Thursday, October 22, 2009

Object navigation: why is it okay for VoiceOver but not NVDA?

Since VoiceOver came to Mac OS X, many screen reader users have joyfully sung its praise - and there's certainly a lot to praise. Some who were previously Windows users have taken the plunge and moved to Mac OS X completely, never looking back. Others use both operating systems or perhaps dream of no longer being dependent on Windows for certain software. This is all fantastic, but as an NVDA developer, there are also aspects of this that frustrate and exasperate me.

Right from the start, NVDA has used "object navigation" in order to review the user interface, particularly to provide access to elements which can't be otherwise accessed using the keyboard. In short, every element in the user interface is generally represented by an object. These objects exist in a hierarchy or tree of objects. Object navigation allows the user to explore this hierarchy by moving between objects and then descending/entering/interacting with objects of interest. For example, entering a list would allow you to see its list items. This is in contrast to the "flat review" (or "screen review") method that Windows screen readers have traditionally used for review, whereby the user can review the content of the screen in a flat fashion from top to bottom, left to right, similar to the way a simple text document would be read. While this might seem more logical at first, it's worth remembering that a sighted user doesn't necessarily read the screen in this ordered fashion. Instead, they are more likely to focus on specific elements of interest, which in some ways is more akin to object navigation. As always, both methods have advantages and disadvantages.

On the Mac, there is only object navigation; there is no concept of flat review of the entire screen using the keyboard. (In the latest version of Mac OS X, you can explore the screen in this fashion using the trackpad. However, NVDA's tracking and reading of text under the mouse allows you to do something similar.) It seems that Mac users are quite happy with this approach, and yet we are constantly bombarded by NVDA users with complaints about the difficulty of object navigation and requests for flat review functionality in NVDA.

So, here's my question for those of you who are either full or partial Mac converts. Why is object navigation quite acceptable on the Mac, but yet not acceptable in Windows? It sometimes seems to me that some of the same users who frequently sing the praises of VoiceOver then turn and complain about the lack of flat review in NVDA. Perhaps I'm wrong and those users who want flat review would not be comfortable using VoiceOver. Even so, it sometimes seems that people are willing to accept a different approach on a completely new platform, yet are unwilling to accept it in a newer product on an existing platform. It's certainly true that NVDA's object navigation needs to be cleaned up a bit (removing extraneous objects, etc.), but I don't think this is the whole story.

ON the web, this gets even more interesting. Due to the non-linear fashion of web pages, Windows screen readers have had to provide their own "flat" representation of web pages. They then override the cursor and other keys to navigate within that representation. Aside from the technical issues associated with this, working with interactive controls on web forms requires a separate mode of interaction (named forms mode, focus mode, etc.) where the screen reader lets the user interact directly with the control by allowing cursor and other keys to pass straight through to the control. (Some screen readers can automatically switch to this mode when appropriate, but there are still two modes.) This is becoming more of a challenge with the ever growing number of web applications, where more keys are required to work with the application. Object navigation solves this problem because the cursor and other application keys are never overridden by the screen reader, so the screen reader doesn't interfere with the functionality of web applications.

Again, VoiceOver uses object navigation on the web and VoiceOver users appear to be quite happy with this. NVDA currently does what other Windows screen readers do, but again, would you be happy if we abandoned this approach in favour of object navigation? It would certainly solve this web application problem once and for all. I get the impression that NVDA users would be unhappy if we changed this.

I want to hear your thoughts. Comment here, Twitter or send me an email.


  1. I am not familiar with voiceover on mac, but you are right that in windows object ierarchy has a lot of extraneous objects which are unnecesary for user. And i can't imagine a heuristic to determine what objects are necessary and possibly in what order (because order can be incorrect as well). May be it is a reason of the user complains.
    I think we will have a flat mode one day after the video hooks get implemented, so there will be two modes as currently it is in virtual buffers. Note, that orca under linux also presents a flat mode. Personally, i am looking forward for simple flat mode in most circumstances as well as clear and elegant object navigation for some programs.

  2. I appreciate the argument you are presenting. What I really don't understand is why the 'object' mode and 'flat' mode are not things that can be toggled by a web developer. It would seem to me that a screen reader software would benefit greatly if these things could be toggled by the web page author instead of having to concern themselves with tricky html semantics, compatibility between readers, et al.

    If an object mode could be toggled on/off consistently by a web developer, it would make everything easier. You go through a document and when the author recognizes content that is best served by flat mode, it gets toggled on and object mode is off. Same goes in the reverse.

    I guess I just don't understand why it's being made so difficult, not that I am pointing fingers.

  3. It's not quite as simple as allowing a web developer to toggle focus mode (i.e. disable browse mode/flat cursor) for part of a page. There are a lot of user experience issues to consider which you're overlooking:
    1. Conflict with user intention: If the screen reader just snapped into focus mode whenever the user cursored into a specific area of a page, that wouldn't be ideal; e.g. perhaps the user wanted to move *past* that area of the page. This is why we require the user to press enter on such elements to switch to focus mode. We do automatically switch if the user clicks on such an element or tabs to it.
    2. Consistency: A user should be able to expect that all controls of a given type will behave the same wherever they encounter them. NVDA should already behave correctly for most ARIA widgets. This should be enough and maintains consistency across pages.
    3. Finally, there *is* a way for a developer to completely disable browse mode: @role="application". However, I have almost never seen this used properly, which suggests that there isn't actually much need for what you are requesting.
    It might help if you can provide specific examples of the problems you are encountering. Generally, problems of this nature are due to authoring misunderstandings, though I'll grant that there are probably still some bugs that still need to be fixed in NVDA with regard to focusable ARIA widgets.

  4. First of all, thank you for responding. I completely understand your point of view. What I am working on is an application that has a great number of moving parts that are appearing and disappearing at will. Skipping over, as you mentioned, an area would not only be disadvantageous - it would be downright confusing because you wouldn't land in the same place each time. We're not talking about a few modal windows - we're talking about an application that essentially IS modal windows.

    "Focus management", if you will, becomes a central concern. If I had to describe the scenario, it is an application with 'browser parts' as opposed to the other way around. As a result, interest in toggling methods becomes a central focus. (no pun intended, I just get tired of looking for synonyms after awhile, ha ha)

    Consistency sounds wonderful. Let's be honest, though - the parallels between the browser wars and screen reader behavior conflicts is practically 1 to 1, only the browser wars were easier to normalize.

    On a more personal opinion note, I feel that Rich Internet Applications are going to become progressively more modal. Would you agree with this, and even if you don't - what is NVDA's outlook on future RIA challenges?

  5. It sounds like your scenario is a prime candidate for @role="application". You would then mark the "browser" portions with @role="document". To use an example, the new Yahoo! Mail sets @role="application" on the top level document, which allows the user to interact with the application just like a desktop application (i.e. no browse mode). However, they set @role="document" on the content of email messages, which means that when they get focus, browse mode will be enabled.

    I'm not really sure how to answer your last question. We want to do our best to stay on top of RIA issues as they arise and will continue to work closely with various areas of the community to that end. That said, I believe that many of the techniques required to deal with these issues already exist and it's just a matter of educating people as to how they should (and should not) be used. The use (and misuse) of @role="application" is a prime example.

  6. Telephone calls through broadband: this is the future of telephony. Voice Over Internet Protocol (VoIP) technology represents a revolution in the way we communicate. Here's an outline of three of the most common ways that VoIP can be used to make cheap phone calls. cheap voice over