The purpose of the experiment was to compare WebTOC with other interfaces for browsing the World Wide Web. We wished to determine which interface was best for different tasks and also what interface features were preferred by users. The independent variable was the type of the interface (WebTOC, Textual TOC and Netscape). The dependent variables were task execution times and subjective satisfaction. To measure the task execution variable, the subjects received tasks sequentially and wrote their answers on a sheet of paper before advancing to the next task. If a user answered a question wrong, they were asked to continue until they found the correct answer. Thus, errors were factored into the study as higher completion time. To measure the second variable, and users filled out a user satisfaction questionnaire upon completion of the experiment.
Our hypothesis was that users of WebTOC and the Textual TOC would execute tasks with in shorter times than the users of the textual HTML interface. We also believed that the WebTOC interface users would be able to answer quantitative questions about the contents of a site more quickly. Finally, we believed that the two table of content style interfaces would receive higher satisfaction ratings than the Netscape interface, and that user's would find the tasks easier to complete using these interfaces.
Four subjects were tested for the pilot study; two per experimental treatment. The treatments tested were: WebTOC, and Netscape. Due to lack of subjects, we were unable to test the treatment involving the Textual TOC. However, we felt that this omission was not very significant as WebTOC and the Textual TOC are used similarly for all but one of our experiment tasks. The subjects who participated in the pilot study were all experienced users of computers and the World Wide Web. Only one had visited our experimental site before, though she did not rate her familiarity with the site very highly.
There were several observations that were made over the course of the pilot experiment. We realized that there are some technical details that had to be considered. For example, we needed to customize the Netscape window prior to the experiment so that the current URL and the directory buttons are not shown. This would limit subjects to the functionality provided by WebTOC and/or the more basic functions of Netscape (Back, Forward, etc.). We also found that the followed links need to be reset after each experiment so that one subject would not benefit from the efforts of the previous one.
Other procedures were also refined. We formalized our training procedure. The preliminary questionnaire and user satisfaction survey were corrected. We revised the wording of the tasks to make them more clear. In addition, we found that the subjects had difficulty with the wording of the following question:
Which of these two photographers have more works included in the collection: William Henry Jackson, or Carleton E. Watkins?
The subjects did not start their search in the author index, as we had anticipated, because they did not associate photographers with authors. Also, we discovered that it is possible to answer this question correctly without all the information required, due to the nature of the site. Rather than reword the question, we decided to introduce subjects to the Subject and Author Index format of the Library of Congress sites.
The completion times were extremely high for tasks three and four, though
this was encouraged by the lack of a maximum time per task. After reviewing
the times, we decided to set the maximum time to be 5 minutes. As a result
of the pilot study we were able to estimate that the experiment will take
about 45-50 minutes per group of subjects. The breakdown is as follows:
We tested 21 subjects, all of which were student volunteers. Each treatmented tested seven subjects. The subjects were asked to fill out a preliminary questionnaire before the experiment. From this quetionnaire we determined that the subjects were generally experienced with computers and the world wide web. Due to limited availability of subjects, no attempt was made to evenly distribute experience levels among the treatments.
90% of the 21 subjects were advanced computer users (having used computers for more than five years). The other 20% were novice users with less than 1 year experience, these users were all in the Netscape user group. The familiarity with the world wide web was evenly distributed across treatments. 38% of the subjects used the world wide web for more than 2 years and these subjects generally rated themselves as advanced users of the web. 52% of the subjects used the web for 6 months to 2 years, and characterized themselves as intermediate users. 10% of the subjects were novice users with less than 6 months experience. The preliminary questionnaire asked several questions regarding experience with and usage of the world wide web, subjects with longer experience usually spent more time per week using the world wide web and spent longer times in one session. Only one of the 21 subjects had visited the Library of Congress American Memory sites before and that was in the framework of a previous experiment. This subject rated his/her familiarity with the sites as "Not very familiar".
WebTOC is a new tool developed at the University of Maryland Human-Computer Interaction Laboratory (HCIL) that gives a visualization of the contents of a Web site. It consists of two parts: the WebTOC Parser and the WebTOC Viewer. The Parser starts with a Web page and follows all the local links, generating a hierarchical representation of the documents local to the site. The Viewer displays this information as a Table of Contents (WebTOC) for the site using a standard Web browser (see Figure 2-1 below).
The lines of text in the WebTOC frame represent links to a document which may be another Web page or a multimedia file such as an image or audio file. A user can display a document referenced by a link by clicking on it as she would a normal Hypertext link.
In addition to the lines of text, each local document is represented by a colored line with a length corresponding to the size of the file. The color of the line represents the type of file (e.g. text, image, audio). When a document contains links to other documents, the lines representing the documents it includes can be collapsed into a thicker "size bar" that shows the total size of the documents it references. Each size bar has a shadow under it. The size of the shadow indicates the number of items subordinate to the document it represents. This gives a visual cue to let the user distinguish quickly between items with a few subordinate links or many links. As an alternative, a bar representing the number of items can be shown. The user has the option of viewing just the colored lines without the text descriptions of the Table of Contents to get a compact visual representation that gives an overall view and makes size comparisons easier.
The world wide web sites used in the experiment were two collections from the American Memory collections of the Library of Congress. American Memory consists of primary source and archival materials relating to American culture and history. These historical collections are the key contribution of the Library of Congress to the National Digital Library. The American Variety Stage: Vaudeville and Popular Entertainment, 1870-1920 collection was used in the training tasks, while The Evolution of the Conservation Movement, 1850-1920 was used for the experimental tasks. These sites were chosen for two reasons: the initial inspiration for WebTOC came from working with the large digital collections of the Library of Congress in the framework of the Interface Design Project for the Library of Congress National Digital Library Program of the HCIL UMCP. The second reason was that these sites provided the sufficient levels and diversity of materials so that the potentials of WebTOC could be fully tested in the experiment.
Other electronic and printed materials were required to conduct the experiment. Preliminary Questionnaires and User Satisfaction Surveys were developed. An application program (developed using Microsoft Access) was written to record the times for subjects to complete the tasks. Usage of this timer is described in Section 2.5 Procedures and Problems. The web browser used for this experiment was Netscape Navigator 3.0. In order to run this and the timer, the computers used were Pentium PCs equipped with Windows 95. Obviously, these computers were connected to the Internet so that the experimental site could be accessed.
The five tasks represented increasing difficulty from no. 1 to no. 5. There were two search and two sibling comparison tasks (each category consisting of one easy task and one complex task) and a task which involved following links embedded in the text. The very first task was a simple search task (Is Moosehead Lake included in the subject index?) in which subjects had to find an item on the second level of the hierarchy. The second task (Which index contains more entries, the author index or the subject index?) was a simple sibling comparison task. A sibling comparison involves descending a hierarchy to find information, returning to a parent in the hierarchy and descending along a different path in order to compare two nodes on the same level of the hierarchy. For this task the required data was at the second level of the hierarchy. The third task (There is one picture of the Potomac River in this collection. Where exactly was this picture taken?) was a complex search task where subjects had to go down four levels deep the hierarchy and often had to back up because of wrong decision on higher levels of the hierarchy. The fourth task (Which of these two photographers have more works included in the collection: William Henry Jackson or Carleton E. Watkins?) was a complex sibling comparison task where the required information was four levels deep in the hierarchy. The last task (Between 1872 and 1889, a document entitled "The Extermination of the American Bison" was published. Who was the author of this document?) was different from the previous ones in that it involves following links embedded in text, while the previous one concerned items in a hierarchy not in text.
The simple and complex search tasks were intended to test how WebTOC enhances browsing hierarchical web sites as compared to browsing by following links embedded in pages. The simple and complex sibling comparison tasks built on search tasks, but they also involved a comparison task. We believe that WebTOC improves performance on tasks of theis nature because it offers easy access to web pages: instead of going several levels up and down, WebTOC users can go straight to a page to look at it. In addition, the size bars allow users to compare the number of links from each page. The fifth task tested browsing by links embedded in text. Its purpose was see if displaying available links in list form provided any advantage.
The first step in conducting the experiment was to arrange a time and place for our session(s). We had originally intended to conduct all three sessions of our experiment over the course of one afternoon, with 10 subjects per session. Each session would test a different treatment of our independent variable. We believed the AT&T Teaching Theater at the University of Maryland would be a suitable testing area for the following reasons: sufficient computers for each session, availability of a projection system for training purposes and ergonomically comfortable environment. This room was reserved for a Saturday afternoon.
Unfortunately, we were unable to get enough subjects on our scheduled date and had to limit testing to one session that tested just one of our treatments. We were forced to conduct several follow-up sessions to test enough subjects in each group for the results to be statistically significant. Some of these subjects were tested in the Human Computer Interaction Laboratory in the Department of Computer Science. We were able to test a total of 7 subjects per treatment.
Prior to each experimental session, we set up a number of computers for subjects to use. This setting up involved the following:
As soon as subjects arrived, they were asked to sit at a computer that had been set up, and fill out the preliminary questionnaire. Once all subjects were present, we started with an introduction and a training routine. The details of the training routine depended on the interface being tested, but each session consisted of a demonstration of all features of the interface as well as usage of the timer. The subjects could ask questions at any time during the training session. After the training, subjects were asked to perform two practice tasks, similar in nature to the experimental tasks, using the training site. We provided guidance to the subjects at this stage and answered any remaining questions.
When all the subjects were ready to begin, they started the experimental tasks. The timer operates as follows. The next task is displayed until the user is ready to proceed. Then it shrinks to a smaller display that shows the task at the top of the screen and the web browser in the rest of the screen. (see Figure 2-2.) When a subject completes a task he or she presses a button to pause the timer. If 300 seconds ellapse before the user completes the task then the timer stops and notifies the user. There was at least one observer for every two subjects. This observer verified the subject's answer before the subject could proceed to the next task. None of the subjects gave an incorrect answer, so we did not have to ask them to resume any tasks.
The final stage of the experiment involved a short debriefing in which we thanked the subjects for their participation and invited them to find out about the results of the study through our web site or over electronic mail. We also asked subjects to fill out a user satisfaction survey. In addition, we provided light refreshments as a gesture of our appreciation. After the experiment, we collected the raw data from each computer, and stored it in a database for future reference.
The approximate times for the various stages of the experiment are given below:
Apart from the difficulty of getting subjects we experienced some problems with network speed, server availability, and computer defects. Network speed varied across experimental sessions, and this may have introduced a bias in the experiment. In particular, all seven of the WebTOC subjects were tested at the same time; therefore their results are more likely to be affected by network contention. Also, the server for the experimental site went down in the middle of one of our sessions, and we were forced to suspend it and resume at a later time. Finally, there were one or two cases of a computer in the teaching theater freezing up, forcing us to shift the subject to another one. Despite these setbacks, we were able to successfully test all the treatments we had intended to. While biases may exist due to circumstances beyond our control, we feel that the actual experimental procedure was consistent across sessions, and the data recorded was not corrupted in any way.