This page has been archived.

Estimating the Proportion of Health-Related Websites Disclosing Information That Can Be Used to Assess Their Quality

Final Report - May 30, 2006

Appendix A
Report of Findings From Websites Evaluation Pretest



Cynthia Baur

DATE: 1/27/2006

Margaret Gerteis, Anna Katz, Julie Ladinsky


Report of Findings from Websites Evalution Pretest

Mathematica Policy Research, Inc. (MPR), under contract to the Office of Disease Prevention and Health Promotion (ODPHP), will develop and test a methodology for estimating the proportion of health websites that comply with disclosure criteria enumerated in Healthy People 2010 Health Communication Objective No. 11-4. Consistent with the requirements of this contract, the MPR project team has 1) finalized a methodology for determining the denominator, 2) developed technical specifications for the assessment, 3) drafted protocols for reviewing health websites, and 4) conducted a pretest of the protocols on a small sample of health websites. Here we describe our approach to the preliminary testing, report key findings, and recommend revisions to the protocols based on these findings. This memo will serve as a basis for our pretest debriefing to be held on January 30, 2006.


We pretested draft protocols on a sample of 10 websites that broadly represent the pool from which we will choose 100 sites. The purpose of this test was to determine if the protocol was able to appropriately elicit information about compliance with disclosure criteria from these sites and to determine needed adjustments for the full review.

Pretest Methodology

Sample Selection

Our aim in selecting the 10 sites for the pretest was to mimic the sample selection procedure that would be used in the full review by including 5 sites from the target stratum (that is, those sites that account for 60 percent of user visits to health websites) and 5 from the remainder. We also aimed to include the range of domains (.com, .org, .net, .gov, .edu) likely to show up in the final sample of 100. We first reviewed the 3,608 health websites from the database provided by Hitwise to determine the distribution of sites by stratum and domain. This distribution is shown in Table 1:

Table 1: Distribution of Health Websites from Hitwise Database


Full Sample


Target Stratuma



Percent of Total


Percent of Stratum

Percent of Domain

Total Hitwise Sample










































aThe "target stratum" is defined as those websites that account for 60 percent of the visits to health websites from the Hitwise database.
bThese "other" sites in the Hitwise database include an array of for-profit (commercial), non-profit, governmental, and other sites, including domains outside the United States, with less commonly-used domain indicators. These will be included in the sampling frame for the full review and will be classified according to their type of sponsorship. However, they were not included in the pretest sample.

We then used a quasi-random selection process to select 10 sites with the characteristics shown in Table 2:

Table 2: Distribution of Pretest Sample


Number in Target Stratum

Number in Remainder



















Of the 10 websites selected initially, two were found to be targeted to specialized audiences for specialized purposes unrelated to the purpose of this study (one site was a job listing for health professionals; the other site listed federal grants for health researchers). We replaced these with two sites with from the same domains and strata.

Selection of Health Content

Because our protocols call for a review of three separate items of health-related content to answer specific questions, our next task was to select the 3 items for review for each of the 10 websites. Our aim in selecting the items for review was not only to ensure that both reviewers looked at the same content but also to minimize selection bias that might result from a given reviewer's particular interests or from website sponsors' efforts to direct users' attention to featured content. For each site, we traced three alternative paths from the home page to health-related content, using random numbers to identify topics or content from listed options. (You may recall that during prior discussions we agreed that any health-related content that users could access from the website under review would count, even if it led to content on other sites.)

Of the 30 items thus generated, 19 were items of health content residing on the website under review, 9 were items generated through links to other sites, and two were .pdf files (one from another website and one from the website under review)

Mode of Administration

Although the pretest was designed primarily to test the protocols, we also wanted to obtain preliminary feedback on a mode of administration that we proposed to use for the full review. First, we transferred the protocols to an Excel worksheet to facilitate data input, scoring, and analysis. Second, we set up two computer screens to allow one reviewer to view and navigate both the website under review and the protocol at the same time. The second reviewer used the Excel worksheet but did not have access to two computer screens.

Review of Websites

Using the draft protocols submitted on December 19, 2005, two members of the project team, who are not the primary reviewers of the full sample, separately reviewed each of the 10 websites and the selected pages of health content. Each reviewer documented any finding that a particular disclosure item was present by indicating both the location (URL) and the wording of the content. They were also asked to track difficulties or questions that arose, as well as the time spent on each review.

Analysis and Debriefing

After both reviewers had finished reviewing the 10 websites, we conducted a simple test of inter-rater reliability, based on a comparison of their choices of specific response options on each question and for each website. We then debriefed reviewers, item by item and site by site, to explore sources of the discrepancies and to identify lingering questions of interpretation to be resolved with the project officer.

Key Findings: How the Review Process Worked

Site Selection

Even in a limited sample of 10 websites, the sampling method used for this pretest yielded a diverse array of health-related websites, suggestive of what we may expect to find in the larger universe. While our working definition of "health websites" has been intentionally inclusive, the fact that 2 of the 10 websites selected initially from the Hitwise database were clearly inappropriate for the purpose of this study suggests the need both to clarify exclusion criteria and to create a sample frame large enough to accommodate a potentially large number of ineligible websites. We propose to eliminate websites from the sample for the full review if they are designed to provide narrowly defined services for specialized audiences and have no health information that might be relevant to the general public. We will also design the sample frame such that replacement sites can be selected, where needed, consistent with the stratified sampling methodology that we have agreed upon.


Both reviewers spent well over an hour (1 hour 20 minutes to 1 hour 40 minutes) on each of the first five website reviews. Thereafter, most reviews were completed within an hour. While some of the time spent on earlier reviews resulted from ambiguities of meaning or interpretation that were later clarified, there was also clearly a "learning curve" as reviewers became accustomed to the protocols, the websites, and strategies to search for the disclosure criteria.

Mode of Administration

As noted above, we tested two aspects of mode of administration of the review protocols: 1) the use of two computer screens, 2) the use of an Excel worksheet, online and on paper. Having two computer screens to work from made it easier for the reviewer to move back and forth between the website under review and the review protocols without having to close either window. (The reviewer who did not have two screens found it easier to work from a paper version of the protocol than to switch between windows on a single screen.) We have therefore arranged for reviewers to have access to two screens for the full review.

However, the online Excel worksheet was somewhat unwieldy to use, requiring excessive scrolling (left/right and up/down) to view definitions or paste content, which made it too easy to lose one's place. Although the paper version was relatively easy to use, it created an extra step of later data entry into a spreadsheet for analysis, adding time to the process and creating the opportunity for more errors. We therefore explored alternatives, including web-based survey applications and Access databases, and propose to use an Access database for the full review.

Sources of Discrepancies Between Reviewers

The item by item, site by site comparison of the two sets of reviews yielded a large number of discrepancies, although simple measures of inter-rater reliability showed the two reviewers to be in moderate agreement, overall, for all of the response items. While particular questions and particular websites were sometimes more problematic than others, the source of the discrepancies generally fell into one of four categories: 1) problems with the protocols, 2) differences in reviewers' subjective interpretations of the meaning of the criteria or what satisfies the criteria, 3) difficulty finding or identifying some disclosure elements, and 4) reviewer entry errors. We review each category briefly below.

Problems with the protocols often resulted from ambiguously worded questions or overlapping response categories. In most cases, we were able to agree on the meaning of the question and resolve ambiguities through rewording the question or the accompanying definition. In order to help reviewers identify disclosure elements, given that the specific wording would vary, we initially broke out questions and/or response categories to provide multiple cues and options. These were not mutually exclusive categories, however, and reviewers often disagreed as to which response applied even as they agreed that the criterion had been met. Disagreements of this sort would not affect overall scoring and can readily be accommodated through scoring algorithms. Where multiple response options were helpful to reviewers (for example, listing separately the different terms that may be used to describe how health content is reviewed), we propose to retain them. Where these options added to the confusion and were not necessary to determine compliance with a given criterion (for example, distinguishing between personal information and personal health information), we propose to combine or eliminate them.

Differences in interpreting the meaning of the criteria or identifying elements that would satisfy the criteria most often resulted from wide variations in disclosure practices among the websites under review. While information may have been presented that related to the criteria, the wording or presentation was such that it was not clear whether it satisfied them. In such cases, reviewers' judgment calls often differed. (We discussed and resolved the most common issues that arose in this regard when we spoke by telephone on January 19.) They are described further in the next section, as they related to the sample websites' performance on specific disclosure criteria.

In a small number of cases, one reviewer was able to find specific disclosure elements that satisfied the criteria while the other was not. One might attribute this discrepancy to differences in the diligence or perceptive capabilities of individual reviewers. In this pretest, however, it happened more or less equally to both reviewers. Through discussion, we determined that in such cases the disclosure element was quite simply hard to find and often found by accident in sections ostensibly devoted to other topics.

Finally, a small number of discrepancies were simple entry errors, often attributable to losing one's place in the online Excel worksheet, as noted above. Reviewer entry errors were less common on the paper worksheet (notwithstanding the opportunity for later transcription errors when the data is entered into a database for analysis).

Key Findings: How the Sample Websites Fared on the Disclosure Criteria

Given the nature and purpose of the pretest, we did not compute a final compliance score for the websites in the pretest. However, our preliminary review suggests that none of the 10 websites satisfied all six of the disclosure criteria. One commercial site appeared to have satisfied five of the six. We discuss findings related to specific criteria below.


Most of the websites reviewed clearly identified the name of the organization responsible for the website, and most provided a street address as well as other contact information for the organization. However, sources of funding for the website were identified less often. Moreover, when sources were identified, it was not always easy to determine whether the information provided referred to funding for the sponsoring organization or funding for the website. Our initial review suggests that about half of the websites fully complied with this criterion.


Very few of the websites reviewed included an explicit statement about the mission or purpose of the website, although many described features or services available to website users. Here again, where mission statements were found, it was not always easy to distinguish whether they were intended to describe the mission of the sponsoring organization or the mission of the website. Statements regarding the website's association (or lack of association) with commercial products or services were often included in legal disclaimers (for example, through links identified in small type at the bottom of the page). Overall, about half of the pretest websites appeared to comply with this criterion.


Most of the websites that included advertising on the homepage clearly distinguished advertising from non-advertising content. In some cases, however, advertising on other pages was not so clearly distinguished. Moreover, it was not always clear where specific links would take the user and which ones would link to commercial promotions. Although it would not be feasible to pursue every link or review every page of content to determine compliance with this disclosure element, we will direct reviewers to explore at least two links beyond the content displayed on the home page to look for advertising content.

There was little consistency in how or where websites described their oversight of health content. Moreover, because many of the sites included content from many different sources, it was not always clear whether the policies that were described referred to all content or only some. This was also problematic in the case of "nested" websites (for example, websites for government programs nested within the parent agency and/or the department websites) for which generic review policies may be found at the parent (or grandparent) site.

Although a few sites clearly identified individual or organizational authors of specific health content, many did not. Sites that included many different types of health content from many different sources were often inconsistent in this regard, identifying authors in some cases but not in others. Authorship was especially ambiguous on websites where the content was (apparently) prepared by the site host. For example, health content on government websites may cite sources of information (research studies, data files) but not clearly indicate who was responsible for synthesizing, writing, or presenting the information.

Only one of the pretest websites reviewed appeared to comply fully with this criterion.


All but one of the pretest websites complied with this criterion by including a clearly marked privacy statement with fairly standard legal language explaining how personal information was used and/or protected. However, the distinction between personally identifiable information and personal health information (or between use of information and protection of information) did not prove useful in determining compliance with this disclosure element, because the language used was often generic and would apply to both kinds of information. We therefore propose to combine these elements in the protocol questions for the full review.


Most websites included some mechanism for website users to provide feedback (such as a user feedback or comment form), and in some cases a pop-up survey solicited specific feedback. However, few sites provided any explanation as to how that feedback would be used to improve website services. Three of the 10 pretest websites appear to have been fully compliant with this criterion.

Updating Health Content

None of the pretest websites consistently identified the date health content was created, reviewed, and updated on specific pages of health content. In many cases, a copyright date was indicated (either as a month and year, year, or range of years) without any specific reference to the date the content was created. None of the sites differentiated between date reviewed and date updated, and many used other equivalent terms, such as "modified" or "revised." Sites that included many different types and/or sources of health content were inconsistent in this regard, clearly dating material in some cases and not in others. As a result, none of the pretest websites reviewed complied with this criterion.


As a result of our experience with the pretest, we propose to make the following modifications to our approach to the full review:

  1. Identify sites selected from the sample frame that are ineligible because of their specialized content and audiences and eliminate them ahead of time, before giving the sample to the reviewers for their review. (We will do this when we review the sites to select pages of health content to review.) We have already generated a sample frame that will accommodate the need for replacements without compromising the integrity of the sample.
  2. Create an Access database to provide reviewers a more user-friendly interface and to allow direct entry of data for analysis.
  3. Provide each reviewer with two monitors to eliminate the need to switch between windows during the review.

Based on our review of findings from the pretest, we have also revised the protocols in order to clarify the meaning of questions and response categories and to provide further direction to reviewers through the accompanying explanations. A copy of the revised protocols is attached.

When we spoke by telephone on January 19, we also discussed and resolved some of the overarching issues that have arisen (for example, how to approach "nested" home pages, .pdf files, health content ghostwritten by site hosts, and copyright dates). We will also incorporate these resolutions into reviewer training and manuals.

However, the depth and breadth of questions and issues that arose during our review, and the lack of consistency among the health websites reviewed in their approach to many of the disclosure elements, suggest the need for an incremental approach that will allow us to resolve new issues as they arise and continue to revise the protocols as needed. For these reasons, the protocols attached should be regarded as "work in progress."

As we also discussed, the amount of time required for the review of each website, especially during the early part of any given reviewer's learning curve, and the time that will be required to resolve additional issues that are likely to arise also suggest the need for an alternative approach to the baseline review of 100 websites in order to complete the project on time and within budget. We have had preliminary discussions with you about these issues and will describe recommended approaches and alternatives in a separate memo.

cc:  Davene Wright; Frank Potter; Margo Rosenbach

< PREVIOUS     |      TABLE OF CONTENTS     |      NEXT >