This page has been archived.

Estimating the Proportion of Health-Related Websites Disclosing Information That Can Be Used to Assess Their Quality

Final Report - May 30, 2006

Methodology (continued)


For practical purposes, given budgetary and time constraints, we were limited to a review of about 100 websites to develop baseline estimates of the proportion of health websites in compliance with the disclosure criteria. In collaboration with the ODPHP Project Officer, we reviewed sampling options and chose a strategy that would balance ODPHP's interests in describing the universe of health-related websites on the one hand, and sites that account for most of the web traffic on the other. The selected strategy called for stratifying the 3,608 websites from the Hitwise database into two groups-(1) the "target stratum" of the 213 sites most frequently visited (accounting for 60 percent of all visits), and (2) the 3,395 sites in the "remainder"-and then drawing a simple random sample of 50 websites from each stratum. While this option would provide less precision for the sample overall than an unstratified simple random sample, it allowed us to achieve greater precision for the target stratum, while retaining a reasonable level of precision for the remainder. Because it yields a sample that is also representative of the universe of all health websites in the baseline period, this method was also selected because it better supports ODPHP's need to track changes over time.

We also controlled the sample selection by using a sequential selection procedure and sorted the sampling frame by two factors: (1) the number of "top 100" subcategory lists that a website was on, and (2) the type of website (for profit, nonprofit, government, or foreign)7. We selected in each stratum a larger equal probability sample than we expected to need, in order to replace sites found to be ineligible. We then randomly partitioned this larger sample into subsamples of five (called waves). The random partitioning took into account the original sorting of the sample to ensure that the sample was diverse on the two sorting factors. We then released waves as needed throughout the data collection effort to replace ineligible sites. A comparison of the types of sites in the "universe" and the baseline sample of 150 (before exclusion of ineligible sites) is shown in Table 1.

Table 1. Comparison of Websites in Universe and Baseline Sample, by Type of Site

Universe Baseline Sample

Type of Site Number Percent Number Percent

Total 3,608 100.0 150 100.0

For Profit 2,650 73.4 114 76.0
Nonprofit 674 18.7 17 11.3
Government 126 3.5 11 7.3
Foreign 158 4.4 8 5.3

Source: Hitwise—Real-Time Competitive Intelligence ( Analysis by Mathematica Policy Research.
Type of Sitea: "For Profit" sites include domains ending in .com, .net, and .biz.
"Nonprofit" sites include domains ending in .org, .edu, and .info.
"Government" sites include domains ending in .gov, .mil, .us, and .int.
"Foreign" sites include domains ending in a foreign country's suffix (e.g.: .fr, .uk, .au).

Prior to releasing the subsamples for baseline data collection, the MPR review supervisor examined each website in each subsample to identify sites that were inoperable, inaccessible, or otherwise not appropriate for review. Consistent with our working definition of "health-related websites," we included all accessible sites with at least three items of health information content, as broadly defined by the eHealth Code of Ethics. Of 150 sites in the baseline sample, 48 (32 percent) were found to be ineligible, most often because they lacked sufficient health information content8 or because access to the sites or to health information on the site was restricted (Table 2)9. The final sample size was 102.

Table 2. Ineligible Sample Websites, by Reason for Ineligibility

Reason Number

Total Ineligible 48

No health information contenta 23
Less than 3 items of health contenta 4
Requires registration or subscription 18
Duplicate of another website in sampleb 2
Inactive website 1

Source: Hitwise—Real-Time Competitive Intelligence ( Analysis by Mathematica Policy Research.
a We used eHealth Code of Ethics definition of health information. Accessed June 19, 2005 at
b Different URLs but the same content.

Table 3 shows how the final sample of eligible websites compared with the universe (sample frame) and to the initial sample, by stratum and type of site. The table shows both unweighted and weighted percentages. Because we oversampled the sites most frequently visited, we have weighted all estimates to adjust for the complex sample design.

Table 3. Sample Eligibility, by Selected Website Characteristics

Sample Frame Initial Sample Number Percent of Initial Sample Weighted Percent of Initial Sample

Total 3,608 150 102 68.0 63.2

Most frequently visited 213 70 52 74.3 74.3
Remainder 3,395 80 50 62.5 62.5
Type of siteb
For profit 2,650 114 77 67.5 68.1
Other 1,009 36 25 69.4 50.5

Source: Hitwise-Real-Time Competitive Intelligence ( Analysis by Mathematica Policy Research.
Note: Percent is unweighted percentage of eligible site and weighted percentage takes into account the disproportionate number of sampled sites among the most frequently visited sites.
aStratum: "Most frequently visited: sites are those that account for 60 percent of total user visits.
"Remainder" include all other sites (that is sites that account for 40 percent of total user visits).
bType of Site: "For Profit" sites include domains ending in .com, .net, and .biz.
"Other" includes the following domains:
     "Non Profit" sites include domains ending in .org, .edu, and .info.
     "Government" sites include domains ending in .gov, .mil, .us, and .int.
     "Foreign" sites include domains ending in a foreign country's suffix (i.e. .fr, .uk, .au).

7 We used domain names as proxy indicators of website type, because it was not feasible for reviewers to make a more thorough investigation. However, we recognize that domain extensions may not accurately reflect profit status or the country of origin.

8 Examples of health-related websites that lacked sufficient health information for our purposes are sites that listed job postings for health professionals or research grants available for health researchers, or that were designed only to support wholesale or retail sellers of specific commercial products. However, if websites designed for such purposes also included health-related information—for example, findings from research grants or information about the therapeutic effects of products—they were considered eligible.

9 Examples of health-related websites with restricted access are those accessible only to members or paying subscribers who must enter an identifying log-in name and password. We also excluded sites that required users to "register" by providing personal information. However, sites that limited access to registered users but also provided some health information to nonregistered visitors were considered eligible.

< PREVIOUS     |      TABLE OF CONTENTS     |      NEXT >