For practical purposes, given budgetary and time constraints, we were limited to a review of about 100 websites to develop baseline estimates of the proportion of health websites in compliance with the disclosure criteria. In collaboration with the ODPHP Project Officer, we reviewed sampling options and chose a strategy that would balance ODPHP's interests in describing the universe of health-related websites on the one hand, and sites that account for most of the web traffic on the other. The selected strategy called for stratifying the 3,608 websites from the Hitwise database into two groups-(1) the "target stratum" of the 213 sites most frequently visited (accounting for 60 percent of all visits), and (2) the 3,395 sites in the "remainder"-and then drawing a simple random sample of 50 websites from each stratum. While this option would provide less precision for the sample overall than an unstratified simple random sample, it allowed us to achieve greater precision for the target stratum, while retaining a reasonable level of precision for the remainder. Because it yields a sample that is also representative of the universe of all health websites in the baseline period, this method was also selected because it better supports ODPHP's need to track changes over time.
We also controlled the sample selection by using a sequential selection procedure and sorted the sampling frame by two factors: (1) the number of "top 100" subcategory lists that a website was on, and (2) the type of website (for profit, nonprofit, government, or foreign)7. We selected in each stratum a larger equal probability sample than we expected to need, in order to replace sites found to be ineligible. We then randomly partitioned this larger sample into subsamples of five (called waves). The random partitioning took into account the original sorting of the sample to ensure that the sample was diverse on the two sorting factors. We then released waves as needed throughout the data collection effort to replace ineligible sites. A comparison of the types of sites in the "universe" and the baseline sample of 150 (before exclusion of ineligible sites) is shown in Table 1.
Table 1. Comparison of Websites in Universe and Baseline Sample, by Type of Site
Prior to releasing the subsamples for baseline data collection, the MPR review supervisor examined
each website in each subsample to identify sites that were inoperable, inaccessible, or otherwise not
appropriate for review. Consistent with our working definition of "health-related websites," we included
all accessible sites with at least three items of health information content, as broadly defined by the
eHealth Code of Ethics. Of 150 sites in the baseline sample, 48 (32 percent) were found to be ineligible,
most often because they lacked sufficient health information content8 or because access to the sites or
to health information on the site was restricted (Table 2)9. The final sample size was 102.
Table 2. Ineligible Sample Websites, by Reason for Ineligibility
Table 3 shows how the final sample of eligible websites compared with the universe (sample frame)
and to the initial sample, by stratum and type of site. The table shows both unweighted and weighted
percentages. Because we oversampled the sites most frequently visited, we have weighted all estimates
to adjust for the complex sample design.
Table 3. Sample Eligibility, by Selected Website Characteristics
7 We used domain names as proxy indicators of website type, because it was not feasible for reviewers to make a more thorough investigation. However, we recognize that domain extensions may not accurately reflect profit status or the country of origin.
8 Examples of health-related websites that lacked sufficient health information for our purposes are sites that listed job postings for health professionals or research grants available for health researchers, or that were designed only to support wholesale or retail sellers of specific commercial products. However, if websites designed for such purposes also included health-related information—for example, findings from research grants or information about the therapeutic effects of products—they were considered eligible.
9 Examples of health-related websites with restricted access are those accessible only to members or paying subscribers who must enter an identifying log-in name and password. We also excluded sites that required users to "register" by providing personal information. However, sites that limited access to registered users but also provided some health information to nonregistered visitors were considered eligible.