Start with the sampling
Imagine you run a weekly poll on your site -- www.NovaScotiaBeerDrinkers.com -- asking members for their opinions on various topics. You've created a poll that asks members their favorite brand of beer (in Nova Scotia, there are three well-known brands: Keiths, Olands, and Schooner). So the survey is as inclusive as possible, you include "Other" among the responses.
You receive 1,000 responses and observe the results in Table 1. (The results shown in this article are for demonstration purposes only and not based on actual surveys.)
Table 1. The beer poll
Keiths | Olands | Schooner | Other |
285 (28.50%) | 250 (25.00%) | 215 (21.50%) | 250 (25.00%) |
The data appears to support the conclusion that Keiths is the most popular brand among Nova Scotia residents. Based on these numbers, can you draw this conclusion? In other words, can you make an inference about the population of Nova Scotia beer drinkers on the basis of results obtained from the sample?
Many factors related to how the sample was collected could render your relative popularity inferences incorrect. Perhaps the sample consists of an inordinate number of employees of Keith's Brewery; perhaps you didn't properly guard against multiple votes by one person who may have biased the outcome; perhaps those who elected to vote are different from those who elected not to vote; perhaps the online voters are different from the offline voters.
Most Web polls are subject to such interpretive difficulties. These interpretive difficulties arise when you try to draw conclusions about a population parameter from a sample statistic. From an experimental design point of view, one of the first questions to ask before you collect data is whether you can take steps to help ensure that your sample is representative of the population of interest.
If drawing conclusions about the population of interest is your motivation for a Web poll (versus entertainment for site visitors), then you should implement techniques to ensure one vote per person (so that, they must login with a unique ID to vote) and randomize the selection sample of voters (for instance, select a random subset of members and e-mail them encouragement to vote).
Ultimately, the aim is to eliminate, or at least reduce, various biases that might impair your ability to draw inferences about your population of interest.
View Take Web data analysis to the next level with PHP Discussion
Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Test the hypothesis