Repoll
Another interesting application of the one-way Chi Square test is to repoll to see if responses have changed.
Imagine that you were to do another Web poll of Nova Scotia beer drinkers after a period had elapsed. You again ask about their favorite brand of beer and now observe the following:
Table 4. A new beer poll
Keiths | Olands | Schooner | Other |
385 (27.50%) | 350 (25.00%) | 315 (22.50%) | 350 (25.00%) |
Recall that the past data looked like this:
Table 1. The old beer poll, yet again
Keiths | Olands | Schooner | Other |
285 (28.50%) | 250 (25.00%) | 215 (21.50%) | 250 (25.00%) |
The obvious difference between the poll outcomes is that the first poll had 1,000 observations and the second one had 1,400 observations. The main effect of these additional observations is a 100-point increase in the frequency count for each response alternative.
When ready to do the analysis of the new poll, you can choose to
analyze the data using the default method of computing the expected
frequencies or you can initialize the analysis with the expected
probability of each outcome based on the proportions observed in the
previous poll. In the second case, you load the previously obtained
proportions into an expected probability array ($ExpProb
) and use them to compute the expected frequency values for each response option.
Listing 6 shows the beer-poll analysis code for detecting changing preferences:
Listing 6. Detecting changing preferences
|
Tables 5 and 6 show the HTML output that the beer_repoll_analysis.php
script generates:
Table 5. Expected frequencies and variances from running beer_repoll_analysis.php
Keiths | Olands | Schooner | Other | Totals | |
Observed | 385 | 350 | 315 | 350 | 1400 |
Expected | 399 | 350 | 301 | 350 | 1400 |
Variance | 0.49 | 0.00 | 0.65 | 0.00 | 1.14 |
Table 6. Various Chi Square statistics from running beer_repoll_analysis.php
Statistic | DF | Obtained | Prob | Critical |
Chi Square | 3 | 1.14 | 0.77 | 7.81 |
Table 6 shows you have a 77 percent probability of obtaining the Chi Square value of 1.14 under the null hypothesis. We cannot reject the null hypothesis that the preferences of Nova Scotia beer drinkers have changed since your last poll. Any discrepancies between the observed and expected frequencies can be accounted for as expected sampling variability from the same population of Nova Scotia beer drinkers. This null finding should not be a surprise given that the transformation of the original poll results was just to add a constant of 100 to each previous poll outcome.
You can imagine, however, that the results might have been different and that they may have suggested a different brand of beer was becoming more popular (by noting the size of the variance reported below each column in Table 5). You can further imagine that such a finding would have significant financial implications for the breweries in question since bar owners tend to stock the most popular beer in their locality.
These results would be subjected to intense scrutiny by brewery owners who would question the appropriateness of the analytic procedures and experimental methodology; in particular, they would question the representativeness of the samples. If you plan to conduct a Web experiment that may have significant practical implications, you need to pay equal attention to the experimental methodologies you use to collect the data and the analysis techniques you employ to make inferences from your data.
So not only can this article give you a good grounding so you can increase your effective understanding of Web data, it can offer some advice on how to defend your selection of statistical test and provide additional legitimacy to the conclusions you draw from the data.
View Take Web data analysis to the next level with PHP Discussion
Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Apply the knowledge