Developer Forums | About Us | Site Map
Search  
HOME > TUTORIALS > SERVER SIDE CODING > PHP TUTORIALS > TAKE WEB DATA ANALYSIS TO THE NEXT LEVEL WITH PHP


Sponsors





Useful Lists

Web Host
site hosted by netplex

Online Manuals

Take Web data analysis to the next level with PHP
By Paul Meagher - 2004-04-12 Page:  1 2 3 4 5 6 7 8 9 10 11

Examples of measurement scales

A measurement scale simply specifies a procedure for assigning symbols, letters, or numbers to a phenomenon of interest. For example, the kilogram scale allows you to assign a number to an object indicating its weight based upon standardized displacements of measuring instruments.

Four measurement scales are of importance:

  • Ratio - The kilogram scale is an example of a ratio scale -- the symbols that are assigned to object attributes have numerical meaning. You can perform various operations on those symbols (such as computing ratios) that you cannot perform using numerical values obtained using less powerful measurement scales.

  • Interval - An interval scale is a scale of measurement in which the distance between any two adjacent units of measurement (also known as intervals) is the same, but the zero point is arbitrary. Examples of interval scales include the measurement of longitude, the heights of tides, and measurements between the start and end of various years. The values on an interval scale can be added and subtracted, but can not be meaningfully multiplied or divided.

  • Rank - A rank scale applies to a set of data that is ordinal (the values and observations belonging to it can be put in order or have a rating scale attached). A common example includes "like-dislike" polls, in which numerals have been assigned to attributes (1 = Strongly dislike to 5 = Strongly like). Usually, categories for an ordinal set of data have a natural order, but the distinction between adjoining points on the scale is not necessarily always the same. You can count and order, but not measure, ordinal data.

  • Nominal - A nominal scale of measurement is the weakest form of measurement scale and involves assigning items to groups or categories. No quantitative information is conveyed and no ordering of the items is implied by the measurements. The main numerical operation you perform on nominal scale data is counting the frequency of items in each category.

The following table contrasts the features of each scale of measurement:

Measurement scales

ScalesAbsolute numerical meaning to attributes?Can perform most math operations?
RatioYes.Yes.
IntervalTo the interval; zero point is arbitrary.Addition and subtraction.
RankNo.Counting and ordering.
NominalNo.Counting only.

In this article I will focus on data collected using a nominal scale of measurement and the inferential techniques appropriate for nominal data.

Use the nominal scale

Almost all Web users -- designers, customers, sysadmins -- are familiar with the nominal scale. Web polls and access logs are simiiar in that a nominal scale is often used as the measurement scale. In Web polls, users often measure preferences by asking people to choose a response option (such as, "Do you prefer Brand A, Brand B, or Brand C?"). Data is summarized by counting the frequency of various types of responses.

In a similar vein, a common way to measure Web site traffic is by assigning each hit or visit to the day of the week it occurred and counting the numbers of hits or visits that occurred on each day. Also, you can (and do) count hits by browser type, operating system type, country of origin -- any categorical dimension you can imagine.

Because Web polls and access statistics both involve counting the number of times data falls into a particular qualitative category, they can be analyzed with similar non-parametric statistical tests (tests that allow you to make inferences based on distribution shape rather than population parameters).

David Sheskin, in his Handbook of Parametric and Non-Parametric Statistical Procedures (p.19, 1997), differentiates between parametric and non-parametric tests in this way:

The distinction employed in this book for categorizing a procedure as parametric versus a nonparametric test is primarily based on the level of measurement represented by the data that are being analyzed. As a general rule, inferential statistical tests which evaluate categorical/nominal data and ordinal/rank-order data are categorized as nonparametric tests, while those tests that evaluate interval data or ratio data are categorized as parametric tests.

Non-parametric tests are also useful when certain assumptions underlying a parametric test are questionable; they are powerful in detecting population differences when parametric assumptions are not satisfied. In the case of Web polls, I am using a non-parametric analysis procedure because Web polls often use a nominal scale to record voter preference.

I am not suggesting that Web polls and Web access statistics should always use a nominal measurement scale, or that non-parametric statistical tests are the only ones that can be used for the analysis of this data. It is easy to imagine polls and surveys, for example, that would require users to provide numerical ratings (1 to 100) for each option and for which parametric statistical tests would be appropriate.

Still, many Web data streams involve compiling categorical count data and that data (measured using more powerful measurement scales) can be turned into nominal scale data by defining intervals (such as, 17 to 21) and assigning each data point to an interval (such as, "young adult"). The ubiquity of frequency data, embedded in the experience of Web developers, makes focusing on non-parametric statistics an appropriate starting point to learn how to apply inferential techniques to data streams.

To keep this article to a manageable size, I'll limit the discussion of the analysis of Web data streams to Web polls. Remember, however, that many Web data streams can be represented in terms of nominal count data and that the inferential techniques I discuss will allow you to go beyond reporting simple count data.



View Take Web data analysis to the next level with PHP Discussion

Page:  1 2 3 4 5 6 7 8 9 10 11 Next Page: Start with the sampling

First published by IBM developerWorks


Copyright 2004-2024 GrindingGears.com. All rights reserved.
Article copyright and all rights retained by the author.