Developer Forums | About Us | Site Map
Search  
HOME > TUTORIALS > SERVER SIDE CODING > PHP TUTORIALS > SIMPLE LINEAR REGRESSION WITH PHP: PART 1


Sponsors





Useful Lists

Web Host
site hosted by netplex

Online Manuals

Simple linear regression with PHP: Part 1
By Paul Meagher - 2004-05-12 Page:  1 2 3 4 5 6 7 8

Fit the model to the data

The SimpleLinearRegression procedure is used to fit a straight line to the data in which the straight line has the following standard form:

y = b + mx

The PHP form of this equation would look something like Listing 3:

Listing 3. PHP equation that fits the model to the data


$PredictedY[$i] = $YIntercept + $Slope * $X[$i]

The SimpleLinearRegression class uses a least-squares criterion for deriving estimates of what the Y Intercept and Slope parameters should be. These estimated parameters are used to construct a linear equation (see Listing 3) to model the relationship between the X and Y values.

Using the derived linear equation, you can then obtain predicted Y values for each X value. If the linear equation is a good fit to the data, then the observed and predicted Y values tend to agree.

How to determine a good fit

The SimpleLinearRegression class generates a fairly large number of summary values. One important summary value is a T statistic that can be used to measure how well a linear equation fits the data. If the fit is good, then the T statistic tends to have a large value. If the T statistic is small, the linear equation should be replaced by a model that assumes the mean of the Y values is the best predictor (that is, the mean of a set of values is often a useful predictor of the next observed value making it the default model).

To test whether the T statistic is large enough to reject the mean of the Y values as the best predictor, you need to compute the probability of obtaining the T statistic by chance. If the probability of obtaining a T statistic is low, then you can reject the null hypothesis that the mean is the best predictor and, correspondingly, gain confidence that a simple linear model offers a good fit for the data.

So, how do you compute the probability of the T statistic?

Compute the T statistic probability

Because PHP lacks mathematical routines to compute the probability of a T statistic, I decided to shell out to the statistical computing package R (see www.r-project.org in Resources) to obtain the necessary values. I also wanted to raise awareness about this package because:

  1. R provides quite a few ideas PHP developers might want to emulate in a PHP math library
  2. With R, you can confirm that values obtained from a PHP math library agree with those obtained from a mature, freely available, open source statistical package.

The code in Listing 4 demonstrates just how easy it is to shell out to R for one value.

Listing 4. Shell out to the R statistical computing package for one value



<?php 

// Copyright 2003, Paul Meagher 
// Distributed under GPL   

class SimpleLinearRegression { 
   
  var $RPath  = "/usr/local/bin/R";  // Your path here

  function getStudentProb($T, $df) {    
    $Probability = 0.0;   
    $cmd = "echo 'dt($T, $df)' | $this->RPath --slave"; 
    $result = shell_exec($cmd);    
    list($LineNumber, $Probability) = explode(" ", trim($result)); 
    return $Probability;
  }

  function getInverseStudentProb($alpha, $df) {  
    $InverseProbability = 0.0; 
    $cmd = "echo 'qt($alpha, $df)' | $this->RPath --slave"; 
    $result = shell_exec($cmd);  
    list($LineNumber, $InverseProbability) = explode(" ", trim($result)); 
    return $InverseProbability;
  }

}

?>

Note that the path to the R executable is set and used in the two functions. The first function returns a probability value associated with a T statistic based upon the Students T distribution, while the second inverse function computes the T statistic corresponding to a given alpha setting. The getStudentProb method is used to assess the fit of the linear model; the getInverseStudentProb method returns an intermediate value used to compute a confidence interval for each predicted Y value.

Space constraints keep me from going into detail about all the functions in this class, so I encourage you to consult an undergraduate statistics textbook if you want to understand the termininology and steps involved in a Simple Linear Regression analysis.



View Simple linear regression with PHP: Part 1 Discussion

Page:  1 2 3 4 5 6 7 8 Next Page: The burnout study

First published by IBM developerWorks


Copyright 2004-2025 GrindingGears.com. All rights reserved.
Article copyright and all rights retained by the author.