Apply Probability Models To Web Data Using PHP And now, Exponential distribution

Apply probability models to Web data using PHP

By Paul Meagher - 2004-04-14 Page: 1 2 3 4 5 6 7 8 9 10 11

And now, Exponential distribution

When probability modeling, you will not be using the probability distributions superclass directly. Instead, you will be interacting with specific instantiations of the probability distributions superclass, one for each common probability distribution for which analytic work has been done to figure out how to implement the PDF, CDF, InverseCDF, and RNG methods.

R, for example, includes such methods for all of these univariate distributions:

Beta
Binomial
Cauchy
Chi Square
Exponential
F
Gamma
Geometric
Hypergeometric
Log-Normal
Logistic
Negative Binomial
Poisson
Student's t
Uniform
Weibull
Wilcoxon

As you can see, when it comes to fitting the probability distribution for your random variable to a specific theoretical probability distribution, you can choose from many such distributions. For you to become proficient in basic probability modeling, you need to become familiar with:

The visual representation of such distributions
The shape and location adjustment parameters that most probability distributions accept
Which adjustment parameters are used for model fitting

Statistics textbooks and Web sites can be consulted for these details. See Resources.

I want to focus on one particular probability distribution called the exponential distribution. What you learn from this example can be used to understand how to use other probability distributions to construct univariate probability models.

The exponential distribution in particular possesses these four advantages:

The implementation of the distribution function methods can be easily found in textbooks or online so that you can see where they came from. Many math books focus on using positive and negative exponential functions to model change in some variable (for example, balance) as a function of time (for example, term interest rate).
The exponential distribution can be used to construct basic probability models (such as assigning probabilities to various statements about a random variable) for the behavior of a large number of phenomenon (for example, random variables of interest). It is commonly used to model the distribution of waiting times -- either the time elapsed until some event occurs or the interval between each occurrence of an event. I will illustrate such an application in the next section.
Textbooks on applied probability modeling often focus on the Exponential and Poisson distributions in particular because together they have demonstrated general utility and can be used to construct more elaborate probability models. In other words, not all probability distributions are equivalent in terms of their usefulness in modeling a variety of real-world phenomenon. The Exponential and Poisson distributions in particular are extremely useful for applied, real-world probability modeling. The Uniform, Normal and Binomial probability distributions are three other distributions you should make an effort to study in detail.
When the observed distribution of your random variable is exponentially distributed, understanding the likelihood of different outcomes is much harder to do by intuition, so it helps in these cases especially to have a formal probability model for the random variable.

Start looking at the Exponential distribution by seeing some examples. The shape and location aspects of the Exponential distribution are determined by a single parameter called lambda (, also known as the rate or decay parameter). Figure 6 plots the probability distributions for four different values of lambda.

Figure 6. Four lambda probability distributions

Listing 4 is the code that implements the probability distribution methods for an Exponential distribution. Notice that the PDF method returns probability values for given values of X; this information was used to construct the lines in Figure 6. Also notice that the core distribution methods (PDF(), CDF(), InverseCDF(), and RNG()) return a single answer or a vector of answers depending on how the method is called. This feature will be demonstrated later in a script that tests the methods of this class.

Listing 4. Implementing probability distribution methods for an Exponential distribution


<?php

/**
* @package PHPMath_ProbabilityDistribution
*/

require_once 'PHPMath/ProbabilityDistribution/General.php';

/**
* The PHPMath_ProbabilityDistribution_Exponential class provides  
* an object for encapsulating exponential distributions.
* @version 0.2
* @author Mark Hale
* @author Paul Meagher
*/

class PHPMath_ProbabilityDistribution_Exponential 
     extends PHPMath_ProbabilityDistribution_General {
  
  var $rate; 

  function PHPMath_ProbabilityDistribution_Exponential($decay=1) {
    if($decay < 0.0) {
      return PEAR::raiseError("Decay parameter should be positive.");
    }    
    $this->rate = $decay;
  }
   
  function Mean() {
    return $this->rate;
  }
  
  function Variance() {
    return $this->rate * $this->rate;
  }
  
  function PDF($x) {    
    if (is_array($x)) {      
      $pdf_vals = array();
      $num_vals = count($x);
      for($i=0; $i < $num_vals; $i++) {                
        if ($x[$i] < 0.0) {
          return PEAR::raiseError("Input values must be greater than 0.");
        }
        $pdf_vals[$i] = $this->rate * exp(-$this->rate * $x[$i]);
      }    
      return $pdf_vals;    
    } else {          
      if ($x < 0.0) { 
        return PEAR::raiseError("Input value must be greater than 0.");
      }
      $pdf_val = $this->rate * exp(-$this->rate * $x);
      return $pdf_val;
    }
  }
  
  function CDF($x) {
    if (is_array($x)) {
      $cdf_vals = array();
      $num_vals = count($x);
      for($i=0; $i < $num_vals; $i++) {                
        if ($x[$i] < 0.0) {
          return PEAR::raiseError("Input values must be greater than 0.");
        }
        $cdf_vals[$i] = 1.0 - exp(-$this->rate * $x[$i]);
      }
      return $cdf_vals;
    } else {          
      if ($x < 0.0) {
        return PEAR::raiseError("Input value must be greater than 0.");
      }
      $cdf_val = 1.0 - exp(-$this->rate * $x);
      return $cdf_val;
    }
  }
  
  function InverseCDF($prob) {
    if (is_array($prob)) {
      $inv_vals = array();
      $num_vals = count($prob);
      for($i=0; $i < $num_vals; $i++) {                
        if ($this->checkRange($prob[$i])) {
          return PEAR::raiseError("Probability values must be 
               between 0.0 and 1.0");
        }
        $inv_vals[$i] = -log(1.0 - $prob[$i]) / $this->rate;
      }
      return $inv_vals;
    } else {
      if ($this->checkRange($prob)) {
        return PEAR::raiseError("Probability value must be 
             between 0.0 and 1.0");
      }
      $inv_val = -log(1.0 - $prob) / $this->rate;
      return $inv_val;
    }  
  }

  function RNG($num_vals=1) {
    if ($num_vals < 1) {
      return PEAR::raiseError("Number of random values must be 
           greater than 0");
    }
    if ($num_vals == 1) {
      $rand_val = mt_rand() / mt_getrandmax();
      return -log($rand_val) / $this->rate;
    } else {
      $rand_vals = array();
      for($i=0; $i < $num_vals; $i++) {
        $rand_val = mt_rand() / mt_getrandmax();
        $rand_vals[$i] = -log($rand_val) / $this->rate;
      }
      return $rand_vals;
    }
  }
    
}

?>

To gain insight into how this class is used and what it does, I have created a script in the tests directory of the accompanying PHPMath_ProbabilityDistribution package that demonstrates usage and method-return values. The script is called exponential.php and looks like this:

Listing 5. The exponential.php script demonstrates usage and method-return values


<?php 
// exponential.php 
// Script to test ExponentialDistribution.php methods 

require_once "../ExponentialDistribution.php"; 
require_once "make_table.php"; 

$exp = new ExponentialDistribution(1); 

$Methods    = array("Mean()","Variance()","PDF(2)","CDF(2)",
     "InverseCDF(0.95)", "RNG(1)"); 
$Output[0] = $exp->Mean(); 
$Output[1] = $exp->Variance(); 
$Output[2] = $exp->PDF(2); 
$Output[3] = $exp->CDF(3); 
$Output[4] = $exp->InverseCDF(0.95); 
$Output[5] = $exp->RNG(1); 
make_table("Exponential Distribution (lambda=1)", "Methods", "Output", 
     $Methods, $Output); 

// Test PDF function by feeding an array of $x_vals and 
// getting a corresponding array of $p_vals. 
$X_Vals = array(0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5); 
$P_Vals = $exp->PDF($X_Vals); 
make_table("PDF(X_VALS)", "X Vals Input", "P Vals Output", $X_Vals, 
     $P_Vals); 

// Test CDF function by feeding an array of $x_vals and getting 
// a corresponding array of $p_vals where each p_val corresponds 
// to p(x < $x_vals[$i]) 
$X_Vals = array(0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5); 
$P_Vals = $exp->CDF($X_Vals); 
make_table("Vector Test of CDF(X_VALS)", "X Vals Input", "P Vals Output", 
     $X_Vals, $P_Vals); 

// Test InverseCDF function by feeding in P_Vals from previous 
// test. Result should be mirror of CDF output. 
$X_Vals = $exp->InverseCDF($P_Vals); 
make_table("InverseCDF(P_VALS)", "P Vals Input", "X Vals Output", 
     $P_Vals, $X_Vals); 

// Test RNG function by passing the number of values you want generated. 
// Result is an array of random numbers from an exponential distribution 
$Counter = range(0, 7); 
$Rnd_Vals = $exp->RNG(8); 
make_table("RNG(N_VALS)", "Counter", "Rnd Vals", $Counter, $Rnd_Vals); 
?>

The exponential.php test script generates five HTML tables.

Table 2 demonstrates the single values that all the methods in the Exponential.php class can return. You can verify that these methods are producing accurate results by comparing their output to what a more mature statistical computing environment like R would generate (for an example, revisit the dexp(), pexp(), qexp(), and rexp() functions).

Table 2. Single values all methods in Exponential.php returns

Exponential Distribution (lambda=1)
Methods	Output
Mean()	1
Variance()	1
PDF(2)	0.13533528323661
CDF(2)	0.95021293163214
InverseCDF(0.95)	2.995732273554
RNG(1)	2.4945071670198

You can also use these tables to assist you in understanding the types of values the functions return when passed different arguments. Tables 3, 4, and 5 are particularly useful for this. They illustrate the vector orientation of these methods and consequently allow you to see the behavior of the three major distribution functions (PDF, CDF, and InverseCDF) over a range of input values (when lambda is set to 1).

Table 3. Lambda = 1, vector orientation of PDF function

PDF(X_VALS)
X Vals Input	P Vals Output
0	1
0.5	0.60653065971263
1	0.36787944117144
1.5	0.22313016014843
2	0.13533528323661
2.5	0.082084998623899
3	0.049787068367864
3.5	0.030197383422319

Table 4. Lambda = 1, vector orientation of CDF function

CDF(X_VALS)
X Vals Input	P Vals Output
0	0
0.5	0.39346934028737
1	0.63212055882856
1.5	0.77686983985157
2	0.86466471676339
2.5	0.9179150013761
3	0.95021293163214
3.5	0.96980261657768

Table 5. Lambda = 1, vector orientation of InverseCDF function

InverseCDF(P_VALS)
P Vals Input	X Vals Output
0	0
0.39346934028737	0.5
0.63212055882856	1
0.77686983985157	1.5
0.86466471676339	2
0.9179150013761	2.5
0.95021293163214	3
0.96980261657768	3.5

Table 6 displays output from the random-number generator. The RNG() method generates exponentially distributed random numbers using the value of lambda to adjust the shape and location aspects of the exponential distribution you are sampling from.

Table 6. Output from the random-number generator

RNG(N_VALS)
Counter	Rnd Vals
0	1.658002921542
1	2.0097168931621
2	0.41562004896617
3	1.9121062991322
4	0.21394501966731
5	0.8054845401079
6	0.50914083865687
7	1.2056725040028

View Apply probability models to Web data using PHP Discussion

Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Building a probability model

First published by IBM developerWorks