STATISTICAL METHODS OF ANALYSIS OF ROAD ACCIDENTS
The following statistical methods are used for analysis of road accidents
Logistic regression method- Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome.
- The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).
- Logistic regression is a predictive analysis.
- Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
- When selecting the model for the logistic regression analysis, an important consideration is the model fit.
- Adding independent variables to a logistic regression model will always increase the amount of variance explained in the log odds.
- However, adding more and more variables to the model can result in overfitting, which reduces the generalizability of the model beyond the data on which the model is fit.
- The statistical model of linear regression is the most popular technique in accident severity research because the relationship between accidents and correlated factors can be clearly identified.
Evans pair and double pair comparison method
- This is one of the most popular techniques in the road traffic community for assessing the risk factors on the probability of death in a road traffic accident.
- This approach is not model based and is not explicit about the assumptions concerning relative relative risk.
- It uses the concept of multiplicative relative risk.
- The double pair comparison method uses two such quantities.
- In both methods, Evans bases variance approximations on an assumption of Independent poisson processes.
This method is based on the assumption of multiplicative relative risks and was proposed by Greenland (1994). Unlike Evans (1985) it explicitly makes model assumptions. It is also a true regression technique rather than a stratified approach.Greenland method is desirable since it enables the estimation of the relative risk reduction or case load reduction when a risk factor is changed. However, since the method assumes multiplicative relative risks, the same reservations are expressed about the unsuitability of this assumption for the range of probabilities encountered in the road traffic area also hold for this new method.Greenland's method is recommended over the methods of Evans (1985).
The Greenland's method is
- based on sound contemporary statistics,
- the model assumptions are explicit,
- it is a true regression technique,
Conditional logistic regression
- It is one of the most under-utilized statistical techniques in the road traffic literature. It finesses the problem of truncation and enables a regression analysis of the FARS database.
- It is a very accessible technique which has been available in the statistical literature for many years.
- The technique is available in most modern statistical packages such as SAS, S-Plus and EGRET.
Truncated logistic regression
- In such cases, data is recorded in accidents where atleast one death occurred.
- Databases of such type tend to be under-reported as there exists a tendency not to report accidents where damage and injury is minimal.
- Application of logistic regression without any correction for truncation yields an exaggerated estimation of the relative advantages of one group over the other in terms of survival.
- The Truncated Logistic Regression (TLR) approach conditions on the probability that an accident is observed which is the probability that it results in at least one fatality.
- Advantages of truncated Logistic Regression
- -Since the TLR uses the full information from the sample it can be expected to lead to more accurate inference than CLR or DPC.
- More effects can be fitted using TLR. The conditional logistic regression likelihood
equation only includes terms which vary within a given accident. For
example, for single vehicle accidents, since the speed of the car is constant for
all the occupants, its effect on the survival prospects cannot be estimated using
CLR. - TLR on the other hand can be used to estimate its effect.
- TOR can be used to estimate the relative seriousness of crashes for occupants.
- The TLR method allows us to estimate the probability that a given type of crash
will kill a given type of occupant. - The TOR method enables the estimation of the probabilities of the various categories of injury.
- Only TLR can be used to estimate the total number of potentially fatal crashes.
- The TLR method allows us to estimate the probability that a particular configuration
of factors results in a fatality. - By dividing the observed number of crashes of this type by this probability we obtain an estimate of the total number of potentially fatal crashes of this type.
- The estimates can then be summed over the categories of crashes to obtain an estimate of the total number of potentially fatal crashes.
- TOR can be generalized to different link functions.
- The TOR method allows us to choose the link function which best fits a given data set.
Bayesian Techniques
In very recent years, there has been great research interest by statisticians in a variety
of Bayesian techniques. - The Bayesian paradigm assumes that the unknown parameters are themselves random variables which have a distribution called a prior.
- The prior is chosen by the researcher before the data is collected to represent the
researcher’s prior belief about the unknown parameter. - The primary objection to Bayesian techniques has been that the choice of the prior is subjective.
- Different researchers will have different priors and consequently arrive at different conclusions
from the same set of data. - This problem can be averted by choosing non-informative priors which essentially assume that the researcher has no apriori knowledge about the unknown parameter. In this way the classical frequentist inference can be mimicked.
- However the methods can be applied to a variety of problems that are intractable
using conventional statistical inference. - The Bayesian method proposes a distribution for the unknown total number of accidents and the distribution of the covariates in those accidents.
- The parameters of those distributions are themselves given a distribution, called a prior.
- Then the probable distribution of the parameters given the observed data, called the posterior, is calculated.
- From the posterior various quantities such as confidence intervals can be calculated.
- The mathematics involved in calculating the posterior is often intractable since it will often involve high dimensional integrals.
- That is the case when truncation is involved.
- Gibbs sampling is a technique which allows us to obtain estimates in such situations.
- Gibbs sampling and variants such as the Metropolis-Hastings algorithm are methods that generate observations from the posterior distributions and use the resulting data to perform inference.
- As a hypothetical example suppose we wished to estimate the total number and
character of accidents in rural areas using data from a truncated database. The Gibbs
sampling technique would use the data from only accidents with a fatality and would
give confidence intervals for the total number of accidents and the overall pattern of
accidents. - Gibbs sampling is one example of a set of modern statistical simulation techniques
which are designed to explore likelihoods and posterior distributions. - The methods are very computational and have only become feasible for general use in the last few
years.
- The concept of probability has been introduced in mathematics to deal with uncertainties.
- Probability in mathematics represents a set of reasonable values.
- The relationship between probability and information is given by Baye's rule.
- Inductive learning through Baye's rule is called Bayesian Inference.
- Bayesian methods are data analysis tools that are derived from the principles of Bayesian inference.
- Bayesian methods are useful in road safety engineering as they predict missing data and forecast future data. They provide a computational framework for model estimation, selection and validation.
- Bayesian methods are used for a variety of inferential and statistical tasks.
- Statistical induction is a process of learning about the general characteristics of a population from a subset of members of that population.
- Numerical values of population characteristics are expressed in the form of a parameter theta, and numerical descriptions of the subset form a dataset y.
- Before a dataset is obtained, the numerical values of the population characteristics and the dataset are uncertain.
- Subsequently, after obtaining the dataset, the uncertainty of the population characteristics decreases.
- Quantifying this change is the purpose of bayesian inference.
- Sample space is the set of all possible datasets.
- Parameter space is the set of possible parameter values from which one value best represents the true population characteristics.
- Ideal bayesian learning consists of a numerical formulation joint beliefs about dataset and parameters that are expressed in the terms of probability distributions over samplespace and parameter space.
- Prior distribution
- Posterior distribution
- Population characteristics
- Sampling model
- The posterior distribution is obtained fro prior distribution and sampling model through Baye's rule:
- Baye's rule does not provide us with answers. However, it changes our outlook in the context of new information.
- Baye's rule is an optimal method about updating beliefs about parameters when given new information from a dataset.
- Thus, there is a strong theoretical justification for the use of Baye's rule as a method of quantitative learning.
- The Empirical Bayes method addresses two problems of safety estimation; it increases the
precision of estimates beyond what is possible when one is limited to the use of a two-three year
history accidents, and it corrects for the regression-to-mean bias. - The theory of the Empirical Baye's method is well developed. It is now used in
the Interactive Highway Safety Design Model (IHSDM) and will be used in the Comprehensive
Highway Safety Improvement Model (CHSIM). - Safety can only be estimated, and estimation is in degrees of precision. The precision of an estimate is usually expressed by its standard deviation.
- The safety of entities on which many accidents occur during a short period can be estimated
quite precisely by using only accident counts. - When it takes a long time for few accidents to occur, the estimate is imprecise.
- The important shortcoming of safety estimates that are based on accident counts only is that they are too imprecise to be useful.
- Another disadvantage of safety estimates is that are based only on accident counts which leads to them being subject to a common bias.
- The existence of this ’regression-to-mean’ bias has been long recognized; it is known to produce inflated estimates of countermeasure effectiveness.
- Rational management of safety is not possible if published studies give rise to unrealistic expectations about the effectiveness of safety improvements.
- The Empirical Bayes (EB) method for the estimation of safety increases the precision of
estimation and corrects for the regression-to-mean bias. - It is based on the recognition that accident counts are not the only clue to the safety of an entity.
- Hence, not only accident counts but also knowledge of the typical accident frequency on similar roads is to be considered.
No comments:
Post a Comment