- Frequentist Statistics tests whether an event (hypothesis) occurs or not. It calculates the probability of an event in the long run of the experiment Bayesian statistics is different from classical statistics
- Bayesian statistics is a mathematical framework to update your beliefs as you observe more data
For example, lets consider an accident
- Try recollecting the events that occurred just before an accident (This is called PRIOR)
- Occurrence of event (Accident) = DATA
- As more DATA is observed = BELIEF is UPDATED = POSTERIOR
- Accident is an event with TWO outcomes (YES or NO)
- Point data (YES or NO)
- Accident is a RANDOM VARIABLE (YES = X & NO = 1-X)
- RANDOM VARIABLE
- Start with the objective (Accident | Data)
- CONDITIONAL PROBABILITY -> If PRIOR says there cannot be an accident, the BELIEF cannot be changed.
Baye's rule
- More data = More precise predictions
- Data is huge (difficult to compute) -> CHOP IT INTO PIECES (LARGE or unmanageable to SMALL CHUNKS or manageable pieces)
- Belief propagation in graphical models
Approximation -> Variational Bayes
- More data to update beliefs
- Use previous outputs as priors and subsequently update
- Belief influences results (Very subjective) similarly Prior is subjective
- Frequentists NOT INTERESTED in SINGLE EVENTS
- BAYESIAN statistics deals with UNCERTAIN EVENTS whereas FREQUENTIST statistics deals with REPEATABLE EVENTS
Bayesian statistics can:
- -incorporate prior knowledge easily
- -update beliefs easily
- -tackle a wider set of problems as probabilities are BELIEFS
- However, bayesian statistics MUST SPECIFY A MODEL
BELIEFs are SUBJECTIVE
Frequentist statistics
- -has non-parametric methods
- -probabilities are objective
- -hard to cheat
- -are focussed on repeatable events
- -prior knowledge is introduced using and ad-hoc format
- -requires a huge data
FREQUENTIST and BAYESIAN statistics use the SAME RULES OF PROBABILITIES
Difference exists in set-up of WHAT IS RANDOM
Bayesian statistics uses UNCERTAINTY IN KNOWLEDGE
Frequentist statistics uses INTRINSIC RANDOMNESS
Usage of either methods is acceptable depending on DATA AVAILABLE and CONSISTENCY
BAYESIAN STATISTICS IS A FUNDAMENTALLY DIFFERENT APPROACH TO STATISTICS
It is an associated set of MATHEMATICAL tools
In BAYESIAN approach, DATA is FIXED and PARAMETERS may VARY
Frequentist statisticians talk about CONFIDENCE INTERVALS while BAYESIAN STATISTICIANS talk about CREDIBLE INTERVAL
Baye's theorem talks about
- -Posterion
- -Likelihood
- -Prior
- -Evidence
- Posterior = (likelihood * prior)/Evidence
- Strength of Prior
Usefulness of Baye's statistics in case of
- -Sparse data
- -Abundant data and
- -Uniform prior
Source of priors
Mathematical tools in Baye's statistics
- -Analytical methods
- -Grid approximation
- -Markov chain monte carlo simulation
- MCMC
-It is an algorithm for exploring parameter space
-Time spent at each point approximates parameter distribution
-Examples include Metropolis-Hastings, Gibbs sampling, etc
Bayesian methods perform extremely well in complex (hierarchical models)
Bayesian methods should be used in case of complex models with many interacting parameters
Bayesian methods are preferred when assumptions CANNOT be made regarding estimates and the data is messy (disorganised, missing data (gaps))
Bayesian road safety analysis
- Bayesian statistics for determining hazardous road locations
- Hazardous locations that are prone to traffic accidents are called "BLACK SPOTS". Identifying these black spots help in scheduling road safety policies.
- A bayesian estimation of the model via a Markov Chain Monte Carlo (MCMC) approach is used.
- Black spots are dangerous locations where accidents occur. Treating black spots is a well known and frequently used means of improving road safety.
- Black spots are spatial concentrations of interdependent high-frequency accident locations.
- From a statistical point of view, road accidents are treated as random events. As a matter of fact they are indeed unintentional result of human behaviour. \
- Hence, it is impossible to predict the exact circumstance of every accident.
- There are several statistical models to analyse black spot data.
- Most accidents follow the Poisson probability law. In order to correct the extra Poisson variation found in accident counts, binomial regression models are used.
- Most recently, Bayesian techniques have been used to handle problems in traffic safety.
- To estimate accident frequencies, a hierarchical bayesian poisson model is used.
- Identification of sites that are more dangerous than others (black spots) help in better scheduling road safety policies.
- Bayesian estimation for the model using a Markov chain Monte Carlo is proposed.
- The problem of identifying black spots is difficult since accidents are rare events and observed data is not necessarily a good indicator as it simply extracts data from an underlying density distribution.
- Policy making has a tremendous impact on society as it can reduce the accidents at a particular site.
- The hierarchical procedure for ranking sites takes into account fatalities and injuries at all levels; combines this information by means of a cost function to rank the sites.
No comments:
Post a Comment