6-probability-every-data-scientist-should-be-aware-of-4

6 Probability Distributions Every Data Scientist Should Be Aware Of

By Urbi Ghosh, 3EA
6 Probability Distributions Every Data Scientist Should Be Aware Of

For any Data Analytics/ Scientist, a professional or a student, probability distribution is an unquestionable requirement to know. It gives the principle foundation to analytics and inferential statistics. While the theory of probability covers the numerical and mathematical aspects, distributions assist us visualizing what's going on underneath.

In this article, some important probability distributions have been discussed.

Types of Common Data

33

6 Important Distribution Types

1. Bernoulli Distribution
It is the easiest distribution to understand with only two possible outcomes-
Success (1)
Failure (0)

Example: - While tossing a coin just once you will either get a Head or a Tail. So the random variable X having a Bernoulli distribution can take value 1 with the probability of success,p, where p=0.5, and the value 0 with the probability of failure, q or 1-p, where q= 0.5.

33

The probabilities of success and failure may or may not be equally likely.

  • Probability Mass Function (p.m.f)-
    px(1-p) x
  • Mean/ Expectation of random variable X having Bernoulli Distribution-
    μ= E(X)= 1*p + 0*(1-p)
  • Variance of a random variable X having Bernoulli Distribution-
    V(X)= E(X2) - [E(X)]2= p-p2= p(1-p)
  • Examples: -
    • A new born baby is either a girl or a boy
    • You either pass or fail in an examination
    • It will either rain or not rain tomorrow

2. Binomial Distribution
Let's take our favourite sport into consideration, cricket, here. If you win a toss today that doesn't mean that you will win the toss tomorrow too. It is a case where Binomial Distribution is considered. Properties-

  1. Each trial is independent of each other
  2. There are only two possible outcomes in a trial- which is either a success or a failure
  3. Total number of 'n' trials are carried out
  4. As all trials are identical, the probability of success and failure is same for all of them.
  • Probability Mass Function (p.m.f)-
    33
  • Mean/ Expectation of Binomial Distribution-
    μ = E(X)= n*p
  • Variance of Binomial Distribution-
    V(X) = n*p*q
  • Bernoulli Distribution is considered to be a special case of Binomial Distribution with the difference that the former has a single trial

3. Uniform Distribution
The basis of Uniform Distribution is that, unlike Bernoulli Distribution, the probabilities of 'n' possible outcomes are equally likely. It is also called a rectangular distribution, which has constant probability.

It characterizes a condition where all outcomes in a range between a minimum and maximum value are equally probable. One of the many examples could be the number of bouquets sold everyday at a flower store is uniformly distributed with a maximum of 40 and a minimum of 10.

  • Probability Density Function (p.d.f) of variable X having Uniform Distribution is-
    33
    'a' & 'b' are the parameters of Uniform Distribution
  • Mean/ Expectation of Uniform Distribution-
    μ= E(X) = (a+b)/2
  • Variance of Uniform Distribution-
    (b-a)2 / 12
  • Standard Uniform Probability Density Function-
    33
    Where a= 0 & b= 1

4. Normal Distribution
A normal distribution is an organization of a data set in which maximum values congregate in the centre of the range and the rest taper off symmetrically toward either ends. Properties-

  1. Mean= Median= Mode i.e. symmetry about the centre (50% of values lower than the mean & 50% more than the mean
  2. It has a bell shaped curve
  3. Area under the bell shaped normal curve is 1.
  4. Two parameters define Normal Distributions- Mean (μ)& Standard Deviation (σ)
  5. 68% of the values of a normal distribution is within one std. dev. of the mean.
  6. 95% of the values of a normal distribution is approximately within two std. dev. of the mean.

  • Probability Density Function (p.d.f) of variable X having Normal Distribution is-
    33
  • Mean/ Expectation of Normal Distribution-
    μ = E(X)
  • Variance of Norma; Distribution-
    V(X) = σ2
  • Examples: -
    • Marks in a Test
    • Errors in Measurement
    • Heights of People
    • Blood Pressures
    • Size of products produced by machines

5. Poisson Distribution
It is useful for describing events with extremely low probabilities of occurrence within fixed interval of time or space. Properties-

  1. The outcome of a successful event should not be influenced by any other successful event
  2. Probability of success within a short period of time = Probability of success within a longer period of time
  3. Probability of success -> 0 as the interval gets smaller

  • Probability Mass Function (p.m.f)
    33
  • Mean/ Expectation of Poisson Distribution-
    μ = E(X) = λt; t= time interval in which the event occurs
  • Variance of Poisson Distribution
    V(X) = μ = λt
    λ = Rate at which an event occurs
    t = length of a time interval
    X= Number of events that occur in that time interval
  • Examples: -
    • Number of crimes that are reported in a place on a day
    • Number of calls of emergency that are reported at a hospital in a day
    • Number of customers for car services in an hour
    • Number of typing errors on every page of a newspaper

6. Exponential Distribution
How long will you have to wait before a patient gets in your clinic? How much time will elapse before an earthquake hits a given region? How long will it take before a call centre receives the next call? How long will an equipment function before it seizes up?

Problems like these are generally solved in probabilistic terms using Exponential Distribution. Exponential distribution characterizes the interval of time between the events. Exponential distribution is extensively used for survival analysis. From the expected life of a machine to the expected life of a human, exponential distribution successfully provides the result.

  • Probability Density Function (p.d.f) of a random variable X having Exponential Distribution-
    33
    Parameter λ> 0 is called the rate
  • Mean of X having Exponential Distribution-
    E(X)= 1/λ
  • Variance of X having Exponential Distribution-
    V(X)= 1/λ2
  • Examples: -
    • Life of a Refrigerator
    • Length of time between arrivals at a hospital
    • Length of time between train arrivals

Probability Distributions are pervasive in numerous segments, in particular, finance & insurance, physics, engineering, computer science and even social sciences in which the students of psychology and healthcare are broadly utilizing probability distribution. It has a simple & extensive application.

#ReadyBusinessPlan #Ask3EA #LearnAt3EA #3EA #BusinessPlan #CapacityEnhancement #CapacityBuilding #Capacity #Assessment #Global #DataAnalyst #DataScience #DataAnalysis

Article by: Urbi Ghosh, 3EA