Probability Theory & Descriptive Statistics

Nadeeha Salam
8 min readJun 29, 2021


“A reasonable probability is the only certainty” — E.W. Howe

Of all the mathematical concepts/tools we use in Data Science, Probability and Statistics form the most important and crucial. Hence, as a Data Scientist, I feel it is very important to have a clear idea of the underlying concepts of Probability & Descriptive Statistics.

So, what is Probability Theory?

While probability refers to the measure of the likelihood of an event to occur, Probability Theory is its entire field or framework dedicated to the study of chances of a random phenomenon. (Random Phenomena: A situation where outcomes are known, but we do not know what outcome could happen yet; hence it’s described in terms of some probability values. Eg: In the case of tossing a coin, Head and Tail are the outcomes, but we are not sure what would come up when a coin is tossed).

Now, let’s have a deeper understanding of terms and concepts related to Probability Theory & Descriptive Statistics.

Random Variables: A random variable is a variable whose possible values are numerical outcomes of a random phenomenon or in other words, it is a rule for associating a number with each element in a sample space.

Probability Distribution Function: It shows how probability is divided for possible values of Random Variables.

Cumulative Distribution Function: The cumulative distribution function (CDF) of X is FX(x) = P(X ≤ x).It gives the probability that X assumes a value that does not exceed x.CDFs are also known as “Distribution functions (DF)”.

Types of Distributions:

There are basically two types of distributions: Discrete distributions (can take only integer values or categorical) and continuous distributions (can take any value within a range that could even be infinite).

Discrete Distributions

Now let’s discuss the different types of important discrete distributions.

  1. Uniform Distribution: In a Uniform Distribution, all outcomes are equally likely to occur.

e.g: Rolling of die

Sample space: S= {1,2,3,…k}


2. Bernoulli Distribution: Here, there are only two possible outcomes- success(s) and failure(f)

e.g.: Tossing a coin, Covid test results etc.

S = {f,s}

Probability of success & failure
Expectation or mean
Bernoulli Distribution for two outcomes 0 & 1 with P(0) = 0.4 & P(1) = 0.6

3. Binomial Distribution: It is the distribution of outcome in an experiment repeated several times.

e.g.: Tossing a coin ’n’ times

S = {f,s}

Distribution and moments
Bernoulli Distribution for n = 5 & p = 0.5

4. Geometric Distribution: It is the probability distribution where the first occurrence of success happens in ‘k’ number of trials.

e.g.: Throwing a dart at a bull’s eye until you hit a bull’s eye on a dartboard

There will be (x-1) failure until the first success.

Mean or Expectation
Geometric Distribution for p=0.5

5. Poisson Distribution: Poisson Distribution gives the number of events that might occur for a given interval of time according to a parameter that could define the rate of occurrence.

Eg: radioactive decay of atoms, goals scoring rate in a match etc.

Mean and variance is the same for Poisson
Poisson Distribution for rate = 1.30

Continuos Distributions

Now let’s move on to the important continuous distributions. Note that for a continuous distribution, point probabilities are always zero.

  1. Uniform Distribution: Here ‘X’ can take values between two numbers, say ‘α’ and ‘β’. Uniform distribution is represented as U(α,β).
Uniform Distribution with α = -5 & β = 5

2. Normal Distribution: It is the most important distribution in statistics. It is symmetrical, bell-shaped and the tails of the graph do not touch the x-axis. It is described by two parameters: mean and standard deviation.

Normal distribution with mean 2.55 & standard deviation 0.7

Standard Normal Distribution: Standard normal distribution is normal distribution with mean ‘0’ and standard deviation ‘1’

Standard Normal Distribution

We often use standard normal distribution i.e, we convert normal distributions to standard normal distribution for ease of calculation using Z-tables.

3. Exponential Distribution: It is the distribution that gives waiting time from one event to the next of a poison process. It is described by the parameter ‘λ’ which is the rate of occurrence. It is represented as Exp(λ).

Exponential distribution with λ = 0.75

4. Gamma Distribution: It is a general family of continuous probability distributions.

Gamma distribution with α = 1

5. Chi-square Distribution: It is a special case of gamma distribution described by a parameter ‘ν’ called degrees of freedom.

Chi-square distribution with degrees of freedom 3

6. t-Distribution: It is identical to a normal distribution and is much heavier at the tails. It is described by the parameter degrees of freedom (ν).

7. F-distribution: It is the ratio of two Chi-square distributions having parameters ν1 and ν2 as degrees of freedom.

F-Distribution with degrees of freedom 6 & 7

Joint Distribution

These distributions are used when there are two or more random variables and we are interested in finding the relationship between these random variables. Say for example, how the correlation between Credit Score and Loan Repayment history is linked to the probability of Loan Application rejections.

Two-dimensional Random Variables:

  • Two-dimensional discrete Random Vectors
  • Two-dimensional continuous Random Vectors

Discrete Bivariate Probability Distributions:

Here, we have associated a number to the outcome of two random variables and the probabilities make up the probability distribution for a random variable.

Continuous Bivariate Probability Distributions:

Here, we have defined the joint density function and probability statement for two continuous random variables.

Marginal Distributions: These are used to find the distributions of a single random variable alone.

  • Discrete
  • Continuos

Expectation and variance of the marginal distribution:

Independent Random variables: Two random variables are said to be independent if the values of one random variable do not change the distribution of the other.

  1. Discrete

2. Continuos

Conditional Distributions: If we want to find the distribution of random variables given the value of one random variable and establish a relationship between two random variables, we use conditional distribution.

  • Discrete
  • Continuos

Conditional Expectations: We also have expectations for conditional distributions as well for conditional probabilities.

  • Discrete
  • Continuos

Degree of Association

Two measures of the degree of the association are:

  • Covariance- It shows the direction of the linear relationship between two variables i.e, either directly proportional or inversely proportional. The values range from -∞ to +∞
  • Correlation- It defines the strength of the relationship between two random variables.

To conclude, we have discussed most of the major topics under probability theory and descriptive statistics. However, there are many distributions apart from what we have discussed which may play a crucial role in solving Data Science related problems. Knowing and understanding these concepts is very important for any Data Scientist as these mathematical concepts are the pillar stone for learning Data Science.



Nadeeha Salam

Data Scientist | Optimist