An Understanding Of The Mathematical Concept Of Data Science.

8 min readMar 14, 2021
Photo by Science in HD on Unsplash

As an aspiring data analyst and scientists understanding the math is just as essential as understanding the programming. Learning it helps you in making informed decisions about how likely an event would take place, based on a pattern of collected data.

Basic Probability

Probability theory is the mathematical framework one needs to understand to analyze chance events in a logically sound manner.

Event : In a lay man word, it is something that takes place. In terms of probability an event is the subset of the respective sample space. A and B represents the events.

image credit : The Omni link calculator

Sample space : The entire possible set of outcomes of a random experiment . Below is sample space of a fair die.

The likelihood of occurrence of an event is known as probability

Probability of an event : The probability of an event is a number showing the possibility that an event is likely to occur. The probability of an event always ranges between 0 and 1.

0 represents how low a probability can go. This indicates impossibility .

In a fair die, there are 6 possibilities ; 1,2,3,4,5, What is the probability of rolling a 8?

  • P(rolling 8 ) = 0/6 =0 . (There’s no probability of rolling a 8 , so it’s impossible).

1 represents how high a probability can get . This indicates certainty .

  • P(rolling any 1–6) = 6/6 = 1. (There are 6 probability, therefore it’s certain the event will happen).

N.B:- When dealing with unfair or weighted coins the two outcomes are not equally likely.

Random Variable

That function that assigns a real number to each outcome in the probability space is known as a random variable.There are two classes of probability distribution :

Discrete:- A discrete random variable has a finite or countable number of possible values. It has a unique nonnegative functions.

P(X=x)P(X<x)=f(x)=F(x) ; P(X=x) =f(x) P(X<x) =F(x)

Below are the major discrete distribution :

  1. Bernoulli :-A Bernoulli random variable takes the value 1 with probability of p and value 0 with probability of 1-p. For example; binary experiments such as coin toss.
  2. Binomial :- A binomial random variable is the sum of n independent Bernoulli random variables with parameter p . For example; used the number of successes in a specified number of identical binary experiments, such as the number of heads in five coin tosses.
  3. Geometric :- A geometric random variable distribution counts the number of trials that are required to observe a single success, where each trial is independent and has success probability p. For example; the number of times a die must be rolled in order for/ a six to be observed
  4. Poisson :- A Poisson random variable counts the number of events occurring in a fixed interval of time or space, given that these events occur with an average rate events occur with an average rate λ. For example; goals in a soccer .
  5. Negative Binomial :- A negative binomial random variable counts the number of successes in a sequence of independent Bernoulli trials with parameter p before r failures occur. For example; the number of tails flipped before three heads are observed in a sequence of coin tosses.

Expectation of a Random Variable

It is that number that attempts to capture the center of a random variable distribution.

It is defined as the probability-weighted sum of all possible values in the random variable’s support.


N.B:- When a probabilistic experiment of rolling a fair die is considered, a sample mean converges to the expectation of a particular value ‘x’. When the distribution is changed i.e the different faces of the die (thereby making the die unfair) It changes the expectation.

Compound Probability

A compound probability is the product of two occurring independent event . The formula for calculating a compound event is based on the type of compound event occurring i.e whether it is mutually inclusive or mutually exclusive.

  1. Set theory

2. Counting

Set Theory : A set is a collection of objects. Set notation are used in probability theory to specify compound events. For example roll an even number by the set {2 ,4 ,6} .

image from shutterstock

Counting : It can be surprisingly difficult to count the number of sequences or sets satisfying certain conditions. The counting methods are Permutation and Combination technique.

prepinsta .com

Example: consider a bag of balls in which each ball is a different color. Drawing the ball one at a time from the bag without replacement, how many different ordered sequences (permutations) of the balls are possible? How many different unordered sets (combinations)?

For Permutations:

The bag of balls contain 4 balls; Ball A is orange , Ball B is green , Ball C is blue , and Ball D is yellow. How many different ordered sequences of the balls are possible?


The visual representation of the permutation method using 4 balls.

Here the different possible ordered sequences of the ball is 24.

For Combinations:

How many different ordered sequences of the balls are possible?


The visuak representation of combination method using 4 balls.

Here the different possible ordered sequences of the ball is 1.

Conditional Probability

This is the measure of a probability of an event occurring given that the event has occurred before. For example the probability that Tobi will be religious with having breakfast, lunch and dinner everyday, nextweek in general is smaller than Tobi will be having breakfast, lunch and dinner everyday given that he has a drug prescription to take three times a day after each meal this week.

Image from online learning .com

When two events A and B are dependent the probability of them occurring is P(A and B) = P(A) × P(B given A)
or P(A and B) = P(A) × P(B | A)

If we divide both sides of the equation by P(A) we get the
Formula for Conditional Probability

Central Limit Theorem : The Central Limit Theorem (CLT) states that the sample mean of a sufficiently .

large number of i.i.d. (identically distributed) random variables is approximately normally distributed. The larger the sample, the better the approximation.

Image credit: Casey Dunn and Creature Cast

Here we will be talking about how to draw conclusions from data available using different processes.

Frequentist inference

This is the process of determining properties of an underlying distribution via the observation of data. It is a type of statistical inference that draws conclusions from sample data by emphasizing the frequency or proportion of the data. Point estimation, Bootstrap and Confidence interval are processes used for frequentist inference.

Point Estimation : It is the method of using a sample data to calculate a single value which is to serve as a best guess of an unknown population parameter.

Point estimate: This is name given to the single value ; since it identifies a point in some parameter space.

I will be using the example illustrated here; we want estimate the value of π by uniformly dropping samples on a square containing an inscribed circle.

Image credit: seeing theory;frequentist inference

The areas of a square and a circle was stated and used to determine the ratio of π.

Image credit: seeing theory;frequentist inference

Here n was represented as the number of samples within the circle and m was represented as the total number of sample dropped. π is the estimator.

Where m = 253 and n = 300 ; π = 3.3733. From this sample and other ones carried out ,it can be shown that this estimator is unbiased and consistent.

Confidence Interval : This is different from point estimation in the sense that, contrast interval aids the estimation of a parameter by specifying a range of values where the parameter is expected to lie meanwhile point estimation specifies just a single value estimate of a parameter.

Bootstrap : It is a computational technique which provides a convenient way to estimate properties of an estimator via resampling. This is done by resampling with placement from the empirical distribution function like (normal ,uniform , student T , chi squared etc.) which itself is generated by sampling once from the population.

Bayesian Theorem

The conditional probability is most times said to be the same as the Baye’s Theorem . The Bayes’s formula actually connects two different conditional probabilities P(A∣B)P(A∣B) and P(B∣A)P(B∣A) and here two events A and B are independent of each other.

The Baye’s Theorem states that:

Here is a link to an example from seeing theory Brown that explains Baye’s Theorem.

Regression Analysis

Linear regression is an approach for modeling the linear relationship between two variables.

Ordinary Least Square : This is a type of linear least squares method for estimating the unknown parameters in a linear regression model.

Linear Regression model image from wikipedia

Correlation : Correlation is a measure of two variables that are linearly related (meaning they change together at a constant rate). This method of regression analysis don’t make reference to cause and effect. Here ‘s a link to try your hands on a pratical example and visualize a correlation relationship.

Analysis of Variance : Variance is a measure of how spread out a data is. Analysis of Variance (ANOVA) is a statistical method for testing whether groups of data have the same mean. Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study.

Phew!!! Going through these mathematical concept was quite easy to understand and not as difficult as some books make it seem ,Yeah? Exactly! I felt the same way after reading it too, so I literally went ahead to clap for me, ha ha! You can go ahead and do the same if you find me worthy with the clap button below and of course please share too!