The learning of statistics begins with the measure of central tendency which leads to correlation. Correlation refers to determining the relationship between two variables, and correlation coefficients are used to determine how strong that relationship is. We can say that the dependency of one variable on the other variable can be calculated by using the correlation coefficient formula. We can measure the coefficient numerically by using the correlation coefficient.
Correlation has its importance in economics, finance, and business. If you are related to the field of finance and economics then calculating the coefficient can aid in the analysis of a group of variables. And if you’re a business owner, knowing this value might help you forecast future sales and market trends for your organization.
Correlation coefficient
The correlation coefficient is the strength between two different variables. It determines the relationship between these variables and data in a scatterplot that falls along a straight line. It is denoted by the letter r. If the value of r=1 or r = -1 then the data will be perfectly linear and if the values are positive then, the correlation will also be positive and if the value is negative then the correlation will be negative. To describe the data by a linear equation the absolute value of r should be near to one. If the value of r is near zero then there will be no straight line and the relationship is considered to be weak. There are many types of correlation, once you are familiar with the variable or data you are using you will be able to select the best correlation coefficient. Mention below is the three types of correlation coefficients.
- Pearson correlation – the measure of a linear relationship between two variables is known as Pearson correlation. It is the most common type of coefficient relation whose value lies between 1 and -1. Pearson correlation doesn’t tell the relationship between dependent and independent variables.
r=n(∑xy)-(∑x)(∑y)/ √[n∑x2-(∑x)2] [n∑y2-(∑y)2]
- Spearman correlation- this kind of correlation shows the relationship between two data sets and the measurement is based on ranked values. It uses ordinal variables rather than normally distributed.
- Kendall correlation- the Kendall correlation is the measurement of the strength between two sets of data.
Steps for calculating correlation coefficient (r).
In order to calculate correlation let ,
(xi,yi)-a pair of data
x – the mean of xi
y- the mean of yi
sx = the standard deviation of the first coordinates of xi
sy = the standard deviation of the second coordinates of yi
- First calculate the mean of all first coordinates of data xi i.e. x.
- Now, calculate the mean of all the first coordinates of the data yi i.e. y.
- Calculate the standard deviation of coordinates of data xi. It is denoted by sx.
- Calculate the standard deviation of coordinates of data yi. It is denoted by sy.
- Now calculate a standardized value for each xi by using formula (zx)i = (xi – x̄) / s x .
- calculate a standardized value for each yi by using the formula (zy)i = (yi – ȳ) / s y .
- (zx)i(zy)i, multiplying the correlation standardized values.
- Sum up all the products together from the last step.
- Now divide the sum by n – 1,
where n is the total number of points in our set of paired data.
Hence, the result will be correlation coefficient r.
Conclusion
Correlation coefficients are used to describe the strength and direction of the link between variables. It is denoted by r. There are three kinds of correlation coefficients – Pearson correlation, Spearman correlation, and Kendall correlation. A correlation which is a measure of a linear relationship between 2 normally distributed random variables is known as Pearson correlation. The correlation which describes the monotonic relationship between 2 variables is called Spearman correlation. It is relatively robust to outliers, can be used for ordinal data and it is used for the distribution of continuous data. We do a hypothesis test to check the null hypothesis of no correlation and a range of values of the estimate is provided by confidence intervals.