Covariance and correlation can be defined as basic statistical tools which describe the relationships between two variables. They are used to study the relationships between two variables. This can be specified in terms of dependence and association, particularly in the case of linear relationships. Covariance is calculated under the same units as that of two variables, while correlation is calculated based on a standardised attribute study which results in a unitless measure.
Covariance
Covariance is defined as the study of variation of expected variables of two variables from their expected values (mean of the data). In brief, covariance measures how much the variables deviate from the expected values and change. Using the mean as a reference, the positions of the observations relative to the mean are considered important.
Covariance is simply calculated as the product of multiplications of corresponding X and Y deviations from their mean.
Where n is the number of pairs of X and Y.
Correlation
Correlation is defined as a standardised attribute of covariance by dividing covariance under the normal distribution of each variable. It is defined using the Pearson correlation coefficient through the following formulae:
More specifically, the Pearson correlation coefficient, in particular, is the standardised attribute form of X and Y variables. The coefficient lies between -1 and 1 such that it is independent of the scale of variables and ranges. The values are further investigated as follows:
- Positive covariance: Implies that two variables are moving together in the same directions.
- Negative covariance: Implies that two variables are moving in inverse directions.
Now, let us solve some examples on covariance and correlation to understand a bit more about the topic in detail.
Example 1
Ram is an Investor. He has a portfolio where it is tracking the performance of Fortera 300, and he wishes to add the stock of Nokia. However, before wishing to take the decision, he wants to conduct an appropriate statistical study to measure the relationship between the stock and Fortera 3000.
He does not wish to take any unwanted risk in his portfolio. Therefore, he does not wish to invest in buying securities for his portfolio that are not moving in the same direction.
- Suggest which technique should Ram use for considering the decision of buying the stock.
Ram should calculate the covariance between the stock of Fortera 300 and Nokia.
- Perform the appropriate statistical study to compute the same.
Step 1: Data Accumulation
Ram would first have to obtain the figures of stock of both Fortera 300 and Nokia. The results are summarised in the given table-
Step 2: Calculation of mean or average prices of each set
Step 3: Now, find the difference between each value and the mean value.
Step 4: Multiply each of the computed values with each other.
Step 5: Input the values in the formula now and calculate the covariance.
Here, the covariance is found to be positive. This implies that there exists a positive relationship such that the price of the stock and Fortera 300 are moving in the same direction.
Example 2
- Calculate suitable statistical tools for standardised attribute study for the given data.
- Based on the overview of the standardised rates, comment on the given data.
Solution
- First, we will have to calculate the means of both of the variables, followed by subtraction from the exact values, and multiply it further as follows:
Comment: The covariance between the production and the number of customers is found to be 22.46. Since the numerical value of covariance is positive, this suggests that there exists a positive relationship between both values. As production increases, so does the number of customers.
However, in order to be able to understand how strong the relationship is, we need to calculate the correlation.
Based on the overview of standardised measures, the correlation between production and the number of values is found to be very strong.
Therefore, as the production of the system increases, there will be a resultant sharp increase in the number of customers also.
Conclusion
Covariance and correlation are used to evaluate relationships within the 2 random variables of data systems. They briefly express the strength and direction of relationships within the variables, which can be used to see whether the data is correlated or not.