The concept of path analysis was proposed in 1921 by Sewall Wright. It was first used in 1959 by Dewey and Lu for plant selection. The same set of assumptions that are applied to linear regression is applied to path analysis as well, in addition to some restrictions that define relations among variables. In path analysis, variables are either dependent or independent. If the variance of a variable is not dependent on another variable, then the variable is exogenous or independent. On the other hand, if the variance of a variable is dependent on another variable, the variable is endogenous. The estimates of the correlation coefficient are used to carry out path analysis.
Path coefficient
The measure of the effect of a causing factor (independent variable) on a dependent variable is indicated by the path coefficient. Since path coefficients are estimated from correlation, they are standardised. A path coefficient is represented using two subscripts. For example, indicates the path from 1 to 2 in the following path diagram.
A path diagram is a diagram that shows relationships between variables and the causal directions between them. In the given diagram, 1 represents an exogenous or independent variable since it has no arrows pointing towards it.
All other variables, that is, 2, 3, and 4 are endogenous or dependent variables as they have arrows pointing towards them. In the given model, endogenous variables can be represented by one or more variables plus an error term, that is, and . Since the causal flow is unidirectional in the given path diagram, it is called recursive.
Calculating path coefficient
Since we are working with correlation, let us assume that our variables are in z score (standard score form). Let the equation of the four variables be as follows:
Since any other variable does not explain the first variable, it can be said that it is an independent variable. As explained earlier, the dependent variables are determined partly due to other variables and partly due to error ( ) or unexplained causes. Also, it can be seen that each variable is determined by a direct path leading up to it. For example, is the direct path to 2 from 1.
Calculating the first path coefficient:
Using the correlation formula for with score,
Substituting the path equation for , we get,
In the above equation, is the variance of . Since it is in standard form, the variance is 1. Also, is 0 as it indicates the correlation between and . (In path analysis, it is assumed that there is no correlation between error terms and variables in the models).
Therefore,
Calculating the second path coefficient:
For the third variable, the paths can be calculated based on the correlation between variables 1, 2, and 3. Therefore, the following correlation formula can be used:
Substituting the path equation for , we get,
Since it is assumed that there is no correlation between error terms and variables in the models, will be 0. The equation can be written as,
After simplification, we get,
Similarly,
After substituting the value of and further simplification, we get,
From , we can get
Calculating the third path coefficient:
Substituting the value of in , we get,
From this, we can solve for the unknown value as follows:
Therefore, from the above calculations, we have obtained the formula for three path coefficients,
We will have to solve three equations in a similar way to find the other unknown path coefficients since the fourth variable ( ) has three paths that come to it.
Applications of path coefficient
Path coefficient can be applied in the following fields:
Studying non-additive gene effect
Polysomic inheritance
Solving complex inbreeding problems
Sex-linked inheritance
Theory of evaluation
Analysis of quantitative traits
Linkage
The environmental influence of traits
Characteristics of path coefficient
The following are the characteristics of the path coefficient:
It is an absolute value
It has no unit
It may be greater than unity and less than negative unity
It explains direction
Conclusion
Path analysis was discovered in 1921 by Sewall Wright. There are two types of variables in path analysis, dependent and independent. A path coefficient shows the effect of an independent variable on a dependent variable.
We assume the variables to be in standard score form to calculate the path coefficient. Then we use the correlation formula for z with r score. To simplify the equations, we assume that the variables are uncorrelated with the error terms.
A path coefficient is represented using two subscripts. For example, indicates the path