In this diagram, dots represent different values of the variables. Scattered plots, which depict the values of variables in a data set by displaying their positions on the vertical and horizontal axes, are characterised by the use of Cartesian coordinates to indicate their positions on the vertical and horizontal axes, respectively. In addition to scatter plots, scattergrams, scatter graphs, and scatter charts are all used to represent data.
Scatter Plot Applications and Uses
1. Demonstration of the relationship between two variables
The scatter plot is most commonly used to depict the relationship between two variables and to investigate the nature of the association. Positive or negative relationships can be detected, as well as non-linear or linear relationships, as well as strong or weak relationships.
The data points or dots that show on a scatter plot indicate the individual values of each of those data points, and when the data is viewed as a whole, they allow for the detection of patterns.
2. Detection of correlational correlations between variables
Another popular application of scatter plots is in the identification of correlational correlations, which is another common application. Scatter plots are commonly used to represent data where independent factors are on the horizontal axis and dependent variables are on the vertical axis. It enables the observer to determine or gain an understanding of what the probable vertical value may be, provided that information about the horizontal value is also available.
3. Identification of data patterns
Identification of data patterns is another possibility with scatter plots. In the case of data gaps, data points can be grouped together depending on how similar their values are. This makes it simple to discover any outlier points that may exist among the data.
When using a scatter plot to identify correlations between variables, the nature of the correlations can be estimated based on the confidence level at which the correlations are found to be significant.
Positive correlation denotes an upward trend, which may be seen visually on the graph as data points slope upwards from the lower-left corner of the chart towards the upper-right corner.
The negative correlation represents a decline, which is depicted on the chart by the data points sloping downhill from the upper-left corner of the chart to the lower-right corner of the chart.
An uncorrelated set of data consists of data that is neither positively nor negatively linked with one another (null).
Common issues when using scatter plots
Overplotting
The problem of overplotting can arise when we have a large number of data points to plot on a single graph. Overplotting is a situation in which data points overlap to such an extent that it is difficult to identify correlations between points and factors. When a large number of data points are concentrated in a small area, it might be difficult to determine how tightly packed they are.
There are a few basic approaches that can be used to ease this problem. The use of a subset of data points is an alternative approach: a random selection of data points should still provide a rough understanding of the patterns present in the whole data set. We can also vary the shape of the dots, for example, by adding transparency to make overlaps more evident or by decreasing the point size to make overlaps less noticeable. As a third option, we could even use a new chart format, such as a heatmap, in which the colour of each bin corresponds to the number of points in that bin. In this context, heatmaps are often referred to as 2-dimensional histograms.
Interpreting correlation as causation
Rather than a problem with the creation of a scatter plot, this is a problem with the perception of the plot. It is not necessary to notice a link between two variables in a scatter plot to conclude, however, that changes in one variable are the cause of changes in the other. As a result, the statistical statement “correlation does not indicate causation” has come to be widely used. Perhaps the observed association has been triggered by some third variable that has an effect on both depicted variables, perhaps the causal link has been reversed, or perhaps the pattern is purely accidental.
Example: It is incorrect to compare city statistics for the amount of green space they have and the number of crimes they commit and conclude that one causes the other. This ignores the fact that larger cities with more people will tend to have more of both, and that they are simply correlated through this and other factors. If a causal link needs to be demonstrated, then additional analysis must be conducted to control or account for the impacts of other potential factors in order to rule out any other plausible explanations.
CONCLUSION
In the world of visualisation, the scatter plot is a fundamental chart type that should be easily created by any visualisation tool or solution. The computation of a basic linear trend line, as well as the colouring of points according to the values of a third, categorical variable, are both quite frequent options. Other possibilities, such as non-linear trend lines and encoding third-variable values by shape, are less typically observed, despite their importance. Even without these features, though, the scatter plot can be a useful chart type to employ when attempting to determine the relationship between quantitative factors in your data.