Statistics is the study of data collection, presentation, analysis, and interpretation. The government’s demand for census data and information on a range of economic operations fueled much of the early interest in statistics. The contemporary need to turn huge amounts of data available in a range of applied sectors into useful information has prompted both theoretical and practical advances in statistics.
The facts and figures that are gathered, evaluated, and summarized for presentation and interpretation are referred to as data. The two categories of data are quantitative and qualitative data. Qualitative data is used to assign labels or names to groupings of comparable items, whereas quantitative data is used to determine how much or how many of something. Consider a study that is looking for information on gender, age, marital status, and annual income for a group of 100 people.
When dealing with enormous amounts of data, data grouping is quite important. A pictogram or a bar graph can alternatively be used to demonstrate this data. Grouped data is data that is created by grouping individual observations of a variable into groups so that a frequency distribution table of these groups may be used to summarise or analyze the data.
Frequency distribution table:-
When the amount of data obtained is significant, we can use the approach outlined below to quickly analyze it using tally marks.
Example:
Consider the results of a test taken by 50 students in class VII. The maximum mark one can secure in the exam is 50.
35, 45, 31, 26, 42, 18, 28, 30, 22, 20, 33, 39, 40, 32, 19, 16, 33, 38, 46, 43, 22, 37, 27, 17, 11, 34, 41,23, 8, 13, 18, 32, 44, 19, 8, 25, 27, 10, 30, 22, 40, 39, 17, 25, 9, 15, 20, 30, 24
If we make a frequency distribution table for each observation, we’ll end up with a big table. So we can build a table containing a group of observations, like 0 to 10, 10 to 20, and so on, to make it easier to understand.
Groups | Tally marks | Frequency |
0 – 10 | ||| | 3 |
10 – 20 | |||| |||| | | 11 |
20 – 30 | |||| |||| |||| | 14 |
30 – 40 | |||| |||| |||| | 14 |
40 – 50 | |||| ||| | 8 |
Total | 50 |
The grouped frequency distribution is the distribution seen in the table above. This allows us to draw a number of important conclusions, such as
- Many students received between 20 and 40 percent, i.e. 20-30 and 30-40 percent.
- Eight students received more than 40 points in the exam, indicating that they received more than 80%.
The groups 0-10, 10-20, 20-30, and so on are known as class intervals in the table above (or classes). It’s worth noting that the number 10 appears in both 0-10 and 10-20 intervals. Similarly, the number 20 appears in both intervals, for example, 10-20 and 20-30. However, it is not possible for observation 10 or 20 to belong to two classes at the same time. To avoid contradiction, we use the criterion that the overall conclusion belongs to the higher class. It signifies that 10 belongs to the 10-20 class interval, not the 0-10. Similarly, 20 belongs in the 20-30 range but not in the 10-20 range, and so on.
Consider the class 10-20, in which 10 represents the lower class interval and 20 represents the upper-class interval. Class height, class size, or class width of the class interval is the difference between upper and lower class bounds.
Determination of class size:-
Follow the procedures below to avoid confusion about the size of the class intervals that we need to take while grouping the data.
- In the given observations, find the highest and lowest (least) data values.
- Calculate the difference between the two values.
- Now, figure out how many class intervals we’ll need (usually 5 to 20 classes are suggested to take based on the number of observations).
- Calculate the size of the class interval by multiplying the difference between the highest and lowest values by the number of classes.
- In the event that a decimal value is acquired as class size, use the nearest whole number greater than the decimal.
Histogram:-
A histogram can be used to visualize the above frequency distribution table. Consider the horizontal axis for class intervals and the vertical axis for frequency.
The frequency of the class interval is shown by the height of the bars. Because there is no class gap, there is no gap between the bars.
Conclusion:-
Individual observations of a variable are grouped into groups in grouped data, and the frequency distribution of these groups provides a handy manner of summarising or interpreting the data. Data binning of a single-dimensional variable, in which individual values are replaced by counts in bins; and grouping of multi-dimensional variables by some of the dimensions (particularly independent variables), in which the distribution of ungrouped dimensions is obtained (especially the dependent variables).