Data analysis is the process of extracting useful information from a dataset by inspecting, cleansing, transforming, and modelling it. Methodologies used to do so include Descriptive Analysis (which provides numerical insight into the data), Exploratory Analysis (which provides visual insight into the data), Predictive Analysis (which provides insight into the data based on historical events), and Inferential Analysis (which provides inferential insight into the data based on historical events) (this involves getting the insight of the population by obtaining the information from the sample).
Data Analysis Types:
Data analysis may be separated into four stages depending on the methodology used:
Descriptive Analysis
Exploratory Data Analysis
Predictive Analysis
Inferential Analysis
Descriptive Analysis:
Descriptive analysis is a numerical method of extracting information from data. The numerical variables’ values are summarised in the descriptive analysis. Assume you’re looking at sales data from a vehicle company. In descriptive analytical literature, you’ll look for answers to queries like what is the mean, mode, and median of a car type’s selling price, what was the income generated by selling a specific model of automobile, and so on. Using this form of analysis, we may determine the central tendency and dispersion of the numerical variables in the data. A descriptive analysis can assist you gain the high-level knowledge of the data and become acclimated to the data set in most practical data science use cases.
The following are some key descriptive analysis terminologies:
Mean: average value of total numbers given in the list of numbers
Mode: most frequent number in the given list of numbers
Median: middle value of the givan list of numbers
Standard deviation: value of variation of the given set of values from the mean value
Variance: Variation is a term that is used to describe (square of standard deviation)
Interquartile Range (IQR): values between 25 and 75 percentile of a list of numbers
Importance of Descriptive Analysis:
Data visualisation is made simple with descriptive statistics. It enables data to be presented in a meaningful and intelligible manner, allowing for a more straightforward understanding of the data set. The analysis of raw data would be laborious, and determining trends and patterns might be tough. Furthermore, raw data makes it difficult to visualise what is being displayed.
Exploratory Data Analysis:
In contrast to descriptive data analysis, which is a numerical approach to data analysis, exploratory data analysis is a visual approach to data analysis. We will turn to exploratory data analysis once we have a basic comprehension of the data at hand through descriptive analysis. The exploratory data analysis may alternatively be divided into two parts:
Uni variate analysis: Analysis of a single variable (exploring characteristics of a single variable)
Multivariate analysis: Analyses using many variables (comparative analysis of multiple variables, if we compare the correlation of two variables, it is called bivariate analysis)
We employ numerous types of plots and graphs to analyse data in the visual style of data analysis. A bar plot, histograms, box plot with whisker, violin plot, and other plots can be used to study a single variable (univariate analysis). We employ scatter plots, contour plots, multi-dimensional graphs, and other multivariate analytic tools.
Need of Exploratory Data Analysis:
Exploratory data analysis provides a visual representation of the data, which aids in identifying the data’s features more clearly
It assists us in determining which characteristics are most significant, which is very handy when dealing with data that has a lot of dimensions. (i.e., dimensionality reduction is aided by approaches like as PCA and t-SNE)
It’s a good technique to communicate the incurred outcome to non-technical stakeholders and executives
Conclusion
The analysis, synthesis, and presentation of findings relating to a data set produced from a sample or complete population is referred to as “descriptive statistics.” Frequency Distribution, Measures of Central Tendency, and Measures of Variability are the three primary categories of descriptive statistics. Data visualisation is aided by descriptive statistics. It enables data to be presented in a meaningful and intelligible manner, allowing for a more straightforward analysis of the data set at hand. Data scientists utilise exploratory data analysis (EDA) to study and investigate data sets and describe their primary properties, frequently using data visualization techniques. It assists data scientists in determining how to effectively manipulate data sources to obtain the answers they want, making it simpler for them to find patterns, test hypotheses, and verify assumptions.