EDA trong Python

Data preparation and exploratory data analysis take a lot of time and effort from data professionals. Wouldn’t it be nice to have a package that enables you to explore your data quickly — in just one line of code?

I’ll show you the best four Python packages that can automate your data exploration and analysis. I’ll go through each one, what it does and how you can use it.

4 Ways to Speed Up Your EDA in Python

  1. DataPrep
  2. Pandas Profiling
  3. SweetViz
  4. AutoViz

 

1. DataPrep

DataPrep lets you prepare your data using a single library with a few lines of code. The DataPrep ecosystem currently consists of three components:

  • Connector 
  • EDA
  • Clean API

The connector enables a simple data collection from web APIs by providing a standard set of operations. The EDA component handles the exploratory data analysis, and clean API provides functions for efficiently cleaning and validating data.

For example, using the Philly parking violations data set, we can call plot[] to get an overview of EDA on the data frame or plot correlations with a single line of code, using plot_correlations[].

You can also generate a detailed report with one line of code using DataPrep. Here’s a create_report[] method called on a data frame.

import pandas as pd
from dataprep.eda import create_report
df = pd.read_csv["parking_violations.csv"]
create_report[df]

You’ll get back an extensive and interactive report for variables and correlations as well as interactions and missing values.


DataPrep eases the amount and effort you need as a data scientist to explore the data set. With just one line of code, you can get an overview of your data set, missing values, correlations and statistical description of the data set, as you can see above.

To install DataPrep, run:

pip install dataprep

Check out the DataPrep for more information.

More From Abdishakur HassanWhat Is Exploratory Spatial Data Analysis [ESDA]?

 

2. Pandas Profiling

Pandas Profiling generates profile reports from a Pandas DataFrame and enables you to perform similar kinds of EDA to the other packages I’m discussing here. It has an extensive use case and more tutorials than all of the packages.

With just one line of code, you can generate an EDA report using Pandas Profiling with descriptive statistics, correlations, missing value, text analysis and more.

Let us call

pip install dataprep
0 on the Philly data frame to generate an EDA report.

from pandas_profiling import ProfileReport
profile = ProfileReport[df, title="Report"]
profile

Pandas Profiling generates a similar report with a sleek user interface [UI].

You can install using the pip package manager by running:

pip install pandas-profiling[notebook]

Make sure to visit the GitHub repository for more tutorials and documentation.

Exploratory Data Analysis [EDA] Using Python

 

3. SweetViz

SweetViz offers an in-depth EDA [target analysis, comparison, feature analysis, correlation] and interactive EDA in two lines of code! In addition, SweetViz allows you to compare two data sets, such as training and test data sets for your machine learning projects.

To get a report from SweetViz, you can run the following command on any data frame and it will generate an HTML report.

import sweetviz as sv
analyze_report = sv.analyze[df]
analyze_report.show_html[report.html', open_browser=False]

More From Our Data Viz ExpertsThe 7 Best Thematic Map Types for Geospatial Data

 

4. AutoViz

With AutoViz, you can automatically visualize any size data set with a single line of code in much more detail. Here’s a report generated with AutoViz using the Philly parking data set.

from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class[]
df_av = AV.AutoViz['parking.csv']

Note that you don’t even need Pandas to read the data. AutoViz will load it when you provide the path to the data set. Here’s the report we generated with AutoViz.

In AutoViz, you have many more plots [i.e., violin, boxplots and more] as well as statistical and probability values. However, the UI isn’t as neat as others’ reports, and you don’t have access to interactive plots.

To install AutoViz, run the following command:

pip install autoviz

More From Built In Data Scientists7 Ways to Tell Powerful Stories With Your Data Visualization

 

The Takeaway

The four packages offer similar functionalities that allow you to automate your EDA with simple, intuitive code [often just one line of it!].

That said, of the four packages in this article, DataPrep provides much more functionality than simple EDA. It can help you ingest more data sources and can help you get through  large data sets faster.

Chủ Đề