pyeda31

Category	Status
CI-CD
Testing
Documentation
Repo Status
Package Version
Python Versions

This python package creates an exploratory data analysis utility designed to streamline the initial stages of data exploration and statistic overview. The three core functions are file validation, handling missing values, and generating summary statistics. pyeda31 offers users a practical toolkit for data preprocessing and exploration, enabling them to work more efficiently with CSV datasets across various projects.

Contributors

Catherine Meng, Jessie Zhang, Zheng He

Functions

check_csv
Check if the given file has a CSV file extension and whether it can be read by the pandas library.
missing_value_summary
This function is to provide a summary of missing values in the dataset.
get_summary_statistics
Generate summary statistics for specified columns or all columns if none are provided.

Contributing to the Python Ecosystem

The pyeda31 package complements the Python ecosystem by providing simple and efficient tools for users to implement quick EDA in the first step of their analysis. While there are some other similar Python packages such as Sweetviz (in-depth EDA with a focus on visualization) and perform-eda (providing comprehensive EDA analysis), these tools can be too heavyweight for quick analysis. Instead, our pyeda31 package aims for simplicity and efficiency, enabling users to quickly complete the most basic and important steps, including validating dataset formats, checking for missing values, and generating statistical summaries for columns of interest. t is a lightweight alternative for small-scale tasks or for gaining an initial understanding of the dataset before in-depth research.Users can also combine pyeda31 with other visualization packages for deeper insights.

Installation

$ pip install pyeda31

Usage

pyeda31 can be used to verify the format of data files and perform basic exploratory data analysis as follows:

from pyeda31.check_csv import check_csv
from pyeda31.pymissing_values_summary import missing_values_summary
from pyeda31.data_summary import get_summary_statistics

Check if the given data file is in csv format

data_file_path = "docs/sample_data.csv"  # path to your data file
if not check_csv(data_file_path):
    raise TypeError("The given file either does not have a CSV file extension or cannot be read by the pandas library. Please check the printed error message for more details.")

Check if the data file has a CSV file extension and whether it can be read by the pandas library

df = pd.read_csv(data_file_path)

missing_summary = missing_values_summary(df)
print("Missing Values Summary:")
print(missing_summary)

Get the data summary for either all columns or the specified columns of our dataset (adjustable decimal precision for mean)

get_summary_statistics(df)
get_summary_statistics(df, col=["numeric", "categorical"]) 
get_summary_statistics(df, col=["numeric"], decimal = 1)

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

pyeda31 was created by Catherine Meng, Jessie Zhang, Zheng He. It is licensed under the terms of the MIT license.

Credits

pyeda31 was created with cookiecutter and the py-pkgs-cookiecutter template.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

pyeda31

Contributors

Functions

Contributing to the Python Ecosystem

Installation

Usage

Check if the given data file is in csv format

Check if the data file has a CSV file extension and whether it can be read by the pandas library

Get the data summary for either all columns or the specified columns of our dataset (adjustable decimal precision for mean)

Contributing

License

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

pyeda31

Contributors

Functions

Contributing to the Python Ecosystem

Installation

Usage

Check if the given data file is in csv format

Check if the data file has a CSV file extension and whether it can be read by the pandas library

Get the data summary for either all columns or the specified columns of our dataset (adjustable decimal precision for mean)

Contributing

License

Credits