Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added example for check_csv function #20

Merged
merged 2 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,22 @@ $ pip install pyeda

## Usage

`pyeda` can be used to verify the format of data files and perform basic exploratory data analysis as follows:
```python
from pyeda.check_csv import check_csv
from pyeda.pymissing_values_summary import missing_values_summary
from pyeda.data_summary import get_summary_statistics

# Check if the given data file is in csv format
data_file_path = "data.csv" # path to your data file
if not check_csv(data_file_path):
raise TypeError("The given file is not in CSV format. Please check your data file.")

# Check if the data file has any missing values

# Get data summary
```

## Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Expand Down
175 changes: 131 additions & 44 deletions docs/example.ipynb
Original file line number Diff line number Diff line change
@@ -1,45 +1,132 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Example usage\n",
"\n",
"To use `pyeda` in a project:"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"import pyeda\n",
"\n",
"print(pyeda.__version__)"
],
"outputs": [],
"metadata": {}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example usage\n",
"\n",
"Here we will demonstrate how to use `pyead` to verify the format of data files and perform basic exploratory data analysis.\n",
"\n",
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import csv\n",
"from pyeda.check_csv import check_csv\n",
"from pyeda.pymissing_values_summary import missing_values_summary\n",
"from pyeda.data_summary import get_summary_statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a csv file\n",
"\n",
"We'll first create a csv file to work with."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Define file name\n",
"file_name = \"sample_data.csv\"\n",
"\n",
"# Create data with some empty values\n",
"data = [\n",
" [\"Name\", \"Age\", \"City\"],\n",
" [\"Alice\", \"25\", \"New York\"],\n",
" [\"Bob\", \"\", \"Los Angeles\"], # Missing age\n",
" [\"Charlie\", \"30\", \"\"], # Missing city\n",
" [\"Emily\", \"22\", \"Chicago\"], \n",
"]\n",
"\n",
"# Write data to a CSV file\n",
"with open(file_name, mode=\"w\", newline=\"\") as file:\n",
" writer = csv.writer(file)\n",
" writer.writerows(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check if the data file is in the csv format\n",
"\n",
"To begin our exploratory data analysis, it is essential to verify whether the given file is a CSV. This can be done by calling the `check_csv` method."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"if not check_csv(file_name):\n",
" raise TypeError(\"The given file is not in CSV format. Please check your data file.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check if data file has any missing values\n",
"\n",
"After verifying the data file type, the next step is to check whether the data contains any missing values using `missing_values_summary`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get data summary\n",
"\n",
"Now it's time to use the `get_summary_statistics` method to get the data summary information."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.16"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
5 changes: 5 additions & 0 deletions docs/sample_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Name,Age,City
Alice,25,New York
Bob,,Los Angeles
Charlie,30,
Emily,22,Chicago
9 changes: 5 additions & 4 deletions src/pyeda/check_csv.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
import pandas as pd

def check_csv(file_path):
"""
Check if the given file is a CSV file by its extension.
"""Check if the given file is a CSV file by its extension.

Parameters
----------
file_path (str): Path to the file.
file_path: str
Path to the file.

Returns
-------
bool: True if the file is a CSV file, False otherwise.
bool
True if the file is a CSV file, False otherwise.

Examples
--------
Expand Down
Loading