Pointblank is a table validation and testing library for Python. It helps you ensure that your tabular data meets certain expectations and constraints and it presents the results in a beautiful validation report table.
Let's take a Polars DataFrame and validate it against a set of constraints. We do that by using the
Validate
class along with adding validation steps:
import pointblank as pb
validation = (
pb.Validate(data=pb.load_dataset(dataset="small_table")) # Use Validate() to start
.col_vals_gt(columns="d", value=100) # STEP 1 |
.col_vals_le(columns="c", value=5) # STEP 2 | <-- Build up a validation plan
.col_exists(columns=["date", "date_time"]) # STEP 3 |
.interrogate() # This will execute all validation steps and collect intel
)
validation
The rows in the validation report table correspond to each of the validation steps. One of the key
concepts is that validation steps can be broken down into atomic test cases (test units), where each
of these test units is given either of pass/fail status based on the validation constraints. You'll
see these tallied up in the reporting table (in the UNITS
, PASS
, and FAIL
columns).
The tabular reporting view is just one way to see the results. You can also obtain fine-grained results of the interrogation as individual step reports or via methods that provide key metrics. It's also possible to use the validation results for downstream processing, such as filtering the input table based on the pass/fail status of the rows.
On the input side, we can use the following types of tables:
- Polars DataFrame
- Pandas DataFrame
- DuckDB table
- MySQL table
- PostgreSQL table
- SQLite table
- Parquet
To make this all work seamlessly, we use Narwhals to work with Polars and Pandas DataFrames. We also integrate with Ibis to enable the use of DuckDB, MySQL, PostgreSQL, SQLite, Parquet, and more! In doing all of this, we can provide an ergonomic and consistent API for validating tabular data from various sources.
Here's a short list of what we think makes Pointblank a great tool for data validation:
- Flexible: We support tables from Polars, Pandas, Duckdb, MySQL, PostgreSQL, SQLite, and Parquet
- Beautiful Reports: Generate beautiful HTML table reports of your data validation results
- Functional Output: Easily pull the specific data validation outputs you need for further processing
- Easy to Use: Get started quickly with a straightforward API and clear documentation examples
- Powerful: You can make complex data validation rules with flexible options for composition
There's a lot of interesting examples you can check out in the documentation website. If you have any questions or would like to request a feature, you are always welcome to create an issue or start a discussion!
You can install Pointblank using pip:
pip install pointblank
If you encounter a bug, have usage questions, or want to share ideas to make this package better, please feel free to file an issue.
Please note that the Pointblank project is released with a
contributor code of conduct.
By participating in this project you agree to abide by its terms.
There are many ways to contribute to the ongoing development of Pointblank. Some contributions can be simple (like fixing typos, improving documentation, filing issues for feature requests or problems, etc.) and others might take more time and care (like answering questions and submitting PRs with code changes). Just know that anything you can do to help would be very much appreciated!
Please read over the contributing guidelines for information on how to get started.
Pointblank is licensed under the MIT license.
This project is primarily maintained by Rich Iannone. Other authors may occasionally assist with some of these duties.