- aiomultiprocess - aiomultiprocess presents a simple interface, while running a full AsyncIO event loop on each child process, enabling levels of concurrency never before seen in a Python application.
- Amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data https://lyft.github.io/amundsen/
- atheris - Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer.
- atoti - atoti is a free Python BI analytics platform for Quants, Data Analysts, Data Scientists & Business Users to collaborate better, analyze faster and translate their data into business KPIs.
- bamboolib - A GUI for pandas DataFrames
- baselines - OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.
- BayesianOptimization - Pure Python implementation of bayesian global optimization with gaussian processes.
- beakerx - Beaker Extensions for Jupyter Notebook http://BeakerX.com
- BentoML - BentoML is an open-source platform for high-performance ML model serving.
- CacheSQL - CacheSQL is a simple library for making SQL queries with cache functionality. The main target of this library are data scientists and data analysts that rely on SQLalchemy to query data from SQL and pandas to do the heavy lifting in Python.
- Causal ML - Causal ML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data
- causalnex - A Python library that helps data scientists to infer causation rather than observing correlation http://causalnex.readthedocs.io/
- Celluloid - This module makes it easy to adapt your existing visualization code to create an animation.
- Chefboost - Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python.
- Ciphey - Ciphey is an automated decryption tool. Input encrypted text, get the decrypted text back.
- Click - Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It's the "Command Line Interface Creation Kit". It's highly configurable but comes with sensible defaults out of the box.
- CLIP - CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs.
- Code Video Generator - Code Video Generator is a library that uses the Manim animation engine to automatically generate code walkthrough videos.
- creme - creme is a Python library for online machine learning.
- cuML - cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn.
- Curecharts - There is no doubt that Javascript has more advantages in interaction as well as visual effect. Besides that, as we all know, Python is an expressive language and is loved by data science community. Hence I want to combine the strength of both technologies, as the result of this idea, cutecharts.py is born.
- D2Go - D2Go is a production ready software system from FacebookResearch, which supports end-to-end model training and deployment for mobile platforms.
- dataprep - Dataprep lets you prepare your data using a single library with a few lines of code.
- datasette - A tool for exploring and publishing data http://datasette.readthedocs.io/
- deepchecks - Test Suites for Validating ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.
- DeText - DeText is a Deep Text understanding framework for NLP related ranking, classification, and language generation tasks.
- DoWhy - DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions.
- D-Tale - D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures.
- EconML - EconML is a Python package for estimating heterogeneous treatment effects from observational data via machine learning.
- EfficientWord-Net - OneShot Learning-based hotword detection.
- Elara DB - Elara DB is an easy to use, lightweight NoSQL database written for python that can also be used as a fast in-memory cache for JSON-serializable data. Includes various methods to manipulate data structures in-memory, secure database files and export data.
- Euporie 0 Euporie is a text-based user interface for running and editing Jupyter notebooks.
- Evidently - Interactive reports to analyze machine learning models during validation or production monitoring.
- Evol - A python grammar for evolutionary algorithms and heuristics
- falcon - The no-nonsense, minimalist web services and app backend framework for Python developers with a focus on reliability and performance at scale https://falcon.readthedocs.io/en/stable/
- FastAPI - FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.
- FastAPI CRUD Router - A dynamic FastAPI router that automatically creates CRUD routes for your models.
- fds - Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc.
- FiftyOne - The open-source tool for building high-quality datasets and computer vision models.
- gazpacho - gazpacho is a simple, fast, and modern web scraping library.
- ggnerator - A simple command line tool for fake dataset generation given a specification defined as a JSON DSL https://pypi.org/project/ggenerator/
- Google Research Football - This repository contains an RL environment based on open-source game Gameplay Football.
- gpt3-sandbox - The goal of this project is to enable users to create cool web demos using the newly released OpenAI GPT-3 API with just a few lines of Python.
- Great Expectations - Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
- guietta - A tool for making simple Python GUIs
- gym - OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithm.
- Hermione - Hermione is the newest open source library that will help Data Scientists on setting up more organized codes, in a quicker and simpler way.
- Hoppscotch - A free, fast and beautiful API request builder used by 120k+ developers. https://hoppscotch.io
- Hyperactive - A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.
- Hyperopt-sklearn - Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn https://hyperopt.github.io/hyperopt-sklearn/
- igel - A machine learning tool that allows to train, test and use models without writing code.
- image-to-latex - Convert images of LaTex math equations into LaTex code.
- jukebox - Code for "Jukebox: A Generative Model for Music"
- jupyter-book - Build interactive, publication-quality documents from Jupyter Notebooks http://jupyterbook.org
- kedro - Kedro is an open-source Python framework that applies software engineering best-practice to data and machine-learning pipelines.
- koalas - The Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.
- lightly - Lightly is a computer vision framework for self-supervised learning.
- LineaPy - LineaPy is a Python package for capturing, analyzing, and automating data science workflows. At a high level, LineaPy traces the sequence of code execution to form a comprehensive understanding of the code and its context.
- Lip2Wav - Generate high quality speech from only lip movements.
- locust - Scalable user load testing tool written in Python http://locust.io
- lona - Lona is a web application framework, designed to write responsive web apps in full Python.
- lux - Lux is a Python library that makes data science easier by automating aspects of the data exploration process. Lux facilitate faster experimentation with data, even when the user does not have a clear idea of what they are looking for.
- manim - Animation engine for explanatory math videos
- Mava - Mava is a library for building multi-agent reinforcement learning (MARL) systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution while providing a high level of flexibility and composability.
- MLextend - A library of extension and helper modules for Python's data analysis and machine learning libraries http://rasbt.github.io/mlxtend/
- NannyML - NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance.
- neupy - NeuPy is a python library for prototyping and building neural networks.
- NeuralDB - Database Reasoning Over Text project for ACL paper.
- NeuralProphet - A Neural Network based Time-Series model.
- Newspaper - Newspaper is an amazing python library for extracting & curating articles.
- OpenChat - Opensource chatting framework for generative models.
- optuna - Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.
- Opytimizer - This package provides an easy-to-go implementation of meta-heuristic optimizations.
- orchest - Orchest is a web based data science tool that works on top of your filesystem allowing you to use your editor of choice. With Orchest you get to focus on visually building and iterating on your pipeline ideas
- pandasgui - A GUI for analyzing Pandas DataFrames.
- panel - Panel provides tools for easily composing widgets, plots, tables, and other viewable objects and controls into custom analysis tools, apps, and dashboards.
- pingouin - Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.
- PlotNeuralNet - Latex code for drawing neural networks for reports and presentation.
- PolyFuzz - PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.
- prophet - Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data.
- pybaobabdt - The pybaobabdt package provides a python implementation for the visualization of decision trees.
- pycaret - An open source, low-code machine learning library in Python https://www.pycaret.org
- PyCM - PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters.
- PyComp - Fábrica de componentes Python para automatização desde atividades rotineiras até construção de modelos de Machine Learning.
- pydantic - Data parsing and validation using Python type hints https://pydantic-docs.helpmanual.io/
- pydash - The kitchen sink of Python utility libraries for doing "stuff" in a functional way.
- pygod - A Python Library for Graph Outlier Detection (Anomaly Detection).
- PyInfra - pyinfra automates infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployment, configuration management and more https://pyinfra.com
- pyinstrument - Pyinstrument is a Python profiler. A profiler is a tool to help you 'optimize' your code - make it faster.
- PyMC3 - PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.
- PySyft - A library for encrypted, privacy preserving machine learning https://www.openmined.org/
- Qlib - Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.
- Quarto - Open-source scientific and technical publishing system built on Pandoc.
- Quant DSL - Domain specific language for quantitative analytics in finance and trading.
- Realtime PyAudio FFT - A simple package to do realtime audio analysis in native Python, using PyAudio and Numpy to extract and visualize FFT features from a live audio stream.
- ReBeL - Implementation of ReBeL, an algorithm that generalizes the paradigm of self-play reinforcement learning and search to imperfect-information games. This repository contains implementation only for Liar's Dice game.
- RPA Framework - RPA Framework is a collection of open-source libraries and tools for Robotic Process Automation (RPA), and it is designed to be used with both Robot Framework and Python.
- Replicate - Version control for machine learning. Replicate is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage. You can get the data back out using the command-line interface or a notebook.
- samila - Samila is a generative art generator written in Python, Samila let's you create arts based on many thousand points.
- SDV - The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset.
- shap - SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model.
- Shapash - Shapash is a Python library which aims to make machine learning interpretable and understandable by everyone. It provides several types of visualization that display explicit labels that everyone can understand.
- sidetable - sidetable builds simple but useful summary tables of your data https://pbpython.com
- scikit-survival - scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.
- skits - A library for SciKit-learn-Inspired Time Series models.
- spleeter - Deezer source separation library including pretrained models.
- sqlacodegen - This is a tool that reads the structure of an existing database and generates the appropriate SQLAlchemy model code, using the declarative style if possible.
- sqlmodel - SQL databases in Python, designed for simplicity, compatibility, and robustness.
- stock-pandas - The production-ready subclass of
pandas.DataFrame
to support stock statistics and indicators. - Stories - Stories is a simple way of sharing code snippets with other developers. Download in marketplace
- Streamlit - The fastest way to build custom ML tools https://www.streamlit.io/
- superset - Apache Superset is a Data Visualization and Data Exploration Platform
- SyntheticControlMethods - A Python package for causal inference using Synthetic Controls
- sysidentpy - sysidentpy is a Python module for System Identification using NARMAX models built on top of numpy and is distributed under the 3-Clause BSD license.
- sweetviz - Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code.
- Texthero - Text preprocessing, representation and visualization from zero to hero https://texthero.org
- tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming http://epistasislab.github.io/tpot/
- tsfel - This repository hosts the TSFEL - Time Series Feature Extraction Library python package. TSFEL assists researchers on exploratory feature extraction tasks on time series without requiring significant programming effort.
- tuplex - Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code.
- Unvoiced - Application that converts American Sign Language to Speech.
- Visual Python - Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.
- Vowpal Wabbit - Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
- Wav2Lip - This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.
- xarray - xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!
- zero - A high performance and fast Python microservice framework (RPC + PubSub).
- ML Notebooks - A series of code examples for all sorts of machine learning tasks and applications.
- Best-of Machine Learning with Python - A ranked list of awesome machine learning Python libraries. Updated weekly.
- Awesome Dash
- Awesome FastAPI - A curated list of awesome things related to FastAPI.
- Awesome Python Data Science - Probably the best curated list of data science software in Python.
- Awesome Machine Learning - Language: Portuguese
- Open Source Society University
- CursoDataScience - Language: Portuguese
- For Data Science Beginners
- DataSciencePython: Common data analysis and machine learning tasks using Python
- A to Z Resources for Students
- A gallery of interesting Jupyter Notebooks
- All Algorithms implemented in Python
- Data Science Pizza
- Cheat Sheets Data Science
- Data Science - Cheat Sheet