Here's the English translation of your machine learning lab README:
This repository contains machine learning models implemented from scratch using numpy
, pandas
, and matplotlib
, aimed at helping learners understand the internal workings of various machine learning algorithms. If you encounter any issues while using this repository, feel free to open an Issue or improve the project by submitting a Pull Request.
- Implement machine learning algorithms manually and package them in Python for testing.
- Provide comprehensive basic tests.
- Easily create and run custom tests.
Use the requirements.txt
file to configure the environment dependencies. It’s recommended to use pload as a lightweight virtual environment management tool focused on managing Python environments. To set it up, follow these steps:
pload new -m 'MLLabs' -v 3.8.20 -f requirements.txt # v3.8.20 recommended; other versions not tested.
The following sections outline how to run benchmark tests, custom models, and create and execute tests.
First, create a dedicated virtual environment for the project and activate it. To learn more about using pload to create virtual environments, refer to: Creating Virtual Environments with pload.
-
Clone the Repository:
git clone --branch main --single-branch https://github.com/HugoPhi/MachineLearningLabs.git
-
Install Local Library:
Enter the project directory:
cd MachineLearningLabs/
Compile and install the library:
pip install .
You can verify successful installation by running
pip list
and checking if thehym
library is included. -
Run Tests:
For example, to run the
test/DecisionTree/watermelon2.0
experiment, execute the following in the project directory:python ./test/DecisionTree/watermelon2.0/main.py
This will generate the experiment results.
You can modify or create your own machine learning models. The project structure is divided into two main parts: src
and test
. The src
directory stores the source code for machine learning algorithms, while the test
directory stores basic and custom tests for each algorithm. Understanding the project structure will help you modify it more efficiently.
The src
directory stores the source code and is organized as follows:
src/
├── hym/
│ ├── __init__.py
│ ├── DecisionTree/
│ │ ├── __init__.py
│ │ ├── DecisionTree.py
│ │ └── ...
│ ├── LinearRegression/
│ └── ...
-
hym/
: Top-level module containing the implementations of various machine learning algorithms. -
To add a new algorithm category, such as Support Vector Machine, create a
SupportVectorMachine/
directory underhym/
, and add it tohym/__init__.py
as follows:from . import DecisionTree from . import LinearRegression from . import SupportVectorMachine # New algorithm module __all__ = [ 'DecisionTree', 'LinearRegression', 'SupportVectorMachine' # Add new module ]
-
File Naming Conventions:
- Algorithm Class Files: Use CamelCase, e.g.,
BasicDecisionTree.py
,Variants.py
, for implementing algorithm classes. - Helper Class Files: Use snake_case, e.g.,
node.py
, for implementing helper classes. - Helper Function Files:
utils.py
contains utility functions, e.g., data loading, preprocessing, and math functions. - Package Initialization File:
__init__.py
, which marks the package and submodules, with exported contents listed in__all__
.
- Algorithm Class Files: Use CamelCase, e.g.,
The test
directory stores test code and is structured similarly to src
:
test/
├── DecisionTree/
│ ├── iris/
│ │ ├── iris.xlsx
│ │ └── main.py
│ ├── watermelon2.0/
│ │ ├── watermelon2.0.xlsx
│ │ └── main.py
│ └── ...
├── LinearRegression/
└── ...
- Create directories under
test/
by algorithm category, matching the structure insrc/
. - Inside each algorithm directory, add test cases. Some tests for basic datasets are already provided, but you can also add your own experiments.
-
setup.py
Contains package build information such as version, dependencies, and author information. The version format follows
v[x].[y].[z]
, where:x
: Major updates, breaking API changes.y
: Significant new features, such as implementing a new algorithm category.z
: Minor updates, including bug fixes or small adjustments.
-
README.md
Documents usage and updates. It’s recommended to check periodically for the latest information.
Algorithm Library
- Supervised Learning
- Linear Regression
- Logistic Regression
- Decision Tree
- ID3
- C4.5
- CART
- Support Vector Machine
- Neural Networks
- Unsupervised Learning
- K-means Clustering
- Principal Component Analysis
Testing
- Supervised Learning
- Linear Regression
- Logistic Regression
- iris
- Decision Tree
- watermelon2.0
- iris
- ice-cream
- wine quality
- house price
- Support Vector Machine
- Neural Networks
- Unsupervised Learning
- K-means Clustering
- Principal Component Analysis
Add your references here.
This project is licensed under the MIT License. Please refer to the LICENSE file for details.
If you find this project helpful, please consider ⭐️ starring us!