This repository presents a research project focused on ensuring data privacy in machine learning, particularly through the use of various methods such as masking and encryption. The study explores the effectiveness of these techniques on different datasets, including MNIST, Breast Cancer, and Pascal VOC.
- Research Objectives
- Datasets
- Network Architectures
- Results
- Conclusion
- Future Work
- Google Colab
- References
The main objectives of this research include:
- Analyzing existing approaches to data privacy in machine learning.
- Evaluating the impact of privacy-preserving techniques on model performance.
- Exploring the balance between data privacy and model accuracy.
The following datasets were used in this research:
- MNIST: A widely-used dataset containing 70,000 images of handwritten digits.
- Breast Cancer: A dataset with 569 samples, each described by 30 attributes related to breast tumor characteristics.
- Pascal VOC: A dataset for visual object classification, containing over 11,000 annotated images across 20 object categories.
Different neural network architectures were employed for various datasets:
Below is an example of data :
The following table summarizes the results of training models on various datasets using different methods: m - masked, e - encryption
Dataset | Method | Masking (%) | Accuracy (%) | Epochs |
---|---|---|---|---|
MNIST | - | - | 98.93 | 10 |
MNIST | m | 20 | 98.27 | 10 |
MNIST | m | 50 | 97.55 | 10 |
MNIST | e | - | 96.26 | 10 |
MNIST | m + e | 20 | 94.28 | 10 |
MNIST | m + e | 50 | 88.24 | 10 |
Breast Cancer | - | - | 96.49 | 20 |
Breast Cancer | m | 20 | 93.86 | 20 |
Breast Cancer | m | 50 | 89.47 | 20 |
Breast Cancer | e | - | 90.35 | 20 |
Breast Cancer | m + e | 20 | 85.09 | 20 |
Breast Cancer | m + e | 50 | 81.58 | 20 |
Pascal VOC | - | - | 95.97 | 10 |
Pascal VOC | m | 30 | 93.76 | 10 |
Pascal VOC | m + e | 30 | 87.71 | 10 |
Pascal VOC | m + e | 30 | 90.83 | 50 |
The results indicate that while privacy-preserving methods reduce model accuracy, they still maintain a level of effectiveness.
This research highlights the importance of developing effective mechanisms to protect data privacy in machine learning while maintaining model performance. The findings underscore the need for a balance between confidentiality and accuracy.
Future research may focus on optimizing model architectures and exploring more advanced encryption and masking techniques to minimize accuracy loss while ensuring robust data privacy.
You can access the MobileNet implementation and experiments through this Google Colab notebook.
- Data Privacy in Machine Learning Systems
- What is Differential Privacy?
- Synthetic Data Overview
- Federated Learning and Differential Privacy
- Understanding Differential Privacy
- High-Accuracy Differentially Private Image Classification
- Adaptive Optimizers with Differential Privacy
- Privacy-Preserving Machine Learning with Fully Homomorphic Encryption
- Homomorphic Encryption in Machine Learning