This project focuses on designing a low-power CNN accelerator tailored for the MNIST dataset. By implementing efficient memory access and resource management techniques, the design minimizes power consumption while achieving high inference performance.
Below is a simplified version of the overall system block diagram:
-
Memory Access Minimization in PE Array
To reduce power consumption, the design minimizes external memory access by efficiently utilizing on-chip buffers and PE arrays. -
FIFO, MaxPooling, and ReLU Integration
A tightly coupled FIFO, MaxPooling, and ReLU module ensures streamlined data processing while maintaining flexibility for hardware optimization. -
Shift Buffer Utilization
Shift buffers are used for managing input data in convolution operations, reducing redundant memory reads and improving computational efficiency. -
Fully Connected (FC) Layer Implementation
The FC layer is implemented with a dedicated computation module that leverages efficient resource allocation and parallelism.
The accelerator achieves efficient inference on a single MNIST image with minimal latency.
The system demonstrates consistent performance when processing 1000 images, showcasing its scalability and robustness.
-
Low-Power Design
- Efficient memory access techniques (PE Array + Shift Buffers).
- Optimized control logic for idle-cycle reduction in processing elements.
-
Resource Utilization
- Reuse of FIFO buffers and PE arrays across multiple operations.
- Minimal external memory bandwidth usage through data locality exploitation.
-
Scalable Architecture
- Modular design supports easy extension to larger datasets or different model architectures.
- Lightweight implementation suitable for resource-constrained environments.
-
Hardware-Software Co-Design
- Integration of software control logic for flexible CNN model configuration.
- Custom AXI4 interface for seamless communication between hardware and software.
This project demonstrates a well-optimized hardware accelerator for MNIST CNN inference with a focus on low-power and high-efficiency design. The techniques implemented here can be extended to more complex deep learning models, making it a valuable reference for future hardware design projects.