A C++ neural network implementation supporting both CPU and GPU execution through CUDA. The framework provides a flexible architecture for deep learning with automatic differentiation and CUDA-accelerated tensor operations.
The framework is built around an efficient tensor implementation supporting:
- CPU/GPU memory management with automatic device switching
- Basic operations: transpose, dot product, element-wise multiplication
- CUDA-optimized kernels with shared memory utilization
- Memory-aligned storage with proper striding
- Template support for different numeric types (float, double, int)
Operation Type | Implementation Details |
---|---|
Matrix Multiplication | Tiled algorithm with shared memory (TILE_SIZE: 32) |
Memory Management | Pitched allocation for optimal memory access |
Device Handling | Automatic CPU/GPU data transfer |
Batch Processing | Vectorized operations for training efficiency |
Layer | Features |
---|---|
Linear | • Xavier initialization • Configurable input/output dimensions • Forward / backward pass support • Batch size handling |
ReLU | • Zero-memory activation • Optimized backward pass |
Sigmoid | • Numerically stable implementation • Binary cross-entropy integration |
Softmax | • Stable computation with max subtraction • Cross-entropy integration |
Dropout | • Training/eval mode switching • Configurable drop rate |
Component | Implementation |
---|---|
Optimizers | • SGD with momentum • Configurable weight decay • Gradient clipping |
Scheduler | • ReduceLROnPlateau • Configurable patience & factor |
Loss Functions | • MSE • Binary Cross-Entropy • Categorical Cross-Entropy |
- MNIST dataset loader with normalization options
- Tabular data loader for CSV files
- ONNX model import functionality (weights / biases / activation functions)
// Initialize model
Model model;
model.setOptimizer(SGD(0.01f, 0.9f, 0.0001f)); // lr, momentum, weight_decay
// Add layers
model.addLayer(std::make_unique<Linear>(784, 128, true)); // GPU enabled
model.addLayer(std::make_unique<ReLU>());
model.addLayer(std::make_unique<Dropout>(0.2f));
model.addLayer(std::make_unique<Linear>(128, 10, true));
model.addLayer(std::make_unique<Softmax>(true));
// Training loop
Tensor<float> predictions = model.forward(input);
auto [loss, gradients] = CategoricalCrossEntropyLoss(predictions, targets);
model.backward(gradients);
model.step();
Application | Description |
---|---|
MNIST Classification | Digit recognition with dropout and learning rate scheduling |
Iris Classification | Multi-class flower classification |
Breast Cancer Classification | Binary classification with regularization (logistic regression) |
California Housing | Regression with multi-layer architecture |
- CUDA Toolkit
- C++17 compatible compiler
- ONNX runtime libraries
# Compile with GPU support
./script.sh
# Run with GPU
./output --gpu
# Available logging levels
./output --infer # Inference logging
./output --back # Backprop logging
./output --loss # Loss computation logging
./output --debug # Detailed debug information
./output --all # All logging enabled