CUDA Neural Network framework

A C++ neural network implementation supporting both CPU and GPU execution through CUDA. The framework provides a flexible architecture for deep learning with automatic differentiation and CUDA-accelerated tensor operations.

Core architecture

Tensor operations

The framework is built around an efficient tensor implementation supporting:

CPU/GPU memory management with automatic device switching
Basic operations: transpose, dot product, element-wise multiplication
CUDA-optimized kernels with shared memory utilization
Memory-aligned storage with proper striding
Template support for different numeric types (float, double, int)

Performance features

Operation Type	Implementation Details
Matrix Multiplication	Tiled algorithm with shared memory (TILE_SIZE: 32)
Memory Management	Pitched allocation for optimal memory access
Device Handling	Automatic CPU/GPU data transfer
Batch Processing	Vectorized operations for training efficiency

Components

Neural layers

Layer	Features
Linear	• Xavier initialization • Configurable input/output dimensions • Forward / backward pass support • Batch size handling
ReLU	• Zero-memory activation • Optimized backward pass
Sigmoid	• Numerically stable implementation • Binary cross-entropy integration
Softmax	• Stable computation with max subtraction • Cross-entropy integration
Dropout	• Training/eval mode switching • Configurable drop rate

Training components

Component	Implementation
Optimizers	• SGD with momentum • Configurable weight decay • Gradient clipping
Scheduler	• ReduceLROnPlateau • Configurable patience & factor
Loss Functions	• MSE • Binary Cross-Entropy • Categorical Cross-Entropy

Data handling

MNIST dataset loader with normalization options
Tabular data loader for CSV files
ONNX model import functionality (weights / biases / activation functions)

Example usage

// Initialize model
Model model;
model.setOptimizer(SGD(0.01f, 0.9f, 0.0001f));  // lr, momentum, weight_decay

// Add layers
model.addLayer(std::make_unique<Linear>(784, 128, true));  // GPU enabled
model.addLayer(std::make_unique<ReLU>());
model.addLayer(std::make_unique<Dropout>(0.2f));
model.addLayer(std::make_unique<Linear>(128, 10, true));
model.addLayer(std::make_unique<Softmax>(true));

// Training loop
Tensor<float> predictions = model.forward(input);
auto [loss, gradients] = CategoricalCrossEntropyLoss(predictions, targets);
model.backward(gradients);
model.step();

Implemented applications (in ./Examples)

Application	Description
MNIST Classification	Digit recognition with dropout and learning rate scheduling
Iris Classification	Multi-class flower classification
Breast Cancer Classification	Binary classification with regularization (logistic regression)
California Housing	Regression with multi-layer architecture

Requirements

CUDA Toolkit
C++17 compatible compiler
ONNX runtime libraries

Build and run

# Compile with GPU support
./script.sh

# Run with GPU
./output --gpu

# Available logging levels
./output --infer    # Inference logging
./output --back     # Backprop logging
./output --loss     # Loss computation logging
./output --debug    # Detailed debug information
./output --all      # All logging enabled

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
Examples		Examples
Layers		Layers
Loader		Loader
Logger		Logger
Loss		Loss
Model		Model
Optimizer		Optimizer
Scheduler		Scheduler
Tensor		Tensor
onnx_generator		onnx_generator
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
main.cpp		main.cpp
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Neural Network framework

Core architecture

Tensor operations

Performance features

Components

Neural layers

Training components

Data handling

Example usage

Implemented applications (in ./Examples)

Requirements

Build and run

About

Releases

Packages

Contributors 2

Languages

FlorianSegard/CudaNeuralNetwork

Folders and files

Latest commit

History

Repository files navigation

CUDA Neural Network framework

Core architecture

Tensor operations

Performance features

Components

Neural layers

Training components

Data handling

Example usage

Implemented applications (in ./Examples)

Requirements

Build and run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages