Skip to content

huwzpf/ParallelMatrixMultiplication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Matrix Multiplication project for Parallel Programming course

General description

This repository contains implementations of matrix multiplication algorithms for both GPU and CPU with different optimizations.

Performance results

CUDA implementation (float_cuda_2d_tiled_register_cache_matrix_multiplication.cu) is 17.3 % slower than cublasSgemm() from cublas_v2.h and 567 % faster than algorithm without any optimizations on large matrices.

Time plot

TODO

  • Refactor the code to use single main function and just include the kernels at compile time
  • Add CPU performance comparison to some library implementation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published