Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the grid dimensionality during KANLayer initialization to reduce memory/GPU usage significantly and greatly reduce the initialization time of KANLayer. #378

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

congyue1977
Copy link

In the initialization process of KANLayer, since the knots vector of B-Splines is constructed based on the grid_range parameter, it is identical across all input dimensions (in_dim). This means the data in the grid is redundant, so simply setting the size of the first dimension to 1 suffices. Subsequent calculations will automatically utilize tensor broadcasting and will not affect the grid update process.

This optimization reduces memory or GPU usage significantly. After optimization, each layer of KANLayer can save (in_dim-1) * (G+2k+1) memory. If the depth is N and input dimensions are the same, this can save N*(in_dim-1) * (G+2k+1).

Furthermore, this optimization drastically reduces the initialization time of KANLayer, improving network efficiency. Through testing, with a large G, for example 100, and a width of [4,100,100,100,1] with k=3 for KAN, before optimization, it took nearly 30s to start training on an Intel i9-12900K. After optimization, training starts in less than 1s.

KindXiaoming and others added 12 commits July 21, 2024 20:36
…duce memory/GPU usage significantly and greatly reduce the initialization time of KANLayer.

In the initialization process of KANLayer, since the knots vector of B-Splines is constructed based on the grid_range parameter, it is identical across all input dimensions (in_dim). This means the data in the grid is redundant, so simply setting the size of the first dimension to 1 suffices. Subsequent calculations will automatically utilize tensor broadcasting and will not affect the grid update process.

This optimization reduces memory or GPU usage significantly. After optimization, each layer of KANLayer can save (in_dim-1) * (G+2k+1) memory. If the depth is N and input dimensions are the same, this can save N*(in_dim-1) * (G+2k+1).

Furthermore, this optimization drastically reduces the initialization time of KANLayer, improving network efficiency. Through testing, with a large G, for example 100, and a width of [4,100,100,100,1] with k=3 for KAN, before optimization, it took nearly 30s to start training on an Intel i9-12900K. After optimization, training starts in less than 1s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants