-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gemm
falling back to the three-loop implementation when invoked as a function template, even for matrices of the same datatype
#31
Comments
gemm
falling back to the three-loop implementation when invoked as a function templategemm
falling back to the three-loop implementation when invoked as a function template, even for matrices of the same datatype
Thanks for reporting this. I never really thought about invoking BLAS++ this way, where the template parameter is specified explicitly. Usually I expect template parameters to be determined implicitly from the arguments. E.g., I expect users to call
rather than the more explicit calls:
The row-major behavior was a bit surprising, so thanks for pointing that out. It's clear what's happening because gemm calls itself using the first convention. We could add template specializations for cases that map to optimized routines (e.g., sgemm). The design uses overloaded routines so that C++ can convert scalar data types, e.g.,
where alpha and beta are doubles that get implicitly converted to std::complex. This works with overloads, but not with templates. |
BTW, don't expect a change in BLAS++ any time soon. We have other things that have higher priority. In the meantime, simply omit the template parameters to get the desired behavior. |
This is not really a bug, but one might want to prevent users from accidentally doing this since the performance was really bad and it is not obvious what's going on. To that end one might use SFINAE to disable the template when all 3 types are the same. In that case the compiler still finds the overloads when calling with identical types, finds the template function when the types are different, and errors out if the template is explicitly called and the types are the same. It's quite a small change. Here is a minimal complete example where STL's type_traits are used to hide the template function from the compiler #include <iostream>
#include <type_traits>
// this template handles data of different types. SFINAE hides it from the
// compiler when the types are the same. This is what we want since there
// are overloads for faster implementations in that case
template <typename TA, typename TB, typename TC,
typename std::enable_if<
!(std::is_same<TA,TB>{} && std::is_same<TA,TC>{})
, bool>::type = true>
void gemm(
TA const *A,
TB const *B,
TC *C)
{
std::cerr << "You called the slow method! TA != TB != TC" << std::endl;
}
/// overloads invoking faster implementation for when the types are the same
void gemm(
double const *A,
double const *B,
double *C)
{
std::cerr << "You called the faster implementation" << std::endl;
}
int main()
{
float *Af = nullptr;
double *Ad = nullptr;
double *Bd = nullptr;
double *Cd = nullptr;
gemm(Af,Bd,Cd); // OK, template function called arguments have different type
gemm(Ad,Bd,Cd); // OK, fast overload called arguments have the same type
gemm<double>(Ad,Bd,Cd); // WRONG, specifying the template parameter forces
// the compiler to choose the template even though
// there's an explicit overload for this case. SFINAE
// can be used to remove the template and catch the
// error at compile time
return 0;
} This program fails to compile because of the
The compiler gives a pretty good error message in this case! When the
|
I could see instances when TA = TB = TC but they are not one of the 4 standard types float, double, complex-float, complex-double. E.g., half, double-double, int. The thing to disable or redirect is really just the 4 standard types. Providing specializations isn't all that complicated, but there are quite a few routines in BLAS to update. Incidentally, you can blame Weslley for the Level 3 BLAS templates. Initially we had only Level 1 and 2 templates. |
the following might be closer ... template <typename T, typename ...Ts>
using all_same = std::conjunction<std::is_same<T,Ts>...>;
template< typename T >
struct is_blaspp_type
: std::integral_constant<
bool,
std::is_same<float, typename std::remove_cv<T>::type>::value
|| std::is_same<double, typename std::remove_cv<T>::type>::value
|| std::is_same<std::complex<float>, typename std::remove_cv<T>::type>::value
|| std::is_same<std::complex<double>, typename std::remove_cv<T>::type>::value
> {};
template <typename T, typename ...Ts>
using all_blaspp_type = std::conjunction<is_blaspp_type<T>, is_blaspp_type<Ts>...>;
// this template is the fallback when faster implementation is not available.
// SFINAE hides it from the compiler when all 3 types are the same or when
// all of the types are one of blaspp's standard types. This is what we want
// since there are overloads for faster implementations in those cases
template <typename TA, typename TB, typename TC,
typename std::enable_if<
!all_same<TA,TB,TC>{} || !all_blaspp_type<TA,TB,TC>{}
, bool>::type = true>
void gemm(
TA const *A,
TB const *B,
TC *C)
{
std::cerr << "You called the template implementation. all same? "
<< all_same<TA,TB,TC>::value << " all supported? "
<< all_blaspp_type<TA,TB,TC>::value << std::endl;
} |
blas mixes templates for slower unoptimized fall back implementations and overloads for faster optimized implementations. specifying the template type in calls to blas functions tells the compiler you want to use the slower template implementation. For more info see: icl-utk-edu/blaspp#31
blaspp has a fallback
gemm
in line 90 ofinclude/blas/gemm.hh
, as shown below. It is a fallback naive three-loop implementation ofgemm
intended to be used where the data type of all three matrices are different.However, calls in the form
blas::gemm<double>
where all matrices are of the same data type somehow got compiled to that fallback function. What's more, when the matrices are in row major format, the correct specialized blas function is invoked. (This is because when that happens, the code calls the column major version (line 103). However, it does not specify the matrix data types, leading to the correct function being used. ) While for column major matrices, the three-loop version is called.This behavior is very unexpected: when calling
blas::gemm<double>
, most users would expect we are using a specialize kernel for double precision. This could lead to perplexing performance behaviors, where row majorgemm
s are normal and column majorgemm
s are slow. Althoughblas::gemm<T>
is undocumented and therefore it's not a bug, I think it's better to either use the kernels like the overloaded variants does, or issue a compiler error/warning.cc @rileyjmurray @burlen
The text was updated successfully, but these errors were encountered: