Activity
move matmul multithreading pragma to inner loop
move matmul multithreading pragma to inner loop
fix divide-by-zero for f16, f32 models
fix divide-by-zero for f16, f32 models
optimize blockwise scale reads in matmuls
optimize blockwise scale reads in matmuls
Pull request merge
optimize blockwise scale reads in matmuls
optimize blockwise scale reads in matmuls
Force push
Fix signed arithmetic overflow
Fix signed arithmetic overflow
Pull request merge
Fix signed arithmetic overflow
Fix signed arithmetic overflow
Force push
write to multiple shards in output_dir
write to multiple shards in output_dir
Apply bias after sigmoid rather than before
Apply bias after sigmoid rather than before
allow specifying num of layers to convert
allow specifying num of layers to convert
Force push