Raku/Perl6 resources for bioinformatics research by Tao Wang. This github repository serves as an introduction to Raku/Perl6 for bioinformaticians and general programmers, with linkes to more advanced resources listed where appropriate.
I am a bioinformatics scientist and a Perl diehard. Vist my lab website for more bioinformatics resources: https://qbrc.swmed.edu/labs/wanglab/index.php
- Download and install raku: https://github.com/rakudo/rakudo/releases (see https://perl6.org/downloads/ for installation instructions)
- Depending on how raku is installed, you may need to manually install zef, the raku package manager: https://github.com/ugexe/zef
- Raku tutorial to get you started. I found this tutorial to be very good: https://perl6intro.com/
- Full Raku documentation: https://docs.perl6.org/
- More explanations of Raku grammar: https://perl6advent.wordpress.com/
- Your first raku experience. After you have installed raku, run this in the shell console, and you should get "hello bioinformatics!"
perl6 -e "'hello bioinformatics!'.say"
- Your first raku script. Create a script with the name test.p6 using your favorite text editor. This is its content:
Execute by running it in the console
#!/usr/bin/env perl6 my @array[3]; @array[0]="bioinformatics"; @array[1]="is"; @array[2]="awesome!"; say @array;
And you should get "[bioinformatics is awesome!]"perl6 test.p6
- Your first raku bug? Debug it! Type the follwing in the console
But the Raku IDE has much more professional debugging methods
perl6-debug-m test.p6
This section provides a benchmark of performance of Raku, for a simple and barely useful job of parsing a small SAM file. A perl5 script is used as a control. An evolving Raku script (with each version documented) is run against the most recent Raku release, and the time it takes for the script to finish is documented. The purpose of this practice is to show how the improvement in the Raku compiler and coding manner can affect performance
Script | Release | SAM | Time (real) | Time (user) | Time (sys) | Comment |
---|---|---|---|---|---|---|
benchmark.pl | v5.22.1 | test1.sam | 0.195 | 0.184 | 0.005 | |
benchmark1.p6 | 2019.03 | test1.sam | 9.975 | 10.454 | 0.153 | |
benchmark1.p6 | 2019.07.1 | test1.sam | 6.376 | 6.704 | 0.176 | |
benchmark2.p6 | 2019.07.1 | test1.sam | 1.888 | 5.698 | 0.178 | credit goes to lizmat! |
benchmark2.p6 | 2019.11 | test1.sam | 1.834 | 5.582 | 0.181 | |
benchmark2.p6 | 2020.01 | test1.sam | 1.756 | 6.455 | 0.262 | |
benchmark.pl | v5.22.1 | test2.sam | 1.935 | 1.889 | 0.042 | |
benchmark2.p6 | 2020.01 | test2.sam | 7.077 | 4.361 | 1.475 | |
benchmark2.p6 | 2020.02 | test2.sam | 6.657 | 1.724 | 1.018 | |
benchmark2.p6 | 2020.05 | test2.sam | 8.129 | 73.737 | 1.440 | |
benchmark2.p6 | 2020.06 | test2.sam | 8.707 | 76.272 | 1.763 | |
benchmark2.p6 | 2020.07 | test2.sam | 7.823 | 63.254 | 1.229 | |
benchmark2.p6 | 2020.08.1 | test2.sam | 8.603 | 71.453 | 1.831 | |
benchmark2.p6 | 2020.09 | test2.sam | 5.885 | 51.270 | 1.255 | |
benchmark2.p6 | 2020.10 | test2.sam | 6.268 | 56.761 | 1.861 | |
benchmark2.p6 | 2020.12 | test2.sam | 6.131 | 56.368 | 1.561 | |
benchmark2.p6 | 2021.02.1 | test2.sam | 6.509 | 56.179 | 1.359 | |
benchmark2.p6 | 2021.03 | test2.sam | 6.261 | 56.393 | 1.810 | |
benchmark2.p6 | 2021.04 | test2.sam | 6.212 | 53.935 | 1.664 | |
benchmark2.p6 | 2021.05 | test2.sam | 6.385 | 56.497 | 1.610 | |
benchmark2.p6 | 2021.06 | test2.sam | 6.736 | 67.202 | 1.742 | |
benchmark2.p6 | 2021.07 | test2.sam | 6.583 | 73.122 | 1.633 | |
benchmark2.p6 | 2021.12 | test2.sam | 5.388 | 47.567 | 1.312 |
Raku does have a built-in profiler for measuring performance of codes, but I do find it hard to understand its output
perl6 --profile=profile.html benchmark.p6 --sam_file=test1.sam
Bioinformatics involves a lot of math, statistics, machine learning, data science ... Let's see how Raku does in terms of handling numbers and formula
- Raku's inherent number processing capabilities
Raku, compared to Perl 5, has incorporated some useful functionality to handle scientific computations in the core of the language. For example
10.rand # random number generation
my $rat=<4/6> # representing real numbers as fractions to preserve accuracy
5e3 # scientific notation
Now we finally have an IDE for Raku now: https://commaide.com/. Believe it or not, I worked with Perl using just Vim and not any IDE for 8 years...
You can help make Perl great again (or more exactly, make Perl6/Raku great) in so many ways!
- Contribute your own Perl6/Raku modules: https://modules.perl6.org/
- Basic instructions for creating Raku modules: https://docs.perl6.org/language/modules
- I have created a trivial Raku module that is nothing more than a skeleton Raku module but can serve as a good reference point of starting a Raku module. Check the "/Make Perl great again/trivial" folder
- If you find bugs in Raku, you can report here: https://github.com/rakudo/rakudo/issues
- Raku discussion forum, home to many passionate Raku folks: https://webchat.freenode.net/#raku
It seems that Raku itself has mostly completed construction, pending performance optimization. It needs to be roughly 10x faster to be really comparable with Perl. The more important thing is to build the ecosystem. As of now, many essential packages for bioinformatics applicatios are still missing in Raku. My wishlist for Raku/Raku packages for 2020
- Speed up Raku
- Raku packages for math/statistics. Let's watch: https://github.com/pierre-vigier/Perl6-Math-Matrix
- Raku data visualization packages. Something like R ggplot or plotly
- Raku packages for handling image IO (JPG, PNG, etc). Any candidate yet? Important for deeplearning
- Light-weight Raku packages for web development. Let's watch: https://github.com/Bailador/Bailador