-
Notifications
You must be signed in to change notification settings - Fork 9
Running PIA via KNIME
Running PIA via KNIME is not as comfortable as running the web interface, but has some advantages when you need to apply the same steps to a number of PIA compilations or want to run a once set up workflow again and again. The downside of this is, that you need to run KNIME and PIA on a desktop computer or workstation instead of a (more powerful) server.
PIA is tested to run with at least KNIME 3.1.0.
To run PIA in KNIME, the easiest way is to use the repository. Currently, the PIA nodes are in KNIME's trunk repository (http://update.knime.org/community-contributions/trunk).
To install PIA go to the "Install New Software" in the Help menu. Click on the "available software sites" (directly below the "Add..." button) and then the "Add..." button in the "Available Software Sites" window. Give the repository a name like "KNIME trunk" and use the URL http://update.knime.org/community-contributions/trunk. When everything is entered, click on ok and double check, that the just created repository is enabled (checked). Click Ok in the "Available Software Sites" window and select the trunk repository (or alternatively select "All available sites") in the select box. the PIA nodes can be found in the "Bioinformatics & NGS" group or simply by searching for it. Select it, click next, accept the license and restart KNIME after the installation is finished.
If all went well, you will see the PIA octopus on the splash screen of KNIME and you will find the PIA nodes under "Community Nodes".
To compile search results into a PIA-XML file, simply use the PIACompiler
node (found in PIA/general). If your searches are already done, the node needs a GenericKnimeNodes/IO InputFile(s)
for input and either a OutputFile
or OutputFolder
as shown in the picture below, or direct pipelining into another node.
Be aware that this node could need a high amount of RAM, depending on the input files. Set the vmsettings
param accordingly!
You can also connect the output of searches performed by OpenMS. For this use the output of PeptideIndexer
nodes as the input to the PIACompiler
node. The following picture shows such a workflow, performing an X!Tandem and a Mascot search and combining the results into one PIA-XML file.
As a side-node, you could also merge already performed searches with searches from an OpenMS workflow.
In this step a parameters XML file is created, which later will be applied to the PIA compilation. This always has to start with a PIAGeneratePipelineXML
to initialize the XML file and give the pipeline a name. Afterwards you can add filters or the execution points for other actions to the pipeline. The nodes will be executed in the order given in the pipeline - this is especially important when for example setting a decoy pattern and executing the FDR calculation, which should be afterwards. The pipeline execution does not export any data, for this the PIAExport
node is needed, but only manipulates the date before the actual execution. Still, the pipeline creation in KNIME and thus all nodes except PIACompiler
and PIAExport
only create respectively enhance an XML file with parameters, the execution and the export is handled by the PIAExport
node.
Click on the following link for an overview of PIA KNIME nodes .
In the following a common pipeline for the protein inference is shown.
The PIAExport
node node needs as input a PIA XML file, either a saved file or directly form the PIACompiler
, and a PIA parameters XML file. One node supports export to the three levels of the PIA analysis (PSMs, peptides and proteins). For each of these levels a format can be chosen (mzIdenML, mzTab or CSV) and also the output of each level must be connected to either a Output File
or Output Folder
to create a respective export (for most analyses only one export is used).
There can be only one export per level. If you, for example, need a separate export for each input file on PSM level, you need to create an PIAExport
node for each of these. If you don't wan to export a specific level, keep the format set to empty.
Be aware that this node could need a high amount of RAM, depending on the input files. Set the vmsettings
param accordingly!
In the following the parameters of the Export are explained in more detail:
parameter | explanation |
---|---|
format | choose the format of the export (mzIdentML, mzTab, CSV) |
fileID | for the PSM and peptide level you can choose to export either all data (0) or only the data from one distinct input file (1, 2,...) |
spectralCount | choosing this in the PSM level creates a single line for each PSM's accession (only in CSV export) |
oneAccessionPerLine | similar to spectralCount, but on peptide and protein level exports |
exportPSMs | also export single PSM information (some formats do this always) |
exportPSMSets | also export PSM set information (some formats do this always) |
exportPeptides | also export peptide information (some formats do this always) |