Skip to content

Running PIA via KNIME

Julian Uszkoreit edited this page Jan 27, 2016 · 30 revisions

Running PIA via KNIME is not as comfortable as running the web interface, but has some advantages when you need to apply the same steps to a number of PIA compilations or want to run a once set up workflow again and again. The downside of this is, that you need to run KNIME and PIA on a desktop computer or workstation instead of a (more powerful) server.

PIA is tested to run with at least KNIME 3.1.0.

Setup

To run PIA in KNIME, the easiest way is to use the repository. Currently, the PIA nodes are in KNIME's trunk repository (http://update.knime.org/community-contributions/trunk).

To install PIA go to the "Install New Software" in the Help menu. Click on the "available software sites" (directly below the "Add..." button) and then the "Add..." button in the "Available Software Sites" window. Give the repository a name like "KNIME trunk" and use the URL http://update.knime.org/community-contributions/trunk. When everything is entered, click on ok and double check, that the just created repository is enabled (checked). Click Ok in the "Available Software Sites" window and select the trunk repository (or alternatively select "All available sites") in the select box. the PIA nodes can be found in the "Bioinformatics & NGS" group or simply by searching for it. Select it, click next, accept the license and restart KNIME after the installation is finished.

If all went well, you will see the PIA octopus on the splash screen of KNIME and you will find the PIA nodes under "Community Nodes".

Compiling search results into a PIA-XML file

To compile search results into a PIA-XML file, simply use the PIACompiler node (found in PIA/general). If your searches are already done, the node needs a GenericKnimeNodes/IO InputFile(s) for input and either a OutputFile or OutputFolder as shown in the picture below, or direct pipelining into another node.

Be aware that this node could need a high amount of RAM, depending on the input files. Set the vmsettings param accordingly!

PIACompiler with files from prior searches

You can also connect the output of searches performed by OpenMS. For this use the output of PeptideIndexer nodes as the input to the PIACompiler node. The following picture shows such a workflow, performing an X!Tandem and a Mascot search and combining the results into one PIA-XML file.

PIACompiler with files from OpenMS workflow

As a side-node, you could also merge already performed searches with searches from an OpenMS workflow.

Creating a pipeline

In this step a parameters XML file is created, which later will be applied to the PIA compilation. This always has to start with a PIAGeneratePipelineXML to initialize the XML file and give the pipeline a name. Afterwards you can add filters or the execution points for other actions to the pipeline. The nodes will be executed in the order given in the pipeline - this is especially important when for example setting a decoy pattern and executing the FDR calculation, which should be afterwards. The pipeline execution does not export any data, for this the PIAExport node is needed, but only manipulates the date before the actual execution. Still, the pipeline creation in KNIME and thus all nodes except PIACompiler and PIAExport only create respectively enhance an XML file with parameters, the execution and the export is handled by the PIAExport node.

Click on the following link for an overview of PIA KNIME nodes .

In the following a common pipeline for the protein inference is shown. KNIME pipeline for protein inference by PIA

Executing the pipeline and exporting data

The PIAExport node node needs as input a PIA XML file, either a saved file or directly form the PIACompiler, and a PIA parameters XML file. One node supports export to the three levels of the PIA analysis (PSMs, peptides and proteins). For each of these levels a format can be chosen (mzIdenML, mzTab or CSV) and also the output of each level must be connected to either a Output File or Output Folder to create a respective export (for most analyses only one export is used).

There can be only one export per level. If you, for example, need a separate export for each input file on PSM level, you need to create an PIAExport node for each of these. If you don't wan to export a specific level, keep the format set to empty.

Be aware that this node could need a high amount of RAM, depending on the input files. Set the vmsettings param accordingly!

In the following the parameters of the Export are explained in more detail:

parameter explanation
format choose the format of the export (mzIdentML, mzTab, CSV)
fileID for the PSM and peptide level you can choose to export either all data (0) or only the data from one distinct input file (1, 2,...)
spectralCount choosing this in the PSM level creates a single line for each PSM's accession (only in CSV export)
oneAccessionPerLine similar to spectralCount, but on peptide and protein level exports
exportPSMs also export single PSM information (some formats do this always)
exportPSMSets also export PSM set information (some formats do this always)
exportPeptides also export peptide information (some formats do this always)