Skip to content

Can I search without going through the Hub

Weiheng Liao edited this page Nov 29, 2022 · 1 revision

Can I search without going through the Hub?

Of course you CAN!~

1 Hub module:This module provides users with a one-stop shop for using patpat_env classes

This module is the core module of patpat_env and allows users to aggregate other classes of patpat_env through the classes in this module. init function provides the runtime environment required for patpat_env. queryHub class provides querying of protein metadata and generating peptides to be retrieved, while MapperHub class provides breakpoints for merging and retrieving multiple proteomic databases . Furthermore, both classes are pluggable, so it is easy to insert self-built methods or classes into them, as long as the developer adheres to the interface design.

Typical usage example:

from patpat import hub
from patpat import mapper
from patpat import utility

utility.init()

identifier_ = 'P05067'

q = hub.QueryHub()
q.identifier = identifier_
q.simple_query()

m = hub.MapperHub(config=q.get_query_config(),
                  mappers=[mapper.PrideMapper(), mapper.IProXMapper()],
                  task=None
                  # task=[your task's uuid]
                  )

m.mapping()
result_ = m.export()

The classic usage of the Hub module above is the same as that described in Quick Start. If you don't understand it, please go back and take a look~

2 Retriever Module: This module contains classes for interacting with external proteomics databases

The classes in this module use three levels of inheritance. First,(1) all subclasses need to inherit from the Retriever base class, second,(2) depending on the database to be targeted, a base class needs to be created for that database, and finally, the subclass that collects the specified data from that database needs to inherit from that database's base class (3). For example, the base class of PRIDE database is named GenericPrideRetriever (2), which inherits Retriever (1) base class, and the subclass that collects protein information from PRIDE is named PrideProteinRetriever (3).

Typical usage example:

2.1 Search for matching PSMs in the PRIDE database by protein identifier

from patpat import logger
from patpat import retriever

# Logger module for testing, with console output available through TestLogger when using other modules alone
l = logger.TestLogger()

r = retriever.PrideProteinRetriever()
r.request_word = "Q9CWY9"
r.get_payloads_on_web()
r.payloads["pageSize"] = 200  # Optional: The parameters can be modified
r.retrieve()

output = r.response

2.2 Search for matching PSMs in the PRIDE database by peptide sequence

from patpat import logger
from patpat import retriever

# Logger module for testing, with console output available through TestLogger when using other modules alone
l = logger.TestLogger()

r = retriever.PridePeptideRetriever()
r.request_word = 'TCVADESAENCDK'
r.get_payloads_on_web()
r.retrieve()

output = r.response

2.3 Get metadata in iProX database by dataset number

from patpat import retriever

r = retriever.IProXProjectRetriever()
r.request_word = 'PXD006512'
r.retrieve()

output = r.response  

It is easy to see that there are two keys to automatic information acquisition.

  • Getting the sequence of the peptide to be queried and the number of the data set (Queryer module)
  • Calling the retriever class repeatedly (Mapper module)

Next, let's see how we should use the Mapper module!

3 Mapper module:Call the retriever module and implement the retrieve function

This module builds different classes according to the database oriented, and each class calls the corresponding retriever to realize the retrieve by itself. The protein/peptide retriever is used to implement the protein/peptide level search, and the protein-to-project mapping is implemented by the project retriever. In addition, each class also needs to implement other functions:

  • a filtering function to remove incorrect mappings;
  • an export function to structure the retrieve results for downstream modules to call.

Typical usage example:

from patpat import hub
from patpat import mapper
from patpat import utility

utility.init()
utility.initiate_uniprot_proteome_catalog()

identifier = 'E9PV96'
q = hub.QueryHub()
q.identifier = identifier
q.simple_query()

conf = q.get_query_config()

m = mapper.PrideMapper()

# Add configs
m._identifier = conf['identifier']
m._peptides = conf['peptides']
m._organism = conf['organism']

m.mapping()
m.filtering()

output = m.export()

4 Querier module: This module is used to build the configuration information that needs to be retrieved

The configuration information required to complete the mapping are the protein metadata and the peptide sequences to be retrieved. The necessary protein metadata are: the protein identifier, the species or tissue to which the protein belongs, and the protein sequence, where the protein sequence is used to generate the peptide sequences to be retrieved.

from patpat import querier

p1 = querier.UniProtProteinQuerier()
p1.identifier = 'E9PV96'
p1.query()
_, organism_, fasta_ = p1.get_properties()

p2 = querier.LocalPeptideQuerier()
p2.set_params(sequence=fasta_['sequence'],
              organism=organism_)
p2.query()
digestion_params_, source_, filtered_peptides_ = p2.get_properties()

5 Logger module

The class CoreLogger is used to generate logs and search temporary files, and is recommended to be used with hub.py and viewer.py.

Typical usage example:

from patpat import logger

uuid = ''

l = logger.CoreLogger(uuid)
l.set_core()
l.set_tmp()

6 Checker module: This module is used to check the connectivity of external proteomics databases

Typical usage example:

from patpat import checker
from patpat import retriever

c = checker.Checker()
c.peptide_retrievers = [
    retriever.PridePeptideRetriever(),
    retriever.IProXPeptideRetriever()
]
c.check()

The end

Can't get enough? Start learning how to extend Patpat!

Clone this wiki locally