Skip to content

Search for datasets via MapperHub

Weiheng Liao edited this page Dec 3, 2022 · 4 revisions

Search for datasets via MapperHub

Congratulations, you've finally arrived at the search part. Bear with me a little longer! MapperHub is very easy to use, so let's get started!

First, I would like to introduce a few parameters:

  • config: Search configs. Enter the config information generated by QueryHub.
  • mappers: Set the mapper to be used. Here, selecting a mapper is the same as choosing the public database to search.
  • task: Optional, the UUID of this search task, used for breakpoints of the task. It is available via MapperHub.config after the task has started.

First enter the config information for QueryHub. Here we choose two mappers (PrideMapper and IProXMapper) for searching. The mapper classes are stored in the mapper module. (Please note the connectivity of the database to Patpat, see Q&A for details)

Also, leave task blank as it is the first time the search task is performed.

m = hub.MapperHub(config=q.get_query_config(),
                  mappers=[mapper.PrideMapper(),
                           mapper.IProXMapper(),
                           ],
                  task=None
                  # task=[your task's uuid]
                  )

Where q is the QueryHub class generated from the completed config. I assume you've already seen how to build a search config via QueryHub, but if not, go back and look again!

As you can see, the mappers parameter makes the whole search process pluggable. Finally, let's get searching!

m.mapping()

When you have finished your search, do not forget to export the results.

result = m.export()

In addition to result, the search results include two files:

  • .json: Contains the complete metadata of the dataset, with the data abundance determined by the public database. However, four keys are necessarily present in each result:
    • protein: Protein-level search results for this dataset
    • peptides: Peptides-level search results for this dataset
    • summary: Summary for this dataset
    • website: Website for this dataset
  • .tsv: Tabular information, it can be viewed via MS Excel at:
    • title: Dataset title
    • summary: Dataset summary
    • website: Dataset website

They are stored in patpat_envs/result/<task_uuid>.

Have other databases of interest? Want to develop your own search process? Come to Tutorial: Extending Patpat to learn about Patpat's interface!