Skip to content

Commit

Permalink
Merge pull request #363 from microbiomedata/issue-352-proc-inst-enum
Browse files Browse the repository at this point in the history
processing institution code and data
  • Loading branch information
turbomam authored Jul 25, 2022
2 parents 614426f + 9f5385b commit 3760cf2
Show file tree
Hide file tree
Showing 6 changed files with 3,455 additions and 0 deletions.
22 changes: 22 additions & 0 deletions src/schema/nmdc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -713,6 +713,28 @@ enums:
meaning: OBI:0000103
comments:
- credit enums come from https://casrai.org/credit/
processing_institution_enum:
name: processing_institution_enum
comments:
- This will become the range of processing_institution.omics processing
- use ROR meanings like https://ror.org/0168r3w48 for UCSD
from_schema: NMDC_enums_roundtrip
permissible_values:
UCSD:
text: UCSD
title: University of California, San Diego
meaning: https://ror.org/0168r3w48
JGI:
text: JGI
title: Joint Genome Institute
meaning: https://ror.org/04xm1d337
EMSL:
text: EMSL
title: Environmental Molecular Sciences Laboratory
meaning: https://ror.org/04rc0xn13
comments:
- replaces Environmental Molecular Science Laboratory
- replaces Environmental Molecular Sciences Lab

slots:
ess dive datasets:
Expand Down
43 changes: 43 additions & 0 deletions util/fetch_omics_processing_set.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import csv
import pprint

from pymongo import MongoClient

client = MongoClient(
"mongodb://mam:NJAxcszXBt3fu%21q@localhost:27027/?authSource=admin&readPreference=primary&directConnection=true&ssl=false"
)

id_inst_file ="../target/omics_processing_id_inst.tsv"

# on the verbose side for just retrieving a collection
result_filter = {}
result = client["nmdc"]["omics_processing_set"].find(filter=result_filter)

# print(type(result))
# <class 'pymongo.cursor.Cursor'>

# ['GOLD_sequencing_project_identifiers',
# '_id',
# 'add_date',
# 'has_input',
# 'has_output',
# 'id',
# 'mod_date',
# 'name',
# 'ncbi_project_name',
# 'omics_type',
# 'part_of',
# 'principal_investigator',
# 'processing_institution',
# 'type']

id_inst = []
for i in result:
id_inst.append(
{"id": i["id"], "processing_institution": i["processing_institution"]}
)

with open(id_inst_file, 'w') as f:
csv_writer = csv.DictWriter(f, list(id_inst[0].keys()), delimiter="\t")
csv_writer.writeheader()
csv_writer.writerows(id_inst)
Loading

0 comments on commit 3760cf2

Please sign in to comment.