Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update environment on 2019-05-02 #45

Merged
merged 2 commits into from
May 7, 2019

Conversation

dhimmel
Copy link
Collaborator

@dhimmel dhimmel commented May 2, 2019

Includes hetmatpy update to not load redundant (inverse) node pairs for symmetric metapaths.
Refs #43

Includes hetmatpy update to not load redundant (inverse) node
pairs for symmetric metapaths.
Refs greenelab#43
@dhimmel
Copy link
Collaborator Author

dhimmel commented May 2, 2019

I will repopulate database locally to test that the updated software works.

Error text:
TypeError: combinations_with_replacement() missing required argument 'r' (pos 2)
@dhimmel
Copy link
Collaborator Author

dhimmel commented May 7, 2019

The local database import based on 865b8a3 is complete with the following times logged:

>>> python manage.py populate_database --max-metapath-length=3 --batch-size=25000
_download_hetionet_hetmat(self=<dj_hetmech_app.management.commands.populate_database.Command object at 0x7f2e70bb72e8>) ran in 0:00:00
_populate_metanode_table() ran in 0:00:00
_populate_node_table() ran in 0:00:07
_populate_metapath_table() ran in 0:00:00
_download_path_counts(length=1) ran in 0:00:01
_populate_degree_grouped_permutation_table(length=1) ran in 0:01:06
_download_path_counts(length=2) ran in 0:03:43
_populate_degree_grouped_permutation_table(length=2) ran in 0:09:49
_download_path_counts(length=3) ran in 0:45:29
_populate_degree_grouped_permutation_table(length=3) ran in 0:58:17
_populate_path_count_table() ran in 3 days, 7:31:21
database_info
>>> python manage.py database_info
################################ Metanode Table ################################
11 rows

         identifier abbreviation  n_nodes
            Anatomy            A      402
 Biological Process           BP    11381
 Cellular Component           CC     1391
           Compound            C     1552
            Disease            D      137 

################################## Node Table ##################################
47,031 rows

 id         metanode_id  identifier identifier_type                                            name                                               data
  0                Gene      128239             int                                          IQGAP3  {'url': 'http://identifiers.org/ncbigene/12823...
  1         Side Effect    C1112256             str              Peripheral sensorimotor neuropathy  {'url': 'http://identifiers.org/umls/C1112256'...
  2  Biological Process  GO:0097343             str                            ripoptosome assembly  {'url': 'http://purl.obolibrary.org/obo/GO_009...
  3  Molecular Function  GO:0003884             str                   D-amino-acid oxidase activity  {'url': 'http://purl.obolibrary.org/obo/GO_000...
  4  Biological Process  GO:0045833             str  negative regulation of lipid metabolic process  {'url': 'http://purl.obolibrary.org/obo/GO_004... 

################################ Metapath Table ################################
2,205 rows

abbreviation                                  name           source_id target_id  length  path_count_density  path_count_mean  path_count_max  dwpc_raw_mean  n_similar  p_threshold
         AlD             Anatomy–localizes–Disease             Anatomy   Disease       1            0.065403         0.065403               1       0.003746          1          1.0
         AdG            Anatomy–downregulates–Gene             Anatomy      Gene       1            0.012143         0.012143               1       0.000078          3          1.0
         AeG                Anatomy–expresses–Gene             Anatomy      Gene       1            0.062520         0.062520               1       0.000141          3          1.0
         AuG              Anatomy–upregulates–Gene             Anatomy      Gene       1            0.011621         0.011621               1       0.000083          3          1.0
        BPpG  Biological Process–participates–Gene  Biological Process      Gene       1            0.002347         0.002347               1       0.000031          1          1.0 

######################## DegreeGroupedPermutation Table ########################
37,905,389 rows

 id metapath_id  source_degree  target_degree    n_dwpcs  n_nonzero_dwpcs  nonzero_mean  nonzero_sd
  1         AdG              0              0  428073600                0           NaN         NaN
  2         AdG              0              1  115216800                0           NaN         NaN
  3         AdG              0              2  114558000                0           NaN         NaN
  4         AdG              0              3  104676000                0           NaN         NaN
  5         AdG              0              4  103212000                0           NaN         NaN 

############################### PathCount Table ################################
166,199,174 rows

 id metapath_id  source_id  target_id    dgp_id  path_count      dwpc   p_value
  1     CbGdDpS      40959       3076  21798466           1  4.168708  0.003379
  2     CbGdDpS      41016      26706  21798711           1  4.462495  0.003627
  3     CbGdDpS      41016      44771  21798711           1  4.248844  0.004612
  4     CbGdDpS      26947      38200  21798684           2  5.053734  0.003806
  5     CbGdDpS      26947      18692  21798676           1  4.478117  0.003615 

2,192 completed metapaths of 2,205 total metapaths

Previously, the PathCounts table had 174,986,768 rows. Now it has 166,199,174 rows. The size went from 66 GB to 63 GB.

I created an export of the database using:

docker exec dj_hetmech_db \
  pg_dump \
  --host=localhost --username=dj_hetmech --dbname=dj_hetmech \
  --create --clean \
  --compress=8 \
  > hetmech-pg_dump.sql.gz

The documentation for --clean is:

Output commands to clean (drop) database objects prior to outputting the commands for creating them. (Unless --if-exists is also specified, restore might generate some harmless error messages, if any objects were not present in the destination database.)

hetmech-pg_dump.sql.gz is now 5.5 GB with the following sha1sum:

71a9801330fe4204870073fd6b7852421059d5fb  hetmech-pg_dump.sql.gz

@dhimmel
Copy link
Collaborator Author

dhimmel commented May 7, 2019

@dongbohu I talked with @vincerubinetti and he'll let us know when he is okay with some database downtime. At that point, I plan to merge this PR and reload the database using:

zcat hetmech-pg_dump.sql.gz | psql --user=dj_hetmech --dbname=dj_hetmech --host=HOST

Based on the documentation of --clean, I think the SQL commands will take care of deleting the tables which already exist.


Update: I ran the command above and it finished quickly with error messages like:

ERROR:  relation "dj_hetmech_app_pathcount_source_id_f99ba31a" already exists

The duplicate pathcounts rows in #43 persist. Therefore, I think I need to drop the database using the procedure in #16 (comment) before running this command.

@dhimmel
Copy link
Collaborator Author

dhimmel commented May 7, 2019

I am doing:

zcat hetmech-pg_dump.sql.gz | psql --user=dj_hetmech --dbname=dj_hetmech --host=HOST > psql-load-pg_dump-log.txt

This seems to be working, but I got the message:

ERROR:  cannot drop the currently open database
ERROR:  database "dj_hetmech" already exists

This message is probably because #16 (comment) recreates the database.

@dhimmel dhimmel merged commit 913a55b into greenelab:master May 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant