Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet and sqlite support; add NNLS-based PEP and Q-value calculation #119
Add parquet and sqlite support; add NNLS-based PEP and Q-value calculation #119
Changes from 244 commits
58e8481
74f91f1
2985a7f
12ebe26
49608e1
6ccc88e
0b4fdc5
86045d9
270efb5
f595804
46fbf6b
ee95fbd
f3d50c8
4293410
623b7d8
8f417dd
2e1723e
6355834
296fb73
096b07f
d497fcc
d241adb
41ed445
ac43547
4a9872f
346a0c0
f543166
0742dc2
a2602df
f12a43d
0fd515b
6726dea
982de49
a823663
f094cfe
c74074e
fca3a7c
46cdd24
3dc71c0
f681d9a
187543f
f6ae3dd
6787feb
3a0b54d
f2bc2fd
c184598
8966115
2066b87
60e8bc0
9787f24
e08edf1
bfe642f
e11b8ff
79c0bcf
a1facaf
e6390fa
04e3416
5e5cb2e
47e87b1
ed35208
9758b4e
5688549
5cf32a5
217f536
c70db4b
f170730
d448728
c403c1d
317c5dd
6b4792d
4d86c6c
452abdb
831b52f
3ce5ca1
a3b6f5f
6d770ab
b2405b4
083a770
386f55f
99c11dc
6e472af
4c97c0c
b53aa66
8da9f4e
5d33cd5
3385726
34d3281
e88540b
e238b56
c8f0a71
28267f4
3e07c55
5231699
3737bc7
39a3c1d
084b3c5
0f52364
d04746c
edef280
21c7008
7721322
3c72fcf
c7e80dd
1d40579
38415ff
1720dc6
802a520
466838c
fb6fee9
25f9d18
4cf19ce
4b99571
768753a
1cb5ab6
5d5048f
948bf1a
0d322cd
3aa7078
3cf8ab6
0677338
da38823
3c87b4b
1e629b9
ced16dd
024280d
a8582d1
b16cf8d
b6c51da
65d6f7e
c628a34
4ca6442
7258909
3403167
c717d8b
a8146c0
6a40e7f
8918ebf
8a53bdc
4e3b8a5
45386c9
53c198d
f12eb5c
43e8f41
a6d846c
90f8601
04dbd44
7c38b1c
4bf1a34
64465d5
1c65adf
8d71506
a7b567d
4d46608
1da20a4
803ccf2
89b5d2a
67d0a84
d5d84ef
251a2ba
0cb411f
8a6c1c4
f7d18e4
367b844
1fcc318
54c0341
883c0c0
1a06a23
b3cd5a5
fc8427c
b068661
fadbed9
b140f92
3624a2c
a80fd47
7a98e59
2d706f0
0ee32fc
672195f
4b9784f
49bd14c
1db4df6
66a0c38
3ede1e3
5fd274d
6dc328d
5708f10
873f30e
30d1bb8
51b1be2
bc73192
6ed3f7d
73ab5e6
4dc3e17
c17c3a2
f5265a2
75a74af
4a6cfe4
8ae2151
4893c81
867afb4
d5d707c
df67d3d
a83e573
e583f48
0d429cd
f2c90f1
0f7a78f
931fcbf
f0d1aac
d80d899
721d017
1962632
cd49bd9
09931a8
418897b
efd1e45
c690b80
aa4da8c
509d090
dcfb1e4
4cf872e
1877517
28a23fc
49aad73
2276182
0f58675
2cf8265
c021f09
02c42dd
b742c73
88065fd
3a9a667
b9bac1c
719eb0f
662b036
cea95a9
8d4ffb2
dc62eff
9a2b3ae
f90f389
3d0e592
653179f
68fd52b
b666ef9
3a3e5ea
00a3c64
3eb62a3
47c6c0b
9197db9
55ad062
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how hard would it be to have this file generated programmatically? (after the xz vulnerability I am trying to have less files that are not plain text in repos ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's depending on how real the data should look like. If it's just for testing if it runs through and some standard cases, this would be possible. For a more real-life integration test even the 10k sample might be small.
Would it be ok, to separate this out from this PR into a separate issue?
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouh, I wasn't aware that this is due to Comet's output format. The use of tabs as separater in the protein column in a tab separated file, breaks some of the functionality that we use in pandas, I believe. We need to specify the column names, their types and a separator. If Pandas read function, after specifying this encounters such an example, it will throw an error.
I think that pandas behaviour makes sense, because if
\t
is the columns separation character, then it must not be used inside columns...I would be in favour of a design that explicitly specified how the input is formatted and having a single reader function per specified format. If other software does not adhere to the given specifications, one would need converters that could run as preprocessing step.
Probably, we would need a follow up on this as well. Let me know your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also hate that they do it that way but alas I think its widely spread so ... I think its critical to support it (we believe this is a real requirement for us).
As implementation alternatives .. (revision: I tried it and its harder than I thought ...) pandas can read anything that implements a .read() method ... so we could do a try->catch option, where it by defaults tries to read the standard way, and if it errors out due to the "non-uniform-number-of-columns", we can wrap it in a way that it stores the "right" number of columns and wraps the proteins with the new separator.
I think that would be a pretty low-effort way to support it.
LMK what you think!
Related: UWPR/Comet#66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like Jimmy will add that feature, so we would just need to add a warning pointing to a fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He also points tot he fact that the percolator-defined pin is tab delimited between proteins ... https://github.com/percolator/percolator/wiki/Interface#tab-delimited-file-format ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://uwpr.github.io/Comet/parameters/parameters_202401/pinfile_protein_delimiter.html its up on stable ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is amazing.
Large diffs are not rendered by default.
Large diffs are not rendered by default.