v1.2.0
Change Log
For Version 1.2.0
- Added
glycoworkGUI.py
to build the .exe based GUI for important glycowork endpoint functions:GlycoDraw
,plot_glycans_excel
, andget_differential_expression
- Removed
python-louvain
as a required dependency forglycowork
glycan_data
loader
- Switched from
pkg_resources
toimportlib
for loading tabular data into the package
stats - Fixed an issue in
TST_grouped_benjamini_hochberg
that caused errors if nothing was significantly different in the entire dataset or in any group test_inter_vs_intra_grouping
is now robust to non-paired data and data with differing sample sizes per condition- Added
replace_outliers_with_IQR_bounds
to support outlier treatment inmotif.analysis
- Added
sequence_richness
,shannon_diversity_index
, andsimpson_diversity_index
to calculate diversity indices of glycomics data
motif
processing
- WURCS handling for universal input now encompass more monosaccharides
- GlycoCT handling for universal input now is robust to the declaration of substituents not immediately following their monosaccharide in the GlycoCT string
- Added
equal_repeats
to check whether two repeating units of a polysaccharide are the same, just shifted - Modified glycan nomenclature detection in
canonicalize_iupac
to be less prone of overidentifying Oxford when it’s just numbers etc. - Added “ß” to the typo detection in
canonicalize_iupac
and “(-)” as a variation of linkage uncertainty detection - Made
canonicalize_iupac
robust to the variation of using {} instead of () for linkages
graph
- Removed the required usage of lib in
glycan_to_nxGraph
,compare_glycans
,subgraph_isomorphism
, and all downstream functions (lib only remains for stemification and deep learning model training/inference) - The keyword argument “wildcards_ptm” now also works as intended when providing pre-calculated graphs as input to
compare_glycans
orsubgraph_isomorphism
- Fixed a rare issue in which
subgraph_isomorphism
, when “count = False”, would sometimes erroneously output “False” because of a greedy approach to evaluating potential matches
tokenization
- Added
get_unique_topologies
to retrieve all base topologies for a given composition that have been observed for a given taxonomic subset - Added the “obfuscate_ptm” keyword argument to
map_to_basic
, to allow for mapping Gal6S to Hex6S rather than the default HexOS, if that is required/advantageous - Support mapping of phosphorylated glycans in
map_to_basic
draw
- Fixed an issue where cross-ring fragments were not correctly rendered in
GlycoDraw
plot_glycans_excel
can now also be used with filepaths to .xlsx files (in addition to .csv files)plot_glycans_excel
now also supports compact glycan drawing with the “compact” keyword argument- Improved drawing resolution in
plot_glycans_excel
GlycoDraw
will now more strongly make use of nomenclature canonicalization in case of IUPAC dialects (still not 100%, if you suspect you use a dialect of IUPAC, pass your sequences throughcanonicalize_iupac
first)- If no filepath is specified,
GlycoDraw
will now also display drawn glycan structures in a non-Jupyter environment (as the classic matplotlib pop-up). Note that this functionality requires the cairosvg dependency (head to https://bojarlab.github.io/glycowork/examples.html#glycodraw-code-snippets if you’re unsure about that)
analysis
- Functions able to use .csv paths as input can now also deal with .xlsx paths as input
- The new “annotate_volcano” keyword argument now allows for the direct insertion of SNFG images within plots from
get_volcano
without having to subsequently rundraw.annotate_figure
get_pvals_motifs
,get_differential_expression
,get_glycanova
,get_time_series
, andget_jtk
now useglycan_data.stats.replace_outliers_with_IQR_bounds
to auto-smooth outliers- Moved
hotellings_t2
toglycan_data.stats
- All functions compatible with motif-level analysis now accept the “custom_motifs” keyword argument to be passed to
annotate_dataset
orquantify_motifs
if “custom” is included in “feature_set” - Changed the “mode” keyword argument in
get_heatmap
to “motifs” as a Boolean argument, like in all othermotif.analysis
functions - Added a call to
clean_up_heatmap
toget_jtk
to avoid redundant motifs - Added
get_biodiversity
to compare two groups of glycomics datasets with regard to the sequence diversity that is present (similar to comparable analyses for microbiome data)
regex
- Added
filter_dealbreakers
to allow for the exclusion of identified matches if they have illegal components beyond the identified match (e.g., the forbidden Fuc in "Fuc-([Gal|GalNAc])?-Gal-([!Fuc]){,1}-GlcNAc"). Before this, the sequence context except the Fuc was extracted and returned. - Fixed an edge case in
filter_matches_by_location
in which internal locations sometimes had to handle triple-nested lists which led to errors get_match
can now also use glycan graphs, such as derived fromglycan_to_nxGraph
, as input- Added
get_match_batch
to process a whole list of glycans at once, with some performance improvements via first pre-compiling the pattern - Fixed an edge case in
get_match
in which pattern components consisting of a single monosaccharide with a specified linkage (e.g., “Fuca3”) could sometimes erroneously output no matches - Added
motif_to_regex
to convert glycan motifs (e.g., in IUPAC-condensed) into a regular expression suitable forget_match
. Limited to simple queries for now.
annotate
get_terminal_structures
now has a “size” keyword argument with which users can control the size of the extracted terminal motifsget_k_saccharides
now has a “terminal” keyword argument with which users can filter to only count motifs at non-reducing endsannotate_dataset
and functions using it now can add the “terminal2” and “terminal3” option in “feature_set” to also annotate & analyze terminal motifs of size 2 (e.g., Neu5Ac(a2-3)Gal(b1-4)) or size 3 (e.g., Neu5Ac(a2-3)Gal(b1-4)GlcNAc)
network
biosynthesis
- Added the possibility of providing abundances to
construct_network
that are then stored as node attributes in the network - Added
add_high_man_removal
as a post-processing step inconstruct_network
to allow for the addition of reactions removing mannoses from high-Man N-glycans occurring during maturation - Added
estimate_weights
andget_edge_weight_by_abundance
to estimate reaction capacities from abundances + estimate missing abundances - Added
get_maximum_flow
,get_max_flow_path
, andget_reaction_flow
to calculate maximum flow paths between network root and endpoints as well as aggregate the flow by reaction type - Added
get_differential_biosynthesis
as a wrapper function to compare two groups of glycomes/networks with regard to their biosynthesis (differential flow paths or differential reaction flows) - Fixed an issue in
construct_network
in which sometimes nodes with outgoing but no incoming connections were not detected as unconnected nodes, leading to incomplete networks - Added the
rescue_glycans
decorator toconstruct_network
, to allow for auto-fixing nomenclature variations - Improved performance of
construct_network
by reducing wasteful computation
evolution
- Switched
get_communities
from usingpython-louvain
to the Louvain implementation innetworkx