Skip to content

v1.2.0

Compare
Choose a tag to compare
@Bribak Bribak released this 15 Mar 14:03
· 169 commits to master since this release
30e64cf

Change Log

For Version 1.2.0

  • Added glycoworkGUI.py to build the .exe based GUI for important glycowork endpoint functions: GlycoDraw, plot_glycans_excel, and get_differential_expression
  • Removed python-louvain as a required dependency for glycowork

glycan_data

loader

  • Switched from pkg_resources to importlib for loading tabular data into the package
    stats
  • Fixed an issue in TST_grouped_benjamini_hochberg that caused errors if nothing was significantly different in the entire dataset or in any group
  • test_inter_vs_intra_grouping is now robust to non-paired data and data with differing sample sizes per condition
  • Added replace_outliers_with_IQR_bounds to support outlier treatment in motif.analysis
  • Added sequence_richness, shannon_diversity_index, and simpson_diversity_index to calculate diversity indices of glycomics data

motif

processing

  • WURCS handling for universal input now encompass more monosaccharides
  • GlycoCT handling for universal input now is robust to the declaration of substituents not immediately following their monosaccharide in the GlycoCT string
  • Added equal_repeats to check whether two repeating units of a polysaccharide are the same, just shifted
  • Modified glycan nomenclature detection in canonicalize_iupac to be less prone of overidentifying Oxford when it’s just numbers etc.
  • Added “ß” to the typo detection in canonicalize_iupac and “(-)” as a variation of linkage uncertainty detection
  • Made canonicalize_iupac robust to the variation of using {} instead of () for linkages

graph

  • Removed the required usage of lib in glycan_to_nxGraph, compare_glycans, subgraph_isomorphism, and all downstream functions (lib only remains for stemification and deep learning model training/inference)
  • The keyword argument “wildcards_ptm” now also works as intended when providing pre-calculated graphs as input to compare_glycans or subgraph_isomorphism
  • Fixed a rare issue in which subgraph_isomorphism, when “count = False”, would sometimes erroneously output “False” because of a greedy approach to evaluating potential matches

tokenization

  • Added get_unique_topologies to retrieve all base topologies for a given composition that have been observed for a given taxonomic subset
  • Added the “obfuscate_ptm” keyword argument to map_to_basic, to allow for mapping Gal6S to Hex6S rather than the default HexOS, if that is required/advantageous
  • Support mapping of phosphorylated glycans in map_to_basic

draw

  • Fixed an issue where cross-ring fragments were not correctly rendered in GlycoDraw
  • plot_glycans_excel can now also be used with filepaths to .xlsx files (in addition to .csv files)
  • plot_glycans_excel now also supports compact glycan drawing with the “compact” keyword argument
  • Improved drawing resolution in plot_glycans_excel
  • GlycoDraw will now more strongly make use of nomenclature canonicalization in case of IUPAC dialects (still not 100%, if you suspect you use a dialect of IUPAC, pass your sequences through canonicalize_iupac first)
  • If no filepath is specified, GlycoDraw will now also display drawn glycan structures in a non-Jupyter environment (as the classic matplotlib pop-up). Note that this functionality requires the cairosvg dependency (head to https://bojarlab.github.io/glycowork/examples.html#glycodraw-code-snippets if you’re unsure about that)

analysis

  • Functions able to use .csv paths as input can now also deal with .xlsx paths as input
  • The new “annotate_volcano” keyword argument now allows for the direct insertion of SNFG images within plots from get_volcano without having to subsequently run draw.annotate_figure
  • get_pvals_motifs, get_differential_expression, get_glycanova, get_time_series, and get_jtk now use glycan_data.stats.replace_outliers_with_IQR_bounds to auto-smooth outliers
  • Moved hotellings_t2 to glycan_data.stats
  • All functions compatible with motif-level analysis now accept the “custom_motifs” keyword argument to be passed to annotate_dataset or quantify_motifs if “custom” is included in “feature_set”
  • Changed the “mode” keyword argument in get_heatmap to “motifs” as a Boolean argument, like in all other motif.analysis functions
  • Added a call to clean_up_heatmap to get_jtk to avoid redundant motifs
  • Added get_biodiversity to compare two groups of glycomics datasets with regard to the sequence diversity that is present (similar to comparable analyses for microbiome data)

regex

  • Added filter_dealbreakers to allow for the exclusion of identified matches if they have illegal components beyond the identified match (e.g., the forbidden Fuc in "Fuc-([Gal|GalNAc])?-Gal-([!Fuc]){,1}-GlcNAc"). Before this, the sequence context except the Fuc was extracted and returned.
  • Fixed an edge case in filter_matches_by_location in which internal locations sometimes had to handle triple-nested lists which led to errors
  • get_match can now also use glycan graphs, such as derived from glycan_to_nxGraph, as input
  • Added get_match_batch to process a whole list of glycans at once, with some performance improvements via first pre-compiling the pattern
  • Fixed an edge case in get_match in which pattern components consisting of a single monosaccharide with a specified linkage (e.g., “Fuca3”) could sometimes erroneously output no matches
  • Added motif_to_regex to convert glycan motifs (e.g., in IUPAC-condensed) into a regular expression suitable for get_match. Limited to simple queries for now.

annotate

  • get_terminal_structures now has a “size” keyword argument with which users can control the size of the extracted terminal motifs
  • get_k_saccharides now has a “terminal” keyword argument with which users can filter to only count motifs at non-reducing ends
  • annotate_dataset and functions using it now can add the “terminal2” and “terminal3” option in “feature_set” to also annotate & analyze terminal motifs of size 2 (e.g., Neu5Ac(a2-3)Gal(b1-4)) or size 3 (e.g., Neu5Ac(a2-3)Gal(b1-4)GlcNAc)

network

biosynthesis

  • Added the possibility of providing abundances to construct_network that are then stored as node attributes in the network
  • Added add_high_man_removal as a post-processing step in construct_network to allow for the addition of reactions removing mannoses from high-Man N-glycans occurring during maturation
  • Added estimate_weights and get_edge_weight_by_abundance to estimate reaction capacities from abundances + estimate missing abundances
  • Added get_maximum_flow, get_max_flow_path, and get_reaction_flow to calculate maximum flow paths between network root and endpoints as well as aggregate the flow by reaction type
  • Added get_differential_biosynthesis as a wrapper function to compare two groups of glycomes/networks with regard to their biosynthesis (differential flow paths or differential reaction flows)
  • Fixed an issue in construct_network in which sometimes nodes with outgoing but no incoming connections were not detected as unconnected nodes, leading to incomplete networks
  • Added the rescue_glycans decorator to construct_network, to allow for auto-fixing nomenclature variations
  • Improved performance of construct_network by reducing wasteful computation

evolution

  • Switched get_communities from using python-louvain to the Louvain implementation in networkx