Skip to content

Releases: bnosac/udpipe

CRAN Release 0.8.1

15 Feb 18:41
Compare
Choose a tag to compare

CHANGES IN udpipe VERSION 0.8.1

  • Allow to pass on a .udpipe filename in udpipe_download_model
  • Update documentation on keywords_collocation
  • Added strsplit.data.frame and paste.data.frame

CRAN Release 0.8

15 Feb 18:40
Compare
Choose a tag to compare

CHANGES IN udpipe VERSION 0.8

  • Default of udpipe_download_model is now changed, downloads now models built on Universal Dependencies 2.3 instead of the models build on Universal Dependencies 2.0
  • Incorporate models from Universal Dependencies 2.3 released on 2018-11-15
  • Incorporate models from conll18 shared task baseline built on Universal Dependencies 2.2
  • In case someone uses document_term_frequencies.character incorrectly with double document identifiers, make sure this is handled
  • txt_recode now returns x if the length of x is 0
  • added txt_sentiment
  • added txt_previousgram

CRAN Release 0.7

10 Sep 12:51
Compare
Choose a tag to compare

CHANGES IN udpipe VERSION 0.7

  • Allow to reconstruct the original text + allow to add a start/end field in as.data.frame (useful but undocumented feature). Set up mainly to be used with the crfsuite R package
  • Added txt_tagsequence
  • Added 1 general function called udpipe which does annotation of data in TIF format.
  • Add option in udpipe_download_model to download the model only it does not exist on disk
  • Loaded model are put into an environment such that users of the function udpipe do not need to care about loading

CRAN Release 0.6.1

30 Jul 18:52
Compare
Choose a tag to compare
  • src/udpipe.cpp: at the request of CRAN: remove dynamic execution specification which g++-7 and later complain about by removing the throw statements
  • add ctb role to authors Milan and Jana in DESCRIPTION

CRAN Release 0.6

30 Jul 18:51
Compare
Choose a tag to compare
  • Added cbind_morphological and cbind_dependencies
  • Allow to show progress in udpipe_annotate
  • txt_nextgram now does not paste NA's together in case someone would use it with missing text data
  • Add example on only doing pos tagging and dependency parsing and excluding tokenisation
  • Fix gcc8 message: warning: 'char* strncpy(char*, const char*, size_t)' specified bound 15 equals destination size [-Wstringop-truncation]

CRAN Release 0.5

14 Mar 11:16
Compare
Choose a tag to compare
  • Added txt_recode_ngram for recoding tokens with compound multi-word expressions
  • Fix to make sure as.data.frame.udpipe_connlu also works with data.table version 1.9.6. Fixes issue #16
  • Allow keywords_rake to use in group a character vector of column names
  • Added a vignette on the use of the package to do topic modelling using the POS tags and multi-word expressions
  • Add example of correlation analysis in vignette on 'Basic Analytical Use Cases'
  • dtm_remove_lowfreq to uses minfreq as lower bound

CRAN Release 0.4

07 Feb 13:34
Compare
Choose a tag to compare
  • Fix R CMD check on clang-UBSAN: UndefinedBehaviorSanitizer (runtime error: reference binding to misaligned address)
  • Add more documentation on required UTF-8 encoding
  • Add as_conllu
  • Add as_word2vec
  • Add as.data.table.udpipe_conllu for convenience
  • Add keywords_rake and keywords_collocation
  • Exported also keywords_collocation and keywords_phrases
  • Add document_term_frequencies_statistics
  • Add boilerplate functions dtm_rowsums and dtm_colsums
  • Make output of keywords_collocation, keywords_rake and keywords_phrases consistent
  • Allow cooccurrence.data.frame to provide a vector of groups
  • Added another vignette

CRAN Release 0.3

15 Jan 13:55
Compare
Choose a tag to compare
  • Add docusaurus site
  • udpipe_download_model gains and extra argument called udpipe_model_repo to allow to download models mainly released under CC-BY-SA from https://github.com/bnosac/udpipe.models.ud
  • Add udpipe_accuracy
  • Add dtm_rbind and dtm_cbind
  • Add udpipe_read_conllu to simplify creating wordvectors
  • Allow to provide several fields in document_term_frequencies to easily allow to include bigrams/trigrams/... for topic modelling purposes e.g. alongside the textrank package or alongside collocation
  • Adding Serbian + Afrikaans
  • Fixing UBSAN messages (misaligned addresses)
  • If user has R version < 3.3.0, use own startsWith function instead of base::startsWith

CRAN Release 0.2.2

07 Dec 20:49
Compare
Choose a tag to compare
  • Fixes once and for all the Solaris compilation issue in ufal::udpipe::multiword_splitter::append_token

CRAN release 0.2.1

06 Dec 22:34
Compare
Choose a tag to compare
  • Added phrases to extract POS sequences more easily like noun phrases, verb phrases or any sequence of parts of speech tags and their corresponding words
  • Fix issue in txt_nextgram if n was larger than the number of elements in x
  • Fix heap-use-after-free address sanitiser issue
  • Fix runtime error: null pointer passed as argument 1, which is declared to never be null (e.g. udpipe.cpp: 3338)
  • Another stab at the Solaris compilation issue