Releases: bnosac/udpipe
Releases · bnosac/udpipe
CRAN Release 0.8.1
CHANGES IN udpipe VERSION 0.8.1
- Allow to pass on a .udpipe filename in udpipe_download_model
- Update documentation on keywords_collocation
- Added strsplit.data.frame and paste.data.frame
CRAN Release 0.8
CHANGES IN udpipe VERSION 0.8
- Default of udpipe_download_model is now changed, downloads now models built on Universal Dependencies 2.3 instead of the models build on Universal Dependencies 2.0
- Incorporate models from Universal Dependencies 2.3 released on 2018-11-15
- Incorporate models from conll18 shared task baseline built on Universal Dependencies 2.2
- In case someone uses document_term_frequencies.character incorrectly with double document identifiers, make sure this is handled
- txt_recode now returns x if the length of x is 0
- added txt_sentiment
- added txt_previousgram
CRAN Release 0.7
CHANGES IN udpipe VERSION 0.7
- Allow to reconstruct the original text + allow to add a start/end field in as.data.frame (useful but undocumented feature). Set up mainly to be used with the crfsuite R package
- Added txt_tagsequence
- Added 1 general function called udpipe which does annotation of data in TIF format.
- Add option in udpipe_download_model to download the model only it does not exist on disk
- Loaded model are put into an environment such that users of the function udpipe do not need to care about loading
CRAN Release 0.6.1
- src/udpipe.cpp: at the request of CRAN: remove dynamic execution specification which g++-7 and later complain about by removing the throw statements
- add ctb role to authors Milan and Jana in DESCRIPTION
CRAN Release 0.6
- Added cbind_morphological and cbind_dependencies
- Allow to show progress in udpipe_annotate
- txt_nextgram now does not paste NA's together in case someone would use it with missing text data
- Add example on only doing pos tagging and dependency parsing and excluding tokenisation
- Fix gcc8 message: warning: 'char* strncpy(char*, const char*, size_t)' specified bound 15 equals destination size [-Wstringop-truncation]
CRAN Release 0.5
- Added txt_recode_ngram for recoding tokens with compound multi-word expressions
- Fix to make sure as.data.frame.udpipe_connlu also works with data.table version 1.9.6. Fixes issue #16
- Allow keywords_rake to use in group a character vector of column names
- Added a vignette on the use of the package to do topic modelling using the POS tags and multi-word expressions
- Add example of correlation analysis in vignette on 'Basic Analytical Use Cases'
- dtm_remove_lowfreq to uses minfreq as lower bound
CRAN Release 0.4
- Fix R CMD check on clang-UBSAN: UndefinedBehaviorSanitizer (runtime error: reference binding to misaligned address)
- Add more documentation on required UTF-8 encoding
- Add as_conllu
- Add as_word2vec
- Add as.data.table.udpipe_conllu for convenience
- Add keywords_rake and keywords_collocation
- Exported also keywords_collocation and keywords_phrases
- Add document_term_frequencies_statistics
- Add boilerplate functions dtm_rowsums and dtm_colsums
- Make output of keywords_collocation, keywords_rake and keywords_phrases consistent
- Allow cooccurrence.data.frame to provide a vector of groups
- Added another vignette
CRAN Release 0.3
- Add docusaurus site
- udpipe_download_model gains and extra argument called udpipe_model_repo to allow to download models mainly released under CC-BY-SA from https://github.com/bnosac/udpipe.models.ud
- Add udpipe_accuracy
- Add dtm_rbind and dtm_cbind
- Add udpipe_read_conllu to simplify creating wordvectors
- Allow to provide several fields in document_term_frequencies to easily allow to include bigrams/trigrams/... for topic modelling purposes e.g. alongside the textrank package or alongside collocation
- Adding Serbian + Afrikaans
- Fixing UBSAN messages (misaligned addresses)
- If user has R version < 3.3.0, use own startsWith function instead of base::startsWith
CRAN Release 0.2.2
- Fixes once and for all the Solaris compilation issue in ufal::udpipe::multiword_splitter::append_token
CRAN release 0.2.1
- Added phrases to extract POS sequences more easily like noun phrases, verb phrases or any sequence of parts of speech tags and their corresponding words
- Fix issue in txt_nextgram if n was larger than the number of elements in x
- Fix heap-use-after-free address sanitiser issue
- Fix runtime error: null pointer passed as argument 1, which is declared to never be null (e.g. udpipe.cpp: 3338)
- Another stab at the Solaris compilation issue