-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
162 lines (113 loc) · 3.43 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>")
library(chr)
```
# chr <img src="man/figures/logo.png" width="160px" align="right" />
[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
R package for simple string manipulation
## Description
Clean, wrangle, and parse character [string] vectors using base exclusively base
R functions.
## Install
```{r install, eval=FALSE}
## install devtools is not alreasy installed
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
## install chr from github
devtools::install_github("mkearney/chr")
## load chr
library(chr)
```
## Usage
### Detect
**Detect** text patterns (an easy-to-use wrapper for `base::grep()` and `base::grepl()`).
```{r grep}
## return logical vector
chr_detect(letters, "a|b|c|x|y|z")
## return inverted logical values
chr_detect(letters, "a|b|c|x|y|z", invert = TRUE)
## return matching positions
chr_detect(letters, "a|b|c|x|y|z", which = TRUE)
## return inverted matching positions
chr_detect(letters, "a|b|c|x|y|z", which = TRUE, invert = TRUE)
## return matching values
chr_detect(letters, "a|b|c|x|y|z", value = TRUE)
## return inverted matching values
chr_detect(letters, "a|b|c|x|y|z", value = TRUE, invert = TRUE)
```
### Extract
**Extract** text patterns.
```{r extract}
## some text strings
x <- c("this one is @there
has #MultipleLines https://github.com and
http://twitter.com @twitter",
"this @one #istotally their and
some non-ascii symbols: \u00BF \u037E",
"this one is they're https://github.com",
"this one #HasHashtags #afew #ofthem",
"and more @kearneymw at https://mikew.com")
## extract all URLS
chr_extract_links(x)
## extract all hashtags
chr_extract_hashtags(x)
## extract mentions
chr_extract_mentions(x)
```
### Count
**Count** number of matches.
```{r count}
## extract all there/their/they're
chr_count(x, "there|their|they\\S?re", ignore.case = TRUE)
```
### Remove
**Remove** text patterns.
```{r remove}
## remove URLS
chr_remove_links(x)
## string together functions with magrittr pipe
library(magrittr)
## remove mentions and extra [white] spaces
chr_remove_mentions(x) %>%
chr_remove_ws()
## remove hashtags
chr_remove_hashtags(x)
## remove hashtags, line breaks, and extra spaces
x %>%
chr_remove_hashtags() %>%
chr_remove_linebreaks() %>%
chr_remove_ws()
## remove links and extract words
x %>%
chr_remove_links() %>%
chr_remove_mentions() %>%
chr_extract_words()
```
### Replace
**Replace** text with string.
```{r replace}
## replace their with they're
chr_replace(x, "their", "they're", ignore.case = TRUE)
```
ASCII functions currently *in progress*. For example, replace non-ASCII symbols
with similar ASCII characters (*work in progress*).
```{r ascii}
## ascii version
chr_replace_nonascii(x)
```
### n-grams
Create **ngram**s at the character-level.
```{r ngrams}
## character vector
x <- c("Acme Pizza, Inc.", "Tom's Sports Equipment, LLC")
## 2 char level ngram
chr_ngram_char(x, n = 2L)
## 3 char level ngram in lower case and stripped of punctation and white space
chr_ngram_char(x, n = 3L, lower = TRUE, punct = TRUE, space = TRUE)
```
### Contributions
Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.