Skip to content

Commit

Permalink
added spelling and updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
michbur committed Aug 21, 2019
1 parent c0374b5 commit 652c7b3
Show file tree
Hide file tree
Showing 70 changed files with 491 additions and 9,527 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
^.*\.Rproj$
^\.Rproj\.user$
docs
13 changes: 7 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,18 @@ Imports: pillar (>= 1.4.2),
crayon (>= 1.3.4),
Rcpp (>= 1.0.1),
stringi (>= 1.4.3)
Suggests: AmyloGram (>= 1.0),
testthat (>= 2.1.0),
ape (>= 5.3),
Biostrings (>= 2.52.0),
seqinr (>= 3.4-5),
spelling
License: GPL (>= 2)
URL: https://github.com/michbur/tidysq
BugReports: https://github.com/michbur/tidysq/issues
NeedsCompilation: no
Repository: CRAN
Encoding: UTF-8
Language: en-US
RoxygenNote: 6.1.1
LinkingTo: Rcpp
Suggests: AmyloGram (>= 1.0),
testthat (>= 2.1.0),
ape (>= 5.3),
Biostrings (>= 2.52.0),
seqinr (>= 3.4-5)

24 changes: 12 additions & 12 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ NULL
#' @name BLOSUM50
#' @docType data
#' @format A data frame with with 20 rows and 20 columns.
#' \describe{Contains an one-letter codes of amino acid as colums and rows names.
#' \describe{Contains an one-letter codes of amino acid as columns and rows names.
#' You can check three-letter codes and full names of the amino acids in
#' \code{\link{aminoacids_df}}.
#' }
Expand Down Expand Up @@ -96,7 +96,7 @@ NULL
#' @name BLOSUM62
#' @docType data
#' @format A data frame with with 20 rows and 20 columns.
#' \describe{Contains an one-letter codes of amino acid as colums and rows names.
#' \describe{Contains an one-letter codes of amino acid as columns and rows names.
#' You can check three-letter codes and full names of the amino acids in
#' \code{\link{aminoacids_df}}.
#' }
Expand All @@ -105,7 +105,7 @@ NULL
#' BLOSUM matrices are actual percentage identity values of sequences selected for
#' construction of the matrices. BLOSUM62 indicates that the sequences
#' selected for constructing the matrix share an average identity value of 62\%.
#' BLOSUM62 is miderange matrix between close and distangly related proteins.
#' BLOSUM62 is miderange matrix between close and distantly related proteins.
#' Matrix made by matblas from blosum62.iij
#' BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
#' Cluster Percentage: >= 62
Expand All @@ -131,7 +131,7 @@ NULL
#' @name BLOSUM50_enc
#' @docType data
#' @format A data frame with with 21 rows and 21 columns.
#' \describe{Contains an one-letter codes of amino acid as colums and rows names
#' \describe{Contains an one-letter codes of amino acid as columns and rows names
#' and also row and column zero vectors representing 'X' (any amino acid).
#' You can check three-letter codes and full names of the amino acids in
#' \code{\link{aminoacids_df}}.
Expand Down Expand Up @@ -166,7 +166,7 @@ NULL
#' @name BLOSUM62_enc
#' @docType data
#' @format A data frame with with 21 rows and 21 columns.
#' \describe{Contains an one-letter codes of amino acid as colums and rows names
#' \describe{Contains an one-letter codes of amino acid as columns and rows names
#' and also row and column zero vectors representing 'X' (any amino acid).
#' You can check three-letter codes and full names of the amino acids in
#' \code{\link{aminoacids_df}}.
Expand All @@ -176,7 +176,7 @@ NULL
#' BLOSUM matrices are actual percentage identity values of sequences selected for
#' construction of the matrices. BLOSUM62 indicates that the sequences
#' selected for constructing the matrix share an average identity value of 62\%.
#' BLOSUM62 is miderange matrix between close and distangly related proteins.
#' BLOSUM62 is miderange matrix between close and distantly related proteins.
#' Matrix made by matblas from blosum62.iij
#' BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
#' Cluster Percentage: >= 62
Expand Down Expand Up @@ -208,7 +208,7 @@ NULL
#' \code{\link{aminoacids_df}}.
#' }
#' @details
#' The BLOSUM50_pca matrix enables PCA calculation on proteins sequences aligments.
#' The BLOSUM50_pca matrix enables PCA calculation on proteins sequences alignments.
#' Components are generated by an eigenvector decomposition of the matrix formed
#' from pairwise similarity scores between each pair of sequences. The similarity score model
#' used for creating BLOSUM50_pca matrix is the \code{\link{BLOSUM50}}.
Expand Down Expand Up @@ -240,7 +240,7 @@ NULL
#' \code{\link{aminoacids_df}}.
#' }
#' @details
#' The BLOSUM62_pca matrix enables PCA calculation on proteins sequences aligments.
#' The BLOSUM62_pca matrix enables PCA calculation on proteins sequences alignments.
#' Components are generated by an eigenvector decomposition of the matrix formed
#' from pairwise similarity scores between each pair of sequences. The similarity score model
#' used for creating BLOSUM62_pca matrix is the \code{\link{BLOSUM62}}.
Expand Down Expand Up @@ -427,13 +427,13 @@ NULL
#' \item{FAUJ880112}{Negative charge (Fauchere et al., 1988)}
#' \item{FAUJ880113}{pK-a(RCOOH) (Fauchere et al., 1988)}
#' \item{FINA770101}{Helix-coil equilibrium constant (Finkelstein-Ptitsyn, 1977)}
#' \item{FINA910101}{Helix initiation parameter at posision i-1 (Finkelstein et
#' \item{FINA910101}{Helix initiation parameter at position i-1 (Finkelstein et
#' al., 1991)}
#' \item{FINA910102}{Helix initiation parameter at posision i,i+1,i+2 (Finkelstein
#' \item{FINA910102}{Helix initiation parameter at position i,i+1,i+2 (Finkelstein
#' et al., 1991)}
#' \item{FINA910103}{Helix termination parameter at posision j-2,j-1,j
#' \item{FINA910103}{Helix termination parameter at position j-2,j-1,j
#' (Finkelstein et al., 1991)}
#' \item{FINA910104}{Helix termination parameter at posision j+1 (Finkelstein et
#' \item{FINA910104}{Helix termination parameter at position j+1 (Finkelstein et
#' al., 1991)}
#' \item{GARJ730101}{Partition coefficient (Garel et al., 1973)}
#' \item{GEIM800101}{Alpha-helix indices (Geisow-Roberts, 1980)}
Expand Down
2 changes: 1 addition & 1 deletion R/encode.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#' in \code{\link{sq}}), which represents encoded sequences.
#'
#' The named vector (ex. \code{c(G = 1, K = 2, P = 2)}) should have all letters
#' assigned, otherwise unasigned letters will be shown as \code{NA}. If any letter that
#' assigned, otherwise unassigned letters will be shown as \code{NA}. If any letter that
#' appears in alphabet appears in at least one of sequences, user will be informed about it.
#' Default action is a warning printed in the console, but it can be changed via setting
#' "tidysq_encode_no_given_action" (see details at \code{\link{tidysq-options}}).
Expand Down
2 changes: 1 addition & 1 deletion R/encsq_to_list.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' In a new object you can check what value is assigned to each letter.
#'
#' @return A list with named numeric vectors. Each vector represents values assigned to
#' according sequnence with \code{\link{encode}} function. Vectors are named and their names
#' according sequence with \code{\link{encode}} function. Vectors are named and their names
#' are letters of original sequences.
#'
#' @details Function is used to transform an \code{\link{sq}} object with
Expand Down
2 changes: 1 addition & 1 deletion R/fasta.R
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ read_fasta <- function(file, type = NULL, is_clean = NULL, non_standard = NULL)
#' @param sq a \code{\link{sq}} object.
#' @param name a \code{\link{character}} vector of length equal to \code{sq} length.
#' @param file a \code{\link{character}} string indicating path to file to write into.
#' @param nchar a posiitive \code{\link{integer}} value informing about maximum number of
#' @param nchar a positive \code{\link{integer}} value informing about maximum number of
#' characters to put in each line of file.
#' @export
write_fasta <- function(sq, name, file, nchar = 80) {
Expand Down
16 changes: 8 additions & 8 deletions R/methods_sq.R
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ c.sq <- function(...) {

#' Convert an object to sq
#'
#' This generic funciton takes an object of arbitrary type and returns a \code{\link{sq}} object
#' This generic function takes an object of arbitrary type and returns a \code{\link{sq}} object
#' as an output. Default implementation of the method throws an error - there needs to be
#' implemented a method for specified class in order for function to work.
#'
Expand Down Expand Up @@ -163,7 +163,7 @@ as.sq.character <- function(x, type = NULL, is_clean = NULL, non_standard = NULL
#'
#' @details This method for class \code{\link{sq}} allows converting sequences from
#' the sq object into a character vector of length equal to the length
#' of sq. Each element of resulting vector is a seperate sequence.
#' of sq. Each element of resulting vector is a separate sequence.
#' All attributes of the input sq are lost during the conversion to
#' character vector.
#'
Expand Down Expand Up @@ -197,11 +197,11 @@ as.character.sq <- function(x, ...) {
#' is \strong{enc}).
#'
#' @details This method for class \code{sq} allows converting sequences from
#' the sq object into a matrix. Each row corresponds to the seperate sequence
#' the sq object into a matrix. Each row corresponds to the separate sequence
#' from the sq object, whereas each column indicates a single position within
#' a sequence. Dimensions of matrix are determined by the number of sequences
#' (rows) and the length of the longest sequence (columns). If a length of
#' sequence is smaller than the lenght of the longest sequence, the remaining
#' sequence is smaller than the length of the longest sequence, the remaining
#' columns will be filled with \code{\link{NA}}. All attributes of the input \code{sq} are lost
#' during the conversion to matrix.
#'
Expand Down Expand Up @@ -242,7 +242,7 @@ as.matrix.encsq <- function(x, ...) {

#' Check if object has specified type
#'
#' Function to checks if object is a \code{\link{sq}} object without specyfying type or
#' Function to checks if object is a \code{\link{sq}} object without specifying type or
#' if it is a \code{\link{sq}} object with specific type.
#' @param x an object to be checked.
#' @return A \code{\link{logical}} value - \code{TRUE} if \code{x} has given type, \code{FALSE}
Expand Down Expand Up @@ -325,7 +325,7 @@ is.encsq <- function(x) {
#' Compare sq object
#' @description Compares input \code{\link{sq}} object with another given.
#'
#' @details \code{`==`} converts left hand side of comparision (\code{x1}) to chracters
#' @details \code{`==`} converts left hand side of comparison (\code{x1}) to characters
#' vector using \code{\link{as.character}} and checks whether given on the right side
#' object can be compared with \code{\link{sq}} object. Function also check
#' the type of \code{\link{sq}} object with which given object will be compared.
Expand All @@ -334,12 +334,12 @@ is.encsq <- function(x) {
#' with usage \code{\link{toupper}}. If right hand side object (\code{x2}) is \code{\link{sq}}
#' it is converted to character vector using also \code{\link{as.character}} function.
#'
#' When both objects are already converted to character vectors, comparision is done
#' When both objects are already converted to character vectors, comparison is done
#' elementwise with standard R rules, (e.g. recycling is used). You can check details
#' \code{\link[base:Comparison]{here}}.
#'
#' Comparing sequences as characters vectors cause that various types of sequences
#' can be compared for example aminoacids with nucleotides sequences so attention
#' can be compared for example amino acids with nucleotides sequences so attention
#' should be paid which sequences types are compared.
#'
#' @param x1 a \code{\link{sq}} object.
Expand Down
4 changes: 2 additions & 2 deletions R/random_sq.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' Generate random sequences
#'
#' Generates a \code{\link{sq}} object with specified number of sequences of given lenght
#' Generates a \code{\link{sq}} object with specified number of sequences of given length
#' and given type.
#'
#' @param n a positive \code{\link{integer}} value - number of sequences to generate.
#' @param len a positive \code{\link{integer}} value - length of each sequence if \code{sd} not
#' specified and mean length of sequences if \code{sd} spedified.
#' specified and mean length of sequences if \code{sd} specified
#' @param type a type of generated sq object; possible values are "ami" and "nuc" (see section
#' \emph{sq types} in \code{\link{sq}} documentation for details).
#' @param is_clean a \code{\link{logical}} value - if \code{TRUE}, letters will be drawn from
Expand Down
12 changes: 6 additions & 6 deletions R/sq.R
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@
#' fact that although there is a standard for format of \emph{fasta} files, sometimes
#' there are other types of symbols which do not fit the standard. Thanks to these types,
#' tidysq can import files with customized alphabets can be imported. Moreover, an user
#' may want to group amino acids with simillar properties (e.g. for machine learning)
#' may want to group amino acids with similar properties (e.g. for machine learning)
#' and replace the longer alphabet with symbols of groups. To check details, see
#' \code{\link{read_fasta}}, \code{\link{construct_sq}} and \code{\link{substitute_letters}}.
#'
Expand All @@ -127,7 +127,7 @@
#' \strong{ami} or \strong{nuc} object.
#'
#' \strong{Important note:} maximum length of an alphabet is \strong{30 letters}. You are
#' not allowed to read fasta files or construct from character vectros that have more
#' not allowed to read fasta files or construct from character vectors that have more
#' than 30 distinct characters in sequences (with exception of reading or constructing
#' \strong{ami} or \strong{nuc} objects - during their construction lowercase letters
#' are automatically converted to uppercase).
Expand Down Expand Up @@ -180,16 +180,16 @@
#' and subset them using \code{\link[=sq-extract]{extract operator}}. Alphabet is kept
#' as an attribute of the object.
#'
#' Raw vectors are the most efficient way of storage - each letter of sequece has asigned
#' Raw vectors are the most efficient way of storage - each letter of the sequence has assigned
#' an integer (its index in alphabet of \code{sq} object). Those integers in binary format
#' fit in less than 8 bits, but normally are stored on 16 bits. However, thanks to bit
#' packing it is possible to remove unused bits and store numbers more tightly. This
#' operations result in a little time overhead in all operations, because most of them
#' require unpacking and repacking sequences, but this cost is relatively low in comparision
#' require unpacking and repacking sequences, but this cost is relatively low in comparison
#' to amount of saved memory.
#'
#' For example - \strong{nuc} \strong{cln} alphabet consist of 6 values: ACGTU-. They are
#' asigned numbers 1 to 6 respectively. Those numbers in binary format take form: \code{001},
#' assigned numbers 1 to 6 respectively. Those numbers in binary format take form: \code{001},
#' \code{010}, \code{011}, \code{100}, \code{101}, \code{110}. Each of the letters can
#' be coded with just 3 bits instead of 8 which is demanded by \code{char} - this allows
#' us to save more than 60\% of memory spent on storage of nucleotides sequences.
Expand Down Expand Up @@ -294,7 +294,7 @@ NULL
#' \item If you specify \code{type} as "unt" and won't neither \code{is_clean} nor
#' \code{non_standard}, type will be set to \strong{unt}. Letters won't be converted to uppercase,
#' alphabet will consist of all letters found in sequences.
#' \item If you do not sepcify neither \code{type} nor \code{is_clean} and specify
#' \item If you do not specify neither \code{type} nor \code{is_clean} and specify
#' \code{non_standard} parameter, which should be character vector where each element is at least
#' two characters long, all strings as specified will be detected in sequences and treated as
#' letters in constructed \strong{atp} \code{sq}.
Expand Down
2 changes: 1 addition & 1 deletion R/sqapply.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#' @param paste_char a \code{\link{logical}} value indicating in which form sequences should be
#' passed to the function \code{fun}; if \code{FALSE} (default), they will be treated as character
#' vectors, if \code{TRUE}, they will be pasted into a single string.
#' @param use_na_char a \code{\link{logical}} value indicating whether to use printing characater
#' @param use_na_char a \code{\link{logical}} value indicating whether to use a printing character
#' to represent \code{\link{NA}} values; if \code{TRUE}, letter from option "tidysq_na_print_char"
#' will be used instead of \code{NA} values (default value for this option is "!", for details
#' see \code{\link{tidysq-options}}), otherwise just \code{NA} values will be used; default value
Expand Down
6 changes: 3 additions & 3 deletions R/substitute_letters.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Substitute letters in a sequence
#'
#' @description 1) Replace ambigous/extraordinary letters in nucleic or
#' @description 1) Replace ambiguous/extraordinary letters in nucleic or
#' amino acid sequence, stored in \code{\link{sq}} object, with the ones
#' that are compliant with the IUPAC standard, ones that are user-defined
#' or with \code{NA} values.
Expand All @@ -21,7 +21,7 @@
#' @return a \code{\link{sq}} object with \strong{atp} type with replaced alphabet,
#' defined by user.
#'
#' @details \code{substitute_letters} allows to replace ambigous/extraordinary
#' @details \code{substitute_letters} allows to replace ambiguous/extraordinary
#' letters in nucleic or amino acid sequence with user-defined or IUPAC
#' symbols. Letters can also be replaced with \code{\link{NA}} values, so that they
#' can be later removed, from the sequence, by \code{\link{clean}} function.
Expand Down Expand Up @@ -144,7 +144,7 @@ substitute_letters <- function(sq, encoding) {
inds_fun[s]
}, new_alph)
if (.is_cleaned(sq)) {
.handle_opt_txt("tidysq_subsitute_letters_cln",
.handle_opt_txt("tidysq_substitute_letters_cln",
"'sq' object passed to substitute_letters had 'cln' subtype, output doesn't have it")
}

Expand Down
4 changes: 2 additions & 2 deletions R/support_funs.R
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,8 @@ is_null_sq <- function(sq) {
#' a \code{\link{sq}} object to amino acid or nucleotide alphabet. Output list has number of
#' elements equal to length of \code{sq} object and each element is a character vector
#' of elements that appear in according sequence that does not fit destination type. This
#' function might be used to find specificly which sequences have which letters - user
#' may want to use this information for example to check input sequeces.
#' function might be used to find specifically which sequences have which letters - user
#' may want to use this information for example to check input sequences.
#'
#' You can check which letters are valid for specified type in \code{\link{sq}} class
#' documentation.
Expand Down
10 changes: 5 additions & 5 deletions R/tidysq_options.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@
#'
#' You can get value of an option by calling \code{getOptions(option_name)} and set its value
#' by calling \code{options(option_name = value)}, where \code{option_name} is an option name
#' (full list of this package included below) and \code{value} is a value to assing to an option.
#' (full list of this package included below) and \code{value} is a value to assign to an option.
#'
#' @details
#' You can change default behaviour of package using one of following options:
#' \itemize{
#' \item tidysq_bite_na_action (default "warning") - a \code{\link{character}} string specifying
#' in which way to inform user about biting sequences out of the range when using
#' \code{\link{bite}}; possible values: "error", "warning", "message", "none",
#' \item tidysq_subsitute_letters_cln (default "warning") - a \code{\link{character}} string
#' \item tidysq_substitute_letters_cln (default "warning") - a \code{\link{character}} string
#' specifying in which way to inform user
#' about droping \code{cln} subtype of \code{sq} while using \code{\link{substitute_letters}};
#' about dropping \code{cln} subtype of \code{sq} while using \code{\link{substitute_letters}};
#' possible values: "error", "warning", "message", "none",
#' \item tidysq_typify_small_cap_let (default "warning") - a \code{\link{character}} string
#' specifying in which way to inform user
Expand All @@ -22,9 +22,9 @@
#' specifying in which way to inform user
#' about encoding unspecified letters as \code{\link{NA}} if they do appear in sequences in
#' \code{\link{encode}}; possible values: "error", "warning", "message", "none",
#' \item tidysq_max_pillar_sq_width (default 15) - an \code{\link{integer}} value specyfying
#' \item tidysq_max_pillar_sq_width (default 15) - an \code{\link{integer}} value specifying
#' pillar_shaft_sq width
#' \item tidysq_max_print_sequences (default 10) - an \code{\link{integer}} value specyfying
#' \item tidysq_max_print_sequences (default 10) - an \code{\link{integer}} value specifying
#' maximum number of printed sequences
#' in \code{\link[=print.sq]{print sq}},
#' \item tidysq_colorful_sq_print (default \code{TRUE}) - a \code{\link{logical}} value if to
Expand Down
Loading

0 comments on commit 652c7b3

Please sign in to comment.