Skip to content

Commit

Permalink
countPattern
Browse files Browse the repository at this point in the history
  • Loading branch information
kevinrue committed May 15, 2024
1 parent 978b9f9 commit 4a75bce
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions episodes/06-biological-sequences.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -485,10 +485,37 @@ from the BSgenome object.

```{r}
genome$chr1
## equivalent to:
## genome[["chr1]]
```

For instance, we can extract the sequence of the Y chromosome and assign it
to a new object `chrY`.

```{r}
chrY <- genome[["chrY"]]
```

### Using genome sequences

From this point, genome sequences can be treated very much like biological
strings (e.g. `DNAString`) described earlier, in the
`r BiocStyle::Biocpkg("Biostrings")` package.

For instance, the function `countPattern()` can be used to count the number of
occurences of a given pattern in a given genome sequence.

```{r}
countPattern(pattern = "CANNTG", subject = chrY, fixed = FALSE)
```

::::::::::::::::::::::::::::::::::::::::: callout

### Note

In the example above, the argument `fixed = FALSE` is used to indicate that the
pattern contain [IUPAC ambiguity codes][external-iupac].

::::::::::::::::::::::::::::::::::::::::::::::::::

[glossary-s4-class]: reference.html#s4-class
[crossref-s4]: 05-s4.html
[external-iupac]: https://en.wikipedia.org/wiki/Nucleic_acid_notation#IUPAC_notation
Expand Down

0 comments on commit 4a75bce

Please sign in to comment.