Skip to content

Commit

Permalink
differences for PR #30
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jul 10, 2024
1 parent 2719fae commit ed02361
Show file tree
Hide file tree
Showing 5 changed files with 1,888 additions and 18 deletions.
87 changes: 87 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#------------------------------------------------------------
# Values for this lesson.
#------------------------------------------------------------

# Which carpentry is this (swc, dc, lc, or cp)?
# swc: Software Carpentry
# dc: Data Carpentry
# lc: Library Carpentry
# cp: Carpentries (to use for instructor training for instance)
# incubator: The Carpentries Incubator
carpentry: 'incubator'

# Overall title for pages.
title: 'Introduction to targets'

# Date the lesson was created (YYYY-MM-DD, this is empty by default)
created: ~

# Comma-separated list of keywords for the lesson
keywords: 'reproducibility, data, targets, R'

# Life cycle stage of the lesson
# possible values: pre-alpha, alpha, beta, stable
life_cycle: 'pre-alpha'

# License of the lesson
license: 'CC-BY 4.0'

# Link to the source repository for this lesson
source: 'https://github.com/joelnitta/targets-workshop'

# Default branch of your lesson
branch: 'main'

# Who to contact if there are any issues
contact: 'joelnitta@gmail.com'

# Navigation ------------------------------------------------
#
# Use the following menu items to specify the order of
# individual pages in each dropdown section. Leave blank to
# include all pages in the folder.
#
# Example -------------
#
# episodes:
# - introduction.md
# - first-steps.md
#
# learners:
# - setup.md
#
# instructors:
# - instructor-notes.md
#
# profiles:
# - one-learner.md
# - another-learner.md

# Order of episodes in your lesson
episodes:
- introduction.Rmd
- basic-targets.Rmd
- cache.Rmd
- lifecycle.Rmd
- organization.Rmd
- packages.Rmd
- files.Rmd
- branch.Rmd
- parallel.Rmd
- quarto.Rmd

# Information for Learners
learners:

# Information for Instructors
instructors:

# Learner Profiles
profiles:

# Customisation ---------------------------------------------
#
# This space below is where custom yaml items (e.g. pinning
# sandpaper and varnish versions) should live


29 changes: 15 additions & 14 deletions files.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ tar_plan(
``` output
▶ dispatched target some_data
● completed target some_data [0.001 seconds]
▶ ended pipeline [0.047 seconds]
▶ ended pipeline [0.048 seconds]
```

If we inspect the contents of `some_data` with `tar_read(some_data)`, it will contain the string `"Hello World"` as expected.
Expand All @@ -82,15 +82,15 @@ tar_plan(

The target `some_data` was skipped, even though the contents of the file changed.

That is because right now, targets is only tracking the **name** of the file, not its contents. We need to use a special function for that, `tar_file()` from the `tarchetypes` package. `tar_file()` will calculate the "hash" of a file---a unique digital signature that is determined by the file's contents. If the contents change, the hash will change, and this will be detected by `targets`.
That is because right now, targets is only tracking the **name** of the file, not its contents. We need to use a special argument for that, `tar_target(format = "file")`. This will cause `targets` to calculate the "hash" of a file---a unique digital signature that is determined by the file's contents. If the contents change, the hash will change, and this will be detected by `targets`.


``` r
library(targets)
library(tarchetypes)

tar_plan(
tar_file(data_file, "_targets/user/data/hello.txt"),
tar_target(data_file, "_targets/user/data/hello.txt", format = "file"),
some_data = readLines(data_file)
)
```
Expand All @@ -101,7 +101,7 @@ tar_plan(
● completed target data_file [0 seconds]
▶ dispatched target some_data
● completed target some_data [0 seconds]
▶ ended pipeline [0.065 seconds]
▶ ended pipeline [0.068 seconds]
```

This time we see that `targets` does successfully re-build `some_data` as expected.
Expand Down Expand Up @@ -186,10 +186,10 @@ tar_plan(
▶ dispatched target penguins_data_raw_file
● completed target penguins_data_raw_file [0.001 seconds]
▶ dispatched target penguins_data_raw
● completed target penguins_data_raw [0.205 seconds]
● completed target penguins_data_raw [0.219 seconds]
▶ dispatched target penguins_data
● completed target penguins_data [0.011 seconds]
▶ ended pipeline [0.285 seconds]
● completed target penguins_data [0.012 seconds]
▶ ended pipeline [0.3 seconds]
```

::::::::::::::::::::::::::::::::::
Expand All @@ -198,7 +198,7 @@ tar_plan(

## Writing out data

Writing to files is similar to loading in files: we will use the `tar_file()` function. There is one important caveat: in this case, the second argument of `tar_file()` (the command to build the target) **must return the path to the file**. Not all functions that write files do this (some return nothing; these treat the output file is a side-effect of running the function), so you may need to define a custom function that writes out the file and then returns its path.
Writing to files is similar to loading in files: we will use `tar_target(format = "file")`. There is one important caveat: in this case, the `command` argument of `tar_target()` **must return the path to the file**. Not all functions that write files do this (some return nothing; these treat the output file is a side-effect of running the function), so you may need to define a custom function that writes out the file and then returns its path.

Let's do this for `writeLines()`, the R function that writes character data to a file. Normally, its output would be `NULL` (nothing), as we can see here:

Expand Down Expand Up @@ -252,24 +252,25 @@ tar_plan(
readLines(!!.x)
),
hello_caps = toupper(hello),
tar_file(
tar_target(
hello_caps_out,
write_lines_file(hello_caps, "_targets/user/results/hello_caps.txt")
write_lines_file(hello_caps, "_targets/user/results/hello_caps.txt"),
format = "file"
)
)
```


``` output
▶ dispatched target hello_file
● completed target hello_file [0 seconds]
● completed target hello_file [0.001 seconds]
▶ dispatched target hello
● completed target hello [0 seconds]
● completed target hello [0.001 seconds]
▶ dispatched target hello_caps
● completed target hello_caps [0 seconds]
▶ dispatched target hello_caps_out
● completed target hello_caps_out [0 seconds]
▶ ended pipeline [0.066 seconds]
▶ ended pipeline [0.063 seconds]
```

Take a look at `hello_caps.txt` in the `results` folder and verify it is as you expect.
Expand All @@ -294,7 +295,7 @@ So this way of writing out results makes your pipeline more robust: we have a gu

::::::::::::::::::::::::::::::::::::: keypoints

- `tarchetypes::tar_file()` tracks the contents of a file
- `tar_target(format = "file")` tracks the contents of a file
- Use `tarchetypes::tar_file_read()` in combination with data loading functions like `read_csv()` to keep the pipeline in sync with your input data
- Use `tarchetypes::tar_file()` in combination with a function that writes to a file and returns its path to write out data

Expand Down
4 changes: 2 additions & 2 deletions md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
"episodes/basic-targets.Rmd" "90190eae899db41c64b69320e3f72365" "site/built/basic-targets.md" "2024-07-10"
"episodes/cache.Rmd" "b487d6d792469641faec63c838541aac" "site/built/cache.md" "2024-07-10"
"episodes/lifecycle.Rmd" "7974a62cc37ac1138647d043fe1e4a26" "site/built/lifecycle.md" "2024-07-10"
"episodes/organization.Rmd" "74df25779b74013eeb6a8ca7b8934efe" "site/built/organization.md" "2024-07-10"
"episodes/organization.Rmd" "0aa3d4eb83806ee1033d90168bc098a6" "site/built/organization.md" "2024-07-10"
"episodes/packages.Rmd" "2c0eb6138ea6685a0ee279c89b381bc4" "site/built/packages.md" "2024-07-10"
"episodes/files.Rmd" "b7f4ef83379a58d5c30d8e011e3b2c0d" "site/built/files.md" "2024-07-10"
"episodes/files.Rmd" "9bbe492d07fc1dfb50ba9962ee8aec45" "site/built/files.md" "2024-07-10"
"episodes/branch.Rmd" "6f1187d6df3310eb042aaae3a44328dc" "site/built/branch.md" "2024-07-10"
"episodes/parallel.Rmd" "f9b7709ceae26b281ea5919835f5260b" "site/built/parallel.md" "2024-07-10"
"episodes/quarto.Rmd" "b854a0a44fd0ec7e503c9e99d21f8fce" "site/built/quarto.md" "2024-07-10"
Expand Down
3 changes: 1 addition & 2 deletions organization.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Last error traceback:
doTryCatch(return(expr), name, parentenv, handler)
base::withCallingHandlers({ NULL base::saveRDS(base::do.call(base::do.ca...
base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/tmp/R...
base::do.call(base::do.call, base::c(base::readRDS("/tmp/Rtmp5QX2Vf/call...
base::do.call(base::do.call, base::c(base::readRDS("/tmp/Rtmp2buwCZ/call...
(function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
(function (targets_function, targets_arguments, options, envir = NULL, s...
tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
Expand Down Expand Up @@ -198,4 +198,3 @@ Striking this balance is more of art than science, and only comes with practice.
- Writing functions is a key skill for `targets` pipelines

::::::::::::::::::::::::::::::::::::::::::::::::

Loading

0 comments on commit ed02361

Please sign in to comment.