-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bfcquery returns inconsistent column types for empty rows #26
Comments
I'm a little behind on my R installation and can update if you can't reproduce the problem: > sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: PureOS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] usethis_1.6.1 tidyr_1.1.0 tibble_3.0.1
[4] stringr_1.4.0 purrr_0.3.4 dplyr_1.0.0
[7] rtracklayer_1.44.4 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0
[10] IRanges_2.18.3 S4Vectors_0.22.1 GEOquery_2.52.0
[13] Biobase_2.44.0 BiocGenerics_0.30.0 evolength_0.0.0.9000
[16] testthat_2.3.2
loaded via a namespace (and not attached):
[1] httr_1.4.1 pkgload_1.1.0
[3] bit64_0.9-7 Rdpack_0.11-1
[5] assertthat_0.2.1 BiocFileCache_1.8.0
[7] blob_1.2.1 GenomeInfoDbData_1.2.1
[9] Rsamtools_2.0.3 remotes_2.1.1
[11] sessioninfo_1.1.1 lattice_0.20-41
[13] pillar_1.4.4 RSQLite_2.2.0
[15] backports_1.1.7 glue_1.4.1
[17] limma_3.40.6 digest_0.6.25
[19] XVector_0.24.0 Matrix_1.2-18
[21] XML_3.99-0.3 pkgconfig_2.0.3
[23] devtools_2.3.0 bibtex_0.4.2.2
[25] zlibbioc_1.30.0 processx_3.4.2
[27] BiocParallel_1.18.1 generics_0.0.2
[29] ellipsis_0.3.1 withr_2.2.0
[31] SummarizedExperiment_1.14.1 cli_2.0.2
[33] magrittr_1.5 crayon_1.3.4
[35] memoise_1.1.0 ps_1.3.3
[37] fs_1.4.1 fansi_0.4.1
[39] xml2_1.3.2 pkgbuild_1.0.8
[41] tools_3.6.3 prettyunits_1.1.1
[43] hms_0.5.3 matrixStats_0.56.0
[45] gbRd_0.4-11 lifecycle_0.2.0
[47] DelayedArray_0.10.0 callr_3.4.3
[49] Biostrings_2.52.0 RcppHMM_1.2.2
[51] compiler_3.6.3 rlang_0.4.6
[53] grid_3.6.3 RCurl_1.98-1.2
[55] rstudioapi_0.11 rappdirs_0.3.1
[57] bitops_1.0-6 DBI_1.1.0
[59] curl_4.3 R6_2.4.1
[61] GenomicAlignments_1.20.1 utf8_1.1.4
[63] bit_1.1-15.2 rprojroot_1.3-2
[65] readr_1.3.1 desc_1.2.0
[67] stringi_1.4.6 Rcpp_1.0.4.6
[69] vctrs_0.3.0 dbplyr_1.4.4
[71] tidyselect_1.1.0
> |
Sorry for the long delay. I'm looking into this and I'm not quite sure how to correct it. It seems like a bug when using dplyr::filter that somehow changes the columns type.
|
If I omit library(purrr)
library(stringr)
library(BiocFileCache)
path <- tempfile()
bfc <- BiocFileCache(path, ask = FALSE)
files_remote <-
str_c(file.path("ftp://ftp.ncbi.nlm.nih.gov",
"geo/samples/GSM1480nnn/GSM1480327/suppl",
"GSM1480327_K562_PROseq_"),
c("minus", "plus"),
".bw")
map_df(files_remote, bfcquery, x = bfc)
# A tibble: 0 × 10
# ℹ 10 variables: rid <chr>, rname <chr>, create_time <dbl>, access_time <dbl>,
# rpath <chr>, rtype <chr>, fpath <chr>, last_modified_time <dbl>,
# etag <chr>, expires <dbl>
bfcadd(bfc, files_remote[1])
#> |======================================================================| 100%
#> BFC1
#> "/tmp/RtmpDRIP5H/file2ff2a62a8acdc8/2ff2a64678a220_GSM1480327_K562_PROseq_minus.bw"
map_df(files_remote[1], bfcquery, x = bfc)
#> # A tibble: 1 × 10
#> rid rname create_time access_time rpath rtype fpath last_modified_time etag
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 BFC1 ftp:… 2024-12-06… 2024-12-06… /tmp… web ftp:… NA NA
#> # ℹ 1 more variable: expires <dbl>
map_df(files_remote, bfcquery, x = bfc)
#> Error in `dplyr::bind_rows()`:
#> ! Can't combine `..1$create_time` <character> and `..2$create_time` <double>.
#> Run `rlang::last_trace()` to see where the error occurred. |
The column header types for the columns
create_time
andaccess_time
are character vectors when non-empty, and double vectors when empty.I expect that they should consistently return the same type; maybe character vectors always; although it's not clear why they are not date or datetime types instead.
Returning inconsistent types throws an error when trying to row bind join multiple queries using
purrr::map_df
where some of the queries are successful and some of them fail:The text was updated successfully, but these errors were encountered: