-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Error on Table Merging in arrow for R #39038
Comments
Hi @TPDeramus as was asked in your StackOverflow post, a way for us to reproduce this would be best. The error message you're getting is very strange. Normally it looks like this:
If you aren't able to provide a reproducible example, adding a |
Apologies but I am not well versed in the implementations of And it's doubly problematic because this is not always thrown as a typical error. Occasionally (but not always), if passed to a variable, its saved as a list item containing the error:
But the example I have that works is also a list:
As such, it's hard to debug via However, from what I was able to gather within at least one session of
Interestingly enough, when I made the following changes to the code:
And just didn't assign it to a variable at all, it ran just fine. This seems to happen when Do you think this can be addressed with some call to |
Okay scratch that. The error will happen as soon as the second iteration. Probably just a typo from troubleshooting on my part. |
Further, the use of library(arrow) temp <- open_csv_dataset(sources = cohort_csvs) %>% compute() Subs <- data.frame(temp %>% distinct(key) %>% collect()) for (Subnum in 1:dim(Subs)[1]) { Seems to proceed without any errors. Though I am uncertain this will provide what I need in the long run if there are still |
I think a reprex is needed here. Even if you can't share your input files, finding a minimal sample of your |
Yes unfortunately the data is very much PHI that can't be shared. I appreciate your patience on this. |
Had a discussion with the parties involved and the general consensus is that sharing the data is not an option and that due to the size, making sure everything is adequately removed is likely not feasible. However, I was given the okay to set up an interactive meeting for troubleshooting if that's something anyone on the team would be open to. |
Hmm...it looks like our https://github.com/apache/arrow/blob/main/r/R/dplyr-join.R#L95-L112 https://github.com/apache/arrow/blob/main/r/R/dplyr-join.R#L181-L239 It seems like https://github.com/apache/arrow/blob/main/r/R/dplyr-join.R#L223 is evaluating to a |
Interesting. Anyway I could explore this via debugging? |
Hi @TPDeramus, to try out @paleolimbot's idea, you should be able to use If you could do these steps and report back that'd be helpful:
|
I dug into this a little, and don't think this is a problem with arrow. One of the inputs has a somewhat strange column type of library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(dplyr, warn.conflicts = FALSE)
strange_object <- numeric()
class(strange_object) <- NA_character_
df <- data.frame(strange_object)
df |>
as_arrow_table() |>
full_join(df)
#> Error in `map_chr()` at r/R/dplyr.R:122:2:
#> ℹ In index: 1.
#> ℹ With name: strange_object.
#> Caused by error:
#> ! NotImplemented: Function 'coalesce' has no kernel matching input types (numeric(0)
#> attr(,"class")
#> [1] NA, numeric(0)
#> attr(,"class")
#> [1] NA)
#> Backtrace:
#> ▆
#> 1. ├─base::tryCatch(...)
#> 2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 3. │ ├─base (local) tryCatchOne(...)
#> 4. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 5. │ └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#> 6. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 7. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 8. ├─base::withCallingHandlers(...)
#> 9. ├─base::saveRDS(...)
#> 10. ├─base::do.call(...)
#> 11. ├─base (local) `<fn>`(...)
#> 12. ├─global `<fn>`(input = base::quote("next-esok_reprex.R"))
#> 13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#> 14. │ └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#> 15. │ └─knitr:::process_file(text, output)
#> 16. │ ├─knitr:::handle_error(...)
#> 17. │ │ └─base::withCallingHandlers(...)
#> 18. │ ├─base::withCallingHandlers(...)
#> 19. │ ├─knitr:::process_group(group)
#> 20. │ └─knitr:::process_group.block(group)
#> 21. │ └─knitr:::call_block(x)
#> 22. │ └─knitr:::block_exec(params)
#> 23. │ └─knitr:::eng_r(options)
#> 24. │ ├─knitr:::in_input_dir(...)
#> 25. │ │ └─knitr:::in_dir(input_dir(), expr)
#> 26. │ └─knitr (local) evaluate(...)
#> 27. │ └─evaluate::evaluate(...)
#> 28. │ └─evaluate:::evaluate_call(...)
#> 29. │ ├─evaluate (local) handle(...)
#> 30. │ │ └─base::try(f, silent = TRUE)
#> 31. │ │ └─base::tryCatch(...)
#> 32. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 33. │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 34. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 35. │ ├─base::withCallingHandlers(...)
#> 36. │ ├─base::withVisible(value_fun(ev$value, ev$visible))
#> 37. │ └─knitr (local) value_fun(ev$value, ev$visible)
#> 38. │ └─knitr (local) fun(x, options = options)
#> 39. │ ├─base::withVisible(knit_print(x, ...))
#> 40. │ ├─knitr::knit_print(x, ...)
#> 41. │ └─knitr:::knit_print.default(x, ...)
#> 42. │ └─evaluate (local) normal_print(x)
#> 43. │ ├─base::print(x)
#> 44. │ └─arrow:::print.arrow_dplyr_query(x)
#> 45. │ └─purrr::map_chr(...) at r/R/dplyr.R:122:2
#> 46. │ └─purrr:::map_("character", .x, .f, ..., .progress = .progress)
#> 47. │ ├─purrr:::with_indexed_errors(...)
#> 48. │ │ └─base::withCallingHandlers(...)
#> 49. │ ├─purrr:::call_with_cleanup(...)
#> 50. │ └─arrow (local) .f(.x[[i]], ...)
#> 51. │ ├─base::paste0(...) at r/R/dplyr.R:129:6
#> 52. │ └─expr$type(schm) at r/R/dplyr.R:129:6
#> 53. │ └─arrow:::compute___expr__type(self, schema) at r/R/expression.R:54:6
#> 54. └─base::.handleSimpleError(...) at r/R/arrowExports.R:1152:2
#> 55. └─purrr (local) h(simpleError(msg, call))
#> 56. └─cli::cli_abort(...)
#> 57. └─rlang::abort(...) Created on 2024-01-03 with reprex v2.0.2 You might be able to detect that column by doing something like: strange_object <- numeric()
class(strange_object) <- NA_character_
df <- data.frame(strange_object)
vapply(lapply(df, class), function(x) any(is.na(x)), logical(1))
#> strange_object
#> TRUE Created on 2024-01-03 with reprex v2.0.2 |
I'll look into that and provide an update when I have a moment. That said, if the object is something preventing joining, do you think there's a relatively straightforward way to edit the schema to fix this? Thanks so much @paleolimbot! |
Describe the bug, including details regarding any error messages, version, and platform.
Hi developers.
I'm having an issue where I'm trying to use
full_join()
on two tables (subset from the same data but filtered and operated on and appended to save memory), but it keeps throwing the following error:Specifically, it looks something like the following:
However, when it reaches past the first part of the loop to the full join, it throws the error regardless of the call used to make the
full_join()
:It will not however, throw any error or display issues with left, right, inner, semi, or anti join.
I kind of need all columns to be retained during the joining, even if as NAs.
Any idea what might be causing the issue?
Version info:
OS:
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
R Version:
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
RStudio Version:
RStudio Server 2022.07.0 Build 548
Session Info:
Component(s)
R
The text was updated successfully, but these errors were encountered: