Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sql): Adds JsonScanBuilder to daft-scan and read_json to daft-sql #3683

Merged
merged 2 commits into from
Jan 15, 2025

Conversation

rchowell
Copy link
Contributor

Description

This PR follows the read_csv and read_parquet pattern for adding the table-value function to SQL. This issue solves/addresses #3196 but there a couple TODOs which could be tackled in later PRs.

  • support for schema as an argument
  • support for mixing named and positional arguments, now only path is positional and all else must be named.
  • argument validation, the python API does some validation that's missing on the SQL side.

Issues

@github-actions github-actions bot added the feat label Jan 14, 2025
Copy link

codspeed-hq bot commented Jan 14, 2025

CodSpeed Performance Report

Merging #3683 will not alter performance

Comparing rchowell/df-253-sql-add-read_json-function (74d4104) with main (feab49a)

Summary

✅ 27 untouched benchmarks

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 58.82353% with 49 lines in your changes missing coverage. Please review.

Project coverage is 77.63%. Comparing base (feab49a) to head (74d4104).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-scan/src/builder.rs 26.86% 49 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3683      +/-   ##
==========================================
- Coverage   77.82%   77.63%   -0.19%     
==========================================
  Files         728      729       +1     
  Lines       89919    90290     +371     
==========================================
+ Hits        69975    70098     +123     
- Misses      19944    20192     +248     
Files with missing lines Coverage Δ
src/daft-sql/src/planner.rs 74.49% <100.00%> (ø)
src/daft-sql/src/table_provider/mod.rs 54.54% <100.00%> (ø)
src/daft-sql/src/table_provider/read_csv.rs 90.90% <ø> (ø)
src/daft-sql/src/table_provider/read_json.rs 100.00% <100.00%> (ø)
src/daft-scan/src/builder.rs 37.32% <26.86%> (-3.12%) ⬇️

... and 9 files with indirect coverage changes

Comment on lines +43 to +45
let glob_paths: String = args
.try_get_positional(0)?
.ok_or_else(|| PlannerError::invalid_operation("path is required for `read_json`"))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary in this PR, but I think it'd be nice to also support an array of paths here (we support this in the dataframe API)

something like

select * from read_json(['./file1.json', './file2.json'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an issue for all three to support the array of paths.

#3686

@rchowell rchowell merged commit 432714d into main Jan 15, 2025
42 of 43 checks passed
@rchowell rchowell deleted the rchowell/df-253-sql-add-read_json-function branch January 15, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants