Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready for review] Icechunk CLI Initial Implementation #716

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ad4847b
refactor(cli): directory
DahnJ Feb 10, 2025
e166f1e
feat(cli): design doc + config
DahnJ Feb 10, 2025
a819028
docs: remove design doc
DahnJ Feb 11, 2025
d3ec0da
Merge branch 'main' into feat/dj-cli-start
DahnJ Feb 11, 2025
a2778a1
feat(cli): follow proposed structure
DahnJ Feb 11, 2025
6236b0a
Merge branch 'main' into feat/dj-cli-start
DahnJ Feb 11, 2025
1dbd8b5
chore: sync Cargo.lock back with main
DahnJ Feb 11, 2025
3f31866
style: whitespace
DahnJ Feb 11, 2025
5b6c181
feat(cli): CLI with a config
DahnJ Feb 15, 2025
449d984
refactor(cli): add TODOs + minor refactors
DahnJ Feb 15, 2025
3acf5d5
refactor(cli): better RepositoryDefinition structure
DahnJ Feb 15, 2025
94941dd
docs: typo
DahnJ Feb 15, 2025
7c79219
refactor: bin name cli -> icechunk
DahnJ Feb 15, 2025
691dd8c
feat(cli): design-doc-aligned interface
DahnJ Feb 16, 2025
18d4d08
feat(cli): Ancestry with snapshot number limit
DahnJ Feb 18, 2025
4d7afda
feat(cli): config init + config list
DahnJ Feb 18, 2025
5b03fbd
refactor(cli): remove default config - can be generated through confi…
DahnJ Feb 18, 2025
6d096e3
fix(cli): remove module, debug instead of display
DahnJ Feb 18, 2025
c464462
refactor(cli): add a tood
DahnJ Feb 18, 2025
7cdd087
refactor(cli): Use existing S3Credentials enum
DahnJ Feb 22, 2025
1c857e6
refactor(cli): Remove unnecessary pinning
DahnJ Feb 22, 2025
8f8d89f
feat(cli): Save to OS-dependent config dir
DahnJ Feb 23, 2025
38747c1
build(cli): CLI as optional dependency
DahnJ Feb 23, 2025
a02365b
refactor(cli): More readable user input matching
DahnJ Feb 23, 2025
1724fcd
feat(cli): No Python tracebacks + cleaner error handling
DahnJ Feb 24, 2025
4ac073b
feat(cli): Add Tigris, Gcs, Azure
DahnJ Feb 24, 2025
2593843
refactor(cli): Command functions accept config as an argument
DahnJ Feb 24, 2025
6feea11
feat(cli): Cleaner API + help messages
DahnJ Feb 24, 2025
3137ace
refactor(cli): Testable CLI functions + tests
DahnJ Feb 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@
/devel

.ipynb_checkpoints
.vscode
117 changes: 117 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/docs/icechunk-python/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ xr.testing.assert_identical(ds, ondisk)

Any task execution framework (e.g. `ProcessPoolExecutor`, Joblib, Lithops, Dask Distributed, Ray, etc.)
can be used instead of the `ThreadPoolExecutor`. However such workloads should account for
Icehunk being a "stateful" store that records changes executed in a write session.
Icechunk being a "stateful" store that records changes executed in a write session.

There are three key points to keep in mind:
1. The `write_task` function *must* return the `Session`. It contains a record of the changes executed by this task.
Expand Down
2 changes: 2 additions & 0 deletions icechunk-python/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ async-trait = "0.1.85"
typetag = "0.2.19"
serde = { version = "1.0.217", features = ["derive", "rc"] }
miette = { version = "7.5.0", features = ["fancy"] }
clap = { version = "4.5", features = ["derive"] }
anyhow = "1.0.95"

[lints]
workspace = true
4 changes: 4 additions & 0 deletions icechunk-python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,7 @@ known-first-party = ["dask", "distributed", "icechunk", "xarray", "zarr"]
[tool.ruff.lint.flake8-tidy-imports]
# Disallow all relative imports.
ban-relative-imports = "all"


[project.scripts]
icechunk = "icechunk._icechunk_python:cli_entrypoint"
34 changes: 34 additions & 0 deletions icechunk-python/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ mod streams;

use std::env;

use clap::error::ErrorKind;
use clap::Parser;
use config::{
PyAzureCredentials, PyAzureStaticCredentials, PyCachingConfig,
PyCompressionAlgorithm, PyCompressionConfig, PyCredentials, PyGcsBearerCredential,
Expand All @@ -31,6 +33,37 @@ use repository::{PyDiff, PyGCSummary, PyRepository, PySnapshotInfo};
use session::PySession;
use store::PyStore;

use icechunk::cli::interface::{run_cli, IcechunkCLI};
use pyo3::wrap_pyfunction;

#[pyfunction]
fn cli_entrypoint(py: Python) -> PyResult<()> {
let sys = py.import("sys")?;
let args: Vec<String> = sys.getattr("argv")?.extract()?;
match IcechunkCLI::try_parse_from(args.to_vec()) {
Ok(cli_args) => pyo3_async_runtimes::tokio::get_runtime().block_on(async move {
run_cli(cli_args).await.map_err(|e| {
PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string())
})?;
Ok(())
}),
// TODO (Daniel): Improve error handling & printout
Err(e) => match e.kind() {
ErrorKind::DisplayHelpOnMissingArgumentOrSubcommand
| ErrorKind::MissingRequiredArgument => {
print!("{}", e);
return Ok(());
}
_ => {
println!("{:?}", e.kind());
return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(
e.to_string(),
));
}
},
}
}

#[pyfunction]
fn initialize_logs() -> PyResult<()> {
if env::var("ICECHUNK_NO_LOGS").is_err() {
Expand Down Expand Up @@ -84,6 +117,7 @@ fn _icechunk_python(py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<PyDiff>()?;
m.add_function(wrap_pyfunction!(initialize_logs, m)?)?;
m.add_function(wrap_pyfunction!(spec_version, m)?)?;
m.add_function(wrap_pyfunction!(cli_entrypoint, m)?)?;

// Exceptions
m.add("IcechunkError", py.get_type::<IcechunkError>())?;
Expand Down
2 changes: 2 additions & 0 deletions icechunk/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ tracing-subscriber = { version = "0.3.19", features = [
tracing = "0.1.41"
err-into = "1.0.1"
serde_yaml_ng = "0.10.0"
clap = { version = "4.5", features = ["derive"] }
anyhow = "1.0.95"

[dev-dependencies]
pretty_assertions = "1.4.1"
Expand Down
31 changes: 31 additions & 0 deletions icechunk/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
repos:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this file src/bin/icechunk?

Copy link
Contributor Author

@DahnJ DahnJ Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now removed the file altogether (5b03fbd), since it can be generated by icechunk config init

Should we instead decide on a standard location for the config, not relative to the repo? How about $HOME/.config/icechunk/cli_config.yaml, obtained through dirs::config_dir()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that here 8f8d89f

On Mac, this actually defaults to $HOME/Library/Application Support.

localrepo: !LocalFileSystem
path: testfolder
config:
inline_chunk_threshold_bytes: null
unsafe_overwrite_refs: null
get_partial_values_concurrency: null
compression: null
caching: null
storage: null
virtual_chunk_containers: null
manifest: null
s3repo: !S3
location:
bucket: testbucket
prefix: testfolder
object_store_config:
region: eu-west-1
endpoint_url: null
anonymous: false
allow_http: false
credentials: FromEnv
config:
inline_chunk_threshold_bytes: null
unsafe_overwrite_refs: null
get_partial_values_concurrency: null
compression: null
caching: null
storage: null
virtual_chunk_containers: null
manifest: null
8 changes: 8 additions & 0 deletions icechunk/src/bin/icechunk/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
use clap::Parser;
use icechunk::cli::interface::{run_cli, IcechunkCLI};

#[tokio::main]
async fn main() {
let cli = IcechunkCLI::parse();
let _ = run_cli(cli).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return an error, something like:

fn main() -> Result<(), Box<dyn Error>>

we will want the status code to be != 0 if some error happened

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved error handling in 1724fcd

}
Loading
Loading