diff --git a/README.md b/README.md index d67ce872..df912f28 100644 --- a/README.md +++ b/README.md @@ -6,26 +6,27 @@ [![docs.rs](https://img.shields.io/docsrs/xvc)](https://docs.rs/xvc/) [![unsafe forbidden](https://img.shields.io/badge/unsafe-forbidden-success.svg)](https://github.com/rust-secure-code/safety-dance/) -A fast and robust MLOps tool to manage data and pipelines +Manage your unstructured data next to code in Git repositories and run commands when they change. -## ⌛ When to use xvc? +## ⌛ Why Xvc? -- When you have a photo, audio, media, or document collection to backup/version with Git, but don't want to copy that huge data to all Git clones. -- When you manage a large number of _unstructured_ data, like images, documents, and audio files. -- When you want to version data files, and want to track versions across datasets. -- When you want to store this data in local, SSH-accessible, or S3-compatible cloud storage. -- When you create data pipelines on top of this data and want to run these pipelines when the data, code, or other dependencies change. -- When you want to track which subset of the data you're working with, and how it changes by your operations. -- When you have binary artifacts that you use as dependencies and would like to have a `make` alternative that considers _content changes_ rather than timestamps. +- You have image, audio, media, document or asset files to [track/version/backup][xvc-file-track] along with the code, but [don't want to copy][xvc-file-recheck] that huge data to all Git clones. +- You want to [manage][xvc-file-list] unstructured data in multiple places with +[multiple subsets][xvc-file-copy], some (e.g. data) being read-only and some +(e.g. models, executables) change frequently. +- You want to [store][xvc-storage-new] this data in [local][xvc-storage-new-local], [SSH-accessible][xvc-storage-new-rsync], or [S3-compatible cloud storages][xvc-storage-new-s3] to share along the repository. +- You want to [specify commands][xvc-pipeline-step-new] that [run][xvc-pipeline-run] when only input data changes, define [pipelines][xvc-pipeline-new] with steps that run when only their [dependencies][xvc-pipeline-step-dependency] change. + - You want to define these dependencies with [files][xvc-pipeline-step-dependency-file], [globs][xvc-pipeline-step-dependency-glob] spanning multiple files, text file lines defined by [ranges][xvc-pipeline-step-dependency-line] or [regexes][xvc-pipeline-step-dependency-regex], [URLs][xvc-pipeline-step-dependency-url], [parameters][xvc-pipeline-step-dependency-params] in the YAML or JSON files, [SQLite queries][xvc-pipeline-step-dependency-sqlite] + or [any command][xvc-pipeline-step-dependency-generic] that produces output. -## ✳️ What is xvc for? +### ✅ Common Tasks -- (for x = files) Track large files on Git, store them in the cloud, create view-only subsets, retrieve them only when necessary. -- (for x = pipelines) Define and run data -> model pipelines whose dependencies may be files, hyperparameters, regex searches, arbitrary URLs, and more. +
+ 🔽 Installation -## 🔽 Installation - -You can get the binary files for Linux, macOS, and Windows from [releases](https://github.com/iesahin/xvc/releases/latest) page. Extract and copy the file to your `$PATH`. +You can get the binary files for Linux, macOS, and Windows from +[releases](https://github.com/iesahin/xvc/releases/latest) page. Extract and +copy the file to your `$PATH`. Alternatively, if you have Rust [installed], you can build xvc: @@ -35,46 +36,117 @@ $ cargo install xvc [installed]: https://www.rust-lang.org/tools/install -If you want to use Xvc with Python console and Jupyter notebooks, you can also install it with `pip`: +If you want to use Xvc with Python console and Jupyter notebooks, you can also +install it with `pip`: ```shell $ pip install xvc ``` -Note that pip installation doesn't make `xvc` available as a shell command. Please see [xvc.py](https://github.com/iesahin/xvc.py) for usage details. +Note that pip installation doesn't make `xvc` available as a shell command. +Please see [xvc.py] for details. + +[xvc.py]: https://github.com/iesahin/xvc.py + +### Completions -## 🏃🏾 Quicktart +Xvc supports dynamic completions for bash, zsh, elvish, fish and powershell. For example, run the following to add completions for bash: + +```bash +echo "source <(COMPLETE=bash xvc)" >> ~/.bashrc +``` -Xvc seamlessly monitors your files and directories on top of Git. To commence, execute the following command within the repository: +See [completions] section in the docs for others. -```console +[completions]: https://docs.xvc.dev/intro/completions + +
+ +
+ 🚀 + Initialize a directory for Xvc + + +```bash $ git init # if you're not already in a Git repository Initialized empty Git repository in [CWD]/.git/ $ xvc init ``` -This command initializes the `.xvc/` directory and adds a `.xvcignore` file for specifying paths you wish to conceal from Xvc. +This command initializes the `.xvc/` directory and adds a `.xvcignore` file for specifying paths you wish to hide from Xvc. + + > 💡**Tip**: + > Git is **not required** to run Xvc. However running Xvc with Git is usually + > a good idea. Xvc can stage/commit metadata files (under `.xvc/`) used to + > track binary files and you can use branches for versioning as well. By + > default, you won't have to deal with Git commands to commit these metadata + > files. + > + > If you don't want to use Xvc with Git, use `--no-git` option when + > initializing. + +
+ +
+ + 👣 + Add Files for Tracking + Include your data files and directories for tracking: ```shell -$ xvc file track my-data/ --as symlink +$ xvc file track my-data/ ``` -This command calculates content hashes for data (using BLAKE-3, by default) and logs them. The changes are committed to Git, and the files are copied to content-addressed directories within `.xvc/b3`. Additionally, read-only symbolic links to these directories are created. +[This command](https://docs.xvc.dev/ref/xvc-file-track.html) calculates content +hashes for data (using BLAKE-3, by default) and records them. Files are moved +to content-addressed directories under `.xvc/b3`. Then they are copied to the +workspace. + + > 💡**Tip**: + > You can specify different [recheck (checkout) + > methods](https://docs.xvc.dev/ref/xvc-file-recheck/) for files and + > directories depending on your use case. Symlinks and hardlinks to the + > files under Xvc cache don't consume additional space but they are readonly. + > You can also use (copy-on-write) reflinks if your file system supports it + > and Xvc is built with `reflink` feature. + +
+ +
+🫧 + Checkout a subset of files as symlinks + + +You can copy and recheck (checkout) subsets of files from Xvc cache as symlinks +to create multiple _views_. This is useful when you need a read-only access +that won't consume additional space. + +```bash +$ xvc file copy my-data/ another-view-to-my-data/ +$ xvc file recheck another-view-to-my-data/ --as symlink +``` + > 💡**Tip**: + > [`xvc file copy`][xvc-file-copy] and [`xvc file move`][xvc-file-move] + > doesn't require file contents to be available. Xvc works only with their + > metadata and you can organize files without their content copied to + > workspace or cache. + + > 💡**Tip**: + > If you installed [completions] to your shell, Xvc completes file names even + > if they are not available in the workspace. -You can specify different [recheck (checkout) methods](https://docs.xvc.dev/ref/xvc-file-recheck/) for files and directories, depending on your use case. -If you need to track model files that change frequently, you can set recheck method `--as copy` (the default). +
-```shell -$ xvc file track my-models/ --as copy -``` +
+ 🌁 Send files to the cloud services -Configure a cloud storage to share the files you added. +Configure a cloud storage to share the files you track with Xvc. ```shell -$ xvc storage new s3 --name my-storage --region us-east-1 --bucket-name my-xvc-remote +$ xvc storage new s3 --name my-storage --region us-east-1 --bucket-name xvc ``` You can send the files to this storage. @@ -83,8 +155,37 @@ You can send the files to this storage. $ xvc file send --to my-storage ``` -When you (or someone else) want to access these files later, you can clone the Git repository and get the files from the -storage. +You can also send a subset of the files. + +```shell +$ xvc file send 'my-data/training/*' --to my-storage +``` + +Xvc [supports](https://docs.xvc.dev/ref/xvc-storage-new) [external directories](https://docs.xvc.dev/ref/xvc-storage-new-local), [Rsync](https://docs.xvc.dev/ref/xvc-storage-new-rsync), [AWS S3](https://docs.xvc.dev/ref/xvc-storage-new-s3), [Google Cloud Storage](https://docs.xvc.dev/ref/xvc-storage-new-gcs), [MinIO](https://docs.xvc.dev/ref/xvc-storage-new-minio), [Cloudflare R2](https://docs.xvc.dev/ref/xvc-storage-new-r2), [Wasabi](https://docs.xvc.dev/ref/xvc-storage-new-wasabi), [Digital Ocean Spaces](https://docs.xvc.dev/ref/xvc-storage-new-digital-ocean). Please [create an issue](https://github.com/iesahin/xvc/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen) if you want Xvc to support another cloud storage service. + +> 💡**Tip**: +> Xvc also supports any command to upload/download files. If your favorite +> service is not listed or you want to use another tool (s5cmd, rclone, etc.), +> you can specify a [generic](https://docs.xvc.dev/ref/xvc-storage-new-generic) +> storage by supplying shell commands to upload and download. + +> 📌 **Important**: +> Xvc never stores credentials to your connections and expects them to be +> available in the environment. It _never_ makes network requests (for +> tracking, statistics, etc.) without your knowledge. You can +> [compile](https://docs.xvc.dev/intro/compile-without-default-features) +> without cloud connection support in case you want to make sure that it +> makes no connections to outside services. + +
+ +
+ 🪣 + Get Files from cloud services + + +When you (or someone else) want to access these files later, you can clone the +Git repository and [get the files][xvc-file-bring] from the storage. ```shell $ git clone https://example.com/my-machine-learning-project @@ -95,90 +196,224 @@ $ xvc file bring my-data/ --from my-storage ``` -This approach ensures convenient access to files from the shared storage when needed. - -You don't have to reconfigure the storage after cloning, but you need to have valid credentials as environment variables -to access the storage. -Xvc never stores any credentials. +This approach ensures convenient access to files from the shared storage when +needed. -If you have commands that depend on data or code elements, you can configure a pipeline. + > 💡**Tip**: + > You don't have to reconfigure the storage after cloning, but you need to + > have valid credentials as environment variables to access the storage. Xvc + > never stores any credentials. -For this example, we'll use [a Python script](https://github.com/iesahin/xvc/blob/main/workflow_tests/templates/README.in/generate_data.py) to generate a data set with random names with random IQ scores. +
-The script uses the Faker library and this library must be available where you run the pipeline. To make it repeatable, we start the pipeline by adding a step that installs dependencies. +
+ 🫖 + Share files from cloud storages for a limited time + + + You can share Xvc tracked files from S3 compatible storages for a specified period. -```console -$ xvc pipeline step new --step-name install-deps --command 'python3 -m pip install --quiet --user -r requirements.txt' +```shell +$ xvc file share --storage my-storage dir-0001/file-0001.bin --duration 1h +https://my-storage.s3.eu-central-1.amazonaws.com/xvc.... ``` -We'll make this this step to depend on `requirements.txt` file, so when the file changes it will make the step run. +You can share the link with others and they will be able to access to the file +hour. The default period is 24 hours. + +
-```console -$ xvc pipeline step dependency --step-name install-deps --file requirements.txt +
+ 🥤Create a data pipeline + +Suppose you have a script to preprocess files in a directory and you want to +run this when the files in `my-data/train` directory changes. We first define a +step in the pipeline that will run the script. + +```bash +$ xvc pipeline step new --step-name preprocess --command 'python3 src/preprocess.py' ``` -Xvc allows to create dependencies between pipeline steps. Dependent steps wait for dependencies to finish successfully. +Each command is associated with a step and each step has a command. + +
+ +
+ 🔗 Add a dependency to a pipeline step + +When we want to create a dependency for a command, we use [`xvc pipeline step +dependency`][xvc-pipeline-step-dependency] command with various parameters. + +We want to define to dependencies for the `preprocess` step we created previously. +We'll make `preprocess` step to depend on: + +- The `src/preprocess.py` source file itself, so when we change the script, we'll run the step again + +```bash +$ xvc pipeline step dependency --step-name preprocess --file src/preprocess.py +``` -Now we create a step to run the script and make `install-deps` step a dependency of it. +- `data/raw/*.jpg` files that the script works on. -```console -$ xvc pipeline step new --step-name generate-data --command 'python3 generate_data.py' -$ xvc pipeline step dependency --step-name generate-data --step install-deps +```bash +$ xvc pipeline step dependency -s preprocess --glob 'data/raw/*jpg' ``` +> ⚠️ Most of the shells expand globs before running the command, so you need to +> quote glob to pass these as strings without expansion. Xvc expands these +> globs itself. + +
+ +
+ 🛝 Run pipeline + After you define the pipeline, you can run it by: -```console +```bash $ xvc pipeline run -[DONE] install-deps (python3 -m pip install --quiet --user -r requirements.txt) -[OUT] [generate-data] CSV file generated successfully. +[DONE] preprocess (python3 src/preprocess.py) +[OUT] [preprocess] +... -[DONE] generate-data (python3 generate_data.py) +[DONE] preprocess (python3 src/preprocess.py) ``` -Xvc allows many kinds of dependnecies, like [files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#file-dependencies), -[groups of files and directories defined by globs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob-dependencies), -[regular expression searches in files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex-dependencies), -[line ranges in files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line-dependencies), -[hyper-parameters defined in YAML, JSON or TOML files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#hyper-parameter-dependencies) -[HTTP URLs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#url-dependencies), -[shell command outputs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#generic-command-dependencies), -and [other steps](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#step-dependencies). +> 💡 Xvc runs pipeline steps in parallel if they are not interdependent. You +> can specify the maximum number of parallel processes. + +
+ +
+ 🪡 + Add fine grained dependencies to steps + + +Xvc allows many kinds of dependencies: + +- Steps can explicitly depend on [other steps][xvc-p-s-d-step] when they are required to run serially. + +- Steps can depend on [single files][xvc-p-s-d-file] or groups of files defined +by [globs][xvc-p-s-d-glob]. For globs, you can also get which files are added, +deleted or updated with [glob-items][xvc-p-s-d-glob-items]. + + > 💡 Similar to Git, Xvc doesn't track directories per se. You can define + > glob dependencies that describe files in directory like `dir/*` when you + > want to track all files in in. + +[xvc-p-s-d-step]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#step-dependency +[xvc-p-s-d-file]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#file-dependency +[xvc-p-s-d-glob]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob-dependency +[xvc-p-s-d-glob-items]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob-items-dependency + +- You can specify steps to depend only to a subset of lines in a file with +[line ranges][xvc-p-s-d-line] or [regular expressions][xvc-p-s-d-regex]. You +can also get which lines are added, deleted or updated with more granular +[line-items][xvc-p-s-d-line-items] or [regex-items][xvc-p-s-d-regex-items] +dependencies. + +[xvc-p-s-d-regex]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex-dependency +[xvc-p-s-d-regex-items]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex-items-dependency +[xvc-p-s-d-line]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line-dependency +[xvc-p-s-d-line-items]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line-items-dependency + +- If you track (hyper)parameters for building/model training process in JSON or +YAML files, you can specify steps to [depend on these parameters][xvc-p-s-d-params]. + +[xvc-p-s-d-params]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#hyper-parameter-dependencies + +- If you want your steps to run when an HTTP(S) URL's content change, you can +specify this with [URL dependencies][xvc-p-s-d-url] + +[xvc-p-s-d-url]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#url-dependencies + +- If you want your step to run when the output from an SQLite query change, you can specify it with [SQLite dependencies.][xvc-p-s-d-sqlite] + +[xvc-p-s-d-sqlite]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#sqlite-query-dependency + +- If none of the dependency types are fit for your needs, you can also specify a [command][xvc-p-s-d-generic] that will be run to check if a step is invalidated. + +[xvc-p-s-d-generic]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#generic + +
+ 🖇️ Example to add a dependency when only certain lines in a file change + +Suppose you have a list of IQ scores in a file. + +```csv +Ada Harris,128 +Alan Thompson,125 +Brian Shaffer,122 +Brian Wilson,94 +Dr. Brittany Chang,103 +Brittany Smith,104 +David Brown,113 +Emily Davis,97 +Grace White,130 +James Taylor,101 +Dr. Jane Doe,105 +Jessica Lee,102 +John Smith,110 +Laura Martinez,110 +Dr. Linus Martin,118 +Mallory Johnson,105 +Mallory Payne MD,99 +Margaret Clark,122 +Michael Johnson,92 +Robert Anderson,105 +Sarah Wilson,104 +Sherry Brown,115 +Sherry Leonard,117 +Susan Davis,107 +Dr. Susan Swanson,132 +``` -Suppose you're only interested in the IQ scores of those with _Dr._ in front of their names and how they differ from the rest in the dataset we created. Let's create a regex search dependency to the data file that will show all _doctors_ IQ scores. -```console +We're only interested in the IQ scores of those with _Dr._ in front of +their names. Let's create a regex search dependency to run a command when only +a line with a _Dr._ title is added to the file. + + +Our command will be collecting all lines with an initial _Dr._ to another file. + +```bash $ xvc pipeline step new --step-name dr-iq --command 'echo "${XVC_ADDED_REGEX_ITEMS}" >> dr-iq-scores.csv ' -$ xvc pipeline step dependency --step-name dr-iq --regex-items 'random_names_iq_scores.csv:/^Dr\..*' +$ xvc pipeline step dependency --step-name dr-iq --regex-items 'iq-scores.csv:/^Dr\..*' ``` -The first line specifies a command, when run writes `${XVC_ADDED_REGEX_ITEMS}` environment variable to `dr-iq-scores.csv` file. -The second line specifies the dependency which will also populate the `$[XVC_ADDED_REGEX_ITEMS]` environment variable in the command. +The first line specifies a command, when run writes `${XVC_ADDED_REGEX_ITEMS}` +environment variable to `dr-iq-scores.csv` file. + +The second line specifies the dependency which will also populate the +`${XVC_ADDED_REGEX_ITEMS}` environment variable in the command. -Some dependency types like [regex items], -[line items] and [glob items] inject environment variables in the commands they are a dependency. -For example, if you have two million files specified with a glob, but want to run a script only on the added files after the last run, you can use these environment variables. +Some dependency types like [regex items][xvc-p-s-d-regex-items], [line +items][xvc-p-s-d-line-items] and [glob items][xvc-p-s-d-glob-items] inject +environment variables to the shells running the step commands. If you have +thousands of files specified by a glob, but want to run a script only on the +added files after the last run, you can use these environment variables. -When you run the pipeline again, a file named `dr-iq-scores.csv` will be created. Note that, as `requirements.txt` didn't change `install-deps` step and its dependent `generate-data` steps didn't run. +When you run the pipeline, a file named `dr-iq-scores.csv` will be created. -```console +```bash $ xvc pipeline run [DONE] dr-iq (echo "${XVC_ADDED_REGEX_ITEMS}" >> dr-iq-scores.csv ) $ cat dr-iq-scores.csv -Dr. Brian Shaffer,122 -Dr. Brittany Chang,82 -Dr. Mallory Payne MD,70 -Dr. Sherry Leonard,93 -Dr. Susan Swanson,81 +Dr. Brittany Chang,103 +Dr. Jane Doe,105 +Dr. Linus Martin,118 +Dr. Susan Swanson,132 ``` -We are using this feature to get lines starting with `Dr.` from the file and write them to another file. When the file changes, e.g. another record matching the dependency regex added to the `random_names_iq_scores.csv` file, it will also be added to `dr-iq-scores.csv` file. +When the file changes, e.g. another line matching the dependency regex added +to the `iq-scores.csv` file, the command will add to +`dr-iq-scores.csv` file. -```console -$ zsh -cl 'echo "Dr. Albert Einstein,144" >> random_names_iq_scores.csv' +```bash +$ zsh -cl 'echo "Dr. John Doe,123" >> iq-scores.csv' $ xvc pipeline run [DONE] dr-iq (echo "${XVC_ADDED_REGEX_ITEMS}" >> dr-iq-scores.csv ) @@ -189,31 +424,30 @@ Dr. Brittany Chang,82 Dr. Mallory Payne MD,70 Dr. Sherry Leonard,93 Dr. Susan Swanson,81 -Dr. Albert Einstein,144 +Dr. John Doe,123 ``` -Now we want to add a another command that draws a fancy histogram from `dr-iq-scores.csv`. As this new step must wait `dr-iq-scores.csv` file to be ready, we'll define `dr-iq-scores.csv` as an _output_ of `dr-iq` step and set the file as a dependency to this new `visualize` step. - -```console -$ xvc pipeline step output --step-name dr-iq --output-file dr-iq-scores.csv -$ xvc pipeline step new --step-name visualize --command 'python3 visualize.py' -$ xvc pipeline step dependency --step-name visualize --file dr-iq-scores.csv -$ xvc pipeline run -[ERROR] Step visualize finished UNSUCCESSFULLY with command python3 visualize.py - -``` +Note that, `${XVC_ADDED_REGEX_ITEMS}` has only the added lines, not all of the +lines the regex match. So, we can just work on the added elements, without +rerunning the commands for all matching elements. -You can get the pipeline in Graphviz DOT format to convert to an image. +
-```console -$ zsh -cl 'xvc pipeline dag | dot -opipeline.png' +
+ 🛃 + Export, edit and import a pipeline with YAML or JSON files + -``` +Unlike some other tools, Xvc doesn't require (or allow) to specify pipelines in +YAML files. Nevertheless, you can [export][xvc-p-e] and [import][xvc-p-i] the pipeline to JSON or +YAML to edit in your editor. You can fix typos in commands, remove steps +completely, or duplicate the pipeline with a new name this way. -You can also export and import the pipeline to JSON to edit in your editor. +[xvc-p-e]: https://docs.xvc.dev/ref/xvc-pipeline-export +[xvc-p-i]: https://docs.xvc.dev/ref/xvc-pipeline-import -```console +```bash $ xvc pipeline export --file my-pipeline.json $ cat my-pipeline.json @@ -304,7 +538,7 @@ $ cat my-pipeline.json "Dr. Sherry Leonard,93", "Dr. Albert Einstein,144" ], - "path": "random_names_iq_scores.csv", + "path": "iq-scores.csv", "regex": "^Dr//..*", "xvc_metadata": { "file_type": "File", @@ -348,50 +582,53 @@ $ cat my-pipeline.json } ``` -You can edit the file to change commands, add new dependencies, etc. and import it back to Xvc. +After you edit the file with changes, you can import the file to check its +consistency and update the pipeline definition. -```console +```bash $ xvc pipeline import --file my-pipeline.json --overwrite ``` -Lastly, if you noticed that the commands are long to type, there is an `xvc aliases` command that prints a set of aliases for commands. You can source the output in your `.zshrc` or `.bashrc`, and use the following commands instead, e.g., `xvc pipelines run` becomes `pvc run`. - -```console -$ xvc aliases - -alias xls='xvc file list' -alias pvc='xvc pipeline' -alias fvc='xvc file' -alias xvcf='xvc file' -alias xvcft='xvc file track' -alias xvcfl='xvc file list' -alias xvcfs='xvc file send' -alias xvcfb='xvc file bring' -alias xvcfh='xvc file hash' -alias xvcfco='xvc file checkout' -alias xvcfr='xvc file recheck' -alias xvcp='xvc pipeline' -alias xvcpr='xvc pipeline run' -alias xvcps='xvc pipeline step' -alias xvcpsn='xvc pipeline step new' -alias xvcpsd='xvc pipeline step dependency' -alias xvcpso='xvc pipeline step output' -alias xvcpi='xvc pipeline import' -alias xvcpe='xvc pipeline export' -alias xvcpl='xvc pipeline list' -alias xvcpn='xvc pipeline new' -alias xvcpu='xvc pipeline update' -alias xvcpd='xvc pipeline dag' -alias xvcs='xvc storage' -alias xvcsn='xvc storage new' -alias xvcsl='xvc storage list' -alias xvcsr='xvc storage remove' +
+ +
+ 🎋 + Visualize a pipeline in Graphviz or Mermaid + + +You can get the pipeline in Graphviz DOT format to convert to an image. + +```bash +$ zsh -cl 'xvc pipeline dag --format graphviz | dot -opipeline.png' ``` -Please create an issue or discussion for any other kinds of dependencies that you'd like to be included. +You can also ask for a [mermaid] diagram; -I'm planning to add [data label and annotations tracking](https://github.com/iesahin/xvc/discussions/208)), [experiments tracking](https://github.com/iesahin/xvc/discussions/207)), [model tracking](https://github.com/iesahin/xvc/discussions/211)), encrypted cache, server to control all commands from a web interface, and more as my time permits. + +```bash +xvc pipeline dag --format mermaid +flowchart TD + n0["preprocess"] + n1["data/*"] --> n0 + n2["train"] + n0["preprocess"] --> n2 + +``` + +[mermaid]: https://mermaid.js.org + +You can embed this output in Markdown files, Github PRs or Jupyter notebooks. + +```mermaid +flowchart TD + n0["preprocess"] + n1["data/*"] --> n0 + n2["train"] + n0["preprocess"] --> n2 +``` + +
Please check [`docs.xvc.dev`](https://docs.xvc.dev) for documentation. @@ -473,3 +710,12 @@ Given that I'm working on this for the last two years for pure technical bliss, This software is fresh and ambitious. Although I use it and test it close to real-world conditions, it didn't go under the test of time. **Xvc can eat your files and spit them into the eternal void!** Please take backups. + + +[xvc-file-track]: https://docs.xvc.dev/ref/xvc-file-track +[xvc-file-list]: https://docs.xvc.dev/ref/xvc-file-list +[xvc-file-recheck]: https://docs.xvc.dev/ref/xvc-file-recheck +[xvc-file-send]: https://docs.xvc.dev/ref/xvc-file-send +[xvc-file-bring]: https://docs.xvc.dev/ref/xvc-file-bring +[xvc-file-copy]: https://docs.xvc.dev/ref/xvc-file-copy +[xvc-file-move]: https://docs.xvc.dev/ref/xvc-file-move diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 6a2a1ec7..32e01b2b 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -33,6 +33,7 @@ - [`xvc file carry-in`](./ref/xvc-file-carry-in.md) - [`xvc file send`](./ref/xvc-file-send.md) - [`xvc file bring`](./ref/xvc-file-bring.md) + - [`xvc file share`](./ref/xvc-file-share.md) - [`xvc file move`](./ref/xvc-file-move.md) - [`xvc file copy`](./ref/xvc-file-copy.md) - [`xvc file remove`](./ref/xvc-file-remove.md) diff --git a/book/src/ref/xvc-file-share.md b/book/src/ref/xvc-file-share.md index 04c1cd6b..e40d9f4f 100644 --- a/book/src/ref/xvc-file-share.md +++ b/book/src/ref/xvc-file-share.md @@ -1,4 +1,4 @@ -# xvc file send +# xvc file share ## Synopsis diff --git a/book/src/ref/xvc-pipeline-step-dependency-file.md b/book/src/ref/xvc-pipeline-step-dependency-file.md index 947af167..832036c5 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-file.md +++ b/book/src/ref/xvc-pipeline-step-dependency-file.md @@ -1,4 +1,4 @@ -### File Dependencies +### File This command works only in Xvc repositories. diff --git a/book/src/ref/xvc-pipeline-step-dependency-generic.md b/book/src/ref/xvc-pipeline-step-dependency-generic.md index 44689a57..cd88a9cb 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-generic.md +++ b/book/src/ref/xvc-pipeline-step-dependency-generic.md @@ -1,4 +1,4 @@ -### Generic Command Dependencies +### Generic Command This command works only in Xvc repositories. diff --git a/book/src/ref/xvc-pipeline-step-dependency-glob-items.md b/book/src/ref/xvc-pipeline-step-dependency-glob-items.md index 7e0a3cdc..dd0ffcef 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-glob-items.md +++ b/book/src/ref/xvc-pipeline-step-dependency-glob-items.md @@ -1,4 +1,4 @@ -### Glob Items Dependency +### Glob Items A step can depend on multiple files specified with globs. When any of the files change, or a new file is added or removed from the files specified by glob, the step is invalidated. diff --git a/book/src/ref/xvc-pipeline-step-dependency-glob.md b/book/src/ref/xvc-pipeline-step-dependency-glob.md index e3d2103a..fb233919 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-glob.md +++ b/book/src/ref/xvc-pipeline-step-dependency-glob.md @@ -1,4 +1,4 @@ -### Glob Dependencies +### Glob A step can depend on multiple files specified with globs. The difference with this and [glob-items dependency](./xvc-pipeline-step-dependency-glob-items.md) diff --git a/book/src/ref/xvc-pipeline-step-dependency-line-items.md b/book/src/ref/xvc-pipeline-step-dependency-line-items.md index 7311478a..530305c6 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-line-items.md +++ b/book/src/ref/xvc-pipeline-step-dependency-line-items.md @@ -1,4 +1,4 @@ -### Line Item Dependencies +### Line Items You can make your steps to depend on lines of text files. The lines are defined by starting and ending indices. diff --git a/book/src/ref/xvc-pipeline-step-dependency-lines.md b/book/src/ref/xvc-pipeline-step-dependency-lines.md index a947f2ac..02e421ea 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-lines.md +++ b/book/src/ref/xvc-pipeline-step-dependency-lines.md @@ -1,4 +1,4 @@ -### Line Dependencies +### Line You can make your steps to depend on lines of text files. The lines are defined by starting and ending indices. diff --git a/book/src/ref/xvc-pipeline-step-dependency-param.md b/book/src/ref/xvc-pipeline-step-dependency-param.md index fbcb7520..82e91a2f 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-param.md +++ b/book/src/ref/xvc-pipeline-step-dependency-param.md @@ -1,4 +1,4 @@ -### (Hyper-)Parameter Dependencies +### (Hyper-)Parameter You may be keeping pipeline-wide parameters in structured text files. You can specify such parameters found in JSON, TOML and YAML files as dependencies. diff --git a/book/src/ref/xvc-pipeline-step-dependency-regex-items.md b/book/src/ref/xvc-pipeline-step-dependency-regex-items.md index 90a6a9fb..0639c030 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-regex-items.md +++ b/book/src/ref/xvc-pipeline-step-dependency-regex-items.md @@ -1,4 +1,4 @@ -### Regex Item Dependencies +### Regex Items You can specify a regular expression matched against the lines from a file as a dependency. The step is invalidated when the matched results changed. diff --git a/book/src/ref/xvc-pipeline-step-dependency-regex.md b/book/src/ref/xvc-pipeline-step-dependency-regex.md index 3bff6b7a..d2d3cc3b 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-regex.md +++ b/book/src/ref/xvc-pipeline-step-dependency-regex.md @@ -1,4 +1,4 @@ -### Regex Dependencies +### Regex You can specify a regular expression matched against the lines from a file as a dependency. The step is invalidated when the matched results changed. diff --git a/book/src/ref/xvc-pipeline-step-dependency-sqlite-query.md b/book/src/ref/xvc-pipeline-step-dependency-sqlite-query.md index 9d7d7894..1b2c9140 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-sqlite-query.md +++ b/book/src/ref/xvc-pipeline-step-dependency-sqlite-query.md @@ -1,4 +1,4 @@ -### SQLite Query Dependency +### SQLite Query You can create a step dependency with an SQLite query. When the query results change, the step is invalidated. diff --git a/book/src/ref/xvc-pipeline-step-dependency-step.md b/book/src/ref/xvc-pipeline-step-dependency-step.md index 4d01a33d..bac068c0 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-step.md +++ b/book/src/ref/xvc-pipeline-step-dependency-step.md @@ -1,5 +1,4 @@ - -### Step Dependencies +### Step This command works only in Xvc repositories. diff --git a/book/src/ref/xvc-pipeline-step-dependency-url.md b/book/src/ref/xvc-pipeline-step-dependency-url.md index 4df471bf..04457015 100644 --- a/book/src/ref/xvc-pipeline-step-dependency-url.md +++ b/book/src/ref/xvc-pipeline-step-dependency-url.md @@ -1,4 +1,4 @@ -### URL Dependencies +### URL This command works only in Xvc repositories. diff --git a/run-tests.zsh b/run-tests.zsh index 4617647e..7780d33c 100755 --- a/run-tests.zsh +++ b/run-tests.zsh @@ -1,6 +1,6 @@ # LLVM_PROFILE_FILE="${TMPDIR}/xvc-%p-%m.profraw" CARGO_INCREMENTAL=0 RUSTFLAGS="-Cinstrument-coverage" XVC_TRYCMD_TESTS=storage,file,pipeline,core,start TRYCMD=overwrite cargo llvm-cov --features test-ci --lcov --output-path lcov.info -p xvc # --test z_test_docs # rws is for local dev, run-with-secrets script -# XVC_TRYCMD_TESTS=storage,file,pipeline,core,start TRYCMD=overwrite rws cargo test --features test-ci -p xvcr +# XVC_TRYCMD_TESTS=storage,file,pipeline,core,start TRYCMD=overwrite rws cargo test --features test-ci -p xvc # XVC_TRYCMD_TESTS=storage,file,pipeline,core,start TRYCMD=overwrite cargo test --features test-ci -p xvc --test z_test_docs diff --git a/rut-toolchain.toml b/rust-toolchain.toml similarity index 100% rename from rut-toolchain.toml rename to rust-toolchain.toml