Releases: dathere/qsv
0.106.0
This release features the new Polars-powered sqlp
command which allows you to run SQL queries against CSVs.
Initial tests show that its performance is competitive with DuckDB and faster than DataFusion on identical SQL queries, and it just runs rings around pandas sql.
It converts Polars SQL (a subset of ANSI SQL) queries to multi-threaded LazyFrames expressions and then executes them. This is a very powerful feature and allows you to do things like joins, aggregations, group bys, etc. on larger than memory CSVs. The sqlp
command is still experimental and we are looking for feedback on it. Please try it out and let us know what you think.
Added
sqlp
: new command to allow Polars SQL queries against CSVs #1015
Changed
- Bump csv from 1.2.1 to 1.2.2 by @dependabot in #1008
- Bump pyo3 from 0.18.3 to 0.19.0 by @dependabot in #1007
- workflow for creating msi for qsv by @minhajuddin2510 in #1009
- migrate from once_cell to std::sync::oncelock #1010
- Bump qsv_docopt from 1.2.2 to 1.3.0 by @dependabot in #1011
- Bump self_update from 0.36.0 to 0.37.0 by @dependabot in #1014
- Bump indicatif from 0.17.4 to 0.17.5 by @dependabot in #1013
- Bump cached from 0.43.0 to 0.44.0 by @dependabot in #1012
- Bump url from 2.3.1 to 2.4.0 by @dependabot in #1016
- Wix changes by @minhajuddin2510 in #1017
- Bump actions/github-script from 5 to 6 by @dependabot in #1018
- Bump regex from 1.8.3 to 1.8.4 by @dependabot in #1019
- Bump hashbrown from 0.13.2 to 0.14.0 by @dependabot in #1020
- Bump tempfile from 3.5.0 to 3.6.0 by @dependabot in #1021
- Bump sysinfo from 0.29.0 to 0.29.1 by @dependabot in #1023
- Bump qsv-dateparser from 0.8.2 to 0.9.0 by @dependabot in #1022
- Bump qsv-sniffer from 0.9.3 to 0.9.4 by @dependabot in #1024
- Bump qsv-stats from 0.9.0 to 0.10.0 3803579
- Bump embedded luau from 0.577 to 0.579
- Bump data-encoding from 2.3.3 to 2.4.0 2285a12
- cargo update bump several indirect dependencies
- change MSRV to 1.70.0
- pin Rust nightly to 2023-06-06
Full Changelog: 0.105.1...0.106.0
0.105.1
All "unsafe" code has been removed. By selectively using asserts, we obviate the need to use explicit unchecked logic to skip unnecessary bounds checking.
Changed
stats
: remove all unsafes 4a4c010fetch
&fetchpost
: remove unsafe 1826bb3validate
: remove unsafe 742ccb3- normalize
--user-agent
option across all of qsv feff90b & 839b3b7 - bump qsv-dateparser from 0.8.1 to 0.8.2 which also uses chrono 0.4.26
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-05-29
Fixed
- remove chrono pin to 0.4.24 and upgrade to 0.4.26 which fixed 0.4.25 CI test failures 7636d82
Full Changelog: 0.105.0...0.105.1
0.105.0
Added
sniff
: added --harvest-mode convenience option #997sniff
: added --quick option on Linux e16df6f- qsv (pronounced "Quicksilver") now has a tagline - "Hi ho, QuickSilver! Away!" 😄 d32aeb1
Changed
sniff
: if --no-infer is enabled when sniffing a snappy file, just return the snappy mime type #996sniff
: now returns filesize and last-modified date in errors. 2162659stats
: minor performance tweaks in hot compute loop f61198c- qsv binary variants built using older glibc/musl libraries are now published with their respective glibc/musl version suffixes (glibc-2.31/musl-1.1.24) in the filename, instead of just the "older" suffix.
- pin chrono to 0.4.24 as the new 0.4.25 is breaking CI tests cde3623
- Bump calamine from 0.19.1 to 0.20.0 ec7e2df
- Bump actions/setup-python from 4.6.0 to 4.6.1 by @dependabot in #991
- Bump flexi_logger from 0.25.4 to 0.25.5 by @dependabot in #992
- Bump regex from 1.8.2 to 1.8.3 by @dependabot in #993
- Bump csvs_convert from 0.8.3 to 0.8.4 by @dependabot in #994
- Bump log from 0.4.17 to 0.4.18 by @dependabot in #998
- Bump polars from 0.29.0 to 0.30.0 by @dependabot in #999
- Bump tokio from 1.28.1 to 1.28.2 by @dependabot in #1000
- Bump once_cell from 1.17.1 to 1.17.2 by @dependabot in #1003
- Bump indicatif from 0.17.3 to 0.17.4 by @dependabot in #1001
- cargo bump update several indirect dependencies
- pin Rust nightly to 2023-05-28
Removed
excel
: removed kludgy --dates-whitelist option #1005
Fixed
sniff
: fix inconsistent mime type detection #995
Full Changelog: 0.104.1...0.105.0
0.104.1
Added
- added new publishing workflow to build binary variants using older glibc 2.31 instead of glibc 2.35 and musl 1.1.24 instead of musl 1.2.2. This will allow users running on older Linux distros (e.g. Debian, Ubuntu 20.04) to run qsv prebuilt binaries with "older" glibc/musl versions. 1a08b92
Changed
sniff
: improved usage text d2b32acsniff
: if sniffing a URL, and server does not return content-length or last-modified headers, set filesize and last-modified to "Unknown" d4a64acfrequency
: use SIMD accelerated utf8 validation in hot loop 33406a1foreach
: use simdut8 validation df6b4f8apply
: use simdutf8 validation in decode operation; also tweak it to avoid panics (however unlikely) adf7052- update install & build instructions with magic
- Bump regex from 1.8.1 to 1.8.2 by @dependabot in #990
- Bump bumpalo from 3.12.2 to 3.13.0
- pin Rust nightly to 2021-05-22
Removed
sniff
: disabled --progressbar option on qsvdp binary variant 1a20edb
Fixed
- updated publishing workflows to properly enable magic feature (for sniff mime type detection) 136211f
Full Changelog: 0.104.0...0.104.1
0.104.0
Added
sniff
: add --no-infer option only available on Linux. Using this option makessniff
work as a general mime type detector - retrieving detected mime type, file size (content-length when sniffing a URL), and last modified date.
When sniffing a URL with --no-infer, it only sniffs the first downloaded chunk, making it very fast even for very large remote files. This option was designed to facilitate accelerated harvesting and broken/stale link checking on CKAN. #987excel
: add canonical_filename to metadata #985snappy
: now accepts url input #986sample
: support url input #989
Changed
- Bump qsv-sniffer from 0.9.2 to 0.9.3 by @dependabot in #979
- Bump console from 0.15.5 to 0.15.6 by @dependabot in #980
- Bump jql-runner from 6.0.7 to 6.0.8 by @dependabot in #981
- Bump console from 0.15.6 to 0.15.7 by @dependabot in #988
- Bump embedded Luau from 0.576 to 0.577
- apply select clippy recommendations
- tweaked emojis used in Available Commands legend - 🗜️ to 🤯 to denote memory-intensive commands that load the entire CSV into memory; 🪗 to 😣 to denote commands that need addl memory proportional to the cardinality of the columns being processed; 🌐 to denote commands that have web-aware options
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-05-21
Fixed
excel
: Handle ranges larger than the sheet by @bluepython508 in #984
Full Changelog: 0.103.1...0.104.0
0.103.1
Changed
- Bump reqwest from 0.11.17 to 0.11.18 by @dependabot in #978
- cargo update bump indirect dependencies
Fixed
- fix
cargo install
failing as it is trying to fetch cargo environment variables that are only set forcargo build
, but notcargo install
#977
Full Changelog: 0.103.0...0.103.1
0.103.0
Added
sniff
: On Linux, short-circuit sniffing a remote file when we already know its not a CSV #976stats
: now computes variance for dates e3e6782stats
: now automatically invalidates cached stats across qsv releases 6e929dd- add magic version to --version option 455c0f2
- added CKAN-aware (
) legend to List of Available Commands
Changed
stats
: improve usage textstats
: use extend_from_slice for readability 23275e2validate
: do not panic if the input is not UTF-8 532cd01sniff
: simplify getting stdin last_modified property; on Linux, return detected mime type in JSON error response 0197591luau
: update embedded Luau from 0.573 to 0.576- Update nightly build instructions
- Bump qsv-sniffer from 0.9.1 to 0.9.2 by @dependabot in #972
- Bump tokio from 1.28.0 to 1.28.1 by @dependabot in #973
- Bump serde from 1.0.162 to 1.0.163 by @dependabot in #974
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-05-13
Full Changelog: 0.102.1...0.103.0
0.102.1
0.102.1 is a small patch release to fix issues in publishing the pre-built binary variants with magic for sniff
when cross-compiling.
Changed
stats
: refine--infer-boolean
option info & update test count de6390btojsonl
: refine boolcheck_first_lower_char() fn 241115e
Fixed
- tweaked GitHub Actions publishing workflows to enable building magic-enabled
sniff
on Linux. Disabled magic when cross-compiling for non-x86_64 Linux targets.
Full Changelog: 0.102.0...0.102.1
0.102.0
A lot of work was done on sniff
to make it not just a CSV dialect detector, but a general purpose file type detector leveraging 🪄 magic ✨ - able to detect mime types even for files on URLs.
sniff
can now also use the same data types as stats
with the --stats-types
option. This was primarily done to support metadata collection when registering CKAN resources not only during data entry, but also when checking resource links for bitrot, and when harvesting metadata from other systems, so stats
& sniff
can be used interchangeably based on the response time requirement and the data quality of the data source.
For example, sniff
can be used for quickly inferring metadata by just downloading a small sample from a very large data file DURING data entry ("Resource-first upload workflow"), with stats
being used later on, when the data is actually being pushed to the Datastore with Datapusher+, when data type inferences need to be guaranteed, and the entire file will need to be scanned.
Added
stats
: add--infer-boolean
option #967sniff
: add--stats-types
option #968sniff
: add magic mime-type detection on Linux #970sniff
: add--user-agent
option bd0bf78sniff
: add last_modified info ef68bff
Changed
- make
--envlist
option allocator-aware f3566dc - Bump serde from 1.0.160 to 1.0.162 by @dependabot in #962
- Bump robinraju/release-downloader from 1.7 to 1.8 by @dependabot in #960
- Bump flexi_logger from 0.25.3 to 0.25.4 by @dependabot in #965
- Bump sysinfo from 0.28.4 to 0.29.0 by @dependabot in #966
- Bump jql-runner from 6.0.6 to 6.0.7 by @dependabot in #969
- Bump polars from 0.28.0 to 0.29.0 by @dependabot in #971
- apply select clippy recommendations
- cargo update bump indirect dependencies
- change MSRV to 1.69.0
- pin Rust nightly to 2023-05-07
Fixed
sniff
: make sniff give more consistent results #958. Fixes #956- Bump qsv-sniffer from 0.8.3 to 0.9.1. Replaced all assert with proper error-handling. #961 a7c607a 43d7eaf
sniff
: fixed rowcount calculation when sniffing a URL and the entire file was actually downloaded - ef68bff
Full Changelog: 0.101.0...0.102.0
0.101.0
We're back to the future! The qsv release train is back on track, as we jump to 0.101.0 over the yanked
0.100.0 release now that self-update logic has been fixed.
Added
stats
: added more metadata to stats arg cache json - 5767e56- added target-triple to user-agent string, and changed agent name to qsv binary variant 063b080, 70f4ea3, f0fcb05
Changed
excel
: performance, safety & documentation refinements e9a283d, 3800d25, 252b01e, 6a6df0f, 67ccd85, f2908ce, 6d5105d, dbcea39, faa8ef9replace
: clarify that it works on a field-by-field basis c0e2012stats
: use extend_from_slice when possible - c71ad4efetch
&fetchpost
: replace multiple push_fields with a csv from vec - f4e0479fetch
&fetchpost
: Migrate to jql 6 #955schema
: made bincode reader buffer bigger - 39b4bb5index
: use increased default buffer size when creating index 60fe7d6- standardized user_agent processing 4c06301, 010c565
- User agent environment variable; standardized user agent processing #951
- more robust Environment Variables processing #946
- move Environment Variables to its own markdown file 77c167f
- Bump tokio from 1.27.0 to 1.28.0 by @dependabot in #945
- Bump mimalloc from 0.1.36 to 0.1.37 by @dependabot in #944
- Bump mlua from 0.9.0-beta.1 to 0.9.0-beta.2 by @dependabot in #952
- Bump flate2 from 1.0.25 to 1.0.26 by @dependabot in #954
- Bump reqwest from 0.11.16 to 0.11.17 by @dependabot in #953
- cargo update bump indirect dependencies
- pin Rust nightly to 2023-04-30
Full Changelog: 0.99.1...0.101.0