Releases: gabledata/recap
0.7.4
What's Changed
- Add
create
static methods for readers by @criccomini in #377
Full Changelog: 0.7.3...0.7.4
0.7.3
Make recap
an implicit namespace. Removed __init__.py
and updated pyproject.toml
for PDM namespacing.
Full Changelog: 0.7.0...0.7.3
0.7.0
What's Changed
- Upgrade
proto-schema-parser
to 0.3.0 by @criccomini in #323 - Add
ProtobufConverter.from_recap
by @criccomini in #324 - Add BigQueryReader.to_recap by @criccomini in #325
- Upgrade fakesnow by @criccomini in #326
- Update README.md with more examples and a table by @criccomini in #327
- Replace
venv
with PDM's virtualenv management by @criccomini in #329 - Handle oddly sized floats in AvroConverter.from_recap by @criccomini in #332
- Reorder Recap types in README.md by @criccomini in #333
- Upgrade fakesnow by @criccomini in #334
- Add
optional: true
support to Recap CST by @criccomini in #340 - Exclude
doc
anddefault
from aliased types by @criccomini in #341 - Add
$ref
and$id
support for JSONSchemaConverter by @criccomini in #343 - Add meta schema: a JSON Schema which matches recap schemas by @mjperrone in #346
- Minor cleanup after spec tests by @criccomini in #347
- Add
from_recap
to JSONSchemaConverter by @criccomini in #352 - Add
pdm run
commands forunit
,spec
, andintegration
by @criccomini in #353 - Update DEVELOPER.md with latest
pdm run test
stuff by @criccomini in #363 - Add better error message for missing 'type' property by @gwukelic in #357
- Add UUID to Postgres recognized types by @gwukelic in #356
- Validate
logical
built-ins in JSON schema metaschema by @criccomini in #360 - Add
validate
methods for each RecapType by @criccomini in #365 - Use JSON metaschema on recap.build website for spec tests by @criccomini in #366
- Add MySQL Reader by @gwukelic in #358
- Add
format: bytes
support for JSONSchemaConverter.to_recap by @criccomini in #367 - Link README.md to recap.build by @criccomini in #368
- Add JSON schema metaschema validation by @criccomini in #369
- Add HDecimalType stats support in HiveMetastoreReader by @criccomini in #370
- Update README.md by @criccomini in #371
- Add autoflake to remove unused imports by @criccomini in #372
- Make
recap
an implicit namespace package by @criccomini in #374
New Contributors
- @mjperrone made their first contribution in #346
- @gwukelic made their first contribution in #357
Full Changelog: 0.6.0...0.7.0
0.6.0
What's Changed
- Fixes wee little typo by @jakthom in #199
- Update README.md by @joshuacoris in #200
- Add
metadata.Schema
conversion for common serialization types by @criccomini in #201 - Move
Schema
model intorecap.schema.model
module by @criccomini in #202 - Update README.md and index.md to show more
schema()
examples by @criccomini in #203 - Add basic protobuf conversion support by @criccomini in #204
- Add a more robust schema model by @criccomini in #208
- Move away from constraints on base types by @criccomini in #213
- Fix styling by @criccomini in #214
- Add a Recap schema definition language by @criccomini in #215
- Add Recap type spec by @criccomini in #217
- Add UUID type to spec by @criccomini in #219
- Add
nullable
to Recap spec by @criccomini in #220 - Typo fixes for SPEC.md by @gunnarmorling in #221
- Lay groundwork for robust types with self-reference support by @criccomini in #223
- Implement Recap type spec by @criccomini in #224
- Update intro and clarification in SPEC by @criccomini in #225
- Add more documentation to the Recap type spec by @criccomini in #227
- Add a basic
diff
method to diff Recap types by @criccomini in #228 - Create a
Converter
class and move Recap converters to it by @criccomini in #229 - Add
diff
helper torecap.schema
by @criccomini in #230 - Remove
Field
from type hierarchy by @criccomini in #231 - Refactor Recap to focus on schemas by @criccomini in #232
- Support Python 3.11 by @criccomini in #233
- Add
diff
to the REST API by @criccomini in #234 - Update spec to differentiate between aliases and logical types by @criccomini in #235
- Move Python project into
python
directory by @criccomini in #236 - Add a Java library for Recap by @criccomini in #237
- Remove Avro and Proto from Python by @criccomini in #238
- Add logical support to Python types by @criccomini in #239
- Remove SPEC.md by @criccomini in #241
- Update README.md by @criccomini in #242
- Remove java by @criccomini in #243
- Transition Recap to a metadata gateway by @criccomini in #244
- Replace SQLAlchemyReader with
dbapi
readers by @criccomini in #261 - Remove legacy dependencies from
pyproject.toml
by @criccomini in #262 - Add JSONSchemaConverter by @criccomini in #266
- Test SnowflakeReader by @criccomini in #267
- Add basic ProtobufConverter by @criccomini in #268
- Remove global type registry from RecapType by @criccomini in #270
- Support deeply nested messages in ProtobufConverter by @criccomini in #271
- Remove random prints by @criccomini in #272
- Add integration test dir and kafka docker images by @criccomini in #273
- Move test_types.py into unit test directory by @criccomini in #274
- Add ProtobufConverter support for ConfluentRegistryReader by @criccomini in #276
- Add JSONSchemaConverter support for ConfluentRegistryReader by @criccomini in #277
- Include
extra_attrs["name"]
for StructType fields in ProtobufConverter by @criccomini in #279 - Add hive metastore by @criccomini in #281
- from_dict() bug fix for union list shorthand by @adrianisk in #283
- Parameterize test_union_type by @criccomini in #286
- Add VOID support for HiveMetadataReader by @criccomini in #287
- Add Timetsamp, Duration, and NullValue WKTs to ProtobufReader by @criccomini in #288
- Update README.md with code examples by @criccomini in #290
- Minor updates to pyproject.toml by @criccomini in #291
- Default
bytes
to 64KiB for StringType and BytesType by @criccomini in #293 - Add
to_dict
,clean_dict
, andalias_dict
to types.py by @criccomini in #295 - Fix
make_nullable
bugs by @criccomini in #296 - Remove
str
from UnionTypetypes
parameter by @criccomini in #298 - Move
make_nullable
to RecapType instance method by @criccomini in #299 - Fix
black
styles by @criccomini in #301 - Rename
struct
andconvert
for readers and converters by @criccomini in #302 - Add
to_avro
forAvroConverter
by @criccomini in #304 - Add
alias
support forAvroConverter.from_recap
by @criccomini in #307 - Include
stats
in HiveMetastoreReader wheninclude_stats
is set by @criccomini in #308 - Add alias support for
to_recap
AvroReader by @criccomini in #310 - Add
fixed
type forto_recap
in AvroConverter by @criccomini in #311 - Update self-reference Avro test to be more specific by @criccomini in #312
- Update
pymetastore
to properly handledate
stats by @criccomini in #313 - Forgot
type
in README.md example by @criccomini in #315 - Support
Message
types in ProtobufConverter by @criccomini in #316 - Support aliases in
ProtobufConverter.to_recap
by @criccomini in #317 - Enforce alias name rules in RecapTypeRegistry by @criccomini in #319
- Minor README update and cleaning unused imports by @criccomini in #321
New Contributors
- @jakthom made their first contribution in #199
- @joshuacoris made their first contribution in #200
- @adrianisk made their first contribution in #283
Full Changelog: 0.5.2...0.6.0
0.5.2
0.5.1
What's Changed
- Exclude Google imports if the dependency is missing by @criccomini in #197
Full Changelog: 0.5.0...0.5.1
0.5.0
0.5.0 is pretty much a complete rewrite. I was really unhappy with how
complicated things had getting with Recap, especially the Python API. I've
rewritten things to add:
- A very simple REPL.
- A FastAPI-like metadata crawling API
- A basic data catalog
- A basic crawler
- A storage layer with a graph-like API
These changes make it much easier to work with Recap in Python. It also lays
the groundwork for complex schema conversion features that I want to write.
What's Changed
- Create unit tests for
catalog/db.py
and add CI for tests by @nahumsa in #149 - Bump version to 0.4.2 by @criccomini in #174
- Remove
TableLocationAnalyzer
Analyzer by @criccomini in #177 - Add black and pylint to ci by @nahumsa in #178
- Rename
fmt
andlint
tostyle
by @criccomini in #179 - Cache BigQuery job counts in BigQueryJobCountAnalyzer by @criccomini in #182
- Remove
latest_queries
from BigQuery analyzers by @criccomini in #183 - Add pyright to style group by @nahumsa in #187
- Fix pyright type hint errors by @criccomini in #188
- Simplify REST API and remove typing by @criccomini in #189
- Replace CatalogPath with URLs by @criccomini in #190
- Replace
dict[str, Any]
withMetadata
in Recap's catalog API by @criccomini in #194 - Rewrite Recap by @criccomini in #196
New Contributors
Full Changelog: 0.4.1...0.5.0
0.4.1
I'm starting to dig more into BigQuery analyzers. I've added a latest_queries
analyzer and a job_counts
analyzer. The latest queries does, what you'd expect: lists the latest queries for each table/view. The job_counts analyzer returns the number of jobs executed by each user on a given table/or view. These jobs can be queries, load, extract, and so on.
Oh, I also fixed a filter/exclude bug in the crawl
command.
What's Changed
- Refactor documentation by @criccomini in #164
- Add analyzer documentation, add guides, and remove concepts by @criccomini in #165
- Fix links in README.md by @criccomini in #166
- Add README link to Python API by @criccomini in #167
- Fix profile bug by @criccomini in #168
- Bumping to 0.4.1 by @criccomini in #170
- Add job counts and latest queries analyzers for BigQuery by @criccomini in #172
Full Changelog: 0.4.0...0.4.1
0.4.0
Okay, some cool stuff in this one. Recap now supports crawling the local filesystem, remote object stores, HTTP, FTP, and a lot more (all thanks to fssspec
). I've also added a bunch of analyzers for CSV, TSV, JSON, and Parquet files:
- Frictionless column analyzer
- DuckDB column analyzer
- GenSON JSON schema analyzer
Lastly, I added a Pandas data profiler analyzer that works with local/remote filesystems (CSV, TSV, JSON, Parquet) and databases. It's slow because it doesn't sample right now, but it's pretty flexible. The analyzer returns some cool statistics about data:
"created_at": {
"count": 10,
"max": "2022-10-13T22:40:02.202086",
"mean": "2022-10-11T16:12:24.884199680",
"min": "2022-10-10T16:48:01.908706",
"p25": "2022-10-10T16:49:35.136787712",
"p50": "2022-10-10T16:55:27.156001024",
"p75": "2022-10-13T03:13:24.321110528",
"p95": "2022-10-13T22:39:35.862166016",
"p99": "2022-10-13T22:39:56.934102016",
"p999": "2022-10-13T22:40:01.675287552"
},
"custom_fields": {
"count": 10,
"freq": 5,
"top": "{}",
"unique": 6
},
"email": {
"count": 10,
"freq": 1,
"top": "test@test.com",
"unique": 10
},
What's Changed
- Bump version to 0.4.0 by @criccomini in #140
- Replace
analyze
andbrowse
commands withlive
commands by @criccomini in #142 - Add
AnalyzingBrowser
by @criccomini in #145 - Add a
FilesystemBrowser
withfsspec
by @criccomini in #147 - Fix server bugs discovered from
FilesystemBrowser
by @criccomini in #148 - Add DuckDB
FileColumnAnalyzer
s for CSV, TSV, and Parquet by @criccomini in #150 - Add Frictionless
FileColumnAnalyzer
s for CSV, TSV, and Parquet by @criccomini in #151 - Add a GenSON JSON schema analyzer for JSON files in local or remote f… by @criccomini in #152
- Fix
DatabaseCatalog.rm()
bug by @criccomini in #156 - Add Pandas
profile.py
analyzer for JSON, CSV/TSV, and Parquet by @criccomini in #157 - Add JSON schema inference for Frictionless columns analyzer by @criccomini in #160
- Fix
FilesystemBrowser
directory list bug by @criccomini in #161
Full Changelog: 0.3.1...0.4.0
0.3.1
Shipping off a small release with a couple of new features: recap analyze
and recap browse
. Some typo fixes, documentation, and refactoring as well.
What's Changed
- Bump version to 0.3.1 by @criccomini in #125
- Exclude default, none, and unset in REST API by @criccomini in #126
- Add
sqlalchemy.*
to quickstart JSON by @criccomini in #127 - Renamed as_of to time by @khuara17 in #130
- Add
recap browse
command by @criccomini in #133 - Typo fix by @gunnarmorling in #135
- Refactor
typing.Inspector
and add docs by @criccomini in #137 - Typo fix by @gunnarmorling in #136
- Change
api.*
config toserver.uvicorn.*
and addserver.plugins
… by @criccomini in #138
New Contributors
- @khuara17 made their first contribution in #130
- @gunnarmorling made their first contribution in #135
Full Changelog: 0.3.0...0.3.1