Skip to content

Releases: gabledata/recap

0.7.4

17 Aug 19:08
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.7.3...0.7.4

0.7.3

17 Aug 15:39
Compare
Choose a tag to compare

Make recap an implicit namespace. Removed __init__.py and updated pyproject.toml for PDM namespacing.

Full Changelog: 0.7.0...0.7.3

0.7.0

16 Aug 17:38
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.6.0...0.7.0

0.6.0

12 Jul 17:37
7010332
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.5.2...0.6.0

0.5.2

27 Feb 18:14
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.5.1...0.5.2

0.5.1

27 Feb 18:13
Compare
Choose a tag to compare

What's Changed

  • Exclude Google imports if the dependency is missing by @criccomini in #197

Full Changelog: 0.5.0...0.5.1

0.5.0

27 Feb 17:55
Compare
Choose a tag to compare

0.5.0 is pretty much a complete rewrite. I was really unhappy with how
complicated things had getting with Recap, especially the Python API. I've
rewritten things to add:

  • A very simple REPL.
  • A FastAPI-like metadata crawling API
  • A basic data catalog
  • A basic crawler
  • A storage layer with a graph-like API

These changes make it much easier to work with Recap in Python. It also lays
the groundwork for complex schema conversion features that I want to write.

What's Changed

New Contributors

Full Changelog: 0.4.1...0.5.0

0.4.1

03 Feb 18:17
Compare
Choose a tag to compare

I'm starting to dig more into BigQuery analyzers. I've added a latest_queries analyzer and a job_counts analyzer. The latest queries does, what you'd expect: lists the latest queries for each table/view. The job_counts analyzer returns the number of jobs executed by each user on a given table/or view. These jobs can be queries, load, extract, and so on.

Oh, I also fixed a filter/exclude bug in the crawl command.

What's Changed

Full Changelog: 0.4.0...0.4.1

0.4.0

30 Jan 18:54
Compare
Choose a tag to compare

Okay, some cool stuff in this one. Recap now supports crawling the local filesystem, remote object stores, HTTP, FTP, and a lot more (all thanks to fssspec). I've also added a bunch of analyzers for CSV, TSV, JSON, and Parquet files:

  • Frictionless column analyzer
  • DuckDB column analyzer
  • GenSON JSON schema analyzer

Lastly, I added a Pandas data profiler analyzer that works with local/remote filesystems (CSV, TSV, JSON, Parquet) and databases. It's slow because it doesn't sample right now, but it's pretty flexible. The analyzer returns some cool statistics about data:

    "created_at": {
      "count": 10,
      "max": "2022-10-13T22:40:02.202086",
      "mean": "2022-10-11T16:12:24.884199680",
      "min": "2022-10-10T16:48:01.908706",
      "p25": "2022-10-10T16:49:35.136787712",
      "p50": "2022-10-10T16:55:27.156001024",
      "p75": "2022-10-13T03:13:24.321110528",
      "p95": "2022-10-13T22:39:35.862166016",
      "p99": "2022-10-13T22:39:56.934102016",
      "p999": "2022-10-13T22:40:01.675287552"
    },
    "custom_fields": {
      "count": 10,
      "freq": 5,
      "top": "{}",
      "unique": 6
    },
    "email": {
      "count": 10,
      "freq": 1,
      "top": "test@test.com",
      "unique": 10
    },

What's Changed

Full Changelog: 0.3.1...0.4.0

0.3.1

24 Jan 01:15
Compare
Choose a tag to compare

Shipping off a small release with a couple of new features: recap analyze and recap browse. Some typo fixes, documentation, and refactoring as well.

What's Changed

New Contributors

Full Changelog: 0.3.0...0.3.1