Releases: koralium/flowtide
Version 0.11.1
Minor changes
Catalog support
This release adds support to use AddCatalog for connectors, to group connectors together under a catalog.
This is useful if two sources have the exact same table names for instace. They can then be added under different catalogs to add uniqueness.
See catalog documentation for more info on how to use it.
Pull requests
Full Changelog: v0.11.0...v0.11.1
Version 0.11.0
Major Changes
Column-Based Event Format
Most operators have transitioned from treating events as rows with flexbuffer to a column-based format following the Apache Arrow specification. This change has led to significant performance improvements, especially in the merge join and aggregation operators. Transitioning from row-based to column-based events involved a major rewrite of core components, and some operators still use the row-based format, which will be updated in the future.
Not all expressions have been converted to work with column data yet. However, the solution currently handles conversions between formats to maintain backward compatibility. Frequent conversions may result in performance decreases.
The shift to a column format also introduced the use of unmanaged memory for new data structures, for the following reasons:
- 64-byte aligned memory addresses for optimal SIMD/AVX operations.
- Immediate memory return when a page is removed from the LRU cache, instead of waiting for the next garbage collection cycle.
With unmanaged memory, it is now possible to track memory allocation by different operators, providing better insight into memory usage in your stream.
B+ Tree Splitting by Byte Size
Previously, the B+ tree determined page sizes based on the number of elements, splitting pages into two equal parts when the max size (e.g., 128 elements) was reached. While this worked for streams with uniform element sizes, it led to size discrepancies in other cases, affecting serialization time and slowing down the stream.
This update introduces page splitting based on byte size, with a default page size of 32KB, ensuring more consistent and predictable page sizes.
Initial SQL Type Validation
This release contains the beginning of type validation when creating a Substrait plan using SQL. Currently, only SQL Server provides specific type metadata, while sources like Kafka continue to designate columns as 'any' due to varying payload types.
The new validation feature raises exceptions for type mismatches, such as when a boolean column is compared to an integer (e.g., boolColumn = 1). This helps inform users transitioning from SQL Server that bit columns are treated as boolean in Flowtide.
New UI
A new UI has been developed, featuring an integrated time series database that enables developers to monitor stream behavior over time. This database’s API aligns with Prometheus standards, allowing for custom queries to investigate potential issues.
The UI retrieves all data through the Prometheus API endpoint, enabling future deployment as a standalone tool connected to a Prometheus server.
Minor Changes
Congestion Control Based on Cache Misses
Flowtide processes data in small batches, typically 1-100 events. While this approach works well with in-memory data, cache misses that require persistent storage access can create bottlenecks. This is particularly problematic with multiple chained joins, where sequential data fetching can delay processing.
To address this, the join operator now monitors cache misses during a batch and, when a threshold is reached, splits the processed events and forwards them to the next operator. This change allows operators to access persistent storage in parallel, easing congestion.
Reduce the amount of pages written to persistent storage
Previously, all B+ tree metadata was written to persistent storage at every checkpoint, including root page IDs. In streams with numerous operators, this led to unnecessary writes.
Now, metadata is only written if changes have occurred, reducing the number of writes and improving storage efficiency.
Pull requests
- Update to the latest substrait version by @Ulimo in #471
- Add stream benchmarks to be able to test performance between versions by @Ulimo in #472
- Allow setting a custom task scheduler by @Ulimo in #473
- Add support for permify as a connector by @Ulimo in #466
- Bug fix: Reusing of dataflowblockoptions caused no possible restarts by @Ulimo in #477
- Bug fixes: Fix aggregation emit list in optimizer and not in by @Ulimo in #476
- Add append tree by @Ulimo in #479
- Fix unit tests by @Ulimo in #480
- Fix some units tests for spicedb and sql server by @Ulimo in #481
- Setup tests by @Ulimo in #482
- Add support for exchange relation and substream relations by @Ulimo in #474
- Improve comparison performance by @Ulimo in #485
- Add support for column store by @Ulimo in #490
- Add support to read number column from sharepoint by @Ulimo in #492
- Add expression compiler for column data by @Ulimo in #491
- Bug fix: Fix bug in binary list when doing updates by @Ulimo in #493
- Fix edge case bug in merge join operator by @Ulimo in #494
- Fix bug in column normalization operator where an empty batch caused exception by @Ulimo in #496
- Fix bug in bitmap list where an insert at an integer border caused error by @Ulimo in #497
- Fix bug in token cache where token was not refreshed correctly by @Ulimo in #498
- Add column aggregate operator by @Ulimo in #499
- Fix bug in addvaluetoelements by @Ulimo in #500
- Change operators from batch manager to global by @Ulimo in #501
- Fix memory not being disposed correctly by @Ulimo in #502
- Add monitoring for memory allocations by @Ulimo in #503
- Update docusaurus to 3.5 by @Ulimo in #504
- Update path to regexp to fix security issue by @Ulimo in #505
- Add boolean and string column based functions by @Ulimo in #511
- Change so column agg operator sends data in smaller batches by @Ulimo in #512
- Add metrics to aggregate and projection by @Ulimo in #513
- Change sql server to use column data instead of row data when reading from sql server by @Ulimo in #507
- Update graph sdk and add trace logging by @Ulimo in #514
- Union and normalization fixes by @Ulimo in #517
- Bug fix: fix after conversion to union column that value get inserted correctly by @Ulimo in #520
- Fix so sql server source uses column and fix so convert to union disposes old column directly by @Ulimo in #521
- Add a built in time series database for monitoring by @Ulimo in #519
- Add RemoveRange to column data by @Ulimo in #522
- Add methods to calculate the actual byte size used by columns and batches by @Ulimo in #518
- Improve performance for map and union columns by @Ulimo in #523
- Change so watermark in multiple targets gets aggregated by @Ulimo in #524
- Change to use realloc instead of malloc when resizing memory by @Ulimo in #525
- Fix so allocation metrics correctly show realloc stats by @Ulimo in #526
- Bump braces from 3.0.2 to 3.0.3 in /src/FlowtideDotNet.AspNetCore/ClientApp by @dependabot in #489
- Bump micromatch from 4.0.5 to 4.0.8 in /src/FlowtideDotNet.AspNetCore/ClientApp by @dependabot in #506
- Diverse fixes to UI and also some minor bug fixes by @Ulimo in #527
- Change so merge join uses microbatches by @Ulimo in #528
- Add support to do splits in B+ tree based on byte size by @Ulimo in #529
- Fix in metrics so readonlyspan of tags are used directly to find the correct metric series by @Ulimo in #530
- Congestion control based on cache misses by @Ulimo in #531
- Bug fixes for metrics and fix dispose in sql and aggregate by @Ulimo in #532
- Change internal node to use native memory for children by @Ulimo in #533
- Change column aggregates to use primitivelist for values by @Ulimo in #534
- Improve performance of search boundries for int64 column by @Ulimo in #535
- Change so normalize key storage uses value container by @Ulimo in #536
- Fix so B+ tree search only uses async if page is not in th...
Version 0.11.0 alpha 22
What's Changed
Full Changelog: v0.11.0-alpha21...v0.11.0-alpha22
Version 0.11.0 alpha 21
What's Changed
- Add between and coalesce column functions by @Ulimo in #565
- Fix issue with watermark in multi input vertex by @Ulimo in #566
- Fix reference remapping when adding a plan as a view by @Ulimo in #568
- Add stop stream command to more gracefully stop a stream by @Ulimo in #569
- Add documentation for general metrics available in the stream by @Ulimo in #571
- Change fasterKV to only take index checkpoints by @Ulimo in #572
Full Changelog: v0.11.0-alpha20...v0.11.0-alpha21
Version 0.11.0 alpha 20
What's Changed
- Add min element size after split for B+ tree by @Ulimo in #561
- Remove metadata write when fetching a state client by @Ulimo in #557
- Implement test cases for storage rules, also fix critical bug where a crash could cause metadata to overwrite data page by @Ulimo in #562
- Add checks in tests that same data is not written multiple times by @Ulimo in #563
- Bump http-proxy-middleware from 2.0.6 to 2.0.7 in /docs by @dependabot in #553
- Bump Microsoft.Extensions.ObjectPool from 8.0.2 to 8.0.10 by @dependabot in #544
Full Changelog: v0.11.0-alpha19...v0.11.0-alpha20
Version 0.11.0 alpha 19
What's Changed
- Add docs for persistent storage rules and describe project structure by @Ulimo in #556
- Add missing throw in sharepoint read by @Ulimo in #558
- Add write to json for columns to easily allow json export by @Ulimo in #559
- Change console sink to work similar to other connectors by @Ulimo in #560
Full Changelog: v0.11.0-alpha18...v0.11.0-alpha19
Version 0.11.0 alpha 18
What's Changed
- Fix so histogram actually takes the sum by @Ulimo in #546
- Improve performance of remove range in bitmaplist by @Ulimo in #548
- Add support for insert range in binary list by @Ulimo in #549
- Add InsertRange to BitmapList by @Ulimo in #550
- Add so benchmark is run on PRs by @Ulimo in #547
- Add insert range in primitivelist and nativelonglist by @Ulimo in #551
- Add CountTrue/FalseInRange for bitmap list by @Ulimo in #552
- Add Insert range to columns and start using it in the stream by @Ulimo in #554
- Do not ignore task cancelled exceptions from sharepoint sdk by @Ulimo in #555
Full Changelog: v0.11.0-alpha17...v0.11.0-alpha18
Version 0.11.0 alpha 17
What's Changed
- Add support to do splits in B+ tree based on byte size by @Ulimo in #529
- Fix in metrics so readonlyspan of tags are used directly to find the correct metric series by @Ulimo in #530
- Congestion control based on cache misses by @Ulimo in #531
- Bug fixes for metrics and fix dispose in sql and aggregate by @Ulimo in #532
- Change internal node to use native memory for children by @Ulimo in #533
- Change column aggregates to use primitivelist for values by @Ulimo in #534
- Improve performance of search boundries for int64 column by @Ulimo in #535
- Change so normalize key storage uses value container by @Ulimo in #536
- Fix so B+ tree search only uses async if page is not in the cache by @Ulimo in #537
- Remade counter to use observable instead. by @Ulimo in #538
- Sql type validation by @Ulimo in #508
- Change iteration operator to use column store by @Ulimo in #540
- Change merge join from deprecated left right keys to keys by @Ulimo in #541
- Fix all current build warnings by @Ulimo in #539
- Add missing file header to openfga tests by @Ulimo in #542
- Fix optimizer for direct field simplification by @Ulimo in #543
Full Changelog: v0.11.0-alpha16...v0.11.0-alpha17
Version 0.11.0 alpha 16
What's Changed
Full Changelog: v0.11.0-alpha15...v0.11.0-alpha16
Version 0.11.0 alpha 15
What's Changed
- Add a built in time series database for monitoring by @Ulimo in #519
- Add RemoveRange to column data by @Ulimo in #522
- Add methods to calculate the actual byte size used by columns and batches by @Ulimo in #518
- Improve performance for map and union columns by @Ulimo in #523
- Change so watermark in multiple targets gets aggregated by @Ulimo in #524
- Change to use realloc instead of malloc when resizing memory by @Ulimo in #525
- Fix so allocation metrics correctly show realloc stats by @Ulimo in #526
- Bump braces from 3.0.2 to 3.0.3 in /src/FlowtideDotNet.AspNetCore/ClientApp by @dependabot in #489
- Bump micromatch from 4.0.5 to 4.0.8 in /src/FlowtideDotNet.AspNetCore/ClientApp by @dependabot in #506
- Diverse fixes to UI and also some minor bug fixes by @Ulimo in #527
Full Changelog: v0.11.0-alpha14...v0.11.0-alpha15