Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pixels-core, daemon, common] support range-based distributed indexing #658

Open
bianhq opened this issue May 20, 2024 · 1 comment
Open
Assignees
Labels
enhancement New feature or request

Comments

@bianhq
Copy link
Contributor

bianhq commented May 20, 2024

We firstly support a single-column range partition key.
As the transaction timestamp is not currently persisted in Pixels, we do not support the schema version in the range index now.

@bianhq bianhq added the enhancement New feature or request label May 20, 2024
@bianhq bianhq self-assigned this May 20, 2024
@bianhq bianhq added this to the Real-time CRUD milestone May 20, 2024
bianhq added a commit that referenced this issue May 29, 2024
These RPCs will be used to implement the range partitioned index.
bianhq added a commit that referenced this issue May 31, 2024
* [Issue #498] remove the orders array from StringColumnReader in C++. (#510)

* [Issue #512] Fix the out-of-range issue caused by splitting a row which has the null value. (#513)

* [Issue #514] change pixels c++ column vector from 4k alignment to 32 byte alignment (#515)

* [Issue #516] enhance exception handling in base workers (#518)

* [Issue #519] fix wrong column chunk offsets in a multi-row-group file (#520)

* [Issue #517] load single tbl file into multiple paths (#523)

Previously, we can only load multiple tbl files into multiple paths.

* [Issue #524] using full path as the key in file footer cache (#525)

* [Issue #522] add layout configurations into the file format (#526)

* [Issue #527] Fix the bug that iovecs might free twice (#528)

* [Issue #521] support chunk aligned compact file (#532)

Also fix an argument checking bug in PixelsCompactor.

* [Issue #471] pixels reader c++ code refactor (#533)

PixelsScanFunction is messy. Refactor the code to make it clearer. The fixes are:
1. Add readIndex for column vector.
2. Move all change state logic to PixelsParallelStateNext, I have verified its correctness, and it doesn't affect the performance.

* [Issue #531] support configurable endianness (#534)

The endianness of Pixels writer is configured by column.chunk.little.endian=true/false.
The endianness is then saved in the ColumnChunkIndex of each column chunk.
The Pixels column readers will check the ColumnChunkIndex to use the right endianness.
This is currently implemented in the Java codebase, and the C++ codebase should at least check and report errors for unsupported endianness.

* [Issue #535] change project version to 0.1.0 (#536)

From version 0.1.0-SNAPSHOT to 0.1.0.

* [Issue #535] upgrade project to 0.2.0-SNAPSHOT (#537)

* [Issue #542] make the batch size of pixels c++ reader configurable (#541)

We have verify that the following changes doesn't affect the performance of pixels c++ reader:
1. In this pr, we can config the batch size of pixels reader c++. The default value is pixelStride.
2. Set the length of filter mask as batchSize instead of curRGRowCount.
3. Reuse the column vector. We allocate a new column vector only when switching to another pxl file.
4. Set a macro to control if we enable SIMD for pixels filter or not.

* [Issue 542] move partitioned from file footer to postscript (#540)

* [Issue #542] rename pixelsFilterMask to PixelsBitMask (#543)

We rename pixelsFilterMask to PixelsBitMask. The reason is that both filter mask and null mask use this class, so we rename it to a more general name.

* [Issue #539] replace lengths with starts for non-encoded string columns. (#544)

The starts array is not encoded by runlength encoding, and it is padded for null values (we will make this configurable in the next step).

* [Issue #546] make string column reader consistent with pixels java reader (#547)

1. Remove starts RLE when dict encoding is not enabled.
2. Assert false if big-endian is used in pxl data.

* [Issue #549] fix dependencies for slf4j (#550)

Add log4j-slf4j-impl as the default binding of slf4j.
Add a unified dependency definition for slf4j-api and exclude slf4j-api dependency in other libs including jetcd, jedis, and hadoop. But we do not add this dependency to the modules by default. It can be added as needed in the future.

* [Issue #549] remove slf4j-api from pom (#551)

slf4j-api is already included in log4j-slf4j-impl.

* [Issue #549] resolve conflict bindings between log4j and slf4j (#552)

Only add the binding when there are no other bindings. Otherwise, there will be conflict bindings problems.

* HOTFIX: resolve log4j warnings for pixels-worker-lambda

* [Issue #542] reorganize the pixels c++ reader (#554)

1. Integrate protobuf into pixels c++. Don't need to install protobuf anymore!
2. Remove pixels-reader directory. Now cpp is the top directory of c++ code
3. Set a separate pixels-cxx.properties for pixels c++ reader

* [Issue #542] minor fix for pixels reader c++ (#555)

1. Rename duckdb to pixels-duckdb
2. Fix minor format
3. Change third_party to third-party

* [Issue #542] pixels c++ reader supports multi-dir read (#557)

1. We offer a convenient Python script in duckdb to distribute the pixels data in a single directory to multiple directories.
2. Support multi-directory read in pixels extension.
3. Update readme.md for SSD array experiment.

* [Issue #542] pixels cxx minor format v2 (#558)

separate make pull with make deps.
set the thread number in pixels-cxx.properties.
add instruction to check os support iouring.

* [Issues #436, #438, #439, #441] an adaptive solution to save cloud query cost (#559)

* [Issue #538] support encoding levels and dictionary encoding without cascading RLE (#561)

1. Support encoding levels in the column writers. The encoding level can be passed from pixels-cli's LOAD command using the -e parameter.
2. Support dictionary encoding without cascading run-length encoding for encoding level 1.
3. Remove adaptive encoding in StringColumnWriter. It was not implemented correctly.
4. Add the dictContent length as the last element into dictStarts. @yuly16 please update the C++ reader accordingly.

* [Issue #556] fix time type and timestamp type (#562)

1. support precision for time type. This is optional by early Presto but required by Trino.
2. fix stats recorder for time.
3. fix reading non-encoded timestamp columns. The bug was caused by the wrong precision when casting formats.
4. revise parsing string to time and timestamp values.
5. fix the timezone offset problem for timestamp.

* [Issue #538] fix load command and docs (#563)

Set the correct default value for the encoding level.

* [Issue #564] Fix Pixels C++ Reader filter pushdown segmentation fault (#565)

* [Issue #545] support nulls padding in the file format (#566)

Add a writer option for column writers.
Handle nulls padding in the column writers and readers.
Add a -p parameter to the LOAD command of pixels-cli to specify whether nulls padding is enabled.
Optimize the de-compaction of the isNull bitmap in the column readers, single-table query performance on TPCH.orders is improved by 15%-20%.
Fix a bug related to null records reading in the column writers of Date, Time, and Timestamp.
Fix a bug related to float column writer by adding the FloatColumnWriter.

* HOTFIX: revise docs

* [Issue #548] support endianness in bit compact and decompact (#568)

In bit compact, big endian means the first element in the input array is compacted as the highest bit of the first byte in the result, and vice versa.
In bit de-compact, big endian means the highest bit of the first byte in the input buffer is de-compacted as the first element in the result, and vice versa.

* [Issue #548] support alignment for isNull bitmap (#569)

Add the following property in pixels.properties to control the alignment of the isNull bitmap:

# the alignment of the start offset of the isnull bitmap in a column chunk,
# it is for the compatibility of DuckDB and its unit is byte.
# for DuckDB, it is only effective when column.chunk.alignment also meets the alignment of the isNull bitmap
isnull.bitmap.alignment=8

* [Issue #491] add worker coordinate service (#571)

1. implement a task queue for task scheduling of a stage in the query execution pipeline;
2. implement a worker registration and management framework to track the status of the workers at each stage.

* [Issue #572] rename columnlet to column chunk and fix dependency and comments (#573)

* [Issue #491] implement plan coordinator initialization (#574)

Users can initialize the plan coordinator of a query plan by invoking initPlanCoordinator on the root operator of the plan.
TaskId is changed from a string to an integer starting from 0 per stage.

* [Issue #491] create plan coordinator from the root operator of the query plan (#575)

* [Issue #579] Fix variable name in StringColumnReader (#578)

* [Issue #491] test and fix worker coordinator (#581)

Add a unit test for the worker coordinator service and fix minor problems (dependency and variable names) in it.

* [Issue #582] specify schema name by parameters in STAT command (#584)

* [Issue #583]: remove file type parameter from QUERY command (#585)

* [Issue #587] clean the code and docs in pixels-amphi (#589)

* [Issue #587] add ignored files (#590)

* [Issue #586] fix scripts and add docs for pixels-scaling-ec2 (#591)

* [Issue #586] fix scripts for minio write-back (#592)

* HOTFIX: Update README.md

* [Issue #593] revise the parameters for the STAT and QUERY commands in pixels-cli. (#594)

Now, the STAT and QUERY commands check layout selection from pixels.properties.

* [pixels-example] make simple writer reader test work for s3 (#596)

We make a simple writer reader test work for s3. Fixed some issues of the test and updated the test to write to and read from a test file on S3 instead of local file system, since pixels is primarily designed to be used as a cloud based system.

* [Issue #595] support rate-limited query execution in pixels-cli (#597)

This makes query throughput experiments easier.

* [Issue #586] revise params for pixels-scaling-ec2 (#599)

* HOTFIX: fix doc

* [Issue #588] update the docs for pixels-cache (#601)

The documents of pixels-cache are now consistent with the implementation.

* [Issue #603] support timestamp column in c++ reader (#602)

Add timestamp column for clickbench.

* [Issue #580] add vector column and its writer reader and related metadata handling (#600)

Implements a vector column type, with proper support in pixels writer, pixels reader, and type description.

* [Issue #580] ensure compatibility with JDK8 (#604)

The previous implementation is not compatible with JDK8.

* [Issue #605] fix start-guard script (#606)

* [Issue #572] do not start cache coordinator and cache manager if cache is disabled (#608)

* [docs] revise docs for pixels-turbo (#609)

Clean the main INSTALL.md and revise the README.md of pixels-turbo.

* [Issue #610] resultRowBatch deletion bug (#612)

Fix the bug that resultRowBatch is deleted when it is nullptr.

* [Issue #546] null support (#613)

Support null padding for non-encoding pixels data.

* [docs] add scripts and docs for clickbench evaluation (#614)

* [Issue #616] ensure number of rows in each row group is strictly controlled by the LOAD command (#617)

Previously, the number of rows in each row group was aligned to the default row batch size 1024.

* [Issue #615] fix alignment for isNull decompaction (#620)

When reading from multiple column-chunks, column readers did not ensure alignment when decompacting isNull array.
This may lead to skipping some bits in the middle of the isNull bytes in a column chunk and finally reading out of the bound of the isNull bytes of a column chunk.

* [Issue #605] enhance the scripts (#619)

Add config file jvm.options in pixels/scripts/bin
Add config file datanodes in pixels/scripts/sbin. The format of each line is hostname pixels_home[optional]. Users can configure the hostname and Pixels home of all nodes and then start datanodes at once.

* [Issue #605] remove guard daemon and revise scripts (#621)

Remove the guard daemon in Pixels to simplify the start process and maintenance.
Revise the scripts accordingly.

* [Issue #618] automatically filter out empty folder object in the listStatus method of S3 storage (#622)

Previously, we filtered out the object with exactly the same key of the path. This is incorrect. It filters out the object if the user intends to list an object, or it does not filter out the folder object if the user does not append a "/" to the path.

* [docs] add instructions for cluster setup and jvm config (#623)

* [Issue #580] make pixels vector type support trino array type (#625)

* [Issue #624] fix an error in updateLocalCache (#626)

Discard the files with fewer row groups than NumRowGroupInFile in the data layout.

* [Issue #627] pixels C++ has bug when running count query (#628)

* [Issue #553] Implemented pixels encoding in C++ (#634)

C++ version of pixels encoder based on Java version.

* [Issue #630] drop cached splits and projections indexes when drop table or schema (#632)

Otherwise, if a table with the same schemaTableName but a different schema is created later, the outdated splits and projections cache may cause problems.

* [Issue #631] fix reading commands from file (#633)

* [Issue #553] Pixels Encoder in C++ (#635)

* [Issue #506] get query costs from transaction service (#638)

Add rpc protocol to set query costs as transaction properties.
Get the spent and billed cents from the transaction service.
Return the query costs and the query pending time in the get query result response.

* [Issue #506] support billed cents in pixels-server (#639)

Allow pixels-trino to set scan size in the transaction properties and use it to calculate the billed cents.

* [docs] modify configrations for pixels-trino in the docs

* [Issue #506] implement state manager and watcher (#641)

The state manager and watcher are responsible for setting and monitoring the state of pixels.
The state is stored as a key-value pair in Etcd.

* [Issue #506] support cost cents collection (#642)

Revise the operators in pixels-planer to support getting output from the return value of the execute() method.
Add methods for the VM workers to check the status of the root operator executed in CF workers.
Add methods for the VM cluster to clean the states and intermediate results.
Fix a bug in PartitionedJoinOperator.execute().
Add protocol for the VM cluster to report scan bytes and cost cents more efficiently.

* [Issue #643] add scan operator and fix scan input and aggregation operators (#646)

1. add scan operator for serverless workers.
2. pixels and starling planner accepts base table and generate scan operator for table scan.
3. fix output collection in aggregation operators, it doesn't collect child output previously.
4. fix ScanInput.generateOutputPaths(), it only generates outputs end with even numbers previously.
5. move outputs from NonPartitionOutput into Output.
6. add comments for worker coordinators.

* [docs] add introduction for pixels-rover (#648)

* [Issue #649] ensure vm cost can be get from the query result (#650)

* [Issue #649] clean query results periodically and fix bugs in query manager (#652)

1. Clean the result of finished queries periodically instead of immediately after the first getQueryResult request.
2. Fix 'Query is not finished' for the repeated getQueryResult requests.
3. Fix dead-lock in popQueryResult of QueryManager.
4. Clean code.

* [Issue #653] fix hosts for file locations (#654)

For locality-insensitive storage, the file locations should have empty hosts. Otherwise, the hosts may be confusing for query engines such as Trino.

* [Issue #580] reformat code and add license header (#655)

* [Issue #649] set cfCostCents default value 0 (#656)

* [Issue #657] add PIXELS_SRC environment variable to compile pixels c++ (#659)

* [Issue #658] implement metadata RPCs for range and range index (#661)

These RPCs will be used to implement the range partitioned index.

* [Issue #226]: retina support (#267)

* [Issue #82]: retina cpp (#277)

Add the CPP source files of retina.

* fix compilation problems.

* rebase and fix compilation problem

---------

Co-authored-by: yuly16 <41314695+yuly16@users.noreply.github.com>
Co-authored-by: Lin Yuan <voidforall@gmail.com>
Co-authored-by: Yijun Ma <35795078+jasha64@users.noreply.github.com>
Co-authored-by: Tiannan Sha <tiannansha@gmail.com>
Co-authored-by: kkzzjx <64601787+kkzzjx@users.noreply.github.com>
Co-authored-by: Qiu Fengshuo <alph000@163.com>
Co-authored-by: Zikai Wang <55374672+khaiwang@users.noreply.github.com>
Co-authored-by: Dongyang Geng <73980116+gengdy1545@users.noreply.github.com>
Co-authored-by: xxchan <37948597+xxchan@users.noreply.github.com>
Co-authored-by: Zhipeng Mao <34649843+mzp0514@users.noreply.github.com>
bianhq added a commit that referenced this issue Jun 8, 2024
By managing file information (e.g., file name and locality), we can index files by their ids in range index and cache. The file id also can be used to implement a very simple distributed file system.
bianhq added a commit that referenced this issue Jun 22, 2024
…beat coordinator/worker (#675)

This is necessary for implementing the index coordinator/worker.
We also rename pixels data node to pixels worker.
bianhq added a commit that referenced this issue Jun 22, 2024
bianhq added a commit that referenced this issue Jun 26, 2024
…nd compact commands (#683)

The import command imports files into a table. The files must already exist in the table's data path(s).
Currently, queries do not use the file information in metadata. This can be implemented after adding a separate metadata table for the projections in the layout.
bianhq added a commit that referenced this issue Jun 29, 2024
So that we can control the access of every file, this is necessary for range index and real-time update.
We also adapt the implementation of projections accordingly.
bianhq added a commit that referenced this issue Feb 5, 2025
Remove the index structure from the metadata table RANGE_INDEXES.
@bianhq
Copy link
Contributor Author

bianhq commented Feb 5, 2025

Transaction timestamp is now persistent in etcd. However, we decide not to use the range index for real-time CRUD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant