Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinot Sparse Data Support #14984

Open
alexch2000 opened this issue Feb 4, 2025 · 0 comments
Open

Pinot Sparse Data Support #14984

alexch2000 opened this issue Feb 4, 2025 · 0 comments

Comments

@alexch2000
Copy link
Contributor

Hey team, we’re building a funnel observability solution using Pinot, where we have a large table with more than 200B rows and around 300+ columns. Of these, only 20–30 columns are commonly used, while the rest are almost always NULL (sparse). Based on my understanding of Pinot’s architecture, each column’s forward index will store some default value for every row, even if the column is NULL most of the time.
What is your recommendation for dealing with a table that has hundreds of mostly NULL columns, especially in terms of segment sizing, ingestion performance, and ongoing query efficiency?

Are there any roadmap items or upcoming features aimed at optimizing storage or performance for extremely sparse columns (e.g., skipping or compressing columns with mostly NULL values)?

Are there any existing best practices or configurations we could apply to make this more efficient (e.g., using defaultNullValue, dictionary vs. raw encoding, or segment-level pruning)?

Pinot always stores column values in a forward index. Forward index never stores null values but have to store a value for each row. Therefore independent of the null handling configuration, Pinot always stores a default value for nulls rows in the forward index. The default value used in a column can be specified in the schema configuration by setting the defaultNullValue field spec. The defaultNullValue depends on the type of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant