Skip to content

Releases: apache/druid

druid-0.11.0

05 Dec 03:56
Compare
Choose a tag to compare

Druid 0.11.0 contains over a hundred performance improvements, stability improvements, and bug fixes from almost 40 contributors. This release adds two major security features, TLS support and extension points for authentication and authorization.

Major new features include:

  • TLS (a.k.a. SSL) support
  • Extension points for authentication and authorization
  • Double columns support
  • cachingCost Balancer Strategy
  • jq expression support in JSON parser
  • Redis cache extension
  • GroupBy performance improvements
  • Various improvements to Druid SQL

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.11.0

Documentation for this release is at: http://druid.io/docs/0.11.0/

Highlights

TLS support

Druid now supports TLS, enabling encrypted client and inter-node communications. Please see http://druid.io/docs/0.11.0/operations/tls-support.html for details on configuration and related extensions.

Added by @pjain1 in #4270.

Authentication/authorization extension points

Extension points for authenticating and authorizing requests have been added to Druid. Please see http://druid.io/docs/0.11.0/configuration/auth.html for information on configuration and extension implementation.

The existing Kerberos authentication extension has been updated to implement the new Authenticator interface, please see the "Kerberos configuration changes" section under "Updating from 0.10.1 and earlier" for more information if you are using the Kerberos extension.

Added by @jon-wei in #4271

Double columns support

Druid now supports Double type aggregator columns. Please see http://druid.io/docs/0.11.0/querying/aggregations.html for documentation on the new Double aggregators.

Added by @b-slim in #4491.

cachingCost Balancer Strategy

Users upgrading to 0.11.0 are encouraged to try the new cachingCost segment balancing strategy on their coordinators. This strategy offers large performance improvements over the existing cost balancer strategy, and it is planned to become the default strategy in the release following 0.11.0.

This strategy can be selected by setting the following property on coordinators:

druid.coordinator.balancer.strategy=cachingCost

Added by @dgolitsyn in #4731

jq expression support in JSON parser

Druid's JSON input parser now supports jq expressions using jackson-jq, enabling more input transforms before ingestion. Please see http://druid.io/docs/0.11.0/ingestion/flatten-json.html for more details.

Added by @knoguchi in #4171.

Redis cache extension

A new cache implementation using Redis has been added in an extension, added by @QiuMM in #4615. Please refer to the preceding pull request for more details.

GroupBy performance improvements

Several new performance optimizations have been added to the GroupBy query by @jihoonson in the following PRs:

#4660 Parallel sort for ConcurrentGrouper
#4576 Array-based aggregation for groupBy query
#4668 Add IntGrouper to avoid unnecessary boxing/unboxing in array-based aggregation

PR #4660 offers a general improvement by parallelizing partial result sorting, while PR #4576 and #4668 offer significant improvements when grouping on a single String column.

SQL improvements

Various improvements and features have been added to Druid SQL, by @gianm in the following PRs:

#4750 - TRIM support
#4720 - Rounding for count distinct
#4561 - Metrics for SQL queries
#4360 - SQL expressions support

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.11.0

Updating from 0.10.1 and earlier

Please see below for changes between 0.10.1 and 0.11.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.10.1, please see release notes of the relevant intermediate versions for additional notes.

Upgrading coordinators and overlords

The following patch changes the way coordinator->overlord redirects are handled:
#5037

The overlord leader election algorithm has changed in 0.11.0: #4699.

As a result of the two patches above, special care is needed when upgrading Coordinator or Overlord to 0.11.0. All coordinators and overlords must be shut down and upgraded together.

For example, to upgrade Coordinators, you would shutdown all coordinators, upgrade them to 0.11.0 and then start them. Overlords should be upgraded in a similar way.

During the upgrade process, there must not be any time period where a non-0.11.0 coordinator or overlord is running simultaneously with an 0.11.0 coordinator or overlord.

Note that at least one overlord should be brought up as quickly as possible after shutting them all down so that peons, tranquility etc continue to work after some retries.

Also note that the druid.zk.paths.indexer.leaderLatchPath property is no longer used now.

Service name changes

In earlier versions of Druid, / characters in service names defined by druid.service would be replaced by : characters because these service names were used in Zookeeper paths. Druid 0.11.0 no longer performs these character replacements.

Example:1 - if the old configuration had a broker with service name test/broker:
druid.service=test/broker

and a Router was configured assuming that / will be replaced with : in the broker service name,
druid.router.tierToBrokerMap={"hot":"test:broker","_default_tier":"test:broker"}

the Router configuration should be updated to remove that assumption:
druid.router.tierToBrokerMap={"hot":"test/broker","_default_tier":"test/broker"}

Example:2 - If the old configuration had overlord with service Name test/overlord then value of druid.coordinator.asOverlord.overlordService or druid.selectors.indexing.serviceName should be test/overlord and not test:overlord

Example:3 - If the old configuration had overlord with service Name test:overlord then value of druid.coordinator.asOverlord.overlordService or druid.selectors.indexing.serviceName should be test:overlord and not test/overlord

Following service name-related configurations are also affected and should be updated to exactly match the value of druid.service property on other node being discovered.

druid.coordinator.asOverlord.overlordService
druid.selectors.coordinator.serviceName
druid.selectors.indexing.serviceName
druid.router.defaultBrokerServiceName
druid.router.coordinatorServiceName
druid.router.tierToBrokerMap

Please see #4992 for more details.

Kerberos configuration changes

The Kerberos authentication configuration format has changed as a result of the new interfaces introduced by #4271. Please refer to http://druid.io/docs/0.11.0/development/extensions-core/druid-kerberos.html for the new configuration properties.

Users can point the Kerberos authenticator's authorizerName to an instance of an "allowAll" authorizer to replicate the pre-0.11.0 behavior of a cluster using Kerberos authentication with no authorization.

Lookups API path changes

The paths for the lookups configuration API have changed due to #5058.

Configuration paths that had the form /druid/coordinator/v1/lookups now have the form /druid/coordinator/v1/lookups/config.

Please see http://druid.io/docs/0.11.0/querying/lookups.html for the current API.

Migrating to Double columns

Prior to 0.11.0, the Double* aggregators would store column values on disk as Float while performing aggregations using Double representations.

PR #4491 allows the Double aggregators to store column values on disk as Doubles. Due to concerns related to rolling updates and version downgrades, this behavior is disabled by default and Druid will continue to store Double aggregators on disk as floats.

To enable Double column storage, set the following property in the common runtime properties:

druid.indexing.doubleStorage=double

Users should not set this property during an initial rolling upgrade to 0.11.0, as any nodes running pre-0.11.0 Druid will not be able to handle Double columns created during the upgrade period. Users will also need to reindex any segments with Double columns if downgrading from 0.11.0 to an older version. Please see #4944 and #4605 for more information.

Scan query changes

The Scan query has been moved from extensions-contrib to core Druid. As part of this migration: #4751, the scan query's handling of the time column has changed.

The time column is now is returned as "__time" rather than "timestamp", it is no longer included if you do not specifically ask for it in your "columns", and it is returned as a long rather than a string.

Users can revert the Scan query's time handling to the legacy extension behavior by setting "legacy" : true in their queries, or setting the property druid.query.scan.legacy = true. This is meant to provide a migration path for users that were formerly using the contrib extension.

Extension Interface Changes

Aggregator double column support

The Aggregator interface has gained a getDouble() method, whi...

Read more

druid-0.10.1

23 Aug 03:31
Compare
Choose a tag to compare

Druid 0.10.1 contains hundreds of performance improvements, stability improvements, and bug fixes from over 40 contributors. Major new features include:

  • Large performance improvements and additional query metrics for TopN queries
  • The ability to push down limit clauses for GroupBy queries
  • More accurate query timeout handling
  • Hadoop indexing support for the Amazon S3A filesystem
  • Support for ingesting Protobuf data
  • A new Firehose that can read input via HTTP
  • Improved disk space management when indexing from cloud stores
  • Various improvements to coordinator lookups management
  • A new Kafka metrics emitter
  • A new dimension comparison filter
  • Various improvements to Druid SQL

If you are upgrading from a previous version of Druid, please see "Updating from 0.10.0 and earlier" below for upgrade notes, including some backwards incompatible changes.

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.1

Documentation for this release is at: http://druid.io/docs/0.10.1/

Highlights

TopN performance improvements

Processing for TopN queries with 1-2 aggregators on historical nodes is now 2-4 times faster. This is accomplished with new runtime inspection logic that generates monomorphic implementations of query processing classes, reducing polymorphism in the TopN query execution path.

Added in a series of PRs described here by @leventov: #3798.

Limit clause push down for GroupBy

Druid can now optimize limit clauses in GroupBy queries by distributing the limit application to historical/realtime nodes, applying the limit to partial result sets before they are sent to the broker for merging. This reduces network traffic within the cluster and reduces the merging workload on broker nodes. Please refer to http://druid.io/docs/0.10.1/querying/groupbyquery.html#query-context for more information.

Added in #3873 by @jon-wei.

Hadoop indexing support for Amazon S3A

Amazon's S3A filesystem is now supported for deep storage and as an input source for batch ingestion tasks. Please refer to <> for documentation.

Added in #4116 by @b-slim.

Protobuf 3.0 support and other enhancements

Support for ingesting Protobuf 3.0 data has been added, along with other enhancements such as reading Protobuf descriptors from a URL. Protobuf-supporting code has been moved into its own core extension as well. See http://druid.io/docs/0.10.1/development/extensions-core/protobuf.html for documentation.

Added in #4039 by @knoguchi.

HTTP Firehose

A new Firehose for realtime ingestion that reads data from a list of URLs via HTTP has been added. Please see http://druid.io/docs/latest/ingestion/firehose.html#httpfirehose for documentation.

Added in #4297 by @jihoonson.

Improved disk space management for realtime indexing from cloud stores

The Firehose implementations for Microsoft Azure, Rackspace Cloud Files, Google Cloud Storage, and Amazon S3 now support caching and prefetching of data. These firehoses can now operate on portions of the input data and pull new data as needed, instead of having to fully read the firehose's input to disk.

Please refer to the following links for documentation:
http://druid.io/docs/0.10.1/development/extensions-contrib/azure.html
http://druid.io/docs/0.10.1/development/extensions-contrib/cloudfiles.html
http://druid.io/docs/0.10.1/development/extensions-contrib/google.html
http://druid.io/docs/0.10.1/development/extensions-core/s3.html

Added in #4193 by @jihoonson.

Improvements to coordinator lookups management

Several enhancements have been made to the state management/synchronization logic for query-time lookups, including versioning of lookup specs. Please see http://druid.io/docs/0.10.1/querying/lookups.html for documentation.

Added in #3855 by @himanshug.

Kafka metrics emitter

A new metrics emitter that sends metrics data to Kafka in JSON format has been added. See http://druid.io/docs/0.10.1/development/extensions-contrib/kafka-emitter.html

Added in #3860 by @dkhwangbo.

Column comparison filter

A new column comparison filter has been added. This filter allows the user to compare values across columns within a row, like a "WHERE columnA = columnB" clause in SQL. See http://druid.io/docs/0.10.1/querying/filters.html#column-comparison-filter for documentation.

Added in #3928 by @erikdubbelboer.

Druid SQL improvements

Druid 0.10.1 has a number of enhancements to Druid SQL, such as support for lookups (PRs by @gianm):

#4368 - More forgiving Avatica server
#4109 - Support for another form of filtered aggregator
#4085 - Rule to collapse sort chains
#4055 - Add SQL REGEXP_EXTRACT function
#3991 - Make row extractions extensible and add one for lookups
#4028 - Support for coercing to DECIMAL
#3999 - Ability to generate exact distinct count queries

Other performance improvements

Druid 0.10.1 has a number of other performance improvements, including:

#4364 - Uncompress streams without having to download to tmp first, by @niketh
#4315 - Server selector improvement, by @dgolitsyn
#4110 - Remove "granularity" from IngestSegmentFirehose, by @gianm
#4038 - serialize DateTime As Long to improve json serde performance, by @kaijianding

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.1

Updating from 0.10.0 and earlier

Please see below for changes between 0.10.0 and 0.10.1 that you should be aware of before upgrading. If you're updating from an earlier version than 0.10.0, please see release notes of the relevant intermediate versions for additional notes.

Deprecation of support for Hadoop versions < 2.6.0

To add support for Amazon's S3A filesystem, Druid is now built against Hadoop 2.7.3 libraries, and we are deprecating support for Hadoop versions older than 2.6.0.

For users running a Hadoop version older than 2.6.0, it is possible to continue running Druid 0.10.1 with the older Hadoop version using a workaround.

The user would need to downgrade hadoop.compile.version in the main Druid pom.xml, remove the hadoop-aws dependency from pom.xml in the druid-hdfs-storage core extension, and then rebuild Druid.

Users are strongly encouraged to upgrade their Hadoop clusters to a 2.6.0+ version as of this release, as support for Hadoop <2.6.0 may be dropped completely in future releases.

If users wish to use Hadoop 2.7.3 as default for ingestion tasks, users should double check any existing druid.indexer.task.defaultHadoopCoordinates configurations.

Kafka Broker Changes

Due to changes from #4115, the Kafka indexing service is no longer compatible with version 0.9.x Kafka brokers. Users will need to upgrade their Kafka brokers to an 0.10.x version.

Coordinator Lookup Management Changes

#3855 introduces various improvements to coordinator lookup propagation behavior. Please see http://druid.io/docs/0.10.1/querying/lookups.html for details. Note the changes to coordinator HTTP API regarding lookups management.

If Lookups are being used in prior deployment, then as part of upgrade to 0.10.1, All coordinators should be stopped, upgraded, and then started with version 0.10.1 at one time rather than upgrading them one at a time. There should never be a situation where one coordinator is running 0.10.0 while other coordinator is running 0.10.1 at the same time.

During the course of the cluster upgrade, lookup query nodes will report an error starting with got notice to load lookup [LookupExtractorFactoryContainer{version='null'. This is not actually an error and is a side effect of the update. See #4603 for details.

Off-heap query-time lookup cache

Please note that the off-heap query-time lookup cache is broken at this time because of an excessive memory use issue, and must not be used:
#3663

Default worker select strategy

Please note that the default worker select strategy has changed from fillCapacity to equalDistribution.

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.10.1/operations/rolling-updates.html should be followed for rolling updates.

Credits

Thanks to everyone who contributed to this release!

@akashdw
@amarjayr
@asdf2014
@asrayousuf
@b-slim
@cesure
@cxmcc
@dclim
@dgolitsyn
@dkhwangbo
@drcrallen
@elloooooo
@erikdubbelboer
@fanjieqi
@Fokko
@freakyzoidberg
@fuji-151a
@gianm
@gkc2104
@himanshug
@hzy001
@JackyWoo
@Jdban
@jeffhartley
@jerchung
@jihoonson
@jon-wei
@kaijianding
@KenjiTakahashi
@knoguchi
@leventov
@licl2014
@logarithm
@niketh
@nishantmonu51
@pjain1
@praveev
@ramiyer
@sascha-coenen
@satishbhor
@sixtus
@sjvs
@skyler-tao
@xanec
@zhihuij
@zwang180

druid-0.10.0

18 Apr 20:34
Compare
Choose a tag to compare

Druid 0.10.0 contains hundreds of performance improvements, stability improvements, and bug fixes from over 40 contributors. Major new features include a built-in SQL layer, numeric dimensions, Kerberos authentication support, a revamp of the "index" task, a new "like" filter, large columns, ability to run the coordinator and overlord as a single service, better performing defaults, and eight new extensions.

If you are upgrading from a previous version of Druid, please see "Updating from 0.9.2 and earlier" below for upgrade notes, including some backwards incompatible changes.

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.0

Documentation for this release is at: http://druid.io/docs/0.10.0/

Highlights

Built-in SQL

Druid now includes a built-in SQL server powered by Apache Calcite. Druid provides two SQL APIs: HTTP POST and JDBC. This provides an alternative to Druid's native JSON API which is more familiar to new developers, and which makes it easier to integrate pre-existing applications that natively speak SQL. Not all Druid and SQL features are supported by the SQL layer in this initial release, but we intend to expand both in future releases.

SQL support can be enabled by setting druid.sql.enable=true in your configuration. See http://druid.io/docs/0.10.0/querying/sql.html for details and documentation.

Added in #3682 by @gianm.

Numeric dimensions

Druid now supports numeric dimensions at ingestion and query time. Users can ingest long and float columns as dimensions (i.e., treating the numeric columns as part of the ingestion-time grouping key instead of as aggregators, if rollup is enabled). Additionally, Druid queries can now accept any long or float column as a dimension for grouping or for filtering.

There are performance tradeoffs between string and numeric columns. Numeric columns are generally faster to group on than string columns. Numeric columns don't have indexes, so they are generally slower to filter on than string columns.

See http://druid.io/docs/0.10.0/ingestion/schema-design.html#numeric-dimensions for ingestion documentation and http://druid.io/docs/0.10.0/querying/dimensionspecs.html for query documentation.

Added in #3838, #3966, and other patches by @jon-wei.

Kerberos authentication support

Added a new extension named 'druid-kerberos' which adds support for User Authentication for Druid Nodes using Kerberos. It uses the simple and protected GSSAPI negotiation mechanism, SPNEGO(https://en.wikipedia.org/wiki/SPNEGO) for authentication via HTTP.

See http://druid.io/docs/0.10.0/development/extensions-core/druid-kerberos.html for documentation on how to configure kerberos authentication.

Added in #3853 by @nishantmonu51.

Index task revamp

The indexing task was re-written to improve performance, particularly for jobs spanning multiple intervals that generated many shards. The segmentGranularity intervals can now be automatically determined and no longer needs to be specified, but ingestion time can be reduced if both intervals and numShards are provided.

Additionally, the indexing task now supports an appendToExisting flag which causes the data to be indexed as an additional shard of the current version rather than as a new version overshadowing the previous version.

See http://druid.io/docs/0.10.0/ingestion/tasks.html#index-task for documentation.

Added in #3611 by @dclim.

Like filter

Druid now includes a "like" filter that enables SQL LIKE-style filtering, such as foo LIKE 'bar%'. The implementation is generally faster than regex filters, and is encouraged over regex filters when possible. In particular, like filters on prefixes such as bar% are significantly faster than equivalent regex filters such as ^bar.*.

See http://druid.io/docs/0.10.0/querying/filters.html#like-filter for documentation.

Added in #3642 by @gianm.

Large columns

Druid now supports individual columns larger than 2GB. This feature is not typically required, since general guidance is that segments should generally be 500MB–1GB in size, but is useful in situations where one column is much larger than all the others (for example, large sketches).

This functionality is available to all Druid users and no special configuration is necessary when using the built-in column types. If you have developed a custom metric column type as a Druid extension, you can enable large column support by overriding getSerializer in your ComplexMetricsSerde.

Added in #3743 by @akashdw.

Coordinator/Overlord combination option

Druid deployments can now be simplified by combining the Coordinator and Overlord functions into the Coordinator process. To do this, set druid.coordinator.asOverlord.enabled and druid.coordinator.asOverlord.overlordService appropriately on your Coordinators and then stop your Overlords.
Overlord console would be available on http://coordinator-host:port/console.html.

This is currently an experimental feature and is off by default. We intend to consider making this the default in a future version of Druid.

See http://druid.io/docs/0.10.0/configuration/coordinator.html for documentation on this feature and configuration options.

Added in #3711 by @himanshug.

Better performing defaults

This release changes two default settings to improve out-of-the-box performance:

  • The buildV9Directly option introduced in Druid 0.9.0 is now enabled by default. This option improves performance of indexing by creating the v9 data format directly rather than creating v8 first and then converting to v9. If necessary, you can roll back to the old code by setting "buildV9Directly" to false in your indexing tasks.

  • The v2 groupBy engine introduced in Druid 0.9.2 is now enabled by default. This new groupBy engine was rewritten from the ground up for better performance and memory management. If necessary, you can roll back to the old engine by setting either "druid.groupBy.query.defaultStrategy" in your runtime.properties, or "groupByStrategy" in your query context, to "v1". See http://druid.io/docs/0.10.0/querying/groupbyquery.html for details on the differences between groupBy v1 and v2.

Other performance improvements

In addition to better performing defaults, Druid 0.10.0 has a number of other performance improvements, including:

  • Concise bitset union, intersection, and iteration optimization (#3883) by @leventov
  • DimensionSelector-based value matching optimization (#3858) by @leventov
  • Search query strategy for choosing index-based vs. cursor-based execution (#3792) by @jihoonson
  • Bitset iteration optimization (#3753) by @leventov
  • GroupBy optimization for granularity "all" (#3740) by @gianm
  • Disable flush after every DefaultObjectMapper write (#3748) by @jon-wei
  • Short-circuiting AND filter (#3676) by @gianm
  • Improved performance of IndexMergerV9 (#3440) by @leventov

New extensions

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.0

Updating from 0.9.2 and earlier

Please see below for changes between 0.9.2 and 0.10.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.9.2, please see release notes of the relevant intermediate versions for additional notes.

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.10.0/operations/rolling-updates.html should be followed for rolling updates.

Query API changes

Please note the following backwards-incompatible query API changes when updating. Some queries may need to be adjusted to continue to behave as expected.

  • JavaScript query features are now disabled by default for security reasons (#3818). If you use these features, you can re-enable them by setting druid.javascript.enabled=true in your runtime properties. See http://druid.io/docs/0.10.0/development/javascript.html for details, including security considerations.

  • GroupBy queries no longer allow __time as the output name of a dimension, aggregator, or post-aggregator (#3967).

  • Select query pagingSpecs now default to fromNext: true behavior when fromNext is not specified (#3986). Behavior is unchanged for Select queries that did have fromNext specified. If you prefer the old default, then you can change this through the druid.query.select.enableFromNextDefault runtime property. See http://druid.io/docs/0.10.0/querying/select-query.html for details.

  • SegmentMetadata queries no longer include "size" analysis by default (#3773). You can still request "size" analysis by adding "size" to "analysisTypes" at query time.

Deployment and configuration changes

Please note the following deployment-related changes when updating.

  • Druid now requires Java 8 to run (#3914). If you are currently running on Java 7, we suggest upgrading Java first and then Druid.

  • Druid now defaults to the "v2" engine for groupBy rather than the legacy "v1" engine. As part of this, memory usage limits have changed from row-based to byte-based limits, so it is possible that some queries which met resource limits before will now exceed them and fail. You can avoid this by tuning the new groupBy engine appropriately. If necessary, you can roll back to the old engine by setting either "druid.groupBy.query.defaultStrategy" in your runtime.propertie...

Read more

druid-0.9.2

01 Dec 21:43
Compare
Choose a tag to compare

Druid 0.9.2 contains hundreds of performance improvements, stability improvements, and bug fixes from over 30 contributors. Major new features include a new groupBy engine, ability to disable rollup at ingestion time, ability to filter on longs, new encoding options for long-typed columns, performance improvements for HyperUnique and DataSketches, a query cache implementation based on Caffeine, a new lookup extension exposing fine grained caching strategies, support for reading ORC files, and new aggregators for variance and standard deviation.

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.9.2

Documentation for this release is here: http://druid.io/docs/0.9.2/

Highlights

New groupBy engine

Druid now includes a new groupBy engine, rewritten from the ground up for better performance and memory management. Benchmarks show a 2–5x performance boost on our test datasets. The new engine also supports strict limits on memory usage and the option to spill to disk when memory is exhausted, avoiding result row count limitations and potential OOMEs generated by the previous engine.

The new engine is off by default, but you can enable it through configuration or query context parameters. We intend to enable it by default in a future version of Druid.

See "implementation details" on http://druid.io/docs/0.9.2/querying/groupbyquery.html#implementation-details for documentation and configuration.

Added in #2998 by @gianm.

Ability to disable rollup

Since its inception, Druid has had a concept of "dimensions" and "metrics" that applied both at ingestion time and at query time. Druid is unique in that it is one of the only databases that supports aggregation at data loading time, which we call "rollup". But, for some use cases, ingestion-time rollup is not desired, and it's better to load the original data as-is. With rollup disabled, one row in Druid will be created for each input row.

Query-time aggregation is, of course, still supported through the groupBy, topN, and timeseries queries.

See the "rollup" flag on http://druid.io/docs/0.9.2/ingestion/index.html for documentation. By default, rollup remains enabled.

Added in #3020 by @kaijianding.

Ability to filter on longs

Druid now supports sophisticated filtering on integer-typed columns, including long metrics and the special __time column. This opens up a number of new capabilities:

Druid does not yet support grouping on longs. We intend to add this capability in a future release.

Added in #3180 by @jon-wei.

New long encodings

Until now, all integer-typed columns in Druid, including long metrics and the special __time column, were stored as 64-bit longs optionally compressed in blocks with LZ4. Druid 0.9.2 adds new encoding options which, in many cases, can reduce file sizes and improve performance:

  • Long encoding option "auto", which potentially uses table or delta encoding to use fewer than 64 bits per row. The "longs" encoding option is the default behavior, which always uses 64 bits.
  • Compression option "none", which is like the old "uncompressed" option, except it offers a speedup by bypassing block copying.

The default remains "longs" encoding + "lz4" compression. In our testing, two options that often yield useful benefits are "auto" + "lz4" (generally smaller than longs + lz4) and "auto" + "none" (generally faster than longs + lz4, file size impact varies). See the PR for full test results.

See "metricCompression" and "longEncoding" on http://druid.io/docs/0.9.2/ingestion/batch-ingestion.html for documentation.

Added in #3148 by @acslk.

Sketch performance improvements

  • DataSketches speedups of up to 80% from #3471.
  • HyperUnique speedups of 19–30% from #3314, used for "hyperUnique" and "cardinality" aggregators.

New extensions

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.9.2

Updating from 0.9.1.1

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.9.2/operations/rolling-updates.html should be followed for rolling updates.

Query time lookups

The druid-namespace-lookup extension, which was deprecated in 0.9.1 in favor of druid-lookups-cached-global, has been removed in 0.9.2. If you are using druid-namespace-lookup, migrate to druid-lookups-cached-global before upgrading to 0.9.2. See our migration guide for details: http://druid.io/docs/0.9.1.1/development/extensions-core/namespaced-lookup.html#transitioning-to-lookups-cached-global

Other notes

Please note the following changes:

  • Druid now ships Guice 4.1.0 rather than 4.0-beta (#3222). This conflicts with the version shipped in some Hadoop distributions, so for Hadoop indexing you may need to adjust your mapreduce.job.classloader or mapreduce.job.user.classpath.first options. In testing we have found this to be an effective workaround. See http://druid.io/docs/0.9.2/operations/other-hadoop.html for details.
  • If you are using Roaring bitmaps, note that compressRunOnSerialization now defaults to true. As a result, segments written will not be readable by Druid 0.8.1 or earlier. If you need segments written by Druid 0.9.2 to be readable by 0.8.1, and you are using Roaring bitmaps, you must set compressRunOnSerialization = false. By default, bitmaps are Concise, not Roaring, so this point will not apply to you unless you overrode that. See #3228 for details.
  • If you use the new long encoding or compression options, segments written by Druid will not be readable by any version older than 0.9.2. If you don't use the new options, segments will remain backwards compatible.
  • If you are using the experimental Kafka indexing service, there is a known issue that may cause task supervision to hang when it tries to stop all running tasks simultaneously during the upgrade process. To prevent this from happening, you can shutdown all supervisors and wait for the indexing tasks to complete before updating your overlord. Alternatively, you can set chatThreads in the supervisor tuning configuration to a value greater than the number of running tasks as a workaround.

Credits

Thanks to everyone who contributed to this release!

@acslk
@AlexanderSaydakov
@ashishawasthi
@b-slim
@chtefi
@dclim
@drcrallen
@du00cs
@ecesena
@erikdubbelboer
@fjy
@Fokko
@gianm
@giaosudau
@guobingkun
@gvsmirnov
@hamlet-lee
@himanshug
@HyukjinKwon
@jaehc
@jianran
@jon-wei
@kaijianding
@leventov
@linbojin
@michaelschiff
@navis
@nishantmonu51
@pjain1
@rajk-tetration
@SainathB
@sirpkt
@vogievetsky
@xvrl
@yuppie-flu

druid-0.9.1.1

29 Jun 18:58
Compare
Choose a tag to compare

Druid 0.9.1.1 contains only one change since Druid 0.9.1, #3204, which addresses a bug with the Coordinator web console. The full list of changes for the Druid 0.9.1 line is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed

Updating from 0.9.0

Query time lookups

Query time lookup (QTL) functionality has been substantially reworked in this release. Most users will need to update their configurations and queries.

The druid-namespace-lookup extension is now deprecated, and will be removed in a future version of Druid. Users should migrate to the new druid-lookups-cached-global extension. Both extensions can be loaded simultaneously to simplify migration. For details about migrating, see Transitioning to lookups-cached-global in the documentation.

Other notes

Aside from the QTL changes, please note the following changes:

  • The default value for maxRowsInMemory has been set to 75,000 across the board for all forms of ingestion. This is in line with previous defaults for Hadoop tasks and Tranquility-based ingestion. If you were creating realtime index tasks directly (without Tranquility) then this is lower than the previous default of 500,000.
  • The /druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myISO8601Interval} REST endpoint is now deprecated. The new /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?kill=true REST endpoint can be used instead.
  • The druid.indexer.runner.separateIngestionEndpoint property is now deprecated. If you were using this functionality to isolate event-push requests and query serving requests for realtime tasks, you can accomplish something similar with druid.indexer.server.maxChatRequests.
  • For developers of Druid extensions, note that the QueryGranularity constants (ALL, NONE, etc) have been moved to io.druid.granularity.QueryGranularities in #2980. Query syntax is not affected.

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.9.1.1/operations/rolling-updates.html should be followed for rolling updates.

Kafka Supervisor

Druid 0.9.1 is the first version to include the experimental Kafka indexing service, utilizing a new Kafka-type indexing task and a supervisor that runs within the Druid overlord. The Kafka indexing service provides an exactly-once ingestion guarantee and does not have the restriction of events requiring timestamps which fall within a window period. More details about this feature are available in the documentation: http://druid.io/docs/0.9.1.1/development/extensions-core/kafka-ingestion.html.

Note: The Kafka indexing service uses the Java Kafka consumer that was introduced in Kafka 0.9. As there were protocol changes made in this version, Kafka 0.9 consumers are not compatible with older brokers and you will need to ensure that your Kafka brokers are version 0.9 or better. Details on upgrading to the latest version of Kafka can be found here: http://kafka.apache.org/documentation.html#upgrade

New Features

#2656 Supervisor for KafkaIndexTask
#2602 implement special distinctcount
#2220 Appenderators, DataSource metadata, KafkaIndexTask
#2424 Enabling datasource level authorization in Druid
#2410 statsd-emitter
#1576 [QTL] Query time lookup cluster wide config

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AFeature

Improvements

#2972 Improved Segment Distrubution (new cost function)
#2931 Optimize filter for timeseries, search, and select queries
#2753 More consistent empty-set filtering behavior on multi-value columns
#2727 BoundFilter optimizations, and related interface changes.
#2711 All Filters should work with FilteredAggregators
#2690 Allow filters to use extraction functions
#2577 Implement native in filter

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AImprovement

Bug Fixes

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ABug

Documentation

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ADocumentation

Thanks to everyone who contributed to this release!
@acslk
@b-slim
@binlijin
@bjozet
@dclim
@drcrallen
@du00cs
@erikdubbelboer
@fjy
@gaodayue
@gianm
@guobingkun
@harshjain2
@himanshug
@jaehc
@javasoze
@jisookim0513
@jon-wei
@JonStrabala
@kilida
@lizhanhui
@michaelschiff
@mrijke
@navis
@nishantmonu51
@pdeva
@pjain1
@rasahner
@sascha-coenen
@se7entyse7en
@shekhargulati
@sirpkt
@skilledmonster
@spektom
@xvrl
@yuppie-flu

druid-0.9.1

28 Jun 23:23
Compare
Choose a tag to compare

Druid 0.9.1 contains hundreds of performance improvements, stability improvements, and bug fixes from over 30 contributors. Major new features include an experimental Kafka Supervisor to support exactly-once consumption from Apache Kafka, support for cluster-wide query-time lookups (QTL), and an improved segment balancing algorithm.

The full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed

Updating from 0.9.0

Query time lookups

Query time lookup (QTL) functionality has been substantially reworked in this release. Most users will need to update their configurations and queries.

The druid-namespace-lookup extension is now deprecated, and will be removed in a future version of Druid. Users should migrate to the new druid-lookups-cached-global extension. Both extensions can be loaded simultaneously to simplify migration. For details about migrating, see Transitioning to lookups-cached-global in the documentation.

Other notes

Aside from the QTL changes, please note the following changes:

  • The default value for maxRowsInMemory has been set to 75,000 across the board for all forms of ingestion. This is in line with previous defaults for Hadoop tasks and Tranquility-based ingestion. If you were creating realtime index tasks directly (without Tranquility) then this is lower than the previous default of 500,000.
  • The /druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myISO8601Interval} REST endpoint is now deprecated. The new /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?kill=true REST endpoint can be used instead.
  • The druid.indexer.runner.separateIngestionEndpoint property is now deprecated. If you were using this functionality to isolate event-push requests and query serving requests for realtime tasks, you can accomplish something similar with druid.indexer.server.maxChatRequests.
  • For developers of Druid extensions, note that the QueryGranularity constants (ALL, NONE, etc) have been moved to io.druid.granularity.QueryGranularities in #2980. Query syntax is not affected.

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.9.1/operations/rolling-updates.html should be followed for rolling updates.

Kafka Supervisor

Druid 0.9.1 is the first version to include the experimental Kafka indexing service, utilizing a new Kafka-type indexing task and a supervisor that runs within the Druid overlord. The Kafka indexing service provides an exactly-once ingestion guarantee and does not have the restriction of events requiring timestamps which fall within a window period. More details about this feature are available in the documentation: http://druid.io/docs/0.9.1/development/extensions-core/kafka-ingestion.html.

Note: The Kafka indexing service uses the Java Kafka consumer that was introduced in Kafka 0.9. As there were protocol changes made in this version, Kafka 0.9 consumers are not compatible with older brokers and you will need to ensure that your Kafka brokers are version 0.9 or better. Details on upgrading to the latest version of Kafka can be found here: http://kafka.apache.org/documentation.html#upgrade

New Features

#2656 Supervisor for KafkaIndexTask
#2602 implement special distinctcount
#2220 Appenderators, DataSource metadata, KafkaIndexTask
#2424 Enabling datasource level authorization in Druid
#2410 statsd-emitter
#1576 [QTL] Query time lookup cluster wide config

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AFeature

Improvements

#2972 Improved Segment Distrubution (new cost function)
#2931 Optimize filter for timeseries, search, and select queries
#2753 More consistent empty-set filtering behavior on multi-value columns
#2727 BoundFilter optimizations, and related interface changes.
#2711 All Filters should work with FilteredAggregators
#2690 Allow filters to use extraction functions
#2577 Implement native in filter

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3AImprovement

Bug Fixes

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ABug

Documentation

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.1+is%3Aclosed+label%3ADocumentation

Thanks to everyone who contributed to this release!
@acslk
@b-slim
@binlijin
@bjozet
@dclim
@drcrallen
@du00cs
@erikdubbelboer
@fjy
@gaodayue
@gianm
@guobingkun
@harshjain2
@himanshug
@jaehc
@javasoze
@jisookim0513
@jon-wei
@JonStrabala
@kilida
@lizhanhui
@michaelschiff
@mrijke
@navis
@nishantmonu51
@pdeva
@pjain1
@rasahner
@sascha-coenen
@se7entyse7en
@shekhargulati
@sirpkt
@skilledmonster
@spektom
@xvrl
@yuppie-flu

Druid 0.9.0

13 Apr 18:58
Compare
Choose a tag to compare

Druid 0.9.0 introduces an update to the extension system that requires configuration changes. There were additionally over 400 pull requests from 0.8.3 to 0.9.0. Below we highlight the more important changes in this patch.

Full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed

Updating from 0.8.x

Extensions

In Druid 0.9, we have refactored the extension loading mechanism. The main reason behind this change is to make Druid load extensions from the local file system without having to download stuff from the internet at runtime.

To learn all about the new extension loading mechanism, see Include extensions and Include Hadoop Dependencies. If you are impatient, here is the summary.

The following properties have been deprecated:
druid.extensions.coordinates
druid.extensions.remoteRepositories
druid.extensions.localRepository
druid.extensions.defaultVersion

Instead, specify druid.extensions.loadList, druid.extensions.directory and druid.extensions.hadoopDependenciesDir.

druid.extensions.loadList specifies the list of extensions that will be loaded by Druid at runtime. An example would be druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage"].

druid.extensions.directory specifies the directory where all the extensions live. An example would be druid.extensions.directory=/xxx/extensions.

Note that mysql-metadata-storage extension is not packaged in druid distribution due to license issue. You will have to manually download it from druid.io, decompress and then put in the extensions directory specified.

druid.extensions.hadoopDependenciesDir specifies the directory where all the Hadoop dependencies live. An example would be druid.extensions.hadoopDependenciesDir=/xxx/hadoop-dependencies. Note: We didn't change the way of specifying which Hadoop version to use. So you just need to make sure the Hadoop you want to use exists underneath /xxx/hadoop-dependencies.

You might now wonder if you have to manually put extensions inside /xxx/extensions and /xxx/hadoop-dependencies. The answer is no, we already have created them for you. Download the latest Druid tarball at http://druid.io/downloads.html. Unpack it and you will see extensions and hadoop-dependencies folders there. Simply copy them to /xxx/extensions and /xxx/hadoop-dependencies respectively, now you are all set!

If the extension or the Hadoop dependency you want to load is not included in the core extension, you can use pull-deps to download it to your extension directory.

If you want to load your own extension, you can first do mvn install to install it into local repository, and then use pull-deps to download it to your extension directory.

Please feel free to leave any questions regarding the migration.

Extensions have now also been refactored in core and contrib extensions. Core extensions will be maintained by Druid committers and are packaged as part of the download tarball. Contrib extensions are community maintained and can be installed as needed. For more information, please see here.

Ordering of Dimensions

Until Druid 0.8.x the order of dimensions given at indexing time did not affect the way data gets indexed. Rows would be ordered first by timestamp, then by dimension values, in lexicographical order of dimension names.

As of Druid 0.9.0, Druid respects the given dimension order given and will order rows first by timestamp, then by dimension values, in the given dimension order.

This means segments may now vary in size depending on the order in which dimensions are given. Specifying a dimension with many unique values first, may result in worse compression than specifying dimensions with repeating values first.

Min/Max Aggregators no longer supported, use doubleMin/doubleMax instead

As indicated in the 0.8.3 release notes, min/max aggregators have been removed in favor of doubleMin, doubleMax, longMin, and longMax aggregators.

If you have any issues starting up because of this, please see #2749

Configuration changes

druid.indexer.task.baseDir and druid.indexer.task.baseTaskDir now default to using the standard Java temporary directory specified by java.io.tmpdir system property, instead of /tmp,

Other issues to be aware of: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3A%22Release+Notes%22

and

https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AIncompatible

New Features

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AFeature

#1719 Add Rackspace Cloud Files Deep Storage Extension
#1858 Support avro ingestion for realtime & hadoop batch indexing
#1873 add ability to express CONCAT as an extractionFn
#1921 Add docs and benchmark for JSON flattening parser
#1936 adding Upper/Lower Bound Filter
#1978 Graphite emitter
#1986 Preserve dimension order across indexes during ingestion
#2008 Regex search query
#2014 Support descending time ordering for time series query
#2043 Add dimension selector support for groupby/having filter
#2076 adding lower and upper extraction fn
#2209 support cascade execution of extraction filters in extraction dimension spec
#2221 Allow change minTopNThreshold per topN query
#2264 Adding custom mapper for json processing exception
#2271 time-descending result of select queries
#2258 acl for zookeeper is added

Improvements

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AImprovement

#984 Use thread priorities. (aka set nice values for background-like tasks)
#1638 Remove Maven client at runtime + Provide a way to load Druid extensions through local file system
#1728 Store AggregatorFactory[] in segment metadata
#1988 support multiple intervals in dataSource inputSpec
#2006 Preserve dimension order across indexes during ingestion
#2047 optimize InputRowSerde
#2075 Configurable value replacement on match failure for RegexExtractionFn
#2079 reduce bytearray copy to minimal optimize VSizeIndexedWriter
#2084 minor optimize IndexMerger's MMappedIndexRowIterable
#2094 Simplifying dimension merging
#2107 More efficient SegmentMetadataQuery
#2111 optimize create inverted indexes
#2138 build v9 directly
#2228 Improve heap usage for IncrementalIndex
#2261 Prioritize loading of segments based on segment interval
#2306 More specific null/empty str handling in IndexMerger

Bug Fixes

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ABug

Documentation

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ADocumentation

#2100 doc update to make it easy to find how to do re-indexing or delta ingestion
#2186 Add intro developer docs
#2279 Some more multitenancy docs
#2364 Add more docs around timezone handling
#2216 Completely rework the Druid getting started process

Thanks to everyone who contributed to this patch!
@fjy
@xvrl
@drcrallen
@pjain1
@chtefi
@liubin
@salsakran
@jaebinyo
@erikdubbelboer
@gianm
@bjozet
@navis
@AlexanderSaydakov
@himanshug
@guobingkun
@abbondanza
@binlijin
@rasahner
@jon-wei
@CHOIJAEHONG1
@loganlinn
@michaelschiff
@himank
@nishantmonu51
@sirpkt
@duilio
@pdeva
@KurtYoung
@mangesh-pardeshi
@dclim
@desaianuj
@stevemns
@b-slim
@cheddar
@jkukul
@AdrieanKhisbe
@liuqiyun
@codingwhatever
@clintropolis
@zhxiaogg
@rohitkochar
@itsmee
@Angelmmiguel
@Noddi
@se7entyse7en
@zhaown
@genevien

Druid 0.8.3 - Stable

26 Jan 23:51
Compare
Choose a tag to compare

Updating from 0.8.x

  • You must set druid.selectors.coordinator.serviceName to your Coordinator's druid.service value (defaults to druid/coordinator) in common.runtime.properties of all nodes. Realtime handoff will only work if this config is properly set. (See #2015)
  • Instead of the normal rolling update procedure, for this release you should update your Coordinator nodes before updating the overlord. (See #2015)
  • Min/max aggregators are now deprecated and will be removed in Druid 0.9.0. Please use longMin, longMax, doubleMin, or doubleMax aggregators as appropriate.

New Features

Improvements

  • #1770 Add segment merge time as a metric
  • #1791 EventReceiverFirehoseMonitor
  • #1824 Add hashCode and equals to UniformGranularitySpec
  • #1889 update server metrics and emitter version
  • #1920 Update curator to 2.9.1
  • #1929 separate ingestion and query thread pool
  • #1960 optimize index merge
  • #1967 Add datasource and taskId to metrics emitted by peons
  • #1973 CacheMonitor - make cache injection optional
  • #2015 Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoffs
  • #2045 Update mmx emitter to 0.3.6
  • #2145 druid.indexer.task.restoreTasksOnRestart configuration

Bug Fixes

  • #1387 Add special handler to allow logger messages during shutdown
  • #1799 Support multiple outer aggregators of same type
  • #1815 Fix Race in jar upload during hadoop indexing
  • #1842 Do not pass druid.indexer.runner.javaOpts to Peon as a property
  • #1867 fixing hadoop test scope dependencies in indexing-hadoop
  • #1888 forward cancellation request to all brokers, fixes #1802
  • #1917 RemoteTaskActionClient: Fix statusCode check.
  • #1932 DataSchema: Exclude metric names from dimension list.
  • #1935 ForkingTaskRunner: Log without buffering.
  • #1940 Move Jackson Guice adapters into io.druid
  • #1954 EC2 autoscaler: avoid hitting aws filter limits
  • #1985 Change LookupExtractionFn cache key to be unique
  • #2036 Disable javadoc linting
  • #1973 Make cache injection optional
  • #2141 Fix some problems with restoring
  • #2227 Update bytebuffer-collections to 0.2.4 (upstream bugfixes in roaring bitmaps)
  • #2240 Fix loadRule when one of the tiers had no available servers
  • #2207 Fix bug for thetaSketch metric not working with select queries
  • #2266 Fix loss in segment announcements when segments do not fit in zNode
  • #2189 add ChatHandlerServerModule to realtime example
  • #2338 Fix tutorial so indexing service can start up

Documentation

  • #1832 add examples for duration and period granularities
  • #1843 "druid.manager.segment" should be "druid.manager.segments
  • #1854 Fix documentation about lookup
  • #1900 fix doc - correct default value for maxRowsInMemory

Thanks to all the contributors to this release!

@b-slim
@binlijin
@dclim
@drcrallen
@fjy
@gianm
@guobingkun
@himanshug
@nishantmonu51
@pjain1
@xvrl

Druid 0.8.2 - Stable

18 Nov 17:17
Compare
Choose a tag to compare

Updating from 0.8.1

If you are using union queries, please make sure to update broker nodes prior to updating any historical nodes, realtime nodes, or indexing service.

Otherwise, you can follow standard rolling update procedures.

New Features

  • #1744 Memcached connection pooling
  • #1753 Allow SegmentMetadataQuery to skip cardinality and size calculations
  • #1609 Experimental kafa simple consumer based firehose
  • #1800 Experimental Hybrid L1/L2 cache

Improvements

  • #1821 cache max data timestamp in QueryableIndexStorageAdapter
  • #1765 Add CPUTimeMetricQueryRunner to ClientQuerySegmentWalker
  • #1776 Modified the Twitter firehose to process more properties
  • #1748 Allow ForkingTaskRunner javaOpts to have quoted arguments which contain spaces
  • #1759 better faster smaller roaring bitmaps
  • #1755 update druid-api for timestamp parsing speedup
  • #1756 improving msging when indexing service is not found
  • #1739 Allow SegmentAnalyzer to read columns from StorageAdapter, allow SegmentMetadataQuery to query IncrementalIndexSegments on realtime node
  • #1732 Add support for a configurable default segment history period for segmentMetadata queries and GET /datasources/ lookups
  • #1695 Allow writing InputRowParser extensions that use hadoop/any libraries
  • #1688 More memcached metrics
  • #1712 Add dimension extraction functionality to SearchQuery
  • #1696 Add CPU time to metrics for segment scanning.
  • #1718 Adds task duration to indexer console for completed tasks.
  • #1725 Don't check for sortedness if we already know GenericIndexedWriter isn't sorted
  • #1699 composing emitter module to use multiple emitters together
  • #1639 New plumber
  • #1604 Allow task to override ForkingTaskRunner tunings and jvm settings
  • #1542 add endpoint to fetch rule history for all datasources
  • #1682 Support parsing of BytesWritable strings in HadoopDruidIndexerMapper
  • #1622 Support for JSON Smile format for EventReceiverFirehoseFactory
  • #1654 Add ability to provide taskResource for IndexTask.

Bug Fixes

  • #1868 Removing parent paths causes watchers of the "announcements" path to get stuck
  • #1855 fix [GreaterThan,LessThan,Equals] HavingSpecs
  • #1862 Add timeout to shutdown request to middle manager for indexing service
  • #1822 support multiple non-consecutive intervals in outer query of nested group-by
  • #1811 Server discovery selector ipv6 friendly
  • #1823 For dataSource inputSpec in hadoop batch ingestion, use configured query granularity for reading existing segments instead of NONE
  • #1818 Add hashCode and equals to stock lookups
  • #1812 Bump server-metrics to 0.2.5 to catch a few fixes.
  • #1806 Fix index exceeded msg to give maxRowCount as well
  • #1801 Fix ClientInfoResource
  • #1795 Try and make AnnouncerTest a bit more predictable
  • #1797 ingest segment firehose ut
  • #1798 Update httpcomponents and aws-sdk
  • #1792 GroupByQueryRunnerTest for hyperUnique finalizing post aggregators
  • #1781 Fix failure in nested groupBy with multiple aggregators with same fie…
  • #1790 Cleanup kafka-extraction-namespace
  • #1782 Add analysisTypes to SegmentMetadataQuery cache key
  • #1730 fix #1727 - Union bySegment queries fix
  • #1783 Separate ListColumnIncluderator cache key parts with nul bytes
  • #1740 fix #1715 - Zombie tasks able to acquire locks after failure
  • #1778 Redirect fixes
  • #1777 fail task if finishjob throws any exception
  • #1775 SQLMetadataConnector: Retry table creation, in case something goes wrong.
  • #1772 RemoteTaskRunner: Fix for starting an overlord before any workers ever existed.
  • #1764 Enable logging for memcached in factory
  • #1760 Update memcached client for better concurrency in metrics.
  • #1761 LocalDataSegmentPusher: Fix for Hadoop + relative paths.
  • #1763 fix NPE and duplicate metric keys
  • #1758 Fix memcached cache provider injection and add test
  • #1747 Account for potential gaps in hydrants in sink initialization, hydrant swapping (e.g. h0, h1, h4)
  • #1751 Soften concurrency requirements on IncrementalIndexTest
  • #1736 IngestSegmentFirehostFactoryTimelineTest for overshadowing of the middle of a segment.
  • #1741 Add better concurrency testing to IncrementalIndexTest
  • #1743 Disable metadata publishing attempt in example script
  • #1697 Better logging of URIExtractionNamespace failures due to missing files
  • #1702 do not have dataSource twice in path to segment storage on hdfs
  • #1710 Add some basic latching to concurrency testing in IncrementalIndexTest
  • #1734 fix broken integration-test
  • #1731 fix NPE with regex extraction function
  • #1700 update indexing in the helper to use multiple persists and merge
  • #1721 fix for "java.io.IOException: No FileSystem for scheme: hdfs" error
  • #1694 Better timing and locking in NamespaceExtractionCacheManagerExecutorsTest
  • #1703 add null check for task context.
  • #1637 Make jetty scheduler threads daemon thread
  • #1658 Hopefully add better timeouts and ordering to JDBCExtractionNamespaceTest
  • #1620 Allow long values in the key or value fields for URIExtractionNamespace
  • #1578 Fix UT and documentation to the extraction filter
  • #1687 do not let user override hadoop job settings explicitly provided by druid code
  • #1689 Update LZ4Transcoder to match Compressed strategy factory type.
  • #1685 Close output streams and channels loudly when creating segments.
  • #1686 Replace funky imports with standard ones.
  • #1683 Remove unused Indexer interface.
  • #1632 Inner Query should build on sub query
  • #1676 fix convert segment task
  • #1672 Migrate TestDerbyConnector to a JUnit @rule
  • #1675 update druid-api for jackson 2.4.6
  • #1632 Inner Query should build on sub query
  • #1668 Code cleanup for CachingClusteredClientTest
  • #1669 Upgrade dependencies
  • #1663 TaskActionToolbox: Remove allowOlderVersions, lift interval constraint
  • #1619 update server metrics
  • [#1661](https://gi...
Read more

Druid 0.8.1 - Stable

16 Sep 05:49
Compare
Choose a tag to compare

Updating from 0.8.0

There should be no update concerns and standard updating procedures can be followed for rolling updates

New Features

  • #1259 Experimental Query Time Lookups (QTL) -– Ability to do limited joins at query time.
    Simple example use case is Country Code to Country Name.
  • #1374 Experimental Hadoop batch re-indexing and Delta ingestion.
    Re-Indexing allows you to ingest existing druid segments using a new schema with certain columns removed, changed granularity etc. "Delta" Ingestion allows appending data to existing interval in a datasource. See the new dataSource inputSpec and multi inputSpec for more information.

Improvements

  • #1465 Read Hadoop configuration file from HDFS
  • #1472 Support using combiner for Hadoop ingestion
  • #1506 Better support for null input rows during ingestion
  • #1518 More support added for Azure deep store
  • #1550 Add configuration option to print all HTTP requests to log
  • #1563 #1602 Improved merging performance on Broker
  • #1567 #1568 Improved error logging for segment activities
  • #1596 Improved coordinator console, now a separate maven dependency instead of giant code dump
  • #1601 Reduced lock contention during segment scan
  • #1603 Improved performance of Lexicographic TopNs
  • #1643 helpful cause explaining why SegmentDescriptorInfo did not exist

Improved test coverage for indexing service, ingestion, and coordinator endpoints

Bug Fixes

  • #1406 Fix groupBy breaking when exceeding max intermediate rows
  • #1441 Fix flush errors being suppressed when closing output streams
  • #1469 Fix inconsistent property names for druid.metadata.* properties
  • #1484 JobHelper.ensurePaths will set properties from config properly
  • #1499 Fix groupBy caching with renamed aggregators
  • #1503 Fix leaking indexing service status nodes in ZK
  • #1534 Fix caching for approximate histograms
  • #1616 Fix dependency error in local index task
  • #1627 Fix realtime tasks getting stuck on shutdown even after status being shown as SUCCESS
  • #1634 Allow IrcFirehoseFactory to shutdown cleanly
  • #1640 Package extensions in release tarball + script to run druid servers
  • #1653 Fix success flag emitted in router query metrics
  • #1659 on kill segment, don't leave version, interval and dataSource dir behind on HDFS
  • #1681 Fix overlapping segments not working for ingest segment firehose

Documentation

  • New documentation for firehoses, evaluating Druid, and plenty of fixes.
  • Improved documentation for working with CDH
  • Added instructions for PostgreSQL metadata store
  • More documentation on how to use ApproximateHistograms

The full list of changes can be found here

Thanks

Special thanks to everyone that contributed (code, docs, etc.) to this release!

@drcrallen
@davideanastasia
@guobingkun
@himanshug
@michaelschiff
@fjy
@krismolendyke
@nishantmonu51
@rasahner
@xvrl
@gianm
@pjain1
@samjhecht
@solimant
@sherry-q
@ubercow
@zhaown
@mvfast
@mistercrunch
@pdeva
@KurtYoung
@onlychoice
@b-slim
@cheddar
@MarConSchneid