PipelineDB 0.8.4 is here, download it now! Some of the highlights of this release are:
Multi-core Scalability Improvements
Previously, all worker processes used a single shared memory queue for IPC which was caused a lot of lock contention when running on machines with a high number of cores. We reworked the entire IPC infrastructure to use per-process queues so that they're lock-free for consumers. We also now write to these queues in batches rather than one tuple at a item which greatly reduces lock contention for producers. In our load tests, we've seen 4x improvement in throughput when running on a 32-core machine. Multi-core performance is going to be something we'll continue to focus on over the next few releases. Please let us know if you face any scalability issues!
As part of this change the
tuple_buffer_blocks configuration parameter has been removed. Instead you should use
continuous_query_ipc_shared_mem to set the size of the shared memory segment per process. The default is 32mb.
Exact Distinct Counting
By default PipelineDB uses HyperLogLogs to estimate the number of distinct values when using
count(DISTINCT ...) so that space requirements are constant. For some users, this behavior was not ideal and they requested the ability to get accurate counts. In this release we've added a
set_agg aggregate function which accumulates values into a set. The cardinality of the set can then be used to determine the number of unique values seen.
CREATE CONTINUOUS VIEW v AS SELECT set_agg(x::int) FROM stream; INSERT INTO stream (x) VALUES (1), (2), (3); INSERT INTO stream (x) VALUES (1), (2), (3); INSERT INTO stream (x) VALUES (3), (4), (5); SELECT set_cardinality(set_agg) FROM v; set_cardinality ----------------- 5 (1 row)
You can also use the
exact_count_distinct(...) aggregate function which is an alias for
Write I/O Load Improvement
Disk I/O has been another bottleneck that some of our heavy users are facing. In many cases, the values of certain fields in a view might not change even though new data is being inserted. For instance:
CREATE CONTINUOUS VIEW v AS SELECT bloom_agg(x::int) FROM stream; INSERT INTO stream (x) VALUES (1); INSERT INTO stream (x) VALUES (1); INSERT INTO stream (x) VALUES (1); INSERT INTO stream (x) VALUES (1);
Only the first insert here changes the value of the Bloom filter from
NULL to a filter with
1 in it. The following
INSERTs don't change the Bloom filter at all. We now detect such cases and only overwrite the old value if it has changed.