← Back to all posts

PipelineDB 0.8.3


PipelineDB 0.8.3 is here, download it now! Some of the highlights of this release are:

Adhoc Continuous Queries

Support for adhoc continuous queries is in this release. We have added a new ncurses based command line client called padhoc that allows you to perform adhoc queries on streams, and see results update in real time.

This is a useful aid for debugging, or just experimenting with queries in general. Here is a demo of it in action:

padhoc -c 'select x::text, count(*) from stream group by x'

padhoc

Note, this will create a temporary adhoc query that will last only until you shut down the client with Ctrl+C.

Adhoc queries do not currently support sliding windows.

The padhoc client takes most of the same connection options as psql, use padhoc --help to see them. The following keys are available to drive the ncurses GUI, and page through the result set:

UP         Scroll up
DOWN       Scroll down
LEFT       Scroll left
RIGHT      Scroll right
PGDN       Scroll page down
PGUP       Scroll page up
HOME       Scroll left to first column
END        Scroll right to last column
TAB        Scroll right one column
SHIFT+TAB  Scroll left one column
p          Pause updating

Since adhoc continuous queries are a relatively new feature, they're disabled by default. To turn them on set continuous_queries_adhoc_enabled=on in your configuration file.

Variable-length Type Support in Top-K

We added top-K support in the previous release but it only supported fixed-length types (int, char, etc). With this release variable length types (like text) are also supported. We've also changed the interface of the the datastructure a bit. fss_topk(fss) is now a set returning function which returns pairs of (value, frequency) items. We've also added fss_topk_values(fss) and fss_topk_freqs(fss) which return arrays of top-K values and top-K frequencies respectively.

CREATE CONTINUOUS VIEW v AS SELECT fss_agg(key::text, 3) FROM stream;
INSERT INTO stream (key) VALUES ('a'), ('b'), ('c');
INSERT INTO stream (key) VALUES ('a'), ('a'), ('b');
INSERT INTO stream (key) VALUES ('a'), ('a'), ('a');
INSERT INTO stream (key) VALUES ('b'), ('c'), ('d');
INSERT INTO stream (key) VALUES ('x'), ('x'), ('x');

SELECT fss_topk(fss_agg) FROM v;
 fss_topk
----------
 (a,6)
 (b,3)
 (x,3)
(3 rows)

SELECT fss_topk_values(fss_agg), fss_topk_freqs(fss_agg) FROM v;
 fss_topk_values | fss_topk_freqs
-----------------+----------------
 {a,b,x}         | {6,3,3}
(1 row)

Keyed Min/Max

Some users requested support for first and last aggregate functions which are available as an extension here. These aggregates return the first or last value of the group as if there was an ORDER BY key clause inside the first and last aggregate. Since ORDER BY doesn't make any sense within streams, we added keyed_max and keyed_min which let you calculate the same aggregate. The following two queries are semantically equivalent (if all the data inserted into stream were to be present in table).

CREATE CONTINUOUS VIEW v AS SELECT z::int, keyed_max(x::int, y::text), keyed_min(x::int, y::text) FROM stream GROUP BY z;

CREATE TABLE table (x int, y text, z int);
SELECT z::int, last(y::text ORDER BY x::int), first(y::text ORDER BY x::int) FROM table GROUP BY z;

Example usage:

CREATE CONTINUOUS VIEW v AS SELECT z::int, keyed_max(x::int, y::text), keyed_min(x::int, y::text) FROM stream GROUP BY z;
INSERT INTO stream (x, y, z) VALUES (1, 2, 1);
INSERT INTO stream (x, y, z) VALUES (10, 12, 1);
INSERT INTO stream (x, y, z) VALUES (1, 3, 1);
INSERT INTO stream (x, y, z) VALUES (10, 15, 1);
INSERT INTO stream (x, y, z) VALUES (10, 5, 2);
INSERT INTO stream (x, y, z) VALUES (10, 7, 2);
INSERT INTO stream (x, y, z) VALUES (3, 'a', 2);
INSERT INTO stream (x, y, z) VALUES (3, 'abc', 2);
SELECT * FROM v;
 z | keyed_max | keyed_min 
---+-----------+-----------
 1 | 12        | 2
 2 | 5         | a
(2 rows)

Dynamic Sliding Window Step-size

Sliding windows have probably been the most popular feature of PipelineDB. So we've been making usability and performance improvements to them with every release. In this release, we've made it simpler to control the fan-out factor for sliding window queries. Instead of using truncation functions like date_trunc, second etc, you can can now specify a storage parameter called step_factor when creating a sliding window continuous view. This parameter controls the step size of the sliding window query as a percentage of the window size. For example, if you want the step size to be 10%, you could create a view like:

CREATE CONTINUOUS VIEW v WITH (max_age = '15 minute`, step_factor = 10) AS SELECT COUNT(*) FROM stream

What this means is that underneath the hood, PipelineDB will create smaller buckets on the time-axis of width 90 seconds. Being able to tune the step-factor this way simplifies understanding what the storage costs are (in this case, it will be ~10x). The default step factor is 5 and can be changed by setting the sliding_window_step_factor configuration parameter.

Performance & Stability Improvements

Previously, the combine workload for a single continuous view was performed by a dedicated combiner process which was turning out to be a bottleneck. It also meant that you'd need at least N continuous views to make use of N combiner processes. Now, each continuous view's combine workload is spread across all combiner processes which greatly improves throughput on multi-core machines.

Some minor bug fixes:
- We completely reworked the way stream insertions work which fixed a bunch of weird bugs we were seeing with prepared stream insertions.
- pg_dump and pg_restore now work with array aggregates.
- System shutdown under heavy load could tend to be unresponsive--this behavior should be fixed now.
- Fixed a deadlock in combiners when running on many cores.