← Back to all posts

PipelineDB 0.9.6


PipelineDB 0.9.6 is here, download it now!

PipelineDB 0.9.6 is primarily a maintenance release, but also includes one important new feature that PipelineDB users have wanted for a while: proper per-row time-to-live (TTL) support for continuous views.

TTL

A very common PipelineDB pattern is continuous aggregation with a timestamp-based column in the aggregation's grouping. For example:

CREATE CONTINUOUS VIEW v AS
  SELECT minute(arrival_timestamp), COUNT(*) FROM some_stream GROUP BY minute

In many cases, it is not necessary (or desired) to indefinitely store such data past a certain amount of time, especially considering that the number of rows is increasing without bound over time.

It is of course perfectly reasonable to DELETE unneeded rows in some kind of background job. However, this TTL pattern is so common amongst PipelineDB users that we decided to add first-class support for it.

TTL expiration behavior can now be assigned to continuous views via the ttl and ttl_column storage parameters. The autovacuumer will DELETE any rows having a ttl_column value that is older than the interval specified by ttl (relative to wall time).

Here's a version of the previous example that will tell the autovacuumer to delete any rows whose minute column is older than one month:

CREATE CONTINUOUS VIEW v_ttl WITH (ttl = '1 month', ttl_column = 'minute') AS
  SELECT minute(arrival_timestamp), COUNT(*) FROM some_stream GROUP BY minute;

Note that TTL behavior is a hint to the autovacuumer, and thus will not guarantee that rows will be physically deleted exactly when they are expired. If you'd like to guarantee that no TTL-expired rows will be read, you should create a view over the continuous view with a WHERE clause that excludes expired rows at read time.

Many users have historically achieved this automatic cleanup behavior by using sliding-window CVs. But with TTL support now available, it is no longer necessary to incur the performance impact of sliding-windows if all you need is automatic expiration.

This is certainly not to say that sliding-window CVs aren't still very useful. However, if you're grouping on a timestamp-based column in conjunction with a sliding window, it is possible that what you really want is simply TTL expiration.

max_age -> sw

In previous versions, sliding window width was given by specifying an interval via the max_age storage parameter. With the addition of TTL support, we felt that more distinctive terminology was necessary, so we renamed max_age to sw (sliding window). Our apologies for the breaking change of such a common parameter, but we'd prefer to do what we need to do to get these details right as we approach PipelineDB 1.0.0.

Output Streams for Transforms

PipelineDB 0.9.5 introduced support for output streams for continuous views. PipelineDB 0.9.6 adds output streams to continuous transforms, which simplifies how they can be used in many cases. Previously, a transform that reads from a stream and writes to another stream had to use a THEN EXECUTE PROCEDURE clause invoking pipeline_stream_insert:

CREATE CONTINUOUS TRANSFORM xform AS SELECT x, y, z FROM some_stream
  THEN EXECUTE PROCEDURE pipeline_stream_insert('another_stream');

CREATE CONTINUOUS VIEW v AS SELECT sum(x) FROM another_stream;

Now, we can just omit the THEN EXECUTE PROCEDURE clause and read directly from the transform's output stream:

CREATE CONTINUOUS TRANSFORM xform AS SELECT x, y, z FROM some_stream;

CREATE CONTINUOUS VIEW v AS SELECT sum(x) FROM output_of('xform');

Much better!

Bug Fixes and Perf/Stability Improvements

PipelineDB 0.9.6 is a maintenance release and resolves various minor odds and ends. Here are some of the more noteworthy issues we fixed:

  • Fixed potential deadlock between combiner process and DROP CONTINUOUS VIEW statement
  • Fixed potential buffer overrun when writing to a sliding-window output stream
  • Made pipeline_stream_insert stricter about the schema of rows it accepts
  • Added optimization to combiner process that avoids sliding-window output stream overhead when the stream has no readers
  • Made planner optimization that reduces memory usage for queries against large continuous views
  • Ripped out a lot of dead code

Thanks!

As always, a big thanks to our growing community of PipelineDB users! You are all awesome and we really enjoy interacting with all of you and getting your indispensible feedback, which continues to make PipelineDB better for everyone. Please keep pushing the limits and let us know how we can make PipelineDB work harder for you!

The next few releases will complete the last leg of the road to PipelineDB 1.0.0, so stay tuned for some exciting stuff coming up in the very near future :)

Now go download PipelineDB!