← Back to all posts

PipelineDB 0.9.9 - One More Release Until PipelineDB is a PostgreSQL Extension


PipelineDB 0.9.9 has been released! Download it here.

Overview

PipelineDB 0.9.9 primarily includes major progress towards becoming a standard PostgreSQL extension. The next release, 1.0.0 will be an extension.

We have admittedly been somewhat behind schedule in our efforts to make PipelineDB an extension, and this is largely due to us investing significant engineering resources into Stride, our completely hosted analytics infrastructure product powered by PipelineDB. While costly in terms of the extension timeline, the benefit of this engineering investment is that it has made us profitable by a comfortable margin, which will fund the ongoing and indefinite development of PipelineDB.

We understand more than anyone how much everyone is looking forward to PipelineDB becoming an extension, and we want all of our Stride infrastructure to be using PipelineDB 1.0.0 as soon as possible too. So it's happening, and there's just one more release to go.

Let's look at what went into the 0.9.9 release...

Standardization

All major PipelineDB objects (streams, continuous transforms, and continuous views), as well as their supporting infrastructure have been standardized into regular PostgreSQL objects to facilitate the final steps towards becoming an extension. All PipelineDB-specific syntax is still allowed for convenience, but internally PipelineDB DDL statements are internally rewritten to standard PostgreSQL DDL statements. As a result, PipelineDB objects can now also be created directly using standard PostgreSQL syntax.

Here's how each PipelineDB object is now represented in a standardized way:

Streams

Streams are now simply foreign tables. CREATE STREAM <stream> is now internally rewritten to:

CREATE FOREIGN TABLE s (x integer, ...) SERVER pipelinedb;

Continuous Views

Continuous views are now represented by regular PostgreSQL views with some associated metadata attached to them. The most important piece of metadata is the action, which for continuous views is now set to materialize:

CREATE VIEW continuous_view WITH (action=materialize, ...) AS SELECT count(*) FROM s;

All regular continuous view options (such as sw, ttl, etc.) are just passed in addition to the action setting.

Continuous Transforms

Continuous transforms are also represented by regular PostgreSQL views, with the action being set to transform:

CREATE VIEW continuous_transform WITH (action=transform) AS SELECT x FROM s;

-- Output functions can be specified via the outputfunc option:
CREATE VIEW continuous_transform WITH (action=transform, outputfunc=pipeline_stream_insert('some_stream')) AS SELECT x FROM s;

Note that the command tag (the response from the server after executing a DDL statement) will now reflect the standard PostgreSQL objects that have been created. So if you're using PipelineDB syntax to create objects, you'll see the command tag for the standard objects that were actually created. For example:

pipeline=# CREATE STREAM s (x integer);
CREATE FOREIGN TABLE
pipeline=#
pipeline=# CREATE CONTINUOUS VIEW v AS SELECT count(*) FROM s;
CREATE VIEW
pipeline=#
pipeline=# CREATE CONTINUOUS TRANSFORM xform AS SELECT x FROM s;
CREATE VIEW

Also please note that as previously mentioned, PipelineDB-specific syntax is still supported so you don't need to change any of your existing DDL statements for PipelineDB 0.9.9. However, with the 1.0.0 extension release you'll be using this new standardized DDL interface.

Reaper Improvements

This release also contains some important performance improvements to the reaper process. The need for these improvements was discovered by running large PipelineDB deployments that power Stride, as we now receive very detailed and illuminating performance metrics from all of the the databases we run in production. PipelineDB 0.9.9 adds the following reaper improvements:

  • Memory usage has been reduced significantly by running TTL expiration queries in smaller, more granular transactions
  • Reapers will always prefer an index scan when looking for TTL-expired rows if an index exists on the TTL column

Upgrading

PipelineDB 0.9.9 includes some minor catalog changes, so you'll want to either import your existing data into a 0.9.9 database, or use the binary upgrade tool against your existing data directory, which is pretty easy:

$ pipeline-upgrade -b old_version/bin -d old_data_dir -B new_version/bin -D new_data_dir

Odds and Ends

  • Improved Docker image to restart PipelineDB when a configuration file is given (Thanks jcrsilva!)
  • Fix extraneous WARNING logged for stream-table JOIN plan executions that produce no rows (#1906)
  • Properly include TTL information with schema dumps (#1899)
  • Fixed bug with dumping transforms with no output function (#1877)
  • Fixed error thrown when running read-only queries in hot standby mode (#1867)
  • Fixed bug causing output stream tuples to be misaligned with output stream schema
  • Fixed in-memory FSS representation bug
  • Fixed in-memory Bloom filter representation bug (#1895)
  • Fixed crash with sliding-window output streams (#1911)

Remaining Work and Timeline

PipelineDB 0.9.9 completes a substantial amount of the internal rework and standardization required for PipelineDB to become an extension. The hardest parts of the extension refactor are now finished. The remainder of the extension work involves actually packaging the PipelineDB codebase as an extension, as well as some final query planning/analysis rework.

We are aiming to ship PipelineDB 1.0.0 by the end of the quarter, so before July 1, 2018. We want to be using PipelineDB as an extension for all of our Stride databases, so it is a top priority for us at this point and is quickly becoming a reality.

You'll definitely want to start using PipelineDB 0.9.9 as soon as possible, as it will be by far the easiest way to seamlessly upgrade to PipelineDB 1.0.0 when it becomes available. Please don't hesitate to provide us with your always thoughtful and helpful feedback.

One more release to go!