← Back to all posts

High-Performance Time-Series Aggregation for PostgreSQL 11


TL;DR

  • PipelineDB 1.0 is a PostgreSQL extension for high-performance time-series aggregation via continuous SQL queries.
  • PipelineDB 1.0 is now available for PostgreSQL 11 and Docker. Visit the getting started docs or download a release package to get started.
  • PipelineDB enables realtime reporting use cases at scale, where only summary data is needed
  • The next release of PipelineDB will include partitioning automation

Overview

PipelineDB extends PostgreSQL by allowing you to define continuous views over streams of time-series data. The unique thing about continuous views is that they only store the aggregate output of their query definitions. Individual time-series events are discarded from memory as soon as all continuous views that must read them have done so. Eliminating the necessity of writing all events to disk before running queries over them enables PipelineDB to deliver extremely high performance for time-series aggregation workloads.

And because PipelineDB only stores compact aggregate data on disk, read query latencies are kept to a minimum. Consider a canoncial example:

-- Input stream for raw time-series events
CREATE FOREIGN TABLE stream (ts timestamp, x integer) SERVER pipelinedb;

CREATE VIEW hourly_counts WITH (action=materialize) AS
  SELECT hour(ts), count(*) FROM stream
GROUP BY hour;

-- Time-series events are written to streams via standard INSERT/COPY statements
INSERT INTO stream (ts, x) SELECT now(), x FROM generate_series(1, 1000) x;

The continuous view, hourly_counts, will only ever store a single row for each new hour of timestamps in the input events. The count column is simply updated incrementally in real time as new events are written to the system. PipelineDB extends this simple concept to virtually any query you can express in SQL.

Continuous queries also produce output streams of incremental updates that are made to them, and thus can be chained together into arbitrary topologies of continuous SQL. And since continuous views are stored as regular PostgreSQL relations, their results can be retrieved and analyzed using the full expressiveness of SQL.

Ultimately this empowers users to build high-performance reporting and analytics applications at large scales with relative ease.

PostgreSQL 11

Two weeks ago we released PipelineDB 1.0.0 as a standard PostgreSQL extension with support for PostgreSQL 10. Adoption of PipelineDB 1.0.0 is growing much faster than any previous release of PipelineDB, and there has been significant demand for PostgreSQL 11 support. And today we're very happy to announce that the latest release of PipelineDB now supports PostgreSQL 11.

Since PostgreSQL 10 is not binary compatible with PostgreSQL 11, PipelineDB has separate packages for each major PostgreSQL version. Head over to the installation docs to get started.

Docker

We've also published official PipelineDB Docker images for both PostgreSQL 10 and 11, so you can now easily run PipelineDB on any platform that supports Docker:

$ docker run pipelinedb/pipelinedb-postgresql-11

See the Docker docs to get started with PipelineDB on Docker.

What's Next

Firstly, we want to thank our amazing community of users who have provided indispensable feedback for the PipelineDB 1.0.0 extension release. Thanks to them, we've made a couple of small release updates to 1.0.0 that resolve some relatively minor platform-specific issues that were identified. Other than that, the feedback we've received for the 1.0.0 release has been overwhelmingly positive, and we couldn't be happier about that. Thank you PipelineDB users, we are genuinely grateful to have you :)

From here, we're pushing to get the next release of PipelineDB out as soon as possible. We're intentionally keeping the next release as light as possible, because it will contain one new major feature that users have wanted for PipelineDB for quite a while: partitioning automation.

Since most continuous views include some kind of time-based column in the aggregation grouping (as in the example above), partitioned continuous views will allow users to keep their continuous views physically partitioned as they grow. The interface will look something like this:

CREATE VIEW partitioned_cv WITH (partition_by = 'hour', partition_duration = '1 day') AS
  SELECT hour(ts), count(*) FROM stream
GROUP BY hour;

This would give you one partition for each day of hour(ts). When a new partition is needed, PipelineDB will automatically create it for you with no service interruptions. All that's needed from users is the initial WITH options.

So if you haven't already, please go give PipelineDB a try and let us know what you think!