1.0is a PostgreSQL extension for high-performance time-series aggregation via continuous SQL queries.
1.0is now available for PostgreSQL
11and Docker. Visit the getting started docs or download a release package to get started.
- PipelineDB enables realtime reporting use cases at scale, where only summary data is needed
- The next release of PipelineDB will include partitioning automation
PipelineDB extends PostgreSQL by allowing you to define continuous views over streams of time-series data. The unique thing about continuous views is that they only store the aggregate output of their query definitions. Individual time-series events are discarded from memory as soon as all continuous views that must read them have done so. Eliminating the necessity of writing all events to disk before running queries over them enables PipelineDB to deliver extremely high performance for time-series aggregation workloads.
And because PipelineDB only stores compact aggregate data on disk, read query latencies are kept to a minimum. Consider a canoncial example:
-- Input stream for raw time-series events CREATE FOREIGN TABLE stream (ts timestamp, x integer) SERVER pipelinedb; CREATE VIEW hourly_counts WITH (action=materialize) AS SELECT hour(ts), count(*) FROM stream GROUP BY hour; -- Time-series events are written to streams via standard INSERT/COPY statements INSERT INTO stream (ts, x) SELECT now(), x FROM generate_series(1, 1000) x;
The continuous view,
hourly_counts, will only ever store a single row for each new hour of timestamps in the input events. The
count column is simply updated incrementally in real time as new events are written to the system. PipelineDB extends this simple concept to virtually any query you can express in SQL.
Continuous queries also produce output streams of incremental updates that are made to them, and thus can be chained together into arbitrary topologies of continuous SQL. And since continuous views are stored as regular PostgreSQL relations, their results can be retrieved and analyzed using the full expressiveness of SQL.
Ultimately this empowers users to build high-performance reporting and analytics applications at large scales with relative ease.
Two weeks ago we released PipelineDB
1.0.0 as a standard PostgreSQL extension with support for PostgreSQL
10. Adoption of PipelineDB
1.0.0 is growing much faster than any previous release of PipelineDB, and there has been significant demand for PostgreSQL
11 support. And today we're very happy to announce that the latest release of PipelineDB now supports PostgreSQL
10 is not binary compatible with PostgreSQL
11, PipelineDB has separate packages for each major PostgreSQL version. Head over to the installation docs to get started.
We've also published official PipelineDB Docker images for both PostgreSQL
11, so you can now easily run PipelineDB on any platform that supports Docker:
$ docker run pipelinedb/pipelinedb-postgresql-11
See the Docker docs to get started with PipelineDB on Docker.
Firstly, we want to thank our amazing community of users who have provided indispensable feedback for the PipelineDB
1.0.0 extension release. Thanks to them, we've made a couple of small release updates to
1.0.0 that resolve some relatively minor platform-specific issues that were identified. Other than that, the feedback we've received for the
1.0.0 release has been overwhelmingly positive, and we couldn't be happier about that. Thank you PipelineDB users, we are genuinely grateful to have you :)
From here, we're pushing to get the next release of PipelineDB out as soon as possible. We're intentionally keeping the next release as light as possible, because it will contain one new major feature that users have wanted for PipelineDB for quite a while: partitioning automation.
Since most continuous views include some kind of time-based column in the aggregation grouping (as in the example above), partitioned continuous views will allow users to keep their continuous views physically partitioned as they grow. The interface will look something like this:
CREATE VIEW partitioned_cv WITH (partition_by = 'hour', partition_duration = '1 day') AS SELECT hour(ts), count(*) FROM stream GROUP BY hour;
This would give you one partition for each day of
hour(ts). When a new partition is needed, PipelineDB will automatically create it for you with no service interruptions. All that's needed from users is the initial