← Back to all posts

Announcing Stride - A Realtime Analytics API

We released PipelineDB just over a year ago and have seen consistently strong, increasing adoption during that period. We've been very fortunate to have an awesome and growing community of nice, thoughtful, smart (and patient) users who have helped us learn about what organizations on the bleeding-edge of data processing are doing and thus what the future looks like. And in our perpetual effort to build the future, it is with tremendous excitement that we're announcing the upcoming developer preview of our newest product: Stride.

Stride is a realtime analytics API built for scale. It leverages PipelineDB extensively as its core infrastructure, but as an API, we manage 100% of that infrastructure for you. You get a dead-simple API that just works, and a web-based interface for visualization. Much more than a silo to blindly dump data into and query later, Stride enables users to define networks of continuous processes that do things like high-throughput aggregations, sliding-window computations, joins, fire webhooks, run massive retroactive batch queries, and more. All of this can be visualized in the Stride web interface or your own frontends, and your clients can even subscribe to realtime streams of changes over HTTP.

Ultimately, Stride aims to deliver value to an enormous underserved market for companies that need more flexibility and power than off-the-shelf analytics products provide, but don't want to operate complex infrastructure in production. As a fully managed API, Stride is simpler than a database, but richly programmable and therefore very powerful.

The Stride website and docs are the best place to look for more technical information, but we also want to tell you about how we got here. We've learned a couple of important lessons from our experience building and supporting PipelineDB, so important in fact that they compelled us to invest significant resources into building Stride on top of it.

Observation #1: Static silos with on-demand querying are not the future of analytics

It takes more physical energy to consume a piece of data (think datastore) than it does to produce it (think web server logs), which is the fundamental reason why there is much more sophistication at the consumption layer. What this means is that storing data in the same form it is produced is almost always extremely expensive and at a minimum inefficient, particularly if you're not accessing it at the same frequency it's produced.

We're seeing a strong, growing trend of sophisticated organizations that understand this and have begun to leverage PipelineDB's continuous processing model to make their data instructure faster, lighter, cheaper, and ultimately more powerful. Continuously run queries, store only their incremental output, and archive everything else because it's going to stay cold. It's rarely worth paying for millisecond-level access to an event that happened a month ago. Seconds, minutes, or even hours are fine if the payoff is that it's essentially free to keep around.

For an increasing number of companies where data is becoming a principal business driver, efficiency of how data is managed and processed is a first-order consideration, especially when considering the development time required to design, build, scale, and manage data infrastructure. Giving the same value to a datapoint that's a week old and rarely used versus one that's brand new and accessed frequently for a few hours is common, but also an obvious source of significant inefficiency. Stride makes it easy to capture and harvest the time-value of data by continuously distilling data in realtime, and even bound it with sliding windows, such that the only data that's stored for fast access is exactly what you actually use. Everything else is simply archived until it's needed again.

Observation #2: The vast majority of the action derived from analytics data will eventually be performed at enormous speeds by machines

Taking the idea of continuous processing further is the concept of realtime push. With a continuous processing model, it is trivial to deliver continuous intelligence to other infrastructural or business components by pushing information out to them in realtime via message buses, webhooks, and so on.

We have begun to internally refer to this fascinating intersection of continuous processing and realtime push as machine analytics, and this is where we see the future of analytics going: perpetually running analytical computations and integrating the resulting continuous intelligence with other parts of the organization in realtime. Here are some simple examples of what this actually looks like in practice:

  • Fraud detection - when there are 10 failed login attempts by a user over a 10-second window, POST to the accounts endpoint and disable that account. Send me a text too.
  • Email concise, summary analytics reports to leaders when certain thresholds are crossed, or at a specific time, or when something noteworthy or abnormal happens
  • Dynamically optimize product price based on demand as determined by a purchase stream, and up-to-the-second inventory data.
  • Continuously optimize a webpage automatically based on streaming A/B test results

An interesting thought experiment is to imagine a human examining a chart that some kind of action is ultimately derived from. Is that chart really necessary? If we think about this chart as simply an intuitive representation of a mere instant of a machine's unfathomably powerful and sophisticated thought process, another interesting question arises: can that machine just programmatically employ that action itself?

It is likely that 90% of the action derived from analytics data can be done effectively with great efficiency by machines, and perhaps the remaining 10% will always require the magic of human imagination and intuition. The current state of the art is the inverse, and Stride is going to enable businesses to change that in order to operate more effectively.

Observation #3: APIs are eliminating the need to operate infrastructure

Even something as simple as deploying and managing PipelineDB still comes with some friction of adoption. We're building Stride to deliver PipelineDB's core value--and more--with virtually no friction at all by managing 100% of the infrastructure for you and putting it all behind a dead-simple API. Not only does this make developers' lives substantially easier, but it takes into consideration an undeniable trend in virtually all markets involving software: APIs are becoming the essential building blocks of complex software products.

The datacenter is now the computer; the container is the process; the container orchestration framework is the operating system; and the managed API is the library. The address space of an application now simply spans the network, allowing developers to increasingly focus on solving core problems by integrating with specialized APIs that do everything else well. Infrastructure in particular is a highly profitable thing to outsource this way because it is invariably a core necessity but almost never a core competency.

Let's look at some examples to illustrate the power of the Stride API. One of the most consistently challenging areas in any analytics problem is counting unique values in various ways: cookies, IP addresses, users, etc. It's a hard problem to scale because there are presumably a lot of these things, and they all need to be read together in order to determine which ones to include in the final unique count. Let's create a continuous process with Stride that incrementally keeps track of a unique visitors by hour, for every url of our website:

POST https://api.stride.io/process/hourly_url_uniques

  "query": "
      hour($timestamp)        AS hour
      url                     AS url,
      num_uniques(cookie)     AS uniques
    FROM page_views
    GROUP BY hour, url",
  "action": {
    "type": "MATERIALIZE"

Since we're using an action of MATERIALIZE, this continuous process will incrementally store the output of the given query such that it can be queried at any time. And since only post-aggregate data is actually stored, querying it is extremely fast. We can use the analyze endpoint to do that:

POST https://api.stride.io/analyze

  "query": "SELECT hour, url, uniques FROM hourly_url_uniques

Which gives us a response such as:

  {"hour": "2016-09-01 00:00:00", "url": "/some/url", "uniques": 678923},
  {"hour": "2016-09-01 01:00:00", "url": "/some/url", "uniques": 55104}

But what if we want to deduplicate those uniques across multiple hours? We can use the combine aggregate for that:

POST https://api.stride.io/analyze

  "query": "SELECT url, combine(uniques) FROM hourly_url_uniques GROUP BY url"

And now we have a deduplicated count across hours:

  {"url": "/some/url", "uniques": 699147}

And what if we want to notify an HTTP endpoint if we ever see more than 100,000 uniques for any url in a given hour? We can create another continuous process that reads from the first one and uses a WEBHOOK action to notify another service whenever our condition is met:

POST https://api.stride.io/process/uniques_notifier

  "query": "
    SELECT new.uniques
    FROM hourly_url_uniques
    WHERE old.uniques < 100000 AND new.uniques >= 100000"
  "action": {
    "type": "WEBHOOK"
    url: "http://my.external.service/notify"

The best part about all of this is that it's powered by a fully managed API, so you can start doing this kind of stuff in your applications as soon as you have an API token.

Developer Preview

We couldn't be more excited to be releasing Stride, which is ultimately the product of an enormous amount of first-hand experience we've had the privilege of absorbing, put into the context of our vision about the future of an extremely active and exciting space, and executed by a team that loves what it does. For companies that need more power and flexibility than off-the-shelf analytics provide, but aren't interested in building and operating complex infrastructure, we think you'll be thrilled to use Stride.

Stride is entering a developer preview phase in the coming weeks, which you can get access to by signing up here. We'll get back to you shortly as new information becomes available, and we're always excited to answer any questions you may have in the meantime. Please let us know what you think, and stay tuned for more to come soon!