Drew Banin’s recent post HTAP Databases in The Analytics Engineering Roundup was awesome! I’ll be referencing Drew’s crystal-clear analysis. Additional thanks to Or Avidov, co-founder at Elementary Data, for encouraging me to share why the convergence of OLTP and OLAP is happening.
The Great Convergence of 2022
We all know the story of the two different kinds of databases. On the one hand, you have the transactional database: good at reading and writing small numbers of rows very quickly, but bad at table scans. On the other hand, you have the analytic database: good at aggregating and filtering data over an entire table quickly, but slow for reads and writes in an application context.
There are - of course - more than two types of databases. You’ve got your key/value stores (Redis), your search-optimized databases (Elasticsearch), your timeseries databases (Influx), your streaming databases (Materialize), your graph databases (Neo4j) and a whole host of other databases that are built to service specific use cases and query patterns. The workloads that these databases can service are the results of very intentional trade-offs around how data is stored, indexed, and processed inside of each database.
—Banin, HTAP Databases
Take a look at the DB-engine’s rankings to see how deep the database ecosystem rabbit hole goes. While the long list of database options might make your head spin, it’s evidence of a rich and rewarded ecosystem. That’s a good thing!
For Marvel fans, think of each database strength like an Infinity Stone. Each one is powerful, but used together, in seamless coordination, is near unstoppable.
Let’s have a little fun with the analogy:
- Space Stone - Travel instantly between places. This could be a multi-region redundant databases like CockroachDB or Yugabyte.
- Time Stone - Manipulate time. Could be Postgres, TimeScaleDB, InfluxDB, QuestDB
- Power Stone - Increase strength. Data warehouses like Snowflake, BigQuery, Firebolt, Redshift, Synapse
The analogy begins to break down with Soul Stone, but you might get my point: it’s great to harness 1 Infinity Stone, but it’s stronger to use a few simultaneously.
In essence, that’s HTAP (Hybrid Transactional and Analytical Processing) - data engineer-speak for a converged architecture with value greater than the sum of the parts. Gartner says “HTAP architecture enables real-time analytics and situation awareness on live transaction data as opposed to after-the-fact analysis on stale data.”
HTAP is the promised magic. Real life events happen and real-time analytics and/or ML spits back value to customers. Users feel seen and the application appears adapted to the speed of life. Internal HTAP use cases are great too – executives can make decisions on fresh analysis at the speed of the business. Sounds easy enough, but Accenture’s poll shows that “Only 16% of companies indicated having an agile data supply chain enabling them to serve data to the business at speed." Traditionally, separated OLTP and OLAP could work. Today is different – data volumes are larger, throughput and concurrency is higher, analytics can be deeper, and the “intelligence layer” tooling has matured.
OLTP → HTAP ← OLAP
Thanos is lucky — he has a gauntlet that unifies multiple infinity stones and coordinates them interchangeably. That’s like HTAP (Hybrid Transactional and Analytical Processing) — unifying transactional powers and analytical powers under one gauntlet.
Today, developers use the app layer as Thanos’ gauntlet to balance OLTP + OLAP together. Problems arise when databases are replaced. To support a new database, developers need to change app code, rewrite queries, possibly use custom libraries, and set up new data pipelines. That’d be like if Thanos had to forge a new gauntlet every time he adopted a new Infinity Stone. That process would be slow, wasteful, and generally, pretty annoying. Worst of all, Thanos would need to recreate a new gauntlet when adversaries are on the attack. Sound like your business?
That wouldn’t work. Yet, it’s exactly what IT orgs expect engineers to do: adopt new databases, map their differences, change application code, rewrite queries, etc. Engineers rinse and repeat this painful cycle when leadership gets interested in a new database to solve scale or functionality issues.
As bad as adopting new databases is, the alternative (not ripping the band-aid), can be worse:
Enter Hydra - the universal gauntlet
Hydra coordinates databases together like Thanos’ gauntlet. Everything just works: use Postgres ORMs and Postgres queries, which will automatically execute against Snowflake for analytics. That’s HTAP - trade-offs being handled automatically like a unified system. Writes are heading to Postgres, simple reads are routed to Postgres, complex analytics are routed to Snowflake.
Want to add a new Infinity Stone (aka database engine)? With Hydra, you can swap in and out databases without any application changes.
Joe Sciarrino, CEO @ Hydra