Welcome to the first installment in our series on Hydra External Tables! Follow us on Twitter, LinkedIn, or Discord to be the first to hear about the next installment in our series and other Hydra announcements.
What is Hydra?
Hydra is an open source Snowflake alternative. Hydra transforms Postgres into an elastic data warehouse with columnar storage and external tables. It's our mission to help start-ups make data-driven decisions.
What is an External Table?
External tables are data that live outside Hydra. You can use External Tables to query and combine data from multiple data sources. In the image below, Hydra tables are represented as "local" and RDS tables as "external". Data in RDS is being fetched and returned through Hydra.
This post will explore how to use Postgres External Tables to select, join, and insert data to and from your Hydra data warehouse.
Why use External Tables with Hydra?
Data warehouses are referred to as an organization’s source of truth because they aggregate data from different sources into a single, central data store to support data analysis. Traditionally, organizations configure data pipelines to copy data from multiple sources into the warehouse. While this approach works well for immutable data like analytics, rapidly changing data quickly becomes out of date. Constantly refreshing this data is expensive, slow, and restrictive.
External tables address the following problems:
- Data copying grows storage and pipeline costs.
- The warehouse’s data recency is limited to the data copy rate and interval.
- Impossible to run ad-hoc queries without copying relevant tables to the warehouse.
With External Tables, queries execute on source databases directly, which greatly reduces the friction and the cost outlined above. However, External Tables fall short when the volume of external data to fetch is large and queries need to be run repeatedly. The tradeoff is whether you want to save the data copying time for ad-hoc queries.
This tutorial performs queries in Hydra and joins an RDS external table. The following is a companion video:
In RDS Postgres, we create
demo_signups as a standard row table:
CREATE TABLE demo_signups ( email TEXT ); INSERT INTO demo_signups (email) VALUES ('email@example.com'), ('firstname.lastname@example.org');
In Hydra, we create
events as a columnar table:
CREATE TABLE events (email TEXT, name TEXT, timestamp TIMESTAMPTZ) USING columnar; INSERT INTO events (email, name, timestamp) VALUES ('email@example.com', 'webinar_signup', CURRENT_TIMESTAMP), ('firstname.lastname@example.org', 'email_signup', CURRENT_TIMESTAMP), ('email@example.com', 'webinar_signup', CURRENT_TIMESTAMP);
Link to the RDS external table
In Hydra, we create an external table linking to the RDS
/* replace ... with your RDS DB info */ CREATE EXTENSION postgres_fdw; CREATE SERVER rds_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '...', port '...', dbname '...'); CREATE USER MAPPING FOR CURRENT_USER SERVER rds_server OPTIONS (user '...', password '...'); CREATE FOREIGN TABLE rds_demo_signups ( email TEXT ) SERVER rds_server OPTIONS (schema_name 'public', table_name 'demo_signups');
Query from RDS external table
In Hydra, check that you can query the external RDS
SELECT * FROM rds_demo_signups;
Perform join from RDS external table
In Hydra, we perform a join with the external RDS
demo_signups table with the columnar
As an example, the following calculates the rate of user conversion from demo signups to webinar signups:
SELECT COUNT(events.email) / COUNT(rds_demo_signups.email) * 100.0 FROM events LEFT JOIN rds_demo_signups ON events.email = rds_demo_signups.email WHERE events.name = 'webinar_signup';
This post explored querying and combining data from an external RDS row table in Hydra. We have shown the ability to query data outside your data warehouse without ETL pipelines.
We will explore Hydra integration patterns in the next few posts. Make sure to follow us on Twitter, LinkedIn, or Discord to get notified. In the meantime, you can learn more about External Tables in our documentation. The Hydra database is open sourced on GitHub; please star it if you like what you see in this post 🙂.