Welcome to the second installment in our series on Hydra External Tables! Follow us on Twitter, LinkedIn, or Discord to be the first to hear about the next installment in our series and other Hydra announcements. Part 1 of the series is available here.
Hydra is an open source Snowflake alternative that transforms Postgres into an elastic data warehouse with columnar storage and external tables. In part 1, we discussed using external tables to access and join data that live outside Hydra. In this post, we are going to discuss the reverse direction - populate data to the Hydra data warehouse from other databases. In this tutorial, we use RDS as the external database to Hydra.
Why populate data to Hydra from foreign databases?
In part 1 of our External Tables series, we showed how External Tables can be used to quickly run ad-hoc analytics queries on data outside of a Hydra data warehouse. When an External Table’s data size is large, the query becomes more expensive. No need to fear - you can also use External Tables to continuously duplicate data to your Hydra data warehouse automatically! For large data volumes, create a synchronous duplicate of the data inside the warehouse for faster query execution. Check out this tutorial to see how!
Data Modeling 101: Star Schema
In the tutorial below, we will adopt a data modeling approach known as the star schema. Let’s run through some basic concepts.
Star schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a star shape. It is a fundamental approach that is widely used to develop or build a data warehouse. It requires classifying model tables as either dimension or fact.
Fact tables record measurements or metrics for a specific event. A fact table contains dimension key columns that relate to dimension tables and numeric measure columns. Dimension tables describe business entities - the things you model. Entities can include products, people, places, and concepts including time itself. A dimension table contains a key column (or columns) that acts as a unique identifier and other descriptive columns. Fact tables are best as columnar tables, while dimension tables may be best as row tables due to their size and rate of updates.
The following diagram shows the star schema that we are going to model for the tutorial:
In this tutorial, we are going to have a
sales fact columnar table, a
customers dimension row table, and a
products dimension row table in Hydra. The
customers row table is auto-synced from RDS Postgres using a trigger where RDS is the canonical source of this table.
The following is a companion video for this tutorial:
In Hydra, we create the
customers table. We then add some sample data to the
customers table is empty and will be auto-populated from RDS at a later step.
CREATE TABLE sales (order_number int, product_key TEXT, email TEXT, order_date TIMESTAMPTZ) USING columnar; CREATE TABLE products (product_key TEXT, name TEXT, category TEXT, color TEXT); CREATE TABLE customers ( email TEXT, country TEXT, state TEXT ); INSERT INTO products (product_key, name, category, color) VALUES ('pixy_7_pro_white', 'Pixy 7 Pro', 'Phone', 'Black'), ('pixy_7_pro_black', 'Pixy 7 Pro', 'Phone', 'White'), ('airphonepods_pro_white', 'AirPhonePods Pro', 'Headphones', 'White'), ('airphone_14_red', 'airPhone 14', 'Phone', 'Red'); INSERT INTO sales (order_number, product_key, email, order_date) VALUES (100, 'pixy_7_pro_white', 'email@example.com', CURRENT_TIMESTAMP), (101, 'pixy_7_pro_black', 'firstname.lastname@example.org', CURRENT_TIMESTAMP), (102, 'airphone_14_red', 'email@example.com', CURRENT_TIMESTAMP);
Link to the Hydra columnar table
In RDS Postgres, we create an external table linking to the Hydra
/* replace ... with your Hydra DB info */ CREATE EXTENSION postgres_fdw; CREATE SERVER hydra_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '...', port '...', dbname '...'); CREATE USER MAPPING FOR CURRENT_USER SERVER hydra_server OPTIONS (user '...', password '...'); CREATE SCHEMA hydra; CREATE FOREIGN TABLE hydra.customers ( email TEXT, country TEXT, state TEXT ) SERVER hydra_server OPTIONS (schema_name 'public', table_name 'customers');
Trigger to sync with the Hydra columnar table
In RDS, we create
customers as a row table and add a trigger to sync with the Hydra
CREATE TABLE customers ( id INT, email TEXT, country TEXT, state TEXT, created_at TIMESTAMPTZ); CREATE OR REPLACE FUNCTION insert_hydra_customers_table () RETURNS TRIGGER AS $BODY$ BEGIN INSERT INTO hydra.customers ("email", "country", "state") VALUES (NEW.email, NEW.country, NEW.state); RETURN NEW; END; $BODY$ LANGUAGE plpgsql; CREATE TRIGGER trigger_insert_hydra_columnar_table AFTER INSERT ON customers FOR EACH ROW EXECUTE PROCEDURE insert_hydra_customers_table(); CREATE OR REPLACE FUNCTION delete_hydra_customers_table () RETURNS TRIGGER AS $BODY$ BEGIN DELETE FROM hydra.customers WHERE email = OLD.email; RETURN OLD; END; $BODY$ LANGUAGE plpgsql; CREATE TRIGGER trigger_delete_hydra_columnar_table AFTER DELETE ON customers FOR EACH ROW EXECUTE PROCEDURE delete_hydra_customers_table(); INSERT INTO customers (id, email, country, state, created_at) VALUES (1, 'firstname.lastname@example.org', 'Canada', 'BC', CURRENT_TIMESTAMP), (2, 'email@example.com', 'US', 'CA', CURRENT_TIMESTAMP), (3, 'firstname.lastname@example.org', 'US', 'TX', CURRENT_TIMESTAMP);
Query from Hydra columnar table
In Hydra, check that the data is populated from RDS to the Hydra
SELECT * FROM customers;
Perform analysis in Hydra
In Hydra, we can perform interesting data analysis like product sales by country:
SELECT COUNT(products.product_key), customers.country FROM sales INNER JOIN products ON products.product_key = sales.product_key INNER JOIN customers on customers.email = sales.email GROUP BY customers.country;
This post explored populating dimension data from RDS Postgres to Hydra using a star schema model. We have shown the ability to auto-sync data outside your data warehouse using a trigger.
We will explore more Hydra integration patterns in the next few posts. Make sure to follow us on Twitter, LinkedIn, or Discord to get notified. In the meantime, you can learn more about External Tables in our documentation. The Hydra database is open sourced on GitHub; please star it if you like what you see in this post 🙂.