This framework runs SQL against one-or-many psql-compatible hosts, with SQL run in sequence on each host. If there are many hosts, each host will execute its series of SQL in parallel to the others.
Source the framework multiple times in sequence to execute across multiple PSQL clusters. To perform extracts and/or loads, I recommend that you wrap COPY jobs in functions, and call those.
Look at the files in /demo, then at /demo/psql_etl_demo.sh, to see how easy it is to use.
psql, with ~/.pgpass file defined for each host Otherwise, just bash
clustername_psql_hosts.txt - A list of upstream PSQL hosts to run SQL against. One per line!
clustername_psql_etl.sql - A list SQL commands to be run on the PSQL hosts. One SQL transaction per line!
If your SQL is too epic to be called in a single line, turn it into a function, and call that.
Please don't put spaces or punctuation in your clustername. It probably won't work.
- Start the job in the directory where your config files are
- Source psql_etl.sh
- Call func_psql_etl "clustername" from Bash.
- If you need to call another cluster, call it again with the next clustername
See the 'demo' folder for an example implementation.
- Set up a DDL script that creates a psql job logging schema inside Postgres
- Add lines to this script to: a. Check if the schema exists (once) b. Write to the schema if it does c. Not error (but just warn) if it doesn't