← Case StudiesProject Summary
01
HealthcareData Infrastructure and Engineering

EHR data unified across five clinical sites.

Multi-source ELT pipeline, dbt transformation layer, and HIPAA-compliant reporting for a multi-site hospital network.

Stack

Google BigQuerydbt CoreCloud Composer (Airflow)Python 3.11Epic FHIR R4 APITerraformGCP IAM + KMSLooker Studio

The Problem

A multi-site hospital network was producing operational reports manually, pulling from Epic, a LabTech instance, and a billing system that did not talk to either. Finance and clinical operations were working from different numbers. The weekly report took four days to produce and was out of date by the time it reached leadership. There was no single source of truth. When numbers disagreed, staff spent time reconciling spreadsheets rather than acting on the data.

The Result

Reporting lag from four days to four hours. Finance and clinical operations running from the same source of truth for the first time. The manual reconciliation process that occupied roughly two analyst-days per week was eliminated entirely.

What we built

The work that shipped.

ELT pipelines from Epic FHIR APIs, LabTech, and the billing platform into BigQuery
dbt transformation layer modeling clinical and financial data into a unified reporting schema
Automated daily reporting replacing the manual extraction and reconciliation process
HIPAA-compliant pipeline with audit logging and PHI controls throughout

Approach

How we designed the system.

Three source systems feed into a single BigQuery warehouse through distinct ingestion patterns suited to each system. Epic data is pulled through the FHIR R4 API using paginated resource requests — Patients, Encounters, Observations, and DiagnosticReports — appended to raw tables with full JSON preserved for reprocessing. LabTech is polled nightly by querying its PostgreSQL instance directly; rows are compared against the prior snapshot using a hash-based change detection approach. The billing system provides a REST endpoint for recent claims and a daily CSV export for full reconciliation; both are loaded and deduped by claim ID. Cloud Composer orchestrates all three ingestion DAGs, with explicit dependency ordering to ensure mart-layer dbt models only run after all raw tables have landed for the day. The dbt transformation layer separates staging, intermediate, and mart models. All PHI access in mart tables is controlled through BigQuery column-level security policies, with field masking applied to the patient dimension for non-clinical consumers.

Full technical report

Want the full
technical detail?

The full write-up includes architecture diagrams, technology selection rationale, implementation phases, and lessons learned. Available for qualified enquiries.

Architecture diagrams for the full system
Technology selection rationale and trade-offs
Implementation phases and key outputs
Lessons learned and what we would do differently

Available under NDA. We respond within two business days.