Back to projects

CaClinics Data Integration & Reporting System

An automated ETL pipeline in Python processing 10,000+ daily records from three sources, with PostgreSQL on AWS EC2 and Power BI dashboards with automated refreshes.

Tech Stack

PythonPostgreSQLAWS EC2Power BIPandas

About this project

Automated a daily ETL pipeline in Python to process 10,000+ records from three sources (MediRecords API, Dear API, manual Zapi CSVs). Deployed PostgreSQL on AWS EC2, organizing separate tables and optimized views for efficient analytics. Created Power BI dashboards with automated refreshes triggered post-ETL via Python, delivering timely data insights. Implemented comprehensive error-handling and email alerts; built a master scheduler. Scalable architecture handled high-volume data; prepared system for full automation to eliminate manual CSV uploads.
Added March 2026