Problem Statement
The company faced challenges with slow ETL processing and inefficient cost management. Their existing workflows were not scalable, and resource consumption was high, driving up operational expenses. Additionally, manual data management tasks were consuming too much time.
Approach & Solution
To enhance ETL performance and reduce costs, we re-engineered their workflows using Google Cloud Platform (GCP) technologies.
- We containerized ETL jobs using Docker and deployed them on GCP Cloud Run, improving scalability and reducing processing times significantly.
- Implemented data partitioning in BigQuery, cutting down storage requirements by 500GB while maintaining data accuracy and accessibility.
- Developed Python automation scripts, handling 5,000+ data entries per day, streamlining data processing tasks.
- Automated manual data management tasks using Google App Script, removing bottlenecks and increasing team efficiency.
Results & Outcomes
The optimized ETL workflows reduced data processing times, allowing the client to handle twice the amount of data in the same time frame. By implementing better storage practices, the company saved on storage costs and improved overall operational efficiency. Automation of tasks freed up manual labor hours, allowing the team to focus on higher-priority initiatives.
Tools & Technologies used
- GCP BigQuery
- GCP Cloud Run
- Python
- Docker
- GCP Cloud Storage
- Google App Script
- Dataform