Optimizing Data Pipelines of a Multinational Food and Beverage Company

Overview

Our client, one of the world’s largest multinational food and beverage company wanted to refactor their existing data pipelines by reducing the number of steps so that data load time can be reduced and data mismatch could be minimised.

Their goal is to minimize data loading times and eliminate data discrepancies by accessing the data directly from the source. As a result, the previously five-step data pipeline will now be streamlined into a more efficient two-step process.

Solution

The project is to create pipelines for 29 tables in Azure Data Factory to fetch data directly from source system, and discontinuing the existing pipelines

The client utilizes a system known as Multi Intelligence Data Analytics Systems (MIDAS) for report generation, with this database serving as the primary platform for the process. Data is currently being fed into the MIDAS database from the CSNG (source database) through a five-step pipeline, which has been causing data discrepancies.

For all the 29 tables present in the MIDAS system, their mappings with the CSNG system was created and given to us by another team
These mappings were validated by us and the doubts were sent to the that team.
We have currently created temporary pipelines for all the tables have been created in Azure Data Factory, based on the current understanding of the table mappings.
Since the CSNG system does not store IDs (Distributor IDs, Retailer IDs, etc.), the ID creation logic is being given to us by another team, and they would be be implemented in our pipeline.

Output

To discontinue the existing pipelines.
Reduce data load time.
Data will be transferred directly from CSNG to the client’s MIDAS server, ensuring the fastest data loading possible
Optimize the SQL queries.