Case Study: Data Engineering

Data Transformation and Process Automation


The pharmaceutical industry is accelerating adoption of automation technologies to drive efficiencies, advance innovation, and become more competitive. With large amounts of data to leverage, such as healthcare databases, data transformation and process automation are positioned as strong candidates to support automation efforts. Systems maintained by humans that require the integration of several flat files primarily using a spreadsheet program like Excel can be challenging. Manual manipulation of the data is likely to be time-consuming and may produce data quality issues which in turn will lead to numerous restarts of data outputs. Because data ownership is based on individual creators, the reliance on data can be inconsistent for the organization. All in, there are several opportunities for breakages in continuity.

The Challenge

Our client was seeking a solution to improve reporting by automating the current manual process for integrating flat files that would ensure high quality data. The client sought to move from a human-based process to one that would be automated and managed by their IT department. As such, solutions needed to consider the rapid growth of data and minimize future maintenance.

The Solution

Optimus SBR Data collected requirements to understand the current process and to incorporate the organization’s objectives. The aim of the solution was to ensure that the automated ETL (Extract, Transform, Load) workflows designed to combine the data files would not merely replicate the current manual process but would also allow for scalability and the integration of new data sources.

Automated ETL workflows were developed using the KNIME Analytics Platform to clean datasets and perform data validation. When an error was found, workflows would specify the type of error and its location to facilitate resolution by users. The workflows consolidated the different Excel files, applied several business rules provided by the client, and resulted in an optimized, standardized data structure for use in the data visualization tool, Power BI. Finally, Optimus SBR Data delivered documentation ensuring knowledge transfer to owners on how to maintain the workflows.

The Results

  • Reduced Processing Time – ETL workflows transform the data in a matter of seconds, allowing users to focus on high-value activities.
  • Data Quality – Business rules and data validations performed in KNIME ensure consistent data quality that can be trusted.
  • Scalability– Workflows are designed to ingest new data sources with little additional effort.
  • Optimized Data for Data Visualization – The final output is standardized and optimized for analytical and visualization purposes.
Data transformation from a manual, human process to an automated, system-based process builds trust in the data
Automating data transformation reduces processing time from hours to seconds
Workflows are scalable, designed to ingest new data sources without increasing maintenance time
Data output is optimized for analytics in a data visualization tool