How to increase the productivity of a corporate data warehouse?

The Customer

The Customer is a company that provides services for processing transactions with bank payment cards, including authorization of card transactions, implementation of a card, ATM, and terminal identification databases, personalization of cards, processing, and clearing of transactions with cards of payment systems.

The problem

Within the framework of the project, the specialists were faced with the task of creating a repository of historical data from various source systems, where the volume of generated data reached hundreds of millions of operations (records) per day.

Additionally, the Customer-company used manual collection and aggregation of data from three different sources to generate reports later used for making managerial decisions, the collection methodology of which was not consistent and varied depending on the executor of the operation. As part of the project, Invento Labs specialists had designed and implemented a BI platform, consisting of both static and analytical reports, which would automatically receive operational data in a short period of time.

The solution

That’s why at first, Invento Labs specialists focused on choosing the necessary tools for the project completion.

Invento Labs architects proposed to use the MPP GreenPlum distributed database as a historical data warehouse. The proposed software removed a whole range of problems and provided some additional benefits:

  • MPP GreenPlum is an open-source and free software product, therefore, its ownership does not imply the cost of purchasing a license.
  • MPP GreenPlum architecture implies the distribution of data between several servers, which increases the speed of writing and reading data. As a result, the historical data warehouse would no longer be a bottleneck when collecting data from different data sources.
  • An additional advantage of using MPP GreenPlum is its ability to duplicate data on different servers, which provides protection against data loss in case of failure of one of the servers on which the storage of historical data is based. And the placement of servers in two different data centers has significantly increased the system's fault tolerance.

After the deployment of MPP GreenPlum, the team of Invento Labs began the adaptation of the existing data collection, transformation, and aggregation (ELT / ETL) processes on the basis of open-source solution Apache Airflow.

To visualize analytical reporting, several products were selected: 

The Tableau product, which is enlisted under the “Leaders” category in the 4th Gartner quadrant. The proposed product made it possible to visualize 6 reports on a variety of business spheres as well as more than 100 different indicators. 

The ability to work with static reports was provided through the use of an open-source product called ReportServer, which was able to process various versions of reports (Birt, jxls, etc.), as well as provide access control to various user groups using Microsoft Active Directory, which was one from the requirements for the developed system.

The result

As a result of the implementation of the proposed solution for the source data warehouse and BI platform, the Customer was able to:

  • Ensure the safety and integrity of historical data;
  • Have operational data in the form of reports for daily trend analysis and efficient business decision-making;
  • Increase the performance of data collection and data processing systems;
  • Reduce labor costs spent on ensuring data safety and data preparation for the subsequent reporting;
  • Improve the quality of data provided for analysis through the harmonization of methodologies;
  • Receive automatically generated reports in a few hours after the end of the reporting period, which previously took a significant amount of time to generate.
Didn't find what you were looking for?
Order a free consultation from Invento Labs experts!
Contact us