Data warehousing is a process of managing a data warehouse and data marts. Most of the companies have their own data warehouse to store and monitor real time data as well as historical data. Most common confusion among people is about the difference between database and data warehouse so let’s begin with Understanding the difference between the data warehouse and data mart.
In a database, data usually comes from single source and involves smaller amount of data, while in data warehouse, data comes from various sources and data is of huge volume.
From testing point of view Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.
Therefore ETL (Extract Transform Load) tool is used to integrate different data sources. ETL tool will work as an integrator, extracting data from different sources; transforming it in preferred format based on the business transformation rules and loading it into target. We will see what are all the steps involved on how to test data warehouse in detail.
Proper Project Planning
The first step in the testing process is planning. One of the most important and challenging task is to understand the scope of testing and understanding of requirements.
Since data comes from different source in most of the cases. Clear high level design is very essential to do end to end testing. Test strategy for project is entirely based on the requirement as to cover all the possible and required data validation.
The ETL process is used to transform data from source and load it based on the requirements. There are several tools available to use for the ETL process like AbInito and Infomatica. Frequency of ETL depends on different projects. It can be daily ,weekly or monthly but whenever there are changes in the code ETL has to be performed.
Below are the several validations we need to perform on the loaded data.
The purpose of record count is to test whether the no of required record from the source is equal to the target records.
Source and target data comparison
We do this to see whether all the required records from data are fetched in the fashion required. For e.g. if from a table called employee, records of employee with more than 5 yrs. of experience are supposed to be fetched and loaded then we need to check in source for the data based on the condition and then compare it with target data.
We have to perform field to field validation in order to do End to End testing.
We have to check for constrains like duplicate check, primary key check, unique key check etc. In some cases these validations are included in ETL jobs. Hence job fails if any of the constraints are violated.
Data warehouse testing needs a proper planning and complete end to end testing which is a big challenge as data is really huge. We can use tools like perfaware to automate the process so that at least regression test cases can be automated.
As data warehouse includes lot of regression testing as small change in development environment creates big differences for testers.
Thanks for visiting QAInsights!
We’re always posting interesting articles on QAInsights. I request you to subscribe so you don’t miss out anything.