Module 1: Analysis and review of High Level Design documents
For an Extract Transform Load (ETL) testing it is very essential to have a high level design document so that one can have a clear idea about what way the data is flowing and what transformations are happening. High Level Design should include all the technical specifications.
Module 2: Analysis and review of Detailed Design documents
A detailed design document, or Module Definition should be produced by development for each process. In this an excel spreadsheet with all the details of source and target tables and fields, and all transformation rules that should be applied to data during migration should be mentioned. This Excel sheet is called Requirement mapping Document in general.
Module 3: Creation of Quality Control Plans
The Quality Control Plan is the high level plan for Test phase of the projects and includes details such as test environment set up, risks and objectives, and entrance and exit criteria. Data required in oracle test schemas will be specified in the QCP to enable appropriate set up of test data prior to test execution.
Requests for test schema set up are made to DBAs, and requests for test data are made to the project developers.
Below diagram will give better understanding on the modular approach for ETL testing.
Module 4: Creation and Review of Product Test Cases
All QC testers are responsible for creating the Product Test Cases in preparation for test execution. Test Cases should be well designed, containing clear and easy to follow test steps which would allow for ease of execution for the QC Tester, without the need for constant referral to development graphs .When writing test cases an important thing to bear in mind is that it may not be the designer of the test cases who will be executing the tests.
Module 5: ETL process from source to target data transformation
Data needs to be extracted from source tables and transformed so that it can be loaded in the target tables. For ETL process we have to run the Scripts in Unix or may be use some ETL tool like Informatica or Abinito as there are huge amount of data to be handled. Once data is ready we can proceed for Validations.
Module 6: Validation of Source to Target data
After successful execution of scripts data validation is performed. Output data is checked against input data from source files or source tables. Any transformation performed by the scripts is validated. Validations to be performed depend on requirements but generally few validations we always have to perform like duplication check and Primary key check.
Module 7: Validation of Transformation Rules through Fudging
It is necessary at times to fudge (manipulate) input data to validate transformation rules. An example of this would be if there are null values on input. Data would have to be added to input files or table fields in order to validate the transformation rules.
Module 8: Validation of Counts/Balancing
A crucial part of testing is to record the number of output records, and to check this against the number of input records. The transformation process may throw out duplicates (in a ‘dedup’ process) so this must be checked. A general rule of thumb is given below
Extract = Dedup + Load
The whole process is known as balancing. Counts, including rejected, duplicated, extracted and loaded counts should be recorded in the data validation spreadsheet as part of results logging.
Geetanjali Bhatia has nearly 3 yrs of testing experience. Currently she is working as data warehouse tester for a insurance Domain in a Mysore based software firm. She is a data warehouse and data Mining Expert. She is always involved in learning new testing strategies.
She uses social site to Connect to testers to gather knowledge about testing approaches.
She has experience of testing Strategies and execution,data mining and performing data validations as well. She can be reached at LinkedIn