Businesses rely on accurate and complete information in order to keep historical records and for analysis that is useful for forecasting and strategizing. The problem is that data may be kept in multiple systems and in different formats. The data may also be changing rapidly, as a result of database transactions, so obtaining a snapshot in time may not be feasible. In addition, data quality may not be high enough to achieve these goals.
The Data Warehouse (DW) addresses these problems by collecting the data into one place in which transactions do not occur, so that the data can be analyzed. In assembling the data in one place, Business Intelligence (BI) tools can then be used for analysis and reporting. Questions of interest to the business can be answered, such as “Which of our vendors is most reliable in terms of delivery time?” or “Who was our top customer this year?”
The process central to Data Warehousing, in which data from one, or often several, sources are brought together, transformed as required, and placed in a central target database is known as Extract-Transform-Load (ETL). This central database, the Data Warehouse, is designed specifically for use in reporting and analysis. The Data Warehouse is not transactional, meaning that the data are relatively static, making quick reporting feasible.
In the first step of the process, the Extraction phase, the data are gathered from the different sources. These sources might be relational databases, spreadsheets, flat files, or other formats. In the Transform phase, a set of rules and/or functions is applied to the extracted data. There are a wide variety of possibilities here, including selecting which columns will or will not be loaded, encoding, translation, deduplication, and deriving new calculated values. In the Load phase, the data are mapped and loaded into the DW. At this stage, tools that manage the ETL process should ensure that referential integrity of the data is maintained.
The philosophy behind testing Data Warehousing is different from transactional database testing as performed, for example, for an e-commerce website. Rather than check that interactions with the front end are correctly reflected on the back end, we want to ensure that data completeness and integrity are maintained in the ETL process. To do this, the tester can craft queries for this purpose. Queries can check, for example:
- Row counts in source and target agree
- No duplicates where they are not allowed
- No truncation of data
- Numeric data are of the correct precision
- Null values, allowed or disallowed
- Uniqueness of primary keys
- Data types of columns are correct
In addition to verifying the ETL process, the reporting layer is checked. From a testing standpoint, this layer is most commonly used in user acceptance testing, although a QA analyst could use BI tools to verify metadata correctness.
This is also the point at which performance should be tested to ensure that the length of time it takes for reports to be returned once requested is reasonable.
As a tester, it is important to understand the system as well as possible, so that risks can be identified, giving focus and priority to the testing effort. So what can a tester do to prepare themselves for Data Warehouse testing?
Get Involved Early and Often
If possible, QA should be involved from the inception of the project, in the design phase. Not only will testers gain a better understanding of the technical details of the project, but they will also hear developer conversations as well as have the opportunity to ask questions from a QA perspective. Sometimes these conversations can lead to the identification of risks, or even reconsideration of design.
Study the Source-Target Mapping
Source-target mapping documentation is the most important piece of information the tester has on a Data Warehouse project. The mapping documentation should include, at a minimum, sources and targets, names for all columns, data types and lengths, calculated values, whether nulls are allowed, and how they will be presented in the reporting layer. The mapping document will also provide the necessary information for crafting the test queries that will be used by QA.
Brush Up on SQL
Depending on the tester’s SQL expertise, it can be a good idea to review writing SQL queries before the project begins. The tester should have an excellent grasp on the use of Select statements and Joins.
Data Warehouses are a mission-critical part of helping modern businesses make strategic decisions. The completeness and integrity of the data are paramount; therefore, the testing must be detailed and focussed. With the right strategies and toolkit, the tester can make great contributions to the success of a Data Warehouse project.
Do you and your team want to learn more about Data Warehouse Testing? Contact us! At PQA, we pride ourselves in being Canada’s leading experts in software testing. We would love to speak with you about how we can support your team.
About Jim Peers
Jim Peers is a QA Practitioner at PQA Testing, with more than sixteen years of experience in software development and testing in multiple industry verticals. After working in the scientific research realm for a number of years, Jim moved into software development, holding roles as tester, test architect, developer, team lead, project manager, and product manager. A trusted technical advisor to clients, Jim creates test strategies, and mentors and assists testers on multiple projects.