Testing has long been accepted in the software development world as the key to quality. Testing ensures code behaves as designed, delivers the expected results even after subsequent changes to the original code base occur and that code changes from one developer do not adversely affect the existing code base. Done correctly, it also verifies that the code handles an appropriate range of input data and manages exceptions and invalid data correctly.
In the world of data-centric development, testing has largely been ignored, or at best performed manually. In this series we will examine how to rectify this. First though we need to begin by defining (or for the data centric world redefining) some terms.
The first and most basic question to answer is “What is Code?” At first glance one thinks of code that is compiled into applications, like C# or VB, or code that resides in databases, such as stored procedures and functions. But this can also be extended to the code that creates all of the components of the database, such as the tables and views. This code is known as the Data Definition Language (DDL) and it can be tested and validated as any other code would.
Extending code to the world of Business Intelligence, this definition includes the ETL processes, such as packages in SQL Server Integration Services, which move data between various sources and targets.
Analytic Cubes, such as SQL Server Analysis Services, also need testing, to ensure measures are calculating correctly and any MDX or DAX code is performing as expected. Reports generated by tools such as SQL Server Reporting Services should likewise be included, validating that the final output to the end user is presented correctly and accurately.
In general, code can be any set of instructions to the computer or an application that produces an output. That output could be something direct (an SSIS package that produces a flat file) or indirect (a stored procedure that updates a table with today’s exchange rates). In both cases, though, there is something measurable that can be examined to determine if the code produced the desired outcome.
For testing, code is usually looked at as units of work – a logical grouping of code into a unit that can be executed to produce a verifiable result.
With the definition of code expanded to include all aspects of a data-centric application, it should be said there is another critical piece that needs to be tested: the data itself. It is vital that the data being moved or manipulated via code be tested to ensure its accuracy and reliability.
There are three types of testing that should be done for data centric applications and processes.
Unit tests focus on small units of work (logical grouping of code) in the system under test, and check assumptions about the behavior of that code. Unit tests are generally implemented by the programmer in conjunction with the development of the code. These are tests the developer runs against the code they have just completed to ensure it works as expected. Once completed, the unit tests are kept to form the backbone for regression testing, and to act as a verifiable check on whether the code performs as expected.
Integration tests generally span multiple units of work, and test to make sure large portions of the system work together correctly. Integration tests take the entire code base from all developers and tests to ensure that all code behaves as desired. Integration testing further ensures that code from one developer does not break code from another developer.
Data Testing performs data based comparisons between systems. This ensures that the data in the target system matches precisely the data in the source. Data testing can also include comparing data to previous versions of the same data, or to an expected outcome, to make sure that the data falls into a normal or approved range of values. The systems being compared may be quite disparate.
For example, the data in a file may be compared against a target database. Likewise, data in a database may be aggregated then compared against the aggregated values in a cube. Databases being compared don’t even need to be of the same type, comparisons between Oracle and SQL Server may be done to ensure data in both systems is identical.
There is one other term often heard in the world of testing: regression testing. Regression testing is simply the running of the above suite of tests after new changes or enhancements have been made to previously existing code.
In this article we redefined testing for the data-centric development world. We identified the definition of code as it relates to data centric development, as well as added data to the list of items to be tested. The different types of testing to be performed were then covered.
In the next article we will examine a ground breaking tool for creating these tests, LegiTest.
Want a sneak peak of LegiTest before it officially launches? Learn more about LegiTest and request your exclusive demo: