Pragmatic Works Blog

Next Generation Data Validation, Put an End to Your Bad Data Days

Written by Tim Moolic | Jul 22, 2016

As the COO of a company, Tim understands the important role data plays in a business's success. Whether it's annual reports, forecasting sales or making important decisions for the business's future, Tim relies on data - data he knows is accurate because it's been tested and validated.

Pragmatic Works helps customers all over the world with their data-centric challenges. If it involves data, we’ve pretty much done it. One challenge our customers continue to see, and we experience ourselves, is the random and unexpected bad result. These are results that cause our business users and customers to question the validity of the reports we produce for them. Even with Master Data Management, Data Cleansing and products like DQS, companies continue to wrestle with bad data. But why?

The most common causes or reasons for bad data that we experience are:

  1. People, Equipment and Process Errors

    These data errors are caused by typos, forgetfulness and equipment failures. This type of failure is challenging to detect because it is unpredictable and often occurs in production after Development and QA have completed their testing.

  2. Manual Testing

    Manual testing production results does not “cause” bad data but it’s a major reason bad data persists. Manual ad hoc testing is the current standard because testing every production scenario is time consuming, costly and often impossible. Most Developers and QA professionals will agree, when you test 1/100 scenarios, there are potentially 99 untested scenarios waiting to bite you (see Real-World Example below).

  3. Third-Party Reliance on SAAS Applications Utilizing API’s

    At Pragmatic Works, we utilize nine different SAAS applications to run our company, with most of them loading a data warehouse for reporting purposes. However, none of the providers give us an update when they make changes to their API’s. I added this challenge because we actually experienced an API change that had a serious impact to our accounting system. With the world in the cloud, this is a new and serious bad data risk.

Next Generation Data Validation

We needed a better way to validate our production data and tackle the above problems. I call our approach the "next generation of data validation", for three primary reasons:

  1. The new validation process works continuously against production results which effectively makes manual testing obsolete by testing 100% of production scenarios as they occur.

  2. The new validation process connects the business user with IT allowing for easy communication between the two roles. It’s the business user who often identifies bad data first, but it’s the IT person responsible for determining the cause and making the fix. Easier and timely communication is required to tackle bad data before it hits a production report.

  3. The new validation process continuously logs and maintains a historical digital record to confirm our data is correct over time. Not only does this CYA, but it’s often a requirement for compliance.

Real-World Example: Commission Validation Using the New Process

Below is an example of bad data that our new process identified. At Pragmatic Works, we use automation to continuously validate the production results in our commission and payroll reports. By continuously testing the production results, we capture failures that would have escaped Development and QA testing.   

This is a real world example of human error and manual test failure (causes 1 and 2 from above). If you look closely at the test result (Figure A below), you will clearly see the expected result values are blank. This is because the person responsible did not follow process and neglected to update the database when the new salesperson was hired. Additionally, manual testing would not have caught this error for two reasons: 

  1. Manual testing cannot test data that doesn’t exist.

  2. We have 17 salespeople and 314 active SKU’s totaling 5338 scenarios to be tested. It is highly unlikely this salesperson and SKU would have been selected as a sample to be manually tested.

Figure A 

Improved Communication Between the Business and IT

Let’s face it: It is very difficult to look at data and tell if it is valid if you are unfamiliar with its purpose. Our new validation strategy improves communication by assigning ownership to the people that understand the data at the time of test creation. This allows for test results that are easy to interpret by all of the roles involved, and faster remediation.

In the above example, the business user can easily interpret the bad data as missing data (note the blanks under expected results). They can also see the order number, product SKU and salesperson impacted without sending a ticket to support, and if authorized, can easily edit the order before the mistake makes its way into a payroll report.  

The worse-case scenario in my example would be a very unhappy salesperson that missed her mortgage payment because her commission didn’t come in, and a lot of headaches for the IT team. What has been your worse-case “BAD DATA DAY”? Tell me your stories in the comments below. I’d love to see if the above process would have made it easier. 

To help ease your data woes, Pragmatic Works has created LegiTest. LegiTest is a comprehensive tool for testing all your data-centric applications, in an easy-to-use, automated platform.