After the first time that business rules and algorithms are applied to a data set, whether it be for a Key Performance Indicator (KPI) assessment process or for any analytic purpose, a quality review should occur. A data quality review specific to the KPI assessment process will examine:
After each team member has had the opportunity to generate the values associated with the KPI in question, the team should come together to begin to deliberate on each other’s findings. The first question should focus on whether everyone generated the same value. If this is not the case, then the values should be compared. Is there a large difference? Document the value generated by each team member, the order with which each team member completed the process and the length of time it took for each team member to calculate the value. Once this information has been collected for each team member, the team should meet to begin the data quality review process buy examining the business rules used.
Business Rules Outlined Correctly
A great way to begin the discussion for each component of the quality review process, would be for the team to compare notes and see whether the value for the KPI under discussion was the same value for each member. If…
For every red car produced in the 2017 model year, the amount of carbon dioxide emitted will be 50% lower than the same red model of car manufactured from the 2016 model year.
The data set used to calculate the example KPI above contains multiple vehicle types and colors. The team should focus on how the subset of data was created. The example KPI is specific to two categorical variables: vehicle type and color (red cars). As a result, only red cars should be selected from the data set to conduct any calculations for this example KPI. As long as these two filters are the only ones being applied, then the order that the filter is applied should not matter (see Figure 1).
Figure 1: Portion of Example Data Set Dealing with Vehicle Attributes.
It does not matter whether the data set is filtered by the color of the vehicle first and then by vehicle type (results shown in yellow in column named “Color First”). The same results happen when the filtering occurs vehicle type first and then by color (results shown in blue in column named “Car First”). The label of YES means that the line of data is included.
The problem exists if the data set was used for another KPI where a different filter was applied and not removed as shown in Figure 2. Here the table includes an additional column referring to the type of interior being either leather or cloth. If the filter regarding the type of interior is not removed from the data set prior to calculating the example KPI, a problem will occur. An illustration of this problem is highlighted in green in Figure 2. If the data set is first filtered by the interior type (which has nothing to do with the example KPI), then the subset of data created to analyze the example KPI will not contain all the data needed. I encourage you to download this example table and play around with these different filtering orders.
Figure 2: Portion of Data Set Dealing with Vehicle Attributes including Distractors
In Part 2 of this Blog topic we examine the fidelity of the algorithm used and the point in time of the data.
Blog #9: Are the business rules carrying out the KPI assessment process or are they a mere reference guide?
Blog #10 Sneak Peak: Quarter 1 Analysis