Thursday, May 8, 2014

Default Records in Dimension tables

At times, we face data quality issues while we are poulating the fact tables in the datawarehouses.

We refresh the data in the DW at various intervals. We found scenarios where we populate dimension tables once a Day and populate fact tables every hour so that organization can analyze what's going on at hourly level if required.

In the above scenario, there are some instances where a dimension record is missing but the fact related to that dimensional record is coming in for processing. (Got a new customer at 10.00 AM, this customer will flow into DW in tomorrow's dimension loading ETL, where as the transaction belongs to the customer will be reaching the DW by 11.30 AM). In this case, the ETL program (look up to get respective dimension key) will yield no record. What do we do in this scenario? We have to reject that record because of FK violation, if so then we will end up getting descripency between OLTP and DW. In this case, we accept the record into fact but associate a default dimensional record from the customer dimension so that we dont miss on the values for reconcilation.

As part of the house cleaning process, or self healing process of bad data, we identify the fact records which got associated to default dimension records and find the right dimensional DW key and update the same in the fact table. This can be a process which runs once in a week to make sure we all the right dimensional association to fact.