Archive for the ‘SCD’ Category

Before getting into the how to details of the implementation, let’s start by defining what early arriving facts are. An early arriving fact is what happens, when a fact record arrives at the data warehouse before its full dimension context. Or in other words, when we lookup the surrogate key in our dimension table using the fact table natural key, the dimension member doesn’t exist yet.

In many cases when we load our data warehouse, we would assume that dimensions and facts would arrive at the same time. Or at least we would expect that dimension members would exist for the fact records being loaded. Although the reasons why we have to be able to handle early arriving facts are many, multiple source systems and/or different load frequences seems to be the dominating factors. We might i.e. be loading our dimensions using a full load strategy with one load schedule, while using an incremental load strategy for our fact table with a close to real-time frequency.

For the sake of this post, consider a company selling products to customers using invoices to capture the sale. We can illustrate using the following simplified star schema

How to enable early arriving facts in timeXtender

In the following we will implement early arriving fact handling for the above scenario. Please be aware of the following default behaviour in timeXtender

  • Unless early arriving fact handling has been explicitly added to a fact table, a missing dimension key is assumed to be an error
  • The relationship settings determines if an error is of severity warning or error (foreign key behavior)
  • Relationship settings are inherited from the project default, when a relationship is created

 The steps 1-5 shown below are for a single dimension, and should be repeated for every dimension.

 Step 1 – Open the Add Related Records Dialog

On the dimension table, right-click and select the Add Related Records context menu option. This will open the dialog shown below, in which the remaining configuration will take place.

Step 2 – Table level parameters

  • In the Create Records from table drop down list, select the source table for the records to be created in this dimension table.
  • Set the Record Condition to Not Exists.
  • Set the Data Destination Table to the Raw instance of the table.

These parameters instructs timeXtender to create distinct keys in the raw instance of the dimension table, when the dimension key from the fact table doesn’t exist in the dimension table.

Step 3 – Field mapping

  • Map the natural key fields from the fact table to the corresponding fields in the dimension table
  • Map any additional attributes from the fact table to the corresponding fields on the dimension table
  • Define default values for remaining fields on the dimension table

Step 4 – Define the match condition

In the Conditions section, define the joins between the fact table and the dimension table. These are used with the record condition to determine if there is a match or not and what to do about it. These joins should be the same as when we perform the lookup of our surrogate key in the dimension table. In the example above the join is made on a single field, the customer key.

Step 5 – Confirm

Click the OK button to confirm and close the dialog.

As it can be seen from the screen shot below, a new group Table Transformations appears in the timeXtender UI under the receiving dimension table.

Dimension processing

Since we created our missing dimension keys in the raw table instance, it is up to the SCD Type 1 and 2 settings for the dimension table to determine what to do with those extra records. In most cases we simply replace the dummy/default/missing attributes on the dimension with the true values coming from the source (type 1 updates). As an alternative we can choose to create the missing dimension keys in the valid table instance, but then we asume full responsibility for the insert i.e. making sure there are no violations of the primary key.

Kimball talks about early arriving facts in his 2004 design tips.

Read Full Post »

The ability to track historically accurate values for dimension attributes, is a fundamental requirement in ETL and data warehousing. This is commonly referred to as slowly changing dimensions (SCD), of which the  most common variants are type 1 and 2. If you’re not already familiar with the concept of SCD, the books, articles and design tips from the Kimball Group are highly recommendable.

In short, the techniques used in handling changes are

Type 1: Overwrite the attribute
Type 2: Add a new dimension row

In a pure type 1 scenario, you will only have one instance of a record, with the attributes being overwritten as changes are brought in from the source systems.

If you’re not already using surrogate keys for your dimensions, now would be a good time to make that change. Once you enable SCD type 2 attributes, multiple instances of the same logical record will exist over time, thus invalidating the business key to be used as key for dimension members.

How to (manually) implement SCD on SQL Server

There are countless ways of implementing on SQL Server, including T-SQL stored procedures, custom components and built-in SSIS components . The predominating implementations are both SSIS based, using either the built-in Slowly Changing Dimension Transformation or the Kimball Method SCD component. The later is a huge improvement over the first one, especially when it comes to performance. This post will not discuss how to configure SCD using these components, but as the documentation and videos at the links above will show, it is a bit complex and requires some experience in both SSIS and T-SQL.

How to enable SCD type 1 and 2 in timeXtender

The default behavior for any table in timeXtender is what we call snapshot load. The table is truncated and fully loaded from the source(s) at each execution, thus no attempt is made to track historical values. But don’t worry, it will take you less than 5 minutes to enable type 1 and/or type 2 changes to be tracked.

To handle type 1 and 2 changes do the following

  1. Enable Slowly Changing Dimensions under Advanced Settings for the table
  2. Define the business (natural) key
  3. Choose type 1 attributes
  4. Choose type 2 attributes
SCD Table Settings

The configuration of the natural key and SCD attributes is done in a single, easy to use dialog

SCD Attribute Settings
And that’s all you need to do to enable SCD type 1 and 2 on a table in timeXtender. Hit the deploy button and the code to handle the actual data will be generated on SQL Server.

Read Full Post »

%d bloggers like this: