Feeds:
Posts
Comments

Archive for the ‘SQL’ Category

In this fourth and final part in the incremental load series, I will explain the sliding window incremental load strategy. As mentioned in the first post in the series, this is a hybrid incremental load strategy. It can combine source based incremental load, target based incremental load, custom selection rules in scenarios where it might not be possible to reliably identify new, modified or deleted records in the source and at the same time, a full extract from the source as implemented by target based incremental load is too much data.

A couple of reasons when this might come in handy are

  • The field values used for incremental selection are not assigned incrementally
  • The field values used for incremental selection can be manually modified (user entered value vs. system field)
  • Long running transactions on the source system cause transactions to be “committed behind me”

Common to all sliding window load strategies are, that I need to decide a safe window that would prevent gaps in data. As mentioned in other posts, one of the worst things that can happen to a data warehouse is gaps in data, becase this will slowly but surely lead to loss of confident in the data warehouse.

Example 1 – 90 days window from last max date

In this scenario I will be using the invoice date from the last extraction as fix point for a sliding window of 90 days. I will use the subtraction feature under source based incremental load, to dynamically subtract the 90 days from whatever value the last maximum value of the invoice date was.

Using a sliding window like this, is often a very effective way to combine a well performing incremental load strategy, while still ensuring that any records that might be modified “back in time” is still covered. In this scenario, it is assumed that no records are modified further back than 90 days from the last transaction date, as defined by the invoice date field.

Example 2 – 90 days from current date

In this scenario I would like to use the current date as the fix point for a 90 days sliding window. The business rules has established that no invoices can be created or modified further back than 90 days backwards from the current date. I do not know in advance how many of the records in this window may already exist, but it is assumed that processing 90 days of data will be much faster than processing the entire table.

This is using a table level selection rule, that is essentially just a part of the SQL WHERE clause used when extracting from the source.

The custom selection rule above will cause the table SELECT statement to be implemented as

SELECT [HotelID], [Number],[InvoiceDate],[ExchangeRate]
FROM [dbo].[Invoices]
WHERE [InvoiceDate] >=DATEADD(D,-90,getdate())

The selection rule will appear under the source table

Combine source- and target based incremental load strategies

It is also possible to implement a hybrid load strategy using both source based incremental load and target based incremental load. If both load strategies are used on the same table, the following separation of responsibilities is used

  • Source based incremental selection rules controls the extraction, thus determines the contents of the raw table instance
  • Target based incremental keys and events controls the destination, thus determines which records are inserted, updated and deleted

Warning: Be very careful if you consider combining source based incremental load for extraction, and delete event handling on target based incremental load. Keeping in mind that records that does not exist in the raw table will be deleted in the valid table, this can lead to unintended behavior.

Read Full Post »

%d bloggers like this: