Feeds:
Posts
Comments

Archive for the ‘Integration Services (SSIS)’ Category

A very common question we get at timeXtender, is why SSIS has to be installed on the same machine as timeXtender. The easy and straight forward answer to this is “as per design by Microsoft”. If you would like a few more details about this limitation, please read on.

SSIS vs. DTS

Data Transformation Services (DTS) is the predecessor to SSIS. The DTS component was a redistributable client component, and did not require a separate server license to be used from any client machine.

SSIS is not redistributable and this has been the case since it first came out with SQL Server 2005. You need a full server license on any machine on which you want to run a SSIS package. If you follow the reference link below, you will see the following

“Although installing BI Development Studio lets you design packages, running packages outside the development environment requires the installation of Integration Services. Subject to the terms of your SQL Server licensing agreement, you may be able to run Setup again to add Integration Services to your installation. Integration Services is now a server component and is not redistributable to client computers in the same manner that the components required for SQL Server 2000 Data Transformation Services (DTS) were redistributable.”

Remote Execution (Lack of)

There are a number of ways to execute a SSIS package (DTEXEC.EXE, DTEXECUI.EXE and SQL Server Agent). But SSIS does not have a remote execution feature, that would allow execution of a package from a remote machine, such as an application server with tools like timeXtender. We can quite simply conclude, that the machine that executes a SSIS package, needs to have SSIS installed.

SSIS and timeXtender

SSIS does not have to be installed on every machine running timeXtender. SSIS does need to be installed on every machine that needs to deploy and execute SSIS packages. SSIS needs to be installed on the following

  • The machine running the timeXtender scheduler
  • The machine running the timeXtender server (multiple environments)
  • The machine that reads the source schema (unless using ADO.NET)

References

One of the few places the license model has been documented by Microsoft, is in The Product Level Is Insufficient topic. This is the error message you will receive, if you attempt to run a SSIS package without a license.

Read Full Post »

In this fourth and final part in the incremental load series, I will explain the sliding window incremental load strategy. As mentioned in the first post in the series, this is a hybrid incremental load strategy. It can combine source based incremental load, target based incremental load, custom selection rules in scenarios where it might not be possible to reliably identify new, modified or deleted records in the source and at the same time, a full extract from the source as implemented by target based incremental load is too much data.

A couple of reasons when this might come in handy are

  • The field values used for incremental selection are not assigned incrementally
  • The field values used for incremental selection can be manually modified (user entered value vs. system field)
  • Long running transactions on the source system cause transactions to be “committed behind me”

Common to all sliding window load strategies are, that I need to decide a safe window that would prevent gaps in data. As mentioned in other posts, one of the worst things that can happen to a data warehouse is gaps in data, becase this will slowly but surely lead to loss of confident in the data warehouse.

Example 1 – 90 days window from last max date

In this scenario I will be using the invoice date from the last extraction as fix point for a sliding window of 90 days. I will use the subtraction feature under source based incremental load, to dynamically subtract the 90 days from whatever value the last maximum value of the invoice date was.

Using a sliding window like this, is often a very effective way to combine a well performing incremental load strategy, while still ensuring that any records that might be modified “back in time” is still covered. In this scenario, it is assumed that no records are modified further back than 90 days from the last transaction date, as defined by the invoice date field.

Example 2 – 90 days from current date

In this scenario I would like to use the current date as the fix point for a 90 days sliding window. The business rules has established that no invoices can be created or modified further back than 90 days backwards from the current date. I do not know in advance how many of the records in this window may already exist, but it is assumed that processing 90 days of data will be much faster than processing the entire table.

This is using a table level selection rule, that is essentially just a part of the SQL WHERE clause used when extracting from the source.

The custom selection rule above will cause the table SELECT statement to be implemented as

SELECT [HotelID], [Number],[InvoiceDate],[ExchangeRate]
FROM [dbo].[Invoices]
WHERE [InvoiceDate] >=DATEADD(D,-90,getdate())

The selection rule will appear under the source table

Combine source- and target based incremental load strategies

It is also possible to implement a hybrid load strategy using both source based incremental load and target based incremental load. If both load strategies are used on the same table, the following separation of responsibilities is used

  • Source based incremental selection rules controls the extraction, thus determines the contents of the raw table instance
  • Target based incremental keys and events controls the destination, thus determines which records are inserted, updated and deleted

Warning: Be very careful if you consider combining source based incremental load for extraction, and delete event handling on target based incremental load. Keeping in mind that records that does not exist in the raw table will be deleted in the valid table, this can lead to unintended behavior.

Read Full Post »

The ability to track historically accurate values for dimension attributes, is a fundamental requirement in ETL and data warehousing. This is commonly referred to as slowly changing dimensions (SCD), of which the  most common variants are type 1 and 2. If you’re not already familiar with the concept of SCD, the books, articles and design tips from the Kimball Group are highly recommendable.

In short, the techniques used in handling changes are

Type 1: Overwrite the attribute
Type 2: Add a new dimension row

In a pure type 1 scenario, you will only have one instance of a record, with the attributes being overwritten as changes are brought in from the source systems.

If you’re not already using surrogate keys for your dimensions, now would be a good time to make that change. Once you enable SCD type 2 attributes, multiple instances of the same logical record will exist over time, thus invalidating the business key to be used as key for dimension members.

How to (manually) implement SCD on SQL Server

There are countless ways of implementing on SQL Server, including T-SQL stored procedures, custom components and built-in SSIS components . The predominating implementations are both SSIS based, using either the built-in Slowly Changing Dimension Transformation or the Kimball Method SCD component. The later is a huge improvement over the first one, especially when it comes to performance. This post will not discuss how to configure SCD using these components, but as the documentation and videos at the links above will show, it is a bit complex and requires some experience in both SSIS and T-SQL.

How to enable SCD type 1 and 2 in timeXtender

The default behavior for any table in timeXtender is what we call snapshot load. The table is truncated and fully loaded from the source(s) at each execution, thus no attempt is made to track historical values. But don’t worry, it will take you less than 5 minutes to enable type 1 and/or type 2 changes to be tracked.

To handle type 1 and 2 changes do the following

  1. Enable Slowly Changing Dimensions under Advanced Settings for the table
  2. Define the business (natural) key
  3. Choose type 1 attributes
  4. Choose type 2 attributes
SCD Table Settings

The configuration of the natural key and SCD attributes is done in a single, easy to use dialog

SCD Attribute Settings
And that’s all you need to do to enable SCD type 1 and 2 on a table in timeXtender. Hit the deploy button and the code to handle the actual data will be generated on SQL Server.

Read Full Post »

%d bloggers like this: