Feeds:
Posts
Comments

Archive for July, 2011

With the constantly increasing data volumes that needs to be handled in data warehouses, SQL Server data compression can be a good way to save on storage – and perhaps even get a performance gain in throughput capacity.

As with most of my other posts, the purpose is not to cover the how to details of a manual implementation in SQL Server, just as I will not cover the technical differences between row and page compression. The purpose of this post is to make you aware of the existence of data compression, what it does and how it is implemented in timeXtender.

What is data compression?

With the danger of being too simplistic, if you think of data compression as being to the contents of a database what zip is to the file system, you are pretty close. It does exactly what the name indicates, it compresses the data, hence saves storage just as zipping a file.

There are two types of data compression, row compression and page compression. Row compression saves storage by using what is called a variable length storage format even for fixed length data types such as CHAR and by not storing zero and NULL values. Page compression uses a more complex technique, including considering redundancy in data.

Using data compression

Considering the nature of staging and data warehouse databases, page compression is well suited for data warehouse databases. Row compression is better suited for staging databases, due to the many changes (updates) normally performed in the data cleansing process.

For ease of use, there are only two parameters in timeXtender that controls the use of data compression

  • Table classification Any table classified as large dimension table or large fact table, uses data compression if supported by the target
  • Deployment target The deployment target for the staging or data warehouse database needs to be an Enterprise Edition of SQL Server 2008+

And just to clarify. A staging database table will always use row compression, whereas a data warehouse table will always use page compression.

Data Compression

Deployment Target

Storage savings

Our field experience shows the following average storage savings

Compression type Storage savings
Row (staging) 40-60 %
Page (data warehouse) 50-90 %

In addition we have seen performance gains, that can be be explained by the reduced disk IO. If the bottle neck in general is IO and there is available memory on the server, you can expect a performance gain.

A word of caution

When you first use data compression in a database, SQL Server changes the internal storage of the database. This will prevent you from restoring a backup of a compression enabled database onto another SQL Server instance, not using data compression.

Software prerequisites

Data compression is a feature that requires SQL Server Enterprise Edition, thus also requires an Enterprise Edition license of timeXtender. It is supported from SQL Server 2008 and timeXtender version 4.5.

References

If you are interested in behind the scenes details of data compression, SQL Server Central is good source. Search for data compression (and make sure not to confuse it with backup compression).

Read Full Post »

Inspired by a question I got the other day about data types and data type conversion in timeXtender, I guess it is about time to cover that subject in this blog. Although the title of this post talks about data type conversion, I will spend just as much time explaining data type handling in general. This includes a general description of how source objects are transformed into the common, standardized SQL Server structure we implement with our code generation.

Staging database

With exception of custom tables, every table in the staging database is derived directly from the source table, making it logically a 1:1 copy of the source table.

Object names (tables and fields) are as in the source, unless it contains illegal characters to the destination SQL Server. Any such character will normally be replaced with an underscore to make it a valid object name.

The data type of the source will automatically be converted into the SQL Server equivalent chosen by timeXtender. With exception of ODBC and text file data sources, this data type mapping cannot be changed.

Custom tables and fields

In both staging and data warehouse databases, it is possible to add custom tables, just as it is possible to add custom fields to any table. For those fields you decide the data type, using what I will call abstract data types. By this I mean that you do not specify actual SQL data types, but the parameters you specify determines the SQL data types used behind the scene.

Tip. If you hover the mouse over a field in the tree view in timeXtender, a tool tip will show the actual SQL data type for the field.

 

SQL Data type Field Parameters
varchar(50) Data Type varchar
nvarchar(20) Data Type nvarchar
decimal(38,3) Data Type decimal

 

Staging to data warehouse mapping

When a field is first copied to a data warehouse table, it inherits the data type from its source, normally a staging database. Unlike the fields derived from a data source table, the data type of these fields can be changed. To edit the data type of a field, simply right click the field and choose edit.

The following shows a field that was copied from a staging field of type varchar and that has been changed to a type datetime.

Data Type Edit

 

Although the data type conversion is not visually clear on this screen shot, the following shows a field of data type varchar, being copied from a staging database table into a field of type datetime in a data warehouse table.

When a field is copied from a staging database to a data warehouse like this, most implicit conversions are done automatically.

Data Movement Different Types

 

Mapping to another field

The following shows a custom transformation, used to copy a field from a character based data type (varchar) to a field of type datetime. In this example it is a simple assignment of a value, but you can also use CONVERT and CAST statements if needed.

Custom Transformation

 

Same field transformation

Although the transformations that can be done on a field is virtually unlimited, the one limitations that does exist is, that the data type cannot be changed. As an example, the following shows a date only transformation on a datetime field on a table. It removes the time part of a datetime value such as 2011-07-05 22:54 and stores it in the same field as 2011-07-05 00:00, still persisted as datetime.

DateOnly Transformation

Read Full Post »

I am sure that most data warehouse developers have experienced the challenge of answering a common question like “what data source fields are we using for this measure?” and sometimes you can even add “and what are we doing to it in the ETL process?”

If you are confident that you can just turn to your complete, always updated to the minute documentation, please feel free to skip the rest of this post. If you are one of the many others, that either do not have complete end to end documentation or have documentation that might not be completely updated, the tracing feature of timeXtender will come in handy.

Using the trace down feature

To answer a question like above for any measure in timeXtender

  1. Right-click the measure
  2. In the context menu, select the menu option Tracing.Trace Down.Full Trace

This will perform a full trace from the measure, all the way through to every data source field being used. As you may have noticed on the context menu, you can also limit the trace to 1,2,3 or a manually specified length.

A data lineage diagram explained

The following data lineage diagram, is the result of a trace down from a business function based measure. The diagram shows

  1. The standard measures used as parameters
  2. The data warehouse fields used for the standard measures
  3. The staging fields feeding the data warehouse table (fields)
  4. That the Currency Rate field is looked up on another table
  5. The source tabled and fields being used

Data Lineage Diagram

A note on scripts

The trace feature does not parse SQL or MDX scripts.

Any transformation performed in user defined functions, stored procedures and script actions does not show up in the trace diagram.

Custom transformations does show up as a transformation on the field on which it is attached, but the trace does not follow any other fields being used in the SQL transformation.

Always use business functions when possible

The ability to use informative traces is yet another reason to why you should always use business functions for calculated measures. Even simple calculations like adding two measures will benefit from tracing, as well as all other benefits outlined in this post about business functions.

Read Full Post »

One of the most challenging parts of building cubes is mastering the MDX language, as it really does not come natural to most of us. It is a very powerful language with almost unlimited possibilities when it comes to creating calculations. I normally say that “you can split the atom in MDX, you just need to figure out how to do it”. And that “how to do it” is the hard part, learning to master the language.

I work with a lot of very smart people in our network of partners and customers, people with years of experience in ETL, data warehousing and cubes. Only a very few of those would consider themselves as experts or even proficient when it comes to MDX.

I am no different than most of these people, thus using myself as an example I can divide my experience with MDX calculations into the following

  • Simple calculations like year to date, adding/dividing/subtracting a couple of measures. The syntax is easy, I can write them off the top of my head.
  • A few frequently used more advanced measures, using MDX functions that requires more than a single parameter. The syntax is known/remembered, I can write them off the top of my head.
  • A lot of previously used calculations of diverse complexity, for instance combining multiple MDX functions. I remember part of the syntax, but probably not the exact sequence of parameter and most likely not all the functions being used. I will look at previous projects, copy the calculation and adjust according to my current need.
  • New semi advanced to advanced calculations. I may have a solutions strategy, but will most certainly not remember the syntax for all functions needed and may need functions never used before. I will look in my books and use my friend Google. Using a trial and error approach, I will get the calculation right and be proud :-)

At timeXtender the product vision has always been driven by our commitment to speed in every aspect of building and maintaining data warehouses and cubes. Clearly we needed to find a more efficient way to deal with MDX calculations, to make it easier to reuse a calculation once sweat, blood and tears have been invested in finding the right, well performing formula for a calculation.

So what exactly is a business function?

A business function is best considered as a parameterized wrapper around a MDX statement used as a calculated measure in a SQL Server Analysis Services (SSAS) cube.

Use business functions to

  • Encapsulate complex MDX and expose using parameters
  • Enable easy reuse of calculations within a single project, across multiple projects and multiple people
  • Ensure consistent use of calculations (a reusable single definition)
  • Lower barrier for use of MDX
  • Eliminate broken MDX caused by i.e. renaming a dimension or measure

Create a business function

The basis for any business function, is a working MDX formula that will be entered and parameterized in the business function definition. I will walkthrough using the following example

I have created a calculated measure for a sales cube. This is used to calculate how every member contributes to each parent. In this specific example, I am using a hierarchy on a customer dimension, that shows my customers using a drill down path of all customers –> country –> city –> customer name

It will show i.e. with which percentage each country contributes to the total. When looking at a specific country, it will show how each city in that country contributes to the country total.

The MDX statement for the measure looks like this

IIF(([Measures].[Net Amount], [Customers – Invoice].[Customers by geography].CurrentMember.Parent) = 0,
    1,
    [Measures].[Net Amount]
    /
    ([Measures].[Net Amount], [Customers – Invoice].[Customers by geography].CurrentMember.Parent)
)

As it turns out, this is actually a quite generic calculation that can be used for any measure with any dimension hierarchy. Consequently I will create a reusable business function that can be used every time I need a measure to show how a value contributes to the totals within a dimension.

A business function definition

I will create a business function Relative Dimension Contribution, using the MDX statement above. By examining he statement, I have identified two parameters

  • The measure (value) for which the calculation is done
  • The dimension hierarchy used for the calculation

The parameters was renamed from the concrete statement above, using more generic names for a standard and reusable function. I have also added a description for the business function, making it easier for the next person to understand how to use this function.

Business Function

As you may have noticed, there is a library path at the bottom of the business function definition dialog. This allows me to organize the business functions into a logical structure, again to make it easier to reuse for myself and others.

Using a business function

Now that I have a business function, I will use this as the basis for a new measure which is needed to show another measure Physical Value over the Products by group hierarchy on the Products dimension.

The steps required to add a business function based measure are

  1. Right-click on the measures folder in a cube. On the context menu, select the Add business function option
  2. Find the appropriated business function in the list of measures
  3. Specify a name for the new measure
  4. Map parameters to the business function

Business Function Mapped

Sensitivity in object names

Business functions are utilizing the meta model of timeXtender using a simple markup language, that replaces parameters in the scripts with the values specified for parameters. This does not only make I significantly easier to change a parameter (without having to carefully walk through the code for every small change). It also means that a simple, common operation like renaming a measure, dimension, hierarchy etc. does not brake the MDX code.

In comparison, please consider what would happen to all manually written calculated measures, should someone rename any element used in the mdx statement. A calculated measure is simply a script persisted as text, thus very sensible to every small change done to object names.

Exporting business functions

Please be aware that business functions are stored at the repository level, but consumed at project level, making a business function available to all projects within a repository. It can be moved using the export and import features, that will export into a XML document.

Business function library

At timeXtender we have utilized the power of business functions and created a library of business functions based on our own field experience and valuable input from partners and customers. The library currently contains 28 reusable business functions and can be obtained via the partner portal. Customers with a maintenance agreement can get the library through their timeXtender implementation partner or may request it by email at support@timeXtender.com

The following shows the import dialog for the business function library

Business Function Library

Read Full Post »

%d bloggers like this: