Feeds:
Posts
Comments

Archive for June, 2012

My last post touched the sensitive topic of Data Vault vs. Star Schemas and it was as expected like putting my hands in the hornets nest. It is clearly a topic that releases some strong emotions. The traffic on my blog went crazy (probably aided by Dan Linstedt’s post on his blog) and I am expecting a call from WordPress, trying to sell me an upgrade to a dedicated server :-)

As one of the comments on the last post stated, there seems to be agreement that star schemas are best for reporting. If the Data Vault is to be considered as the backend Data Warehouse, I guess we would then be discussing the work done in between the data sources and the dimensional model.

A couple of the key words from the discussion were repeatable, automatable and generatable, along with words like auditable, flexibility and adaptability. I do not know many organizations that would not like to have all of the above from all everything they do.

Assumption: A dimensional model can be automated too

I will try to get a little deeper in understanding Data Vault and the added value over a pure Kimball Star Schema dimensional model. The assumption I am making here is, that Data Vault is not seen as the only way to automate the generation of a dimensional model.

At timeXtender we promote an agile approach (methodology) and automation to speed up implementation. The later is what our software is doing. It is a Meta data driven design and build tool, centered around a dimensional model including staging, data cleansing and optionally also cubes and much more.

It seems that there is also agreement on automation and agile, which also aims at delivering fast and frequent to the business.

Data Vault: You never have to go back to the source

This is a very frequent statement from the Data Vault community.

I am going to assume, that behind that statement are a number of assumptions such as that the entire source model was discovered in the first attempt, does not include any new sources, that the sources were not modified since last time and perhaps more importantly, that a change in the reporting model is not related to a process change in the source system(s).

Data Vault: You can change you mind without impact to the data model

This is also sometimes presented as Data Vault is the only forgiven data model, that allows you to make mistakes and change your mind. One of the common examples used for this is a change of keys in a Kimball type II history enabled dimension. If you have such a dimension based on a Data Vault, you can change the keys and simply recreate the dimension with a historical correct result.

I completely agree that this is one of the challenges in a Kimball model, as it is difficult to recreate history. Especially because the Data Warehouse is often the only place this history is tracked. This is probably also the one scenario where I see Data Vault modeling adding true value to dimensional modeling.

But again I am assuming that there are some of the same assumptions behind it as above. I have also dealt with changes like that countless times, both in my time as a customer and from the vendor side of the eco system. In my experience, a change like that is almost always happening concurrently with a change of a business process that involves the way the source is used and very often also with a a change or extension of the source system, such as adding new fields or tables.

Where does the hours go in Data Vault?

What I am trying to get to with the above, is that there are some major, and understated assumptions behind some of the key benefits of the Data Vault model in my humble opinion.

So what I am wondering about is this: With the level of automation and ease of which “round trips between the Data Vault and the star schema” goes, where does the hours go in Data Vault?

A word on comments

I really appreciate the discussion my last post caused, and have of course also read the posts that Dan posted on his blog afterwards. Once again I welcome your thought and comments, but I would appreciate keeping it in a mature tone. If you cannot comment in a non-hostile, non-demeaning way, I will not waste my time responding.

Read Full Post »

I am back in Belgium to deliver some training and do a bit of consultancy. Since that leaves me at yet another hotel room, I might as well share some thoughts on something I have noticed over the past year or so and which I also noticed during my last visit here.

Thus in continuation of my last post, I will share one more observation from the Data Warehouse automation event in Belgium: Data Vault is a hot topic in the Benelux region and was part of almost every presentation. This is a distinct contrast to what I experience during my travels to the rest of Europe, USA, Canada, South Africa, Iceland and India where it is hard to find a customer, BI consultant or even anyone at a BI conference, that has ever heard of Data Vault.

Data Vault, a secret revolution?

I was first introduced to Data Vault a couple of years ago, and have to admit that I did not really see the magic. And I even have to admit, that although I now see its place in some scenarios after intense reading, seeing presentations at various conferences etc., I still tend to see it as overhead that is adding little value in most cases.

But before you read on, let me state something that is apparently quite important when it comes to Data Vault. If you have not made the effort to become a certified Data Vault modeler, you know nothing. It seems like they are sharing some secrets during the training that makes everything clear.

Frequently I have heard people in the Data Vault community explain the lack of wider acceptance of Data Vault with people talking and writing less about Data Vault than of Kimball. And then Data Vault has only been around for 10-11 years, while Kimball has had many more years of speaking and marketing. I am personally not buying that explanation. If something as revolutionary to data warehousing as Data Vault is being positioned, has been around for that long without major acceptance, it might not be as revolutionary at all?

And when you engage in discussions, critical questions tend to be answered with the you are not certified… again, if you do not instantly agree with the conceptual one liner benefits. I have even found a discussion with Dan Linstedt, where he makes the same argument. He is pretty much saying, get certified or shot up. This behavior is of course not exactly encouraging anyone to write or talk about it, so I wonder why they wonder.

And with this long disclaimer-like introduction, I just want to make it clear that I am not a certified Data Vault modeler. This will of course leave me wide open for attacks from the Data Vault believers, since I am clearly not a big fan.

So what is Data Vault?

There is a really interesting discussion thread on the Kimball forum, where Dan Linstedt is personally engaged in a discussion about Data Vault vs. Dimensional Modeling. It is a pretty long thread with a lot of assumptions and technical details flying through the air. And by the way, Thomas Kejser who is really giving Linstedt a run for his money, is probably one of the most experienced, knowledgeable persons in databases, BI, DW and SQL Server specifically.

You can find a definition of Data Vault here, but I think this statement from the above discussion thread says it all

One point that is very difficult to argue over is end-user access. Why? Because the Data Vault is built to be a back-end data warehouse, that feeds star schemas, cubes, and other data modeled data marts on the front end. With this in mind, the Data Vault is tuned for high speed loading, AND high speed querying (but, of course – it is NOT tuned for ad-hoc querying) – it is tuned for specific processes (consistent, repeatable processes that use the same indexes and the same data sets over and over again) to produce the front-ends.

So to put it a bit simple, Data Vault is something you do in addition to the dimensional model. It sits in between data sources and the dimensional model, which makes it hard to consider it as anything else as overhead to the existing practices.

Data Vault, agile and speed of delivery

I can (and have) built and loaded Star Schemas from a Data Vault within 45 minutes of receiving a 2 page requirements document filled out by business users / customers.

This statement from Mr. Linstedt in the same thread as above, explains a lot when it comes to agile and speed of delivery. At the DW automation conference, the power of Data Vault was underlined by not delivering in days or weeks, but actually minutes. I guess in all fairness they should have added – we can deliver in minutes, after the Data Vault has been built and loaded.

There is actually nothing unique in being able to deliver that fast. Using timeXtender as my rapid design and deployment tool, I am frequently building and loading star schemas from the original data source(s), via a staging area, within 45 minutes of receiving requirements from the business.

And just to be very clear on the assumptions and prerequisites, I only need to have a working connection to the data source(s) and someone with source system domain knowledge, to help find the information.

What do you think?

I will let you know in the next post, the one situation where I think Data Vault modeling is adding value to dimensional modeling. In the mean time, please feel free to share your thoughts on Data Vault vs. dimensional modeling.

The war has been going on between Inmon and Kimball for years (it seems like Inmon is the only one still fighting). Now we have Inmon vs. Kimball vs. Data Vault. With Inmon’s kind words about Data Vault, it appears as it might even be Inmon and Data Vault in the red corner against Kimball in the blue corner?

But what do you think, is Data Vault all buzz and no real usage, or is it the magic bullet we have all been waiting for? Is it replacing Kimball or does it compliment Kimball? Please let me know what you think.

P.S. I do hope that someone from the Data Vault community finds this post via some search engine. If you do, it would be refreshing with some constructive information about the benefits of Data Vault, applied to real world scenarios with multiple data sources, disparate keys and what have we not, integrated into a data warehouse.

Read Full Post »

%d bloggers like this: