I am back in Belgium to deliver some training and do a bit of consultancy. Since that leaves me at yet another hotel room, I might as well share some thoughts on something I have noticed over the past year or so and which I also noticed during my last visit here.
Thus in continuation of my last post, I will share one more observation from the Data Warehouse automation event in Belgium: Data Vault is a hot topic in the Benelux region and was part of almost every presentation. This is a distinct contrast to what I experience during my travels to the rest of Europe, USA, Canada, South Africa, Iceland and India where it is hard to find a customer, BI consultant or even anyone at a BI conference, that has ever heard of Data Vault.
Data Vault, a secret revolution?
I was first introduced to Data Vault a couple of years ago, and have to admit that I did not really see the magic. And I even have to admit, that although I now see its place in some scenarios after intense reading, seeing presentations at various conferences etc., I still tend to see it as overhead that is adding little value in most cases.
But before you read on, let me state something that is apparently quite important when it comes to Data Vault. If you have not made the effort to become a certified Data Vault modeler, you know nothing. It seems like they are sharing some secrets during the training that makes everything clear.
Frequently I have heard people in the Data Vault community explain the lack of wider acceptance of Data Vault with people talking and writing less about Data Vault than of Kimball. And then Data Vault has only been around for 10-11 years, while Kimball has had many more years of speaking and marketing. I am personally not buying that explanation. If something as revolutionary to data warehousing as Data Vault is being positioned, has been around for that long without major acceptance, it might not be as revolutionary at all?
And when you engage in discussions, critical questions tend to be answered with the you are not certified… again, if you do not instantly agree with the conceptual one liner benefits. I have even found a discussion with Dan Linstedt, where he makes the same argument. He is pretty much saying, get certified or shot up. This behavior is of course not exactly encouraging anyone to write or talk about it, so I wonder why they wonder.
And with this long disclaimer-like introduction, I just want to make it clear that I am not a certified Data Vault modeler. This will of course leave me wide open for attacks from the Data Vault believers, since I am clearly not a big fan.
So what is Data Vault?
There is a really interesting discussion thread on the Kimball forum, where Dan Linstedt is personally engaged in a discussion about Data Vault vs. Dimensional Modeling. It is a pretty long thread with a lot of assumptions and technical details flying through the air. And by the way, Thomas Kejser who is really giving Linstedt a run for his money, is probably one of the most experienced, knowledgeable persons in databases, BI, DW and SQL Server specifically.
You can find a definition of Data Vault here, but I think this statement from the above discussion thread says it all
One point that is very difficult to argue over is end-user access. Why? Because the Data Vault is built to be a back-end data warehouse, that feeds star schemas, cubes, and other data modeled data marts on the front end. With this in mind, the Data Vault is tuned for high speed loading, AND high speed querying (but, of course – it is NOT tuned for ad-hoc querying) – it is tuned for specific processes (consistent, repeatable processes that use the same indexes and the same data sets over and over again) to produce the front-ends.
So to put it a bit simple, Data Vault is something you do in addition to the dimensional model. It sits in between data sources and the dimensional model, which makes it hard to consider it as anything else as overhead to the existing practices.
Data Vault, agile and speed of delivery
I can (and have) built and loaded Star Schemas from a Data Vault within 45 minutes of receiving a 2 page requirements document filled out by business users / customers.
This statement from Mr. Linstedt in the same thread as above, explains a lot when it comes to agile and speed of delivery. At the DW automation conference, the power of Data Vault was underlined by not delivering in days or weeks, but actually minutes. I guess in all fairness they should have added – we can deliver in minutes, after the Data Vault has been built and loaded.
There is actually nothing unique in being able to deliver that fast. Using timeXtender as my rapid design and deployment tool, I am frequently building and loading star schemas from the original data source(s), via a staging area, within 45 minutes of receiving requirements from the business.
And just to be very clear on the assumptions and prerequisites, I only need to have a working connection to the data source(s) and someone with source system domain knowledge, to help find the information.
What do you think?
I will let you know in the next post, the one situation where I think Data Vault modeling is adding value to dimensional modeling. In the mean time, please feel free to share your thoughts on Data Vault vs. dimensional modeling.
The war has been going on between Inmon and Kimball for years (it seems like Inmon is the only one still fighting). Now we have Inmon vs. Kimball vs. Data Vault. With Inmon’s kind words about Data Vault, it appears as it might even be Inmon and Data Vault in the red corner against Kimball in the blue corner?
But what do you think, is Data Vault all buzz and no real usage, or is it the magic bullet we have all been waiting for? Is it replacing Kimball or does it compliment Kimball? Please let me know what you think.
P.S. I do hope that someone from the Data Vault community finds this post via some search engine. If you do, it would be refreshing with some constructive information about the benefits of Data Vault, applied to real world scenarios with multiple data sources, disparate keys and what have we not, integrated into a data warehouse.