Feeds:
Posts
Comments

My last post touched the sensitive topic of Data Vault vs. Star Schemas and it was as expected like putting my hands in the hornets nest. It is clearly a topic that releases some strong emotions. The traffic on my blog went crazy (probably aided by Dan Linstedt’s post on his blog) and I am expecting a call from WordPress, trying to sell me an upgrade to a dedicated server :-)

As one of the comments on the last post stated, there seems to be agreement that star schemas are best for reporting. If the Data Vault is to be considered as the backend Data Warehouse, I guess we would then be discussing the work done in between the data sources and the dimensional model.

A couple of the key words from the discussion were repeatable, automatable and generatable, along with words like auditable, flexibility and adaptability. I do not know many organizations that would not like to have all of the above from all everything they do.

Assumption: A dimensional model can be automated too

I will try to get a little deeper in understanding Data Vault and the added value over a pure Kimball Star Schema dimensional model. The assumption I am making here is, that Data Vault is not seen as the only way to automate the generation of a dimensional model.

At timeXtender we promote an agile approach (methodology) and automation to speed up implementation. The later is what our software is doing. It is a Meta data driven design and build tool, centered around a dimensional model including staging, data cleansing and optionally also cubes and much more.

It seems that there is also agreement on automation and agile, which also aims at delivering fast and frequent to the business.

Data Vault: You never have to go back to the source

This is a very frequent statement from the Data Vault community.

I am going to assume, that behind that statement are a number of assumptions such as that the entire source model was discovered in the first attempt, does not include any new sources, that the sources were not modified since last time and perhaps more importantly, that a change in the reporting model is not related to a process change in the source system(s).

Data Vault: You can change you mind without impact to the data model

This is also sometimes presented as Data Vault is the only forgiven data model, that allows you to make mistakes and change your mind. One of the common examples used for this is a change of keys in a Kimball type II history enabled dimension. If you have such a dimension based on a Data Vault, you can change the keys and simply recreate the dimension with a historical correct result.

I completely agree that this is one of the challenges in a Kimball model, as it is difficult to recreate history. Especially because the Data Warehouse is often the only place this history is tracked. This is probably also the one scenario where I see Data Vault modeling adding true value to dimensional modeling.

But again I am assuming that there are some of the same assumptions behind it as above. I have also dealt with changes like that countless times, both in my time as a customer and from the vendor side of the eco system. In my experience, a change like that is almost always happening concurrently with a change of a business process that involves the way the source is used and very often also with a a change or extension of the source system, such as adding new fields or tables.

Where does the hours go in Data Vault?

What I am trying to get to with the above, is that there are some major, and understated assumptions behind some of the key benefits of the Data Vault model in my humble opinion.

So what I am wondering about is this: With the level of automation and ease of which “round trips between the Data Vault and the star schema” goes, where does the hours go in Data Vault?

A word on comments

I really appreciate the discussion my last post caused, and have of course also read the posts that Dan posted on his blog afterwards. Once again I welcome your thought and comments, but I would appreciate keeping it in a mature tone. If you cannot comment in a non-hostile, non-demeaning way, I will not waste my time responding.

I am back in Belgium to deliver some training and do a bit of consultancy. Since that leaves me at yet another hotel room, I might as well share some thoughts on something I have noticed over the past year or so and which I also noticed during my last visit here.

Thus in continuation of my last post, I will share one more observation from the Data Warehouse automation event in Belgium: Data Vault is a hot topic in the Benelux region and was part of almost every presentation. This is a distinct contrast to what I experience during my travels to the rest of Europe, USA, Canada, South Africa, Iceland and India where it is hard to find a customer, BI consultant or even anyone at a BI conference, that has ever heard of Data Vault.

Data Vault, a secret revolution?

I was first introduced to Data Vault a couple of years ago, and have to admit that I did not really see the magic. And I even have to admit, that although I now see its place in some scenarios after intense reading, seeing presentations at various conferences etc., I still tend to see it as overhead that is adding little value in most cases.

But before you read on, let me state something that is apparently quite important when it comes to Data Vault. If you have not made the effort to become a certified Data Vault modeler, you know nothing. It seems like they are sharing some secrets during the training that makes everything clear.

Frequently I have heard people in the Data Vault community explain the lack of wider acceptance of Data Vault with people talking and writing less about Data Vault than of Kimball. And then Data Vault has only been around for 10-11 years, while Kimball has had many more years of speaking and marketing. I am personally not buying that explanation. If something as revolutionary to data warehousing as Data Vault is being positioned, has been around for that long without major acceptance, it might not be as revolutionary at all?

And when you engage in discussions, critical questions tend to be answered with the you are not certified… again, if you do not instantly agree with the conceptual one liner benefits. I have even found a discussion with Dan Linstedt, where he makes the same argument. He is pretty much saying, get certified or shot up. This behavior is of course not exactly encouraging anyone to write or talk about it, so I wonder why they wonder.

And with this long disclaimer-like introduction, I just want to make it clear that I am not a certified Data Vault modeler. This will of course leave me wide open for attacks from the Data Vault believers, since I am clearly not a big fan.

So what is Data Vault?

There is a really interesting discussion thread on the Kimball forum, where Dan Linstedt is personally engaged in a discussion about Data Vault vs. Dimensional Modeling. It is a pretty long thread with a lot of assumptions and technical details flying through the air. And by the way, Thomas Kejser who is really giving Linstedt a run for his money, is probably one of the most experienced, knowledgeable persons in databases, BI, DW and SQL Server specifically.

You can find a definition of Data Vault here, but I think this statement from the above discussion thread says it all

One point that is very difficult to argue over is end-user access. Why? Because the Data Vault is built to be a back-end data warehouse, that feeds star schemas, cubes, and other data modeled data marts on the front end. With this in mind, the Data Vault is tuned for high speed loading, AND high speed querying (but, of course – it is NOT tuned for ad-hoc querying) – it is tuned for specific processes (consistent, repeatable processes that use the same indexes and the same data sets over and over again) to produce the front-ends.

So to put it a bit simple, Data Vault is something you do in addition to the dimensional model. It sits in between data sources and the dimensional model, which makes it hard to consider it as anything else as overhead to the existing practices.

Data Vault, agile and speed of delivery

I can (and have) built and loaded Star Schemas from a Data Vault within 45 minutes of receiving a 2 page requirements document filled out by business users / customers.

This statement from Mr. Linstedt in the same thread as above, explains a lot when it comes to agile and speed of delivery. At the DW automation conference, the power of Data Vault was underlined by not delivering in days or weeks, but actually minutes. I guess in all fairness they should have added – we can deliver in minutes, after the Data Vault has been built and loaded.

There is actually nothing unique in being able to deliver that fast. Using timeXtender as my rapid design and deployment tool, I am frequently building and loading star schemas from the original data source(s), via a staging area, within 45 minutes of receiving requirements from the business.

And just to be very clear on the assumptions and prerequisites, I only need to have a working connection to the data source(s) and someone with source system domain knowledge, to help find the information.

What do you think?

I will let you know in the next post, the one situation where I think Data Vault modeling is adding value to dimensional modeling. In the mean time, please feel free to share your thoughts on Data Vault vs. dimensional modeling.

The war has been going on between Inmon and Kimball for years (it seems like Inmon is the only one still fighting). Now we have Inmon vs. Kimball vs. Data Vault. With Inmon’s kind words about Data Vault, it appears as it might even be Inmon and Data Vault in the red corner against Kimball in the blue corner?

But what do you think, is Data Vault all buzz and no real usage, or is it the magic bullet we have all been waiting for? Is it replacing Kimball or does it compliment Kimball? Please let me know what you think.

P.S. I do hope that someone from the Data Vault community finds this post via some search engine. If you do, it would be refreshing with some constructive information about the benefits of Data Vault, applied to real world scenarios with multiple data sources, disparate keys and what have we not, integrated into a data warehouse.

About a month ago, I saw a presentation on agile BI with a couple of interesting statements and a disturbing conclusion. The conclusion was, that agile BI is not about delivering faster (or cheaper) but it is about delivering in arbitrarily smaller increments to end users. I will address this conclusion at the end of this post.

One of the interesting statements were that work in progress is a liability, which is actually a pretty good way of illustrating both why we have to deliver in smaller increments, but also exactly why we have to deliver faster.

A definition of work in progress

In business accounting, an asset is something the business owns that has value. The opposite is a liability, something that the business owes to someone else. So the question is, if work in progress is an asset or liability?

Work in progress (WIP) is defined as something that is only partly completed. It is often used in manufacturing to describe the materials, machine- and man hours that have been consumed (invested) to make a product. Until the finished product has left the assembly line or last stage in the manufacturing process, it is considered as work in progress. Commonly this is considered as intermediate inventory. An asset, something the business owns that has value.

In the specific context of DW/BI, consider work in progress as all the work done to analyze, design, develop and test the data warehouse, ETL process, dashboards, reports. In other words everything that is done in between identifying a business requirement and until it has been delivered in a fully functional state to the business.

Dealing with volatile “products”

Although work in progress is traditionally seen as an asset as per above, it is not always the case. In lack of better examples, consider your business requirements as if you were processing or transporting fresh fruit, and then reconsider if this intermediate inventory is still an asset or at least for how long? It should be rather clear to everyone, that it will remain an asset only for a short period of time, after which it no longer has any value.

If you agree that the business is constantly affected by all kinds of internal and external factors, that is directly driving business requirements towards the DW/BI platform, then this is exactly how you should consider your business requirements. Business requirements are volatile and the investment you make to meet the requirements is a liability that you have to minimize.

Why delivering fast is always important

In throughput accounting, you can increase your Return On Investment (ROI) by reducing your investment in inventory (work in progress) and reduce operating expenses. Minimizing the liability of work in progress by simply reducing the investment without looking at the speed of delivery is very unlikely to increase the throughput, which is the really interesting factor in increasing ROI.

Simply focusing on smaller increments without the time to market factor, is like saying that it does not matter when you deliver to the business, as long as it is done in small increments with the smallest possible investment. Try to tell your business that, the next time they stop by with another urgent business requirement. And if the business really does not care when you deliver, you should challenge them if it is even worth doing at all.

Keep in mind, that one of the twelve principles behind the Agile Manifesto is

Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.

Today I had an email correspondence with one of my good partners in Belgium about his initial experience with user stories for requirements gathering. He had just finished reading User Stories Applied by Mike Cohn and despite that he was initially skeptic, he has already used user stories a couple of times for smaller things.

I have had similar talks to others, that had to overcome a barrier to change. To some this is just a matter of getting used to something new, to others it actually appears as a frightening wall to overcome. I guess it is quite common to have a certain level of hesitation or resistance, when it comes to doing things in a different way that we have been used to.

This makes me wonder. What drives you towards agile data warehousing despite this initial skepticism, hesitation or perhaps even fear?

Let me know what you think and get the discussion going.

 

It has been quite some time I last posted on this blog, one of the reasons being that we have been very busy building our new TX 2012 that was released last week. Also we launched a new support portal a while ago on support.timextender.com that takes place of mainly the how to topics I used to post here. The support portal has a knowledge base, forums a new ticketing system and lots of other stuff.

I am sitting in Toronto Pearson International Airport, on my way home from yet another eventful trip. It concludes the last few weeks of traveling where I have had the pleasure of speaking at the DWH Automation Conference in Belgium on How to achieve rapid return on data warehouse investments, followed by the PASS SQL Rally conference in Dallas, TX where we had a booth on the expo floor and finally here at Microsoft in Toronto, Canada. Microsoft presented their latest and greatest on SQL Server 2012 and I was asked to talk about Data Warehousing – The Kimball Methodology & Evolving Trends, followed by a 1½ days on hand workshop teaching agile requirements gathering, design and implementation to a group of consultants and end users.

I have literally spoken to hundreds of people from IT and business. Developers, analysts, data modelers, report developers, SQL programmers. They are looking towards agile and some have already started. And perhaps more importantly, I have not spoken to a single business person, that does not buy in to the agile message. It is a pleasure to be part of the agile movement, helping to break down the notorious reputation of data warehousing being too complex, expensive and slow to implement.

In Dallas our neighbors on the expo were WhereScape. And although we have very different products and not least strategic focus, we do share the passion for agile DW/BI and we had some very interesting discussions over the resistance most of the major consulting firms have against rapid tools.

We all often hear comments from these firms like “it looks really easy to use, so we doubt it can be used to handle the complexity of our projects”. 

And this leads me to a comment from one of the organizers of the DW Automation conference in Belgium after the conference. He said After another successful DWH Automation event, I keep wondering: how long can "body shops" keep this tide out?!? The tools are ready!

And I could not agree more. We are past the point where we have to discuss if tools like timeXtender can handle the job, regardless of customer size or complexity. And we have a track record to prove it.

The real driver behind the resistance towards both agile as such and rapid tools in particular, seems to be that they are protecting their services business. Most managers in these firms does not seem to like short iterations, as it makes it more difficult to plan their utilization of resources. I guess this must mean that customers are paying the price to satisfy their need for nice, long project iterations.

Comments are of course welcome :-)

In this third part of the series on agile dimensional modeling, I will talk about requirements gathering using user stories.

 

So what is a user story?

A user story is simply put a representation of business requirements. Ron Jeffries has defined a user story using the three C’s of user stories.

  • CardUser stories are written on cards (or post it notes) called story cards. The card has just enough text to identify the requirement and remember what the user story is. They are used for planning and notes will be added to capture priority and cost/estimation
  • ConversationThe requirement itself is communicated from the business to the data warehouse developer through conversation. This verbal exchange of thoughts, opinions and feelings is often considered the most important part of user stories
  • ConfirmationThis adds the acceptance tests, that document the details that can be used to determine when a story is complete. The business communicates how they will confirm that the story has been delivered

 

A story card for data warehousing can look like this
Story Card 01

 

Writing user stories

There is no fixed way to write a user story. A story can be informal or formal and in a practical manner range from physical index cards on paper to dedicated IT systems. How you (your business stakeholders) write user stories is not significant, whatever works for you is fine.

The size of a user story is not fixed, but in my experience a user story should

  • Be small enough to be completed in one day
  • Deliver some business value
  • Be demonstrable

The following is an example of a story card representing a user story, that clearly needs to broken into smaller stories

Story Card 02

The following shows a story card with details added to the bottom of the card

Story Card 03

A formal way of writing a user story can look like this

Story Card 04

 

User stories are written by the business

User stories represents business requirements. It is the responsibility of the business to prioritize user stories. User stories are prioritized based on the added business value. User stories are written in business language so that the business understand the stories.

The word business is used so many times in the previous paragraph, that it should be very clear who should write the user stories: the business.

 

Next steps

In the following posts I will introduce sprints, the data warehouse backlog and how to use prototyping as an efficient way to get from user stories to a confirmed dimensional model, using timeXtender as the rapid design and deployment tool.

 

References

If you want to know more about user stories, I can recommend the book User Stories Applied by Mike Cohn (ISBN 0321205685)

This post is part of the agile dimensional modeling series, that promotes the use of agile to speed up business value driven data warehouse development.

The importance of the why

The first and most important thing to understand is always the why. The what, when, how etc. does not make much sense, unless we know why we are doing whatever it is that we are doing. In business intelligence and data warehousing, the why should always be because it adds value to the business.

Everything we do is done because it adds business value and when we need to prioritize what to do when, it is prioritized based on what adds the most value to the business.

Requirements gathering

Once we understand that what we will be doing should always be business value driven, it should become obvious that we need to talk to the business, to learn what actually adds the most business value.

The primary purpose of business requirements gathering is to understand what the business is trying to accomplish. It is the most critical step in data warehousing, as this is when we learn through conversations with the business what we need to build for them to achieve the business goals.

With the agile approach I am promoting, requirements gathering is an ongoing process unlike most conventional project management methodologies. In a conventional approach requirements gathering is often seen as a (lengthy) one time step, that will determine what will be done in the next many months.

With todays volatile business environment, we simply cannot afford to effectively tell the business that “you are only allowed to come up with requirements every six months, because we have already decided what we are doing for the rest of this project”. The answer is to work in short iterations and build our data warehouses incrementally.

 

Conventional requirements gathering

I have never worked with any organization, that did not already have a handful of very urgent business requirements the moment I stepped through the door.

With a conventional approach to data warehousing, these low hanging fruits or easy wins as I call them, will not be implemented until they have been very structurally considered along with all other less and very less important requirements.

Sadly you will spend weeks or more likely months analyzing requirements across the entire enterprise, before it becomes clear that you should have done these easy wins before doing anything else – including the last few months of analyzing requirements.

I love the Kimball dimensional modeling approach and his way of incrementally building the data warehouse. It is the most robust and scalable model for data warehousing. In my opinion the requirements gathering phase of the Kimball lifecycle is the biggest weakness, as it results in a very slow implementation of at least the first increment.

 

Incremental requirements gathering

What we should be doing, and what is fortunately being done more and more, is using some of the key elements from agile methodologies like SCRUM and XP.

As you will learn from subsequent posts, I highly recommend using user stories to capture business requirements. User stories are written by the business itself, thus you as an organization can easily capture new requirements concurrently with building a data warehouse increment.

User stories will be prioritized by business value and the data warehouse built in time-boxed increments of 1-2 weeks duration called sprints. As the business requirements that goes into a sprint is decided before every sprint, it is a very good way to ensure that you will constantly consider what makes the most value to the business right now.

User stories are placeholders for business requirements and contain just enough text to serve as planning and to remember what the story is about. With this approach you do not have to spend a lot of time with detailed analysis of the requirements, until the business decides that is now one of the current priorities for an upcoming sprint.

 

Next step

The next post will introduce user stories as a way to capture business requirements.

%d bloggers like this: