Building a Data Warehouse? Jump Off a Cliff!
Want to Seriously Throw Good Money After Bad? Start a BI Project!
by Ian Nicholson
On a group thread on LinkedIn recently I witnessed several consultants arguing over which
ETL tool is best for building data warehouses.
They all discussed the same, tired old approach they’ve been doing for years, completely
oblivious to a few key stats:-
According to Gartner, 76% of the budget of any BI project is sunk into the data
Also, according to Gartner, 70-80% of BI projects fail.
Of those that succeed, many are over time or over budget.
And yet these same consultants are continuing to use those same ‘best’ ETL tools and
approaches to ‘deliver’ BI projects, just like lemmings jumping off a cliff.
So what’s going wrong? Let’s look at what’s currently happening...
You decide to start your BI project. It is deemed that a Business Analyst (1st consultant)
talks to the business to ‘get an understanding’ of their requirements.
The BA give her results to a Solution Architect (2nd consultant) who creates the design and
gets a Data Modeller (3rd Consultant, but possibly the same guy) to design a data model.
Now the likelihood is, he will design a set of star-schemas - because that’s what you’ll need
for the reports, right? Also, they can be delivered straight away for a quick win with the
So the Data Modeller gives his design to a DBA (4th Consultant) to create the schema in the
data warehouse server, and to the ETL developer (5th Consultant) to write the ETL code to
populate the schema in the data warehouse. Once the schema is present, a report is written
by the BI Developer (6th Consultant).
Of course, all of this is overseen by the Project Manager (7th Consultant).
Now, fast forward a few months to the first deliverable being revealed to the business. More often than not, the solution isn’t what the user
expected, wanted or signed-for. Changes are required. Why? Usually, a business user doesn’t understand what is possible or what is required
in a data warehouse. Only when they see a result do the pennies drop, and they realise they really needed something else. Either that, or
during the time it took to deliver the report, the world moved on and focus changed.
Whatever the reason, you must go back through that convoluted process above to make the changes. Now, did I mention that when the ETL
Developer is writing his code, he has a problem. If he needs to get data from lots of data sources, he must do some further ‘manipulation’ in
order to shoehorn it into a star-schema (a star-schema, by the way, is a table design made up of facts (things we want to measure: Invoice
amount, quantity etc) surrounded by dimensions (how we want to sort and slice that data: Customer, Region, Product, Date, Sales Rep).
Put simply, you cannot take data from multiple sources and just put it into a star-schema. A lot things must happen first. Bill Inmon, the
Father of Data Warehousing, has written numerous books on this science, but in a nutshell he says you should make the data generic and put
it into its Third Normal Form. Few people know how to do this properly, and this is where, I believe, that 76% of money and 80% failure rate
begins. Your ETL Developer writes a lot of convoluted code on he understands which becomes a nightmare when changes are required.
So how do we fix this? Well, I should be clear that Software Development Life Cycle (SDLC) as a development approach does NOT work for BI
projects, since they tend to be fluid in nature and more of a journey than a destination. Agile development methodologies yield far more
success in BI projects.
Current ETL development tools are too limited and cumbersome in their approach. There is little if no automation available from these
products. They are development tools, originally designed just to move data from one place to another, not build data warehouses. ETL tools
do not build data warehouses. People using ETL tools build data warehouses, and ETL tools simply cannot deliver in an Agile fashion.
Nowadays there is a new breed of Data Warehouse Automation tools available that do the job much faster - giving deliverables in hours or
days, not weeks or months. In doing so, they slash development times and costs to a fraction of what they are currently.
Data Warehouse Automation work by automatically designing the required schema and generating the code to populate it. It does this by
analysing the model of the data sources, and applying the teachings of Bill Inmon to the design process.
Using Data Warehouse Automation tools save time, money, deliver value sooner and virtually eliminate risk of project failure.
Call or email us for further information on how we can assist you: Info@DataTransformed.com.au.
Copyright © 2013 Data Transformed Consulting Services Pty Ltd
ACN: 162 629 183
ABN: 96 162 629 183
“Plans fail for the lack of counsel,
but with many advisers they succeed.”
“Insanity: doing the same thing over and
over again and expecting different results.”