Critical ETL Mistakes When Building Your BI Solution

Table of Contents

    Let’s face it, typically you need an ETL (Extract, Transform and Load) process to build your BI solution. By some accounts, this can be a considerable effort; some industry estimates indicate that it takes up to 80% of any data warehouse or BI budget. While there are many well-documented strategies and lessons-learned out there, some of them drag you through the technical weeds of what tools to buy or the details of building robust ETL solutions. It’s tempting to skip these steps or not give them full consideration during ETL development. But if you do, you could be setting yourself up for failure.

    One thing that would be to your advantage is to be aware of the sand traps ahead: what do you need to avoid in the process to ensure success? Here are five critical mistakes when building BI solutions:

    Mistake #1: Not engaging your data consumer

    • “If you build it, they will come”—while it’s a poetic and popular idea, this could be an ominous assumption. Just building it doesn’t mean that users will come to your BI solution. You need to define and develop a clear purpose and objective in collaboration with your data consumer so that it hits the bullseye to fulfill their needs. You can’t assume that just because you’ve got data that users will embrace the new BI.
    • Our data consumers sometimes know more than we do. As a technical resource, we are able to dive into the data details very quickly. But the users who have worked with the data and systems can often provide invaluable insights about what to expect and possible pitfalls and landmines that can result.
    • Be sure to get buy-in from the start. If you’re engaged with the data consumer, getting buy-in shows you value their opinion and have a desire to build something of value to them. It speaks volumes that you’re not just going through the motions to build another “thing” that they might or might not use. Importantly, it gives them a sense of ownership.

    Mistake #2: Not profiling your data source

    • Trust but verify. While, as we’ve said, users sometimes know more than we do, it can also be foolhardy to take their word for it and leave it at that. If the data doesn’t match what you’ve been told, maybe something has been missed. You could be extracting data from the wrong system, applying incorrect assumptions, or have an issue with the data source that was previously unknown.
    • Until you look, you won’t know what ETL you need to build. Knowing the makeup of your data sources will arm you with information about precisely what you have and do not have. After profiling your data sources, you might find that you need to go elsewhere to get the data your users need, or you might realize that gaps need to be filled in. It’s good to know these kinds of things early in the process, as budgets and scope could run over later in the ETL leg of the project.

    Mistake #3: Not knowing your destination/target

    • Identify your data sources. If you don’t know where you’re going, how do you know what you’re going to need to get there? Do I need a car or a boat or a plane? If you choose the wrong starting point, you could end up at the wrong destination or only get halfway to your destination.
    • Determine your data rules and normalizations. Knowing which KPIs you’ll be calculating and the inputs to calculate them will help you define what needs to be done from an ETL perspective. Building an ETL prior to understanding your data rules and normalizations could result in project delays and rework.  Having both the source and target data content defined will guide the amount of complexity that you’ll need to build into your ETL.

    Mistake #4: Not building with the future in mind

    • A sustainable ETL system is able to adapt to change. A good ETL design will not only take current requirements into consideration, but it can also easily incorporate new requirements with little or no rework. New data source formats and data layouts that arise in the future may be a barrier to BI adoption if the work effort to implement additional sources is too great.
    • Greater expense in the long run. If you don’t build with the future in mind, the project may accrue greater expenses to maintain, which could have been avoided.

    Mistake #5: Not designing for data quality from the start

    Users won’t trust the data if they don’t have some awareness of the quality of the data being loaded. Don’t skimp on data quality checks throughout the ETL lifecycle. Checks and balances to reconcile source-to-target data loads should not only be visible to the technical ETL flow, but they should also include some level of metrics that allow your data consumers to be confident that the data is complete. Additionally, make sure you have some logging methodology or approach that will you to track down errors and data quality concerns.


    ClicData All-in-One Business Intelligence Solution: ETL, Data Warehouse & Data Visualization

    More than just pretty dashboards, ClicData is a comprehensive and powerful Business Intelligence platform. We include a data warehouse and ETL features to automatically and easily cleanse, combine, transform, merge or standardize your data. You can signup for a 14-day free trial right now.