Data Models Optimization

Data Model Optimizations For Better Performances

Have you ever tried to make this Sales dashboard your CEO is raving about a bit faster to load? Did you ever wonder if you make the best usage of the multiple set-up options of your ClicData account?

Maybe it’s time to start an optimization project, from the Data Model to Dashboard creation’s best practices.

Who wouldn’t want to save some GB and provide the best user experience to his favorite colleague, when he’s consulting your caringly crafted dashboard?

What is a Data Model?

A Data Model describes how your different data sources and consequent datasets are used in relation to each other, but also how the data is shaped, stored, refreshed and used.

A good model will enable you to get the best performances for your dashboards, but also for the consumption of your storage space, refresh power and maintenance time.

Usually, we mention Data models in relation to databases. Let’s see how to apply some of the best practices to your ClicData account, step by step.

Best optimization practices

On the data side

What defines a good Data model when it comes to datasets? As often, it depends… on your own very specific situation.

Think of your data sources, the overall project and sharing objectives. Let’s consider some best practices that may apply to your case.

1. Pull only the data you need

Wherever you can, limit the data pulled to the only columns and rows you really need for reporting and ETL purposes. There is no need to overload your account with unused data, as it will slow down data processing and all dependent calculations.

Also, it will save some storage.

2. Create custom datasets from databases

Use the query editor to create the perfect dataset, tailored to your needs. Think of who will use these datasets and whether they will need to perform transformations on the data.

If yes, go for chunked datasets, for example, 1 dataset per table from your database. This will allow ETL actions to be most efficient because performed on smaller datasets. Tables can then be joined via a Merge.

Blog Custom Datasets

If no transformation is needed, go for a dataset directly combining different tables from your database, creating one dataset per type of usage, e.g. Finance, Sales, CRM, Marketing, etc.

Blog Sql Query

This will result in the same table as with the above-mentioned Merge:

Blog Data Table

Always try to go for the most granular level of data that you will need for your visualizations. Aggregations can then be built upon datasets.

For example, use daily metrics if you need Daily metrics, Day over Day evolution metrics or Weekly metrics. Use Monthly grain if daily monitoring is of no use.

3. Use appropriate Data types

With the June 2019 product release, you can finetune data types for each column of your data. This will be particularly useful when working with non-database sources, as these datasets might not be optimized for performances yet.

Use numerical values whenever you can. Using Text will consume more storage space and will be slower to proceed when evaluated.

Blog Appropriate Data Types

In this case scenario, the column [Convertible] can either have ‘yes’ or ‘no’ as a value. Consider lowering the maximum size of this column to 3 characters rather than 250.

4. Take advantage of Cache feature

The cache will take up storage but will also increase performances theatrically. Datasets such as Views, Fusions, and Merges created via the ETL can be cached.

Whenever a dataset times out, consider caching the datasets that feed into, especially the larger ones.

Using the Dependency Viewer, check which data feeds directly into the final dataset and apply to cache accordingly.

Blog Cache Feature Clicdata

In this case, caching ‘Sales Continent & Brand’ and ‘Sales – Japan’ which feed directly into ‘Sales Continent & Brand View’ will ensure top performances for the latter.

If ‘Sales Continent & Brand View’ is heavily transformed as well and depending widgets in the dashboard are slow to render, cache this View too. Learn more about why and when to cache your data.

Blog Cache Data Clicdata

Also, don’t forget about low-hanging fruit optimizations for your storage. You know, those that we forget most often?

5. Clean the trash

Obviously…

6. Check out stored versions

This feature is very useful for non-persistent datasets, such as for example data available through Facebook’s API and the Facebook connector.

Do you need to keep track of the 10 last versions of a dataset? Maybe not! If the data is still available in the source, don’t worry about keeping history in ClicData.

Blog Data Storage Clicdata
This account stores more History and Trash than Active data

On the Refresh schedules

Refreshing data sources is vital to maintain an efficient dashboard set, displaying accurate and up to date data to users.

Here are some best practices to keep your Schedules workspace tidy and performant, while saving on your refresh quota. Try to implement them right away when setting up your automated refresh schedules, even if you feel that you can always come back to this later and optimize. Do it right from the start!

1. Optimize for time frames and activity days

Are your dashboards consulted 24 / 7? Probably not. Your data only needs to be refreshed when the final visualizations are consumed by users. Think of the cadence that is the most appropriate to your business and consumption context of dashboards.

A typical set up would be a refresh scheduled only during working hours and days. In the case scenario of hourly refreshes run every hour, this simple optimization will save 77% of refresh quota if you switch to working hours only.

Refresh your data up to every minute during this period if you need to monitor business in real time, for example via a screen displayed in the office. Counting sales in real time will become addictive and stimulating for your team!

Refresh your data once a day if dashboards are consulted once a day.

2. Enable your users to refresh data whenever it’s really needed

To give users – even Viewers – the ability to refresh data in real time from the dashboard directly, set up an interaction button that will refresh a schedule when hit.

This way you can even consider reducing the automated refreshes schedule and offer a better-personalized experience to end users.

Blog Data Refresh Clicdata

3. Consider heavy datasets

Working with millions of rows and dozens of columns can become challenging because it can take up to minutes to proceed with the refresh. Long refresh times need to be considered in the Schedules set up as well.

Always evaluate how long a dataset will take to refresh before setting up a 1-minute cadence schedule! Keep track of the load time using the task logs.

Leave time for the refresh to be finished before running the next schedule.

Blog Manage Data Refreshes

Take advantage of the Busy Days / Time graphic in the scheduler to aim for lower activity times during the day to ensure quickest possible refresh time. Click on a day to drill down to minutes.

At 2:02 AM UTC very few refresh jobs are launched.

Blog Data Refresh Management

4. Keep your workspace clean

Group multiple data refresh tasks that feed into 1 dashboard or topic into one Schedule. This way, all the data will get refreshed at the same time, and consequent data combinations will be correct.

In an e-commerce website case scenario, update Orders and Customers data at the same time, say every 10 minutes. Products can be updated separately, for example, once a day, unless the catalog changes at a quicker rate (on a marketplace website for example, where merchants feed the catalog continuously).

On the dashboard side

Finally, let’s not forget to look at the Dashboards after all the optimization work on Data and Schedules!

We’ve collected some bests practice to help you save time while building and maintaining them, but also to build quick loading visualizations.

1. Calculated metrics

When creating calculated metrics always consider if you could create them on the Data side, using ClicData’s ETL, e.g. by adding a calculated column to a View. If yes, choose this option for performance’s sake.

This will reduce calculation time when loading the dashboard, even more, if you cache the View.

2. Use SQL in calculated columns on a dashboard

Did you know that you can use SQL in your dashboard? Now you do!

In case scenarios of calculations that do not need conditional filtering, simply write your calculations as you would do in a calculated column on the Data side. Add single quotes around the SQL formula.

For example, replace:

DataAggregate(‘Orders’,’OrderPrice,’sum)/DataAggregate(‘Orders’,’OrderID’,count)

with

‘sum([OrderPrice])/count([OrderID])’

Bonus when using SQL, the formula is evaluated in the context of the widget, with its categories, series, and filters’ setup. Using the DataAggregate function you would need to include contextual filters to the formula.

Learn more about SQL, DataAggregate and other contextual formulas.

4. Prefer filtered widgets to All data displayed

Set up widgets to be filtered by default to the smallest Dataset necessary.

Loading all data in each widget will take longer to display than loading already filtered data. Only when selecting different values in the filter, will the corresponding data be loaded and processed.

Blog Filtered Widgets Clicdata

An elegant way to pre-filter data while providing a good user experience is to default filter to a User parameter, for example via a dropdown list that other widgets depend on.

4. Smaller dashboards with interactions

Avoid supersonic dashboards displaying all your KPIs at once.

Consider building a set of dashboards linked to each other using buttons, providing a website-like experience. You can create a drill down report series using this method.

Experience this with the live Sales dashboard template navigation menu built out of designed Button widgets.

Blog Drill Down Dashboards Clicdata

5. Dashboard formula

Dashboard level formulas are useful when handling values or metrics used repeatedly across widgets, for example for filtering purposes.

Let’s consider the case scenario of multiple widgets that are all filtered on a dynamic date, the first day of the current Month. Rather than adding this filter formula to each widget, create a dashboard formula and refer to it in widgets.

The formula will be evaluated only once.

Blog Dashboard Level Formulas

There are a LOT of ways to optimize performances in ClicData. Creating a robust data model that will help you balance between best use of storage, efficient refresh schedules management and dashboard display time will make all the difference to your daily work, your end users and probably your subscription price.

Start wherever it makes the most sense to you! Do the easiest optimization work first and gradually move forwards step by step to the toughest part.

If you feel you need advice to make the best choices in your business scenario, don’t hesitate to reach out to our Support team or via the tickets system and Support chat. We will be happy to help you optimize your data model!

Share this on