A Guide to Building Performant Real-Time Data Models | by Marie Truong | Aug, 2023

A Guide to Building Performant Real-Time Data Models | by Marie Truong | Aug, 2023

[ad_1]

Marie Truong
Towards Data Science
Photo by Lukas Blazek on Unsplash

Data has become a critical tool for decision-making. To be actionable, data needs to be cleaned, transformed, and modeled.

This process is often part of an ELT pipeline that runs at a given frequency, for example daily.

On the other hand, to adjust and make decisions fast, stakeholders sometimes need access to the most recent data to be able to react fast.

For example, if there is a huge drop in the number of users of a website, they need to be aware of this issue quickly and be given the necessary information to understand the problem.

The first time I was asked to build a dashboard with real-time data, I connected it directly to the raw table that was real-time and provided some simple KPIs like the number of users and crashes. For monthly graphs and deeper analysis, I created another dashboard connected to our data model, that was updated daily.

This strategy was not optimal: I was duplicating logic between the data warehouse and the BI tool, so it was harder to maintain. Moreover, the real-time dashboard could only perform well with a few days of data, so stakeholders had to switch to the historical one to check earlier dates.

I knew we had to do something about it. We needed real-time data models without compromising performance.

In this article, we’ll explore different solutions to build real-time models, and their pros and cons.

An SQL view is a virtual table that contains the result of a query. Unlike tables, views do not store data. They are defined by a query that is executed every time someone queries the view.

Here is an example of a view definition:

CREATE VIEW orders_aggregated AS (
SELECT
order_date,
COUNT(DISTINCT order_id) AS orders,
COUNT(DISTINCT customer_id) AS customers
FROM orders
GROUP BY order_date
)

Even when new rows are added to the table, views stay up to date. However, if the table is big, views might become very slow as no data is stored.

They should be the first option to try out if you are working on a small project.

[ad_2]
Source link

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *