Data Culture

ETL vs ELT: What’s the Difference (and Which is Better)?

ETL vs ELT

Michael Segner

Michael writes about data engineering, data quality, and data teams.

The acronym ETL (Extract, Transform, Load) has been around for decades, but ELT is rapidly gaining ground. What do those letters stand for and why are there discussions on ETL vs ELT? The letters in both abbreviations stand for Extract, Load, and Transform. But their different ordering is crucial to understanding these processes.

Although these two acronyms hold significant clues to the basic principles of each process – with ETL data is transformed before its loaded, while ELT transforms it after loading – there’s quite a bit more going on below the surface here.

Here’s an analogy that might highlight the differences between ETL vs ELT. You can think of ETL as a restaurant where a waiter gets the exact specifications of what the consumer wants. The meal is prepared to that criteria and once out the door, it’s not as easy to modify, you are likely cooking up an entirely new entree. 

ELT is more like having amorphous clay at your disposal. The clay can take any shape it wants at any time, but its flexibility is both an advantage and a weakness. Experimentation is easy and the creation process is more agile, but if you aren’t careful you can continually twist and mold the clay to the point it’s unrecognizable and brittle.

In this post, we’ll look at some of the key differences between ETL vs ELT, when each option might be more suitable, and ultimately look to determine circumstances in which ETL or ELT is better for businesses than the other.

First, let’s briefly cover each letter of this acronym in the context of these processes:

  • Extract: This refers to your data being pulled from your data sources, whether those are structured relational databases or unstructured data
  • Transform: Data is cleaned up, processed, and/or converted, sometimes with the aim of making it consistent with other data you have stored
  • Load: Data is transferred, or loaded, to where it will be stored and can then analyzed using business intelligence tools

However, those basic definitions listed above apply to the E, T, and L of both ETL and ELT, so let’s dig a little deeper into how each works.

What Is ETL?

The idea behind the ETL methodology is that data from disparate sources is converted to be more consistent before being stored in a database. It’s been used since the 1970s, particularly in conjunction with on-premise databases that have storage and compute limitations.

Recently, it has made a comeback as some data leaders have advocated that building a modern data warehouse should involve gathering data consumer requirements upfront and pre-modeling data tailored to those requirements as a means of avoiding upstream data quality issues and improving overall usability.

Extraction: At the beginning of this process, an ETL tool is used to pull raw data from the sources you have available. Coming from disparate sources, the size and format of this data can vary considerably, as can the moment it’s extracted.

Transform: With your existing data already stored in a particular format, new data must be directed to a processing server or staging area where it can be transformed to ensure compliance with data you already have in storage or in accordance with your business use case. This is often done with custom coding within the pipeline itself.

Load: Once data has been processed so it conforms to your existing data regime, it can be stored in your data warehouse. Data does not move from where it’s being processed to where it’s being stored until it has been successfully transformed.

ETL can be used as part of both cloud and on-premise solutions, although it’s most associated with the latter, and has been the status quo for collating data in one place. ELT, however, has taken part of this process and turned it on its head…

What Is ELT?

Thanks to the rise of cloud-based systems, orchestrators like Airflow, transformation tools like dbt, and the increased use of unstructured data, some data teams have favored switching from ELT to ELT. The key difference?

With ELT, raw data is loaded directly into the storage tier (data warehouse or data lake) before transformation.

Instead of relying on a processing server or staging area, data is cleaned and processed within the data warehouse. And, because raw data is stored, it can be transformed as many times (and in as many ways) as necessary.

With an emphasis on using cloud storage and SaaS solutions, ELT is all about bringing speed and flexibility to big data analysis that just wasn’t possible using on-premise solutions even a few years ago.

Differences Between ETL and ELT

This means that the following two things, flipsides of the same coin, are true:

  1. ELT provides access to raw data from within the data warehouse or data lake
  2. ETL stores information in the data warehouse that has already been transformed

With ETL, data is transformed before being loaded. That process takes time, which makes data entry slower than ELT. Without the need to transform data first, ELT allows for rapid (or even simultaneous) loading then transformation of data.

The retention of raw data means that ELT maintains big data sets that are extremely rich, and can be queried in all sorts of different ways. By contrast, data stored using ETL is likely to be narrower and less flexible.

ETLELT
Extraction
Raw data extracted from
disparate sources

Raw data extracted from
disparate sources
Transformation
Raw data is transformed
on secondary server or
staging area

Data is transformed within
the data warehouse or
system
Loading
Data loaded into data
warehouse/system after
transformation

Raw data is loaded directly
into the data warehouse
or system
Data TypesStructured

Structured, semi-structured,
unstructured
Volume
Best suited for smaller
data sets that require
complex transformation

Useful for large data sets
that need to be loaded
quickly
Privacy
Pre-loading
transformation can
address privacy
concerns
Requires additional
safeguards and security

Data Lake
Compatibility

Negates many of the
flexibility advantages
Yes

ETL vs ELT Pros & Cons

ELT draws many benefits from the scalability and processing power of cloud storage but, if you’re still using on-premise solutions, deploying it effectively might prove more difficult. Beyond that, each approach offers advantages and disadvantages to consider.

The speed of ELT, which offers the possibility of simultaneous loading and transformation, is one obvious mark in its favor. If you need to load and analyze large amounts of data, and keep raw data available for future analysis, ELT is a good choice.

If you typically collect and store data in one particular format, from a number of sources that rarely varies, ETL might be a better option. If your transformation processes are particularly complex, you might find doing that on the fly using ELT cumbersome.

ETLELT
Latency 
High (as transformation
must be completed
before storage)

Low (with zero or minimal
processing before storage)

Flexibility
Low (as sources and
transformations must
be defined early)

High (as new data sources
and formats can be integrated
on the fly)
Storage Requirements
Low (as data is stored
post-transformation)

High (as large volumes of
raw data are stored)
Security & Compliance
Easy (as sensitive data
can be removed or
encrypted prior to
storage in data
warehouse) 
Difficult (as more steps
may be required to
ensure compliance
with protocols)
Speed
Pre-load
transformation can
be intensive process

Prizes speed and efficiency,
with simultaneous loading
and transformation
Cost
Can be high, due to
ongoing maintenance
and storage

Low, or can be, due to
scalable cloud/SaaS
platforms
Maturity

Well-documented
methodology with
tons of information
and best practices
available
Relatively new concept
that can be tricky to
implement

When to use ETL vs ELT

If your organization is on a legacy solution, this becomes a very simple decision. Because ELT is only available with cloud architecture, ETL is really your only choice.

However, if you’re already on a cloud platform like Snowflake or Databricks —or you’re planning to migrate—your choice becomes a little more nuanced. In this scenario, whether you choose ETL vs ELT will depend largely on your needs and priorities. If ingesting as much data as possible to potentially transform and leverage at a later time (for example, if you plan on developing an internal AI model) ELT will likely be the better choice. However, if your primary concern is compliance, a pipeline approach like ETL would allow you to be more selective about the data that’s ultimately ingested into and available for use in your warehouse.

Will ELT replace ETL?

When it comes to ETL vs ELT, “which is better” really comes down to the external factors you have to consider and needs of the business. 

ELT is a potential replacement for ETL. But, ELT will only replace ETL if the use case requires it. If not, then ETL will continue to be the process of choice for your data team.

Summary: ETL vs ELT, Which Is Better?

On the face of it, using ELT and capitalizing on all the advantages listed above might seem like a no-brainer. But things are rarely that simple. Although ELT is new(er) and exciting, there are plenty of reasons that ETL has been the standard in data processing for so long.

For intensive transformations that rarely vary, data that contains a wealth of personal identifying information (PII), or systems built around legacy architecture, ETL remains a strong choice…even if that may change as ELT continues to evolve.

Interested in learning more about how data observability can monitor for anomalies or other issues in your data warehouse – and beyond? Fill out the form below to schedule a time.

Our promise: we will show you the product.