The acronym ETL (Extract, Transform, Load) has been around for decades, but ELT is rapidly gaining ground. What do those letters stand for and why are there discussions on ETL vs ELT? The letters in both abbreviations stand for Extract, Load, and Transform. But their different ordering is crucial to understanding these processes.
Although these two acronyms hold significant clues to the basic principles of each process – with ETL data is transformed before its loaded, while ELT transforms it after loading – there’s quite a bit more going on below the surface here.
Here’s an analogy that might highlight the differences between ETL vs ELT. You can think of ETL as a restaurant where a waiter gets the exact specifications of what the consumer wants. The meal is prepared to that criteria and once out the door, it’s not as easy to modify, you are likely cooking up an entirely new entre.
ELT is more like having amorphous clay at your disposal. The clay can take any shape it wants at any time, but its flexibility is both an advantage and a weakness. Experimentation is easy and the creation process is more agile, but if you aren’t careful you can continually twist and mold the clay to the point it’s unrecognizable and brittle.
In this post, we’ll look at some of the key differences between ETL vs ELT, when each option might be more suitable, and ultimately look to determine circumstances in which ETL or ELT is better for businesses than the other.
First, let’s briefly cover each letter of this acronym in the context of these processes:
- Extract: This refers to your data being pulled from your data sources, whether those are structured relational databases or unstructured data
- Transform: Data is cleaned up, processed, and/or converted, sometimes with the aim of making it consistent with other data you have stored
- Load: Data is transferred, or loaded, to where it will be stored and can then analyzed using business intelligence tools
However, those basic definitions listed above apply to the E, T, and L of both ETL and ELT, so let’s dig a little deeper into how each works.
What Is ETL?
The idea behind the ETL methodology is that data from disparate sources is converted to be more consistent before being stored in a database. It’s been used since the 1970s, particularly in conjunction with on-premise databases that have storage and compute limitations.
Recently, it has made a comeback as some data leaders have advocated that building a modern data warehouse should involve gathering data consumer requirements upfront and pre-modeling data tailored to those requirements as a means of avoiding upstream data quality issues and improving overall usability.
Extraction: At the beginning of this process, an ETL tool is used to pull raw data from the sources you have available. Coming from disparate sources, the size and format of this data can vary considerably, as can the moment it’s extracted.
Transform: With your existing data already stored in a particular format, new data must be directed to a processing server or staging area where it can be transformed to ensure compliance with data you already have in storage or in accordance with your business use case. This is often done with custom coding within the pipeline itself.
Load: Once data has been processed so it conforms to your existing data regime, it can be stored in your data warehouse. Data does not move from where it’s being processed to where it’s being stored until it has been successfully transformed.
ETL can be used as part of both cloud and on-premise solutions, although it’s most associated with the latter, and has been the status quo for collating data in one place. ELT, however, has taken part of this process and turned it on its head…
What Is ELT?
Thanks to the rise of cloud-based systems, orchestrators like Airflow, transformation tools like dbt, and the increased use of unstructured data, some data teams have favored switching from ELT to ELT. The key difference?
With ELT, raw data is loaded directly into the storage tier (data warehouse or data lake) before transformation.
Instead of relying on a processing server or staging area, data is cleaned and processed within the data warehouse. And, because raw data is stored, it can be transformed as many times (and in as many ways) as necessary.
With an emphasis on using cloud storage and SaaS solutions, ELT is all about bringing speed and flexibility to big data analysis that just wasn’t possible using on-premise solutions even a few years ago.
Differences Between ETL and ELT
This means that the following two things, flipsides of the same coin, are true:
- ELT provides access to raw data from within the data warehouse or data lake
- ETL stores information in the data warehouse that has already been transformed
With ETL, data is transformed before being loaded. That process takes time, which makes data entry slower than ELT. Without the need to transform data first, ELT allows for rapid (or even simultaneous) loading then transformation of data.
The retention of raw data means that ELT maintains big data sets that are extremely rich, and can be queried in all sorts of different ways. By contrast, data stored using ETL is likely to be narrower and less flexible.
Raw data extracted from
Raw data extracted from
Raw data is transformed
on secondary server or
Data is transformed within
the data warehouse or
Data loaded into data
Raw data is loaded directly
into the data warehouse
Best suited for smaller
data sets that require
Useful for large data sets
that need to be loaded
safeguards and security
Negates many of the
ETL vs ELT Pros & Cons
ELT draws many benefits from the scalability and processing power of cloud storage but, if you’re still using on-premise solutions, deploying it effectively might prove more difficult. Beyond that, each approach offers advantages and disadvantages to consider.
The speed of ELT, which offers the possibility of simultaneous loading and transformation, is one obvious mark in its favor. If you need to load and analyze large amounts of data, and keep raw data available for future analysis, ELT is a good choice.
If you typically collect and store data in one particular format, from a number of sources that rarely varies, ETL might be a better option. If your transformation processes are particularly complex, you might find doing that on the fly using ELT cumbersome.
High (as transformation
must be completed
Low (with zero or minimal
processing before storage)
Low (as sources and
be defined early)
High (as new data sources
and formats can be integrated
on the fly)
Low (as data is stored
High (as large volumes of
raw data are stored)
|Security & Compliance
Easy (as sensitive data
can be removed or
encrypted prior to
storage in data
|Difficult (as more steps
may be required to
be intensive process
Prizes speed and efficiency,
with simultaneous loading
Can be high, due to
Low, or can be, due to
tons of information
and best practices
|Relatively new concept
that can be tricky to
Summary: ETL vs ELT, Which Is Better?
On the face of it, using ELT and capitalizing on all the advantages listed above might seem like a no-brainer. But things are rarely that simple. Although ELT is new(er) and exciting, there are plenty of reasons that ETL has been the standard in data processing for so long.
For intensive transformations that rarely vary, data that contains a wealth of personal identifying information (PII), or systems built around legacy architecture, ETL remains a strong choice…even if that may change as ELT continues to evolve.
When it comes to ETL vs ELT, “which is better” really comes down to the external factors you have to consider and needs of the business.
Interested in learning more about how data observability can monitor for anomalies or other issues in your data warehouse – and beyond? Fill out the form below to schedule a time.
Our promise: we will show you the product.