This article is part of the “Jinja in dbt” series. If you’re new to Jinja templating, start with What is Jinja Templating Engine and How to Use It in dbt.

One of the most powerful features of dbt is its integration with Jinja for dynamically discovering and managing columns from the source or an upstream model. This capability eliminates manual column maintenance, and makes your models more resilient to schema changes.

This post explores how to dynamically manage columns in dbt, including:

Why use Jinja for column selection?

Dynamic column management means your dbt models can automatically detect and work with columns from source tables or other models at compile time, without requiring manual updates to your SQL code. This is particularly useful when dealing with wide tables, frequently changing schemas, or when you want to apply transformations to all columns without explicitly naming them.

Reference columns with variables

Let’s start with the simplest form of dynamic referencing: using variables.

Here is an example of a dbt model with defined variables:


{% set key_column = 'customer_id' %}
{% set date_column = 'order_date' %}
{% set amount_column = 'total_amount' %}

select
    {{ key_column }},
    {{ date_column }},
    {{ amount_column }},
    {{ amount_column }} * 1.08 as {{ amount_column }}_with_tax
from {{ ref('raw_orders') }}
where {{ date_column }} >= '2024-01-01'

When compiled, this becomes the following SQL:

select
    customer_id,
    order_date,
    total_amount,
    total_amount * 1.08 as total_amount_with_tax
from analytics.dbt_staging.raw_orders
where order_date >= '2024-01-01'

This approach is useful when the same column names appear in multiple places and might need to change based on configuration or source system differences.

Iterate over a list of columns

You can store multiple column names in an array (a Jinja list) to process them in a batch, or apply the same transformations in one Jinja statement.

This dbt model applies the same aggregations to numeric fields:


{% set dimension_columns = ['customer_id', 'product_id', 'sales_rep_id', 'region'] %}
{% set metric_columns = ['quantity', 'revenue', 'profit'] %}

select
    -- Reference dimension columns
    {% for col in dimension_columns %}
    {{ col }},
    {% endfor %}

    -- Transform metric columns
    {% for col in metric_columns %}
    sum({{ col }}) as total_{{ col }},
    avg({{ col }}) as avg_{{ col }}{% if not loop.last %},{% endif %}
    {% endfor %}

from {{ ref('sales_data') }}
group by
    {% for col in dimension_columns %}
    {{ col }}{% if not loop.last %},{% endif %}
    {% endfor %}  
  

And here is how this statement will be compiled in dbt:

select
    -- Reference dimension columns
    customer_id,
    product_id,
    sales_rep_id,
    region,

    -- Transform metric columns
    sum(quantity) as total_quantity,
    avg(quantity) as avg_quantity,
    sum(revenue) as total_revenue,
    avg(revenue) as avg_revenue,
    sum(profit) as total_profit,
    avg(profit) as avg_profit

from analytics.dbt_staging.sales_data
group by
    customer_id, product_id, sales_rep_id, region

Dynamically discover columns with adapter.get_columns() function

The adapter.get_columns() function is your primary tool for discovering table structures when working with Jinja. This function queries the information schema of your database to retrieve column metadata at compile time.

In this dbt model, the adapter.get_columns() function gets all columns from the upstream model:


-- Get all columns from a source table
{% set columns = adapter.get_columns(ref('raw_customers')) %}

select
    {% for column in columns %}
        {{ column.name }}{% if not loop.last %},{% endif %}
    {% endfor %}
from {{ ref('raw_customers') }}

Assuming your raw_customers table has columns id, name, email, and created_at, this would compile to:

select
    id,
    name,
    email,
    created_at
from analytics.dbt_staging.raw_customers

This approach automatically generates a column list based on the actual table structure, ensuring your model stays in sync with schema changes.

Select columns based on a set pattern

You can also dynamically create different groups of columns based on their naming patterns, like a common prefix or suffix.

For example, in this dbt model we group columns based on their name’s suffix to apply the same aggregation:


{% set all_columns = adapter.get_columns(ref('customer_survey')) %}
{% set rating_columns = [] %}
{% set comment_columns = [] %}

{% for column in all_columns %}
    {% if column.name.endswith('_rating') %}
        {% do rating_columns.append(column.name) %}
    {% elif column.name.endswith('_comment') %}
        {% do comment_columns.append(column.name) %}
    {% endif %}
{% endfor %}

select
    survey_id,
    respondent_id,
    
    -- Average all rating columns (only if any exist)
    {% if rating_columns %}
    (
        {% for col in rating_columns %}
        coalesce({{ col }}, 0){% if not loop.last %} + {% endif %}
        {% endfor %}
    ) / {{ rating_columns | length }} as avg_rating,
    {% else %}
    NULL as avg_rating,
    {% endif %}
    
    -- Concatenate non-null comments (only if any exist)
    {% if comment_columns %}
    concat(
        {% for col in comment_columns %}
        coalesce({{ col }}, ''){% if not loop.last %}, ' | ', {% endif %}
        {% endfor %}
    ) as combined_feedback
    {% else %}
    NULL as combined_feedback
    {% endif %}
    
from {{ ref('customer_survey') }}

Compiled SQL (assuming columns like service_rating, product_rating, service_comment, product_comment exist in the upstream table):

select
    survey_id,
    respondent_id,
    
    -- Average all rating columns
    (
        coalesce(service_rating, 0) + 
        coalesce(product_rating, 0)
    ) / 2 as avg_rating,
    
    -- Concatenate non-null comments
    concat(
        coalesce(service_comment, ''), ' | ', 
        coalesce(product_comment, '')
    ) as combined_feedback
    
from analytics.dbt_staging.customer_survey

Transform columns dynamically

You can use Jinja not only to dynamically select columns, but also to dunamically apply the same transformation to multiple columns, if they exist in your source.

Let’s look at this example:


{% set numeric_columns = ['revenue', 'cost', 'profit', 'tax'] %}
{% set all_columns = adapter.get_columns(ref('financial_data')) %}
{% set column_names = all_columns | map(attribute='name') | list %}
{% set existing_numeric = [] %}

{% for col in numeric_columns %}
    {% if col in column_names %}
        {% do existing_numeric.append(col) %}
    {% endif %}
{% endfor %}

select
    transaction_id,
    {% for col in existing_numeric %}
    round({{ col }}, 2) as {{ col }}_rounded{% if not loop.last %},{% endif %}
    {% endfor %}
    created_date
from {{ ref('financial_data') }}

The third line in this model takes the list of column objects (all_columns) returned by adapter.get_columns(ref(‘financial_data’)), and extracts just the names of each column into a new list (column_names). Here we use the Jinja filter map(attribute=’name’) to iterate over all_columns list and retrieve the name property from each column object.

The list filter converts the resulting map object into a Python list. The final output is a flat list containing the names (as strings) of all columns in the financial_data table

Assuming only revenue, cost, and profit columns exist in the source table, this compiles to:

select
    transaction_id,
    round(revenue, 2) as revenue_rounded,
    round(cost, 2) as cost_rounded,
    round(profit, 2) as profit_rounded,
    created_date
from analytics.dbt_staging.financial_data

Use columns metadata

The get_columns() function returns rich metadata for each column, including its data type. You can use this information for type-specific transformations.

Let’s take this dbt model where different transformations are applied based on the column data type:


{% set columns = adapter.get_columns(ref('customer_data')) %}

select
    -- Generate type-specific transformations
    {% for column in columns %}
        {% if column.data_type in ('varchar', 'text', 'string') %}
            trim(lower({{ column.name }})) as {{ column.name }}_clean
        {% elif column.data_type in ('int', 'bigint', 'numeric') %}
            coalesce({{ column.name }}, 0) as {{ column.name }}_filled
        {% elif column.data_type in ('timestamp', 'datetime') %}
            date_trunc('day', {{ column.name }}) as {{ column.name }}_date
        {% else %}
            {{ column.name }}
        {% endif %}
        {%- if not loop.last %},{% endif %}
    {% endfor %}
from {{ ref('customer_data') }}

Assuming your table has columns name (varchar), age (int), signup_date (timestamp), and is_processed (boolean), this compiles to:

select
    trim(lower(name)) as name_clean,
    coalesce(age, 0) as age_filled,
    date_trunc('day', signup_date) as signup_date_date,
    is_processed
from analytics.dbt_staging.customer_data

Alias columns dynamically

You can also use Jinja’s string manipulation filters to create systematic column aliases.

Here is an example of a dbt model that iterates over a list of columns from the source (available_columns), and renames them with a prefix clean_:


-- Define a fixed list of raw column names we expect to find in the raw_customers table
{% set raw_columns = ['first_name', 'last_name', 'email_address', 'phone_number'] %}

-- Retrieve all column metadata from the referenced source model 'raw_customers'
{% set source_columns = adapter.get_columns(ref('raw_customers')) %}

-- Extract just the column names into a list for easier checking later
{% set available_columns = source_columns | map(attribute='name') | list %}

select
    customer_id,
    {% for col in raw_columns %}
        {% if col in available_columns %}
    {{ col }} as clean_{{ col.replace('_', '') }}
        {% else %}
    NULL as clean_{{ col.replace('_', '') }}
        {% endif %}
        {%- if not loop.last %},{% endif %}
    {% endfor %}
from {{ ref('raw_customers') }}

When compiled, the SQL query will look like this:

select
    customer_id,
    first_name as clean_firstname,
    last_name as clean_lastname,
    email_address as clean_emailaddress,
    phone_number as clean_phonenumber,
    created_date
from analytics.dbt_staging.raw_customers

Filter and group columns

You can use Jinja to create more sophisticated logic by filtering and grouping columns based on their characteristics, such as data type or name.

For example, this dbt model dynamically builds a SQL query for the table sales_data by first retrieving all its columns, then dividing them into two separate lists (dimensions and measures), and, finally, aggregating numeric columns.


-- Retrieve all columns and their metadata from the referenced model 'sales_data'
{% set all_columns = adapter.get_columns(ref('sales_data')) %}

-- Initialize an empty list to later store dimension (categorical) columns
{% set dimension_columns = [] %}

-- Initialize an empty list to later store measure (numeric or aggregatable) columns
{% set measure_columns = [] %}

-- Define a list of columns to exclude from processing or transformations
-- Typically these are metadata or system-generated fields
{% set excluded_columns = ['created_at', 'updated_at', 'batch_id'] %}

{% for column in all_columns %}
    {% if column.name not in excluded_columns %}
        {% if column.data_type in ('varchar', 'text', 'string', 'boolean') %}
            {% do dimension_columns.append(column.name) %}
        {% elif column.data_type in ('int', 'bigint', 'numeric', 'float', 'decimal') %}
            {% do measure_columns.append(column.name) %}
        {% endif %}
    {% endif %}
{% endfor %}

select
    -- Dimension columns (no aggregation)
    {% for dim in dimension_columns %}
    {{ dim }},
    {% endfor %}
    
    -- Measure columns (with aggregation)
    {% for measure in measure_columns %}
    sum({{ measure }}) as total_{{ measure }},
    avg({{ measure }}) as avg_{{ measure }}{% if not loop.last %},{% endif %}
    {% endfor %}
    
from {{ ref('sales_data') }}
group by 
    {% for dim in dimension_columns %}
    {{ dim }}{% if not loop.last %},{% endif %}
    {% endfor %}
    

If your table has dimensions region and product_category, and measures revenue and quantity, this compiles to:

select
    -- Dimension columns (no aggregation)
    region,
    product_category,
    
    -- Measure columns (with aggregation)
    sum(revenue) as total_revenue,
    avg(revenue) as avg_revenue,
    sum(quantity) as total_quantity,
    avg(quantity) as avg_quantity
    
from analytics.dbt_staging.sales_data
group by 
    region,
    product_category

Select columns based on the environment

To create consistent naming conventions across different data sources or environments, use Jinja dictionaries to handle different column names.


{% set column_mapping = {
    'prod': {
        'customer_key': 'customer_id',
        'order_key': 'order_id',
        'purchase_date': 'order_date'
    },
    'dev': {
        'customer_key': 'cust_id',
        'order_key': 'ord_id', 
        'purchase_date': 'purchase_timestamp'
    }
} %}

{% set env_columns = column_mapping.get(target.name, {}) %}
{% set source_columns = adapter.get_columns(ref('raw_orders')) %}
{% set available_columns = source_columns | map(attribute='name') | list %}

select
    {% for logical_name, physical_name in env_columns.items() %}
        {% if physical_name in available_columns %}
    {{ physical_name }} as {{ logical_name.replace('_key', '_id') }}
        {% else %}
    NULL as {{ logical_name.replace('_key', '_id') }}
        {% endif %}
        {%- if not loop.last %},{% endif %}
    {% endfor %}
from {{ ref('raw_orders') }}

In this case compiled SQL will look different in development vs production environments:

Compiled SQL (in production environment):

select
    customer_id as customer_id,
    order_id as order_id,
    order_date as order_date,
    amount
from analytics.dbt_staging.raw_orders

Compiled SQL (in development environment):

select
    cust_id as customer_id,
    ord_id as order_id,
    purchase_timestamp as order_date,
    amount
from analytics.dbt_staging.raw_orders

Handle missing columns gracefully

When using adapter.get_columns(), your code may expect certain columns to exist, but they might be missing due to schema evolution or variations between environments. To handle missing columns, you can use Jinja logic to check for a column’s existence before referencing it. This prevents your dbt runs from failing due to an “invalid identifier” error when an upstream column is removed or renamed.

Step 1: Get existing column names

First, obtain the list of columns from your relation and transform it into a simple list of column names using adapter.get_columns() function.


{% set source_relation = ref('raw_orders') %}
{% set columns = adapter.get_columns(source_relation) %}
{% set column_names = columns | map(attribute='name') | list %}

Step 2: Conditionally reference columns

When writing your SELECT statement, use an if statement to output the real column if it exists, or substitute it with a NULL value otherwise.


select
    order_id,
    customer_id,
    {% if 'email' in column_names %}
    email,
    {% else %}
    NULL as email,
    {% endif %}
    {% if 'phone_number' in column_names %}
    phone_number,
    {% else %}
    NULL as phone_number,
    {% endif %}
    order_date
from {{ source_relation }}

If phone_number column is not present in the upstream table, here is how this conditional logic will be compiled in dbt:

select
    order_id,
    customer_id,
    email,
    NULL as phone_number,
    order_date
from analytics.dbt_staging.raw_orders

If all fields are present, the compiled SQL will look as follows;

select
    order_id,
    customer_id,
    email,
    phone_number,
    order_date
from analytics.dbt_staging.raw_orders

Step 3: Handle a list of desired columns dynamically

If you have a list of optional columns you want to include, you can loop through them and apply the same conditional logic.


{% set desired_columns = ['discount_amount', 'promo_code', 'sales_rep', 'shipping_cost'] %}
{% set source_relation = ref('raw_orders') %}
{% set columns = adapter.get_columns(source_relation) %}
{% set column_names = columns | map(attribute='name') | list %}

select
    order_id,
    customer_id,
    {% for col in desired_columns %}
        {% if col in column_names %}
            {{ col }}
        {% else %}
            NULL as {{ col }}
        {% endif %}
        {%- if not loop.last %},{% endif %}
    {% endfor %}
from {{ source_relation }}

Let’s say raw_orders table has these columns: order_id, customer_id, promo_code, shipping_cost. In this case, the final SQL query will be compiled as follows:

select
    order_id,
    customer_id,
    NULL as discount_amount,
    promo_code,
    NULL as sales_rep,
    shipping_cost
from analytics.dbt_staging.raw_orders

This approach makes your model resilient to schema changes, and missing columns are returned as NULL (or any other default value) instead of causing query failures.

Best practices for dynamic column selection

  • Always run dbt compile to verify that your Jinja generates the expected SQL before executing your models.
  • When using adapter.get_columns(), always account for the possibility that columns might not exist. Use conditional logic to provide sensible defaults or NULL values.
  • Choose descriptive names for your variables and lists (dimension_columns, columns_to_exclude) to make your code self-documenting.
  • Use Jinja comments {# … #} to explain complex column selection or transformation logic.
  • Consider performance: dynamic column discovery queries the information schema at compile time. While generally fast, be mindful of the number of tables you query in large projects.

Important considerations

  • Physical relations only: adapter.get_columns() queries the database catalog, so it only works for relations that already exist (sources, tables, views). If your upstream reference is an ephemeral dbt model, the function won’t be able to collect its columns.
  • Compile-time execution: Column discovery happens at compile time. Therefore, it cannot see columns from CTEs defined in the same model.
  • Environment testing: Always test your logic across different environments (dev, prod) to ensure it handles all variations correctly.
  • Schema evolution: While this helps manage schema changes, consider the downstream impact of adding NULL columns or changing data types on your BI tools and other dependent models.

Conclusion

Dynamic column selection transforms how you handle schema evolution and reduces manual maintenance of your dbt projects. By mastering these patterns you can create robust models that adapt to changing data structures.

Want to see more Jinja use cases in dbt? Check other articles in the series:

Illustration by Storyset