cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Create a docker image for dbt task

essura
New Contributor II

Hi there,

We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).

It seems to work currently, but fx downloading artifacts is not working and there are some complaints in log4j surrounding missing R. 

Is it possible to see the built-in dbt task Dockerfile? or at least get an outline of used directories, env variables and dependencies?

/Esben

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @essuraSetting up a Docker image for your dbt execution is a great approach.

Let’s dive into the details.

  1. Prebuilt Docker Images:

    • dbt Core and all adapter plugins maintained by dbt Labs are available as Docker images. These images are distributed via GitHub Packages in a public registry.
    • Using a prebuilt Docker image to install dbt Core in production has several benefits:
      • It already includes dbt-core, one or more database adapters, and pinned versions of all their dependencies.
      • It simplifies installation and development locally if you don’t have a Python environment set up.
      • Note that running dbt in this manner can be slower if your operating system differs from the system that built the Docker image.
    • You can install an image using the docker pull command:
      docker pull ghcr.io/dbt-labs/<db_adapter_name>:<version_tag>
      
    • For example, to pull the latest version of dbt Core with the PostgreSQL adapter, you can use:
      docker pull ghcr.io/dbt-labs/dbt:latest
      
  2. Running a dbt Docker Image in a Container:

    • The entry point for dbt Docker images is the command dbt.
    • You can bind-mount your project directory to /usr/app within the container and use dbt as normal.
    • Here’s an example of running a dbt Docker image:
      docker run \
        --network=host \
        --mount type=bind,source=path/to/project,target=/usr/app \
        --mount type=bind,source=path/to/profiles.yml,target=/root/.dbt/ \
        <dbt_image_name> \
        ls
      
    • Notes:
      • Bind-mount sources must be an absolute path.
      • Adjust the docker networking settings based on your data warehouse or database host.
  3. Custom Docker Images:

    • If the pre-made images don’t fit your use case, you can build custom images using the provided Dockerfile and README.
    • The Dockerfile supports building images for:
      • All adapters maintained by dbt Labs.
      • One or more third-party adapters.
      • Other system architectures.

Remember to adjust the paths and configurations according to your specific setup.

If you encounter any issues, feel free to ask for further assistance! 🚀🐳

essura
New Contributor II

This looks kind of what ChatGPT proposed as a solution to me.

My issue is that i'm trying to use our custom container in a dbt task (a workflow task). The container is working, but we can't fx use the "download artifact", probably because the path is configured to some temp execution path. And we have some runtime issues in the logs concerning missing R, but what is R used for in the dbt task?

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.