cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Create a docker image for dbt task

essura
New Contributor II

Hi there,

We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).

It seems to work currently, but fx downloading artifacts is not working and there are some complaints in log4j surrounding missing R. 

Is it possible to see the built-in dbt task Dockerfile? or at least get an outline of used directories, env variables and dependencies?

/Esben

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @essuraSetting up a Docker image for your dbt execution is a great approach.

Letโ€™s dive into the details.

  1. Prebuilt Docker Images:

    • dbt Core and all adapter plugins maintained by dbt Labs are available as Docker images. These images are distributed via GitHub Packages in a public registry.
    • Using a prebuilt Docker image to install dbt Core in production has several benefits:
      • It already includes dbt-core, one or more database adapters, and pinned versions of all their dependencies.
      • It simplifies installation and development locally if you donโ€™t have a Python environment set up.
      • Note that running dbt in this manner can be slower if your operating system differs from the system that built the Docker image.
    • You can install an image using the docker pull command:
      docker pull ghcr.io/dbt-labs/<db_adapter_name>:<version_tag>
      
    • For example, to pull the latest version of dbt Core with the PostgreSQL adapter, you can use:
      docker pull ghcr.io/dbt-labs/dbt:latest
      
  2. Running a dbt Docker Image in a Container:

    • The entry point for dbt Docker images is the command dbt.
    • You can bind-mount your project directory to /usr/app within the container and use dbt as normal.
    • Hereโ€™s an example of running a dbt Docker image:
      docker run \
        --network=host \
        --mount type=bind,source=path/to/project,target=/usr/app \
        --mount type=bind,source=path/to/profiles.yml,target=/root/.dbt/ \
        <dbt_image_name> \
        ls
      
    • Notes:
      • Bind-mount sources must be an absolute path.
      • Adjust the docker networking settings based on your data warehouse or database host.
  3. Custom Docker Images:

    • If the pre-made images donโ€™t fit your use case, you can build custom images using the provided Dockerfile and README.
    • The Dockerfile supports building images for:
      • All adapters maintained by dbt Labs.
      • One or more third-party adapters.
      • Other system architectures.

Remember to adjust the paths and configurations according to your specific setup.

If you encounter any issues, feel free to ask for further assistance! ๐Ÿš€๐Ÿณ

essura
New Contributor II

This looks kind of what ChatGPT proposed as a solution to me.

My issue is that i'm trying to use our custom container in a dbt task (a workflow task). The container is working, but we can't fx use the "download artifact", probably because the path is configured to some temp execution path. And we have some runtime issues in the logs concerning missing R, but what is R used for in the dbt task?

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group