cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Generate and export dbt documentation from the Workflow dbt task to S3

Anonymous
Not applicable

I'm testing the Databricks Jobs feature with a dbt task and wanted to know if you had any advice for me for managing dbt documentation.

I can use "dbt run" commands to run my models then "dbt docs generate" to generate the documentation. But is it possible to export the generated files to GitHub or to a File System like AWS S3 ?

4 REPLIES 4

Anonymous
Not applicable

Hi,

Thanks for your answer. Actually, I used this documentation for generating my Databricks jobs but there is no mention about how to manage the dbt generated documentation. I do not know if that feature is already implemented by Databricks or if there is a work around.

Jfoxyyc
Valued Contributor

 Hi @Kaniz Fatma,

โ€‹

The documentation mentions:

  • Automatic archiving of the artifacts from job runs, including logs, results, manifests, and configuration.

โ€‹

When a dbt task runs, do the logs, manifests and index.html automatically go back to the attached repo?

โ€‹

Is there a way to run slim ci with the dbt task? Can we use pre-commit? It would be good to be able to inspect manifest, capture models that have changed, shallow clone them, test their transforms, and if succeed, run those to prod.โ€‹

โ€‹

Jfoxyyc
Valued Contributor

You can use the jobs api and hit the jobs/runs/get-output/ and look at the dbt_output.artifacts_link to get an http link to download a tar.gz file that has all the artifacts in it. You can then unpack the tar.gz and store those files in adls or s3.

136039
New Contributor II

How can I access these target files from the task itself ? I am trying to use dbt's state modifiers for detecting models that changed and only running models when the source freshness changed. Is there an easy way to store and use these state files in s3/databricks workspace? We are also using Databricks Asset bundles to deploy our workflows and code, so maybe theres a way to use it for this problem ?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group