How to use partial_parse.msgpack with workflow dbt task?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-28-2022 06:23 PM
I'm looking for direction on how to get the dbt task in workflows to use the partial_parse.msgpack file to skip parsing files that haven't changed. I'm downloading my artifacts after each run and the partial_parse file is being saved back to adls.
What is the dbt task doing? Which folder does the git repo specified in "project directory" load to? I'm assuming I can use an init script and load the file from adls and dump it into whichever folder is /{project directory}/target/ ?
- Labels:
-
Dbt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2023 10:39 AM
Hi, Could you please confirm what will be your expectation and the used case? Do you want the file to be saved somewhere else?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-03-2023 12:50 AM
The use case is I'd like the dbt task in a databricks workflow to be able to use the partial_parse.msgpack file so it can take advantage of partial parsing and not have to parse files that haven't changed. It significantly speeds up runtime since it doesn't waste time parsing.
I download the file from the artifacts after each run and store it in an adls location. I don't know how to get the dbt task to use it. I don't want to store it in git repo. I would prefer it if I could get the dbt task to copy the file from adls somehow. If this was possible, we could also copy manifest.json and run state:modified tasks for slim ci.