cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error Databricks Bundle Deploy with changes in the wheel file

jeremy98
Honored Contributor

Hello Community,
Suddenly, I have an error, when I'm doing the deploy of the new bundle to databricks changing the python script, the cluster continue to point to an old version of the py script uploaded from databricks asset bundle, why this?

 

1 ACCEPTED SOLUTION

Accepted Solutions

denis-dbx
Databricks Employee
Databricks Employee

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the built wheel so that the new code is always deployed.

Note, that this only works with local wheels that are uploaded as part of "bundle deploy". So you need to download this wheel into your project so that it can be patched by "bundle deploy" before upload to break the cache.

artifacts:
  mywheel:
    type: whl
    dynamic_version: true
    files:
    - source: dist/pipelines-0.0.1-py3-none-any.whl

resources:
  jobs:
    sync_tables_gold_with_pg:
      name: sync_tables_gold_with_pg

      tasks:
        - task_key: sync_tables_gl_and_pg
          job_cluster_key: job_new_cluster
          existing_cluster_id: 1224-151003-c62b2avz
          notebook_task:
            notebook_path: ../notebook/sync_gold_tables_to_postgres.ipynb
            source: WORKSPACE
            base_parameters:
              env: ${bundle.target}
          libraries:
            - whl: dist/pipelines-0.0.1-py3-none-any.whl





View solution in original post

9 REPLIES 9

jeremy98
Honored Contributor

Seems that the problem is to use an existing cluster and making the deploy again and again point to a first upload of the same script.. is this a databricks bug?

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

What is the error that you are hitting?

jeremy98
Honored Contributor

Hello Alberto,

Yes, essentially, when I was deploying a workflow based on an existing cluster to run manually inside Databricks, I encountered an issue. While updating a Python script used by a notebook task, the script pointed to the initial version instead of the updated one, even though I redeployed the DAB.

Let me know if this makes things clearer.

Alberto_Umana
Databricks Employee
Databricks Employee

Thanks for the comments! 

Just to confirm you are following this deployment logic? 

databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job

Also it looks like existing cluster is not picking up new version you can either restart the cluster manually or automate the restart process as part of your deployment workflow, just to confirm.

jeremy98
Honored Contributor

Hello Alberto,
Thanks for your answer, yes basically I did those steps. Sometimes I didn't do validate before deploy (putting -t stg, instead dev target) but also the last command all manually through the portal.

I restarted, but again after 3 deploys for example, the cluster doesn't take the correct version of the py script..

 

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

Could you please send me your databricks.yml file to review it?

 

Hello,
This is the yaml file I used, but btw I think because I cannot use a cluster up and never stop, I had the problem that during every new deploy, btw the cluster retains the first deploy... seems weird.

resources:
  jobs:
    sync_tables_gold_with_pg:
      name: sync_tables_gold_with_pg

      tasks:
        - task_key: sync_tables_gl_and_pg
          job_cluster_key: job_new_cluster
          existing_cluster_id: 1224-151003-c62b2avz
          notebook_task:
            notebook_path: ../notebook/sync_gold_tables_to_postgres.ipynb
            source: WORKSPACE
            base_parameters:
              env: ${bundle.target}
          libraries:
            - whl: ${workspace.root_path}/files/dist/pipelines-0.0.1-py3-none-any.whl

 

denis-dbx
Databricks Employee
Databricks Employee

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the built wheel so that the new code is always deployed.

Note, that this only works with local wheels that are uploaded as part of "bundle deploy". So you need to download this wheel into your project so that it can be patched by "bundle deploy" before upload to break the cache.

artifacts:
  mywheel:
    type: whl
    dynamic_version: true
    files:
    - source: dist/pipelines-0.0.1-py3-none-any.whl

resources:
  jobs:
    sync_tables_gold_with_pg:
      name: sync_tables_gold_with_pg

      tasks:
        - task_key: sync_tables_gl_and_pg
          job_cluster_key: job_new_cluster
          existing_cluster_id: 1224-151003-c62b2avz
          notebook_task:
            notebook_path: ../notebook/sync_gold_tables_to_postgres.ipynb
            source: WORKSPACE
            base_parameters:
              env: ${bundle.target}
          libraries:
            - whl: dist/pipelines-0.0.1-py3-none-any.whl





jeremy98
Honored Contributor

AMAZING :), thx denis!