cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to use custom whl file + pypi repo with a job cluster in asset bundles?

VicS
Visitor

I tried looking through the documentation but it is confusing at best and misses important parts at worst.  Is there any place where the entire syntax and ALL options for asset bundle YAMLs are described? 

I found this https://docs.databricks.com/en/dev-tools/bundles/index.html, but aside from rudimentary examples, it is incredibly difficult to figure out what key should be where in the YAML... Sometimes it was easier to check out the Terraform documentation for databricks jobs/tasks and work backwards from there, instead of reading the Databricks documentation. 

I cannot figure out how to use job cluster with custom built whl files. I build one whl file on the fly and pull another as a dependency from a private Pypi server. 

I was able to do it with a general purpose cluster which I created beforehand, but the job cluster never seems to install the libraries.

With a general purpose:

 

 

 

- task_key: ingestion
    existing_cluster_id: ${var.existing_cluster_id}
    python_wheel_task:
        package_name: my_package
        entry_point: my_package_ep
        libraries:
            - whl: ./dist/*.whl
            - pypi:
            package: another-package==1.0.0
            repo: https://pkgs.dev.azure.com/xxxx/xx/_packaging/xxx/pypi/simple/
        

 

 

 

I tried various possibilities where I thought the "libraries" key makes sense, but due to a lack of documentation I was not able to figure it out. Neither below the task, nor in the environment did it work.

 

 

 

      environments:
        - environment_key: myenv
          spec:
            client: "1"
            dependencies:
              - whl: ./dist/*.whl
              - pypi:
                  package: pyspark-framework==${var.pyspark_framework_version}
                  repo: https://pkgs.dev.azure.com/xxx/xxxx/_packaging/xxxx/pypi/simple/

      tasks:
        - task_key: mytask
          environment_key: myenv
          python_wheel_task:
            package_name: mypackage
            entry_point: mypackage_ep
            libraries:
              - whl: ./dist/*.whl
              - pypi:
                  package: pyspark-framework==${var.pyspark_framework_version}
                  repo: https://pkgs.dev.azure.com/xxx/xxxx/_packaging/xxxx/pypi/simple/

 

 

 

Can anyone tell me how to properly add my whl files (local dist + from pypi) to a job_cluster?

1 REPLY 1

VicS
Visitor

I also tried the following, but couldn't get it to work with my custom pypi index. Any help is appreciated.

environments:
- environment_key: myenv
spec:
client: "1"
dependencies:
- another-package==1.0.0@https://pkgs.dev.azure.com/xx/xxx/_packaging/xxx/pypi/simple/

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group