Databricks

jpwp · ‎01-09-2022

Can someone provide me an example for a python_wheel_task and what the entry_point field should be?

The jobs UI help popup says this about "entry_point":

"Function to call when starting the wheel, for example: main. If the entry point does not exist in the meta-data of the wheel distribution, the function will be called directly using `$packageName.$entryPoint()`."

However, an entry point in python is a combination of the group and a name. e.g. in my setup.py

entry_points = {
    'my_jobs': [
        'a_job = module_name:job_function'
   ]
},

Here the group is "my_jobs" and the name is "a_job". For databricks, should I make entry_point `a_job`, `my_jobs.a_job`, or does databricks require a specific group name for wheels that are run as tasks?

I couldn't find any documentation online to clarify this.

jpwp · ‎02-01-2022

The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`

View solution in original post

hectorfi · ‎10-23-2023

One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.

View solution in original post

Kaniz · ‎01-10-2022

Hi @Joel Pitt ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Kaniz · ‎01-10-2022

Hi @Joel Pitt ,

I believe that this should be achievable by the specification of the libraries field (see docs).

Can you try something like this?:

{
  "existing_cluster_id": <cluster_id>,
  "python_wheel_task": {
    "package_name": <package_name>,
    "entry_point": <entry_point>
  },
  "libraries": [
    { "whl": "dbfs:/FileStore/my-lib.whl" }
  ]
}

jpwp · ‎01-10-2022

Hi Kaniz - I'm afraid that doesn't answer the question. I am asking about the expected value for the entry_point field. I am not trying to use an additional library, I am trying to run a python_wheel_task.

Kaniz · ‎01-31-2022

Hi @Joel Pitt , Please go through this documentation. Let me know if this helps.

Anonymous · ‎02-01-2022

@Joel Pitt - Let us know if either of Kaniz's resources helps you. If they do, would you be happy to mark that answer as best? That helps other members find the solutions more quickly.

jpwp · ‎02-01-2022

The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`

Kaniz · ‎10-25-2023

Hi @jpwp , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.

hectorfi · ‎09-29-2023

Just in case anyone comes here in the future, this is kind of how Databricks executes these entry points... How I know? I have banged my head against this wall for a couple of hours already.

from importlib import metadata

package_name = "some.package"
entry_point = "my-entry-point"

available_entry_point = metadata.distribution(package_name).entry_points
entry = [ep for ep in available_entry_points if ep.name = entry_point]

if entry:
    enstry[0].load()()
else:
    # Imagine that <package-name> is replaced with the package name provided
    # and same for <entry-point>
    import <package-name>
    <package-name>.<entry-point>()

If you cannot see your entry point usint the following, then you (we) are out of luck.

from importlib import metadata
from pprint import pprint

package_name = "my.package"
pprint(metadata.distribution(package_name).entry_points)

My current working theory is that the user installing the package is not the same user running the execution. Or for some weird reason the metadata is not available at the job runtime...

GabMorin · ‎09-29-2023

How do you build the wheel? I got it working with poetry like so:

entrypoint.py somewhere in your codebase:

def entrypoint():
    print("Works")

pyproject.toml:

[tool.poetry]
name = "package"
version = "1.0.0"
description = "package"
packages = [{include = "src"}, ]  # assuming you have the /src structure

[tool.poetry.scripts]
my_entrypoint = "src.entrypoint:entrypoint" # before : is path to file and after is method name

Job config:
package_name: 'package' --> taken from pyproject.toml
entrypoint: 'my_entrypoint' --> taken from the pyproject.toml before the `=` of your entrypoint line

(assuming your installed the wheel on the cluster)

I also pulled my hair out over this and am now bald.

FYI, my full setup is micromamba -> poetry -> gitlab -> pulumi -> databricks ☠️

hectorfi · ‎10-23-2023

One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.

Kaniz · ‎10-25-2023

Hi @hectorfi, It's great to hear that your query has been successfully resolved. Thank you for your contribution.

Databricks

How to specify entry_point for python_wheel_task?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI