Databricks Community

jpwp · ‎01-09-2022

Can someone provide me an example for a python_wheel_task and what the entry_point field should be?

The jobs UI help popup says this about "entry_point":

"Function to call when starting the wheel, for example: main. If the entry point does not exist in the meta-data of the wheel distribution, the function will be called directly using `$packageName.$entryPoint()`."

However, an entry point in python is a combination of the group and a name. e.g. in my setup.py

entry_points = {
    'my_jobs': [
        'a_job = module_name:job_function'
   ]
},

Here the group is "my_jobs" and the name is "a_job". For databricks, should I make entry_point `a_job`, `my_jobs.a_job`, or does databricks require a specific group name for wheels that are run as tasks?

I couldn't find any documentation online to clarify this.

jpwp · ‎02-01-2022

The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`

View solution in original post

hectorfi · ‎09-29-2023

Just in case anyone comes here in the future, this is kind of how Databricks executes these entry points... How I know? I have banged my head against this wall for a couple of hours already.

from importlib import metadata

package_name = "some.package"
entry_point = "my-entry-point"

available_entry_point = metadata.distribution(package_name).entry_points
entry = [ep for ep in available_entry_points if ep.name = entry_point]

if entry:
    enstry[0].load()()
else:
    # Imagine that <package-name> is replaced with the package name provided
    # and same for <entry-point>
    import <package-name>
    <package-name>.<entry-point>()

If you cannot see your entry point usint the following, then you (we) are out of luck.

from importlib import metadata
from pprint import pprint

package_name = "my.package"
pprint(metadata.distribution(package_name).entry_points)

My current working theory is that the user installing the package is not the same user running the execution. Or for some weird reason the metadata is not available at the job runtime...

View solution in original post

GabMorin · ‎09-29-2023

How do you build the wheel? I got it working with poetry like so:

entrypoint.py somewhere in your codebase:

def entrypoint():
    print("Works")

pyproject.toml:

[tool.poetry]
name = "package"
version = "1.0.0"
description = "package"
packages = [{include = "src"}, ]  # assuming you have the /src structure

[tool.poetry.scripts]
my_entrypoint = "src.entrypoint:entrypoint" # before : is path to file and after is method name

Job config:
package_name: 'package' --> taken from pyproject.toml
entrypoint: 'my_entrypoint' --> taken from the pyproject.toml before the `=` of your entrypoint line

(assuming your installed the wheel on the cluster)

I also pulled my hair out over this and am now bald.

FYI, my full setup is micromamba -> poetry -> gitlab -> pulumi -> databricks ☠️

View solution in original post

hectorfi · ‎10-23-2023

One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.

View solution in original post

jpwp · ‎01-10-2022

Hi Kaniz - I'm afraid that doesn't answer the question. I am asking about the expected value for the entry_point field. I am not trying to use an additional library, I am trying to run a python_wheel_task.

Anonymous · ‎02-01-2022

@Joel Pitt - Let us know if either of Kaniz's resources helps you. If they do, would you be happy to mark that answer as best? That helps other members find the solutions more quickly.

jpwp · ‎02-01-2022

The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`

hectorfi · ‎09-29-2023

Just in case anyone comes here in the future, this is kind of how Databricks executes these entry points... How I know? I have banged my head against this wall for a couple of hours already.

from importlib import metadata

package_name = "some.package"
entry_point = "my-entry-point"

available_entry_point = metadata.distribution(package_name).entry_points
entry = [ep for ep in available_entry_points if ep.name = entry_point]

if entry:
    enstry[0].load()()
else:
    # Imagine that <package-name> is replaced with the package name provided
    # and same for <entry-point>
    import <package-name>
    <package-name>.<entry-point>()

If you cannot see your entry point usint the following, then you (we) are out of luck.

from importlib import metadata
from pprint import pprint

package_name = "my.package"
pprint(metadata.distribution(package_name).entry_points)

My current working theory is that the user installing the package is not the same user running the execution. Or for some weird reason the metadata is not available at the job runtime...

GabMorin · ‎09-29-2023

How do you build the wheel? I got it working with poetry like so:

entrypoint.py somewhere in your codebase:

def entrypoint():
    print("Works")

pyproject.toml:

[tool.poetry]
name = "package"
version = "1.0.0"
description = "package"
packages = [{include = "src"}, ]  # assuming you have the /src structure

[tool.poetry.scripts]
my_entrypoint = "src.entrypoint:entrypoint" # before : is path to file and after is method name

Job config:
package_name: 'package' --> taken from pyproject.toml
entrypoint: 'my_entrypoint' --> taken from the pyproject.toml before the `=` of your entrypoint line

(assuming your installed the wheel on the cluster)

I also pulled my hair out over this and am now bald.

FYI, my full setup is micromamba -> poetry -> gitlab -> pulumi -> databricks ☠️

hectorfi · ‎10-23-2023

One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.

VictorS · ‎07-09-2024

You are a hero for supplying a full example - especially the validation part is great. Thanks dude!

MRMintechGlobal · ‎09-18-2024

Just want to confirm - my project uses PDM not poetry

and as such uses

[project.entry-points.packages]

Rather than

[tool.poetry.scripts]

and the bundle is failing to run on the cluster - as it can't find the entry point - is this expected behavior?

MRMintechGlobal · ‎09-18-2024

My issue appears to have been uploading wheel with identical version numbers during development.

I've added dynamic versioning to the packages using git hash and timestamp to ensure the latest is installed and runs.

def get_version(version=__version__):
    try:
        import subprocess

        git_hash = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode("ascii").strip()
        version += f"+{git_hash}-{int(time.time())}"
    except Exception as e:
        print(e)
    return version

Databricks Community

How to specify entry_point for python_wheel_task?

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟