โ01-09-2022 04:12 PM
Can someone provide me an example for a python_wheel_task and what the entry_point field should be?
The jobs UI help popup says this about "entry_point":
"Function to call when starting the wheel, for example: main. If the entry point does not exist in the meta-data of the wheel distribution, the function will be called directly using `$packageName.$entryPoint()`."
However, an entry point in python is a combination of the group and a name. e.g. in my setup.py
entry_points = {
'my_jobs': [
'a_job = module_name:job_function'
]
},
Here the group is "my_jobs" and the name is "a_job". For databricks, should I make entry_point `a_job`, `my_jobs.a_job`, or does databricks require a specific group name for wheels that are run as tasks?
I couldn't find any documentation online to clarify this.
โ02-01-2022 11:57 AM
The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`
โ09-29-2023 08:23 AM
Just in case anyone comes here in the future, this is kind of how Databricks executes these entry points... How I know? I have banged my head against this wall for a couple of hours already.
from importlib import metadata
package_name = "some.package"
entry_point = "my-entry-point"
available_entry_point = metadata.distribution(package_name).entry_points
entry = [ep for ep in available_entry_points if ep.name = entry_point]
if entry:
enstry[0].load()()
else:
# Imagine that <package-name> is replaced with the package name provided
# and same for <entry-point>
import <package-name>
<package-name>.<entry-point>()
If you cannot see your entry point usint the following, then you (we) are out of luck.
from importlib import metadata
from pprint import pprint
package_name = "my.package"
pprint(metadata.distribution(package_name).entry_points)
My current working theory is that the user installing the package is not the same user running the execution. Or for some weird reason the metadata is not available at the job runtime...
โ09-29-2023 01:31 PM - edited โ09-29-2023 01:50 PM
How do you build the wheel? I got it working with poetry like so:
entrypoint.py somewhere in your codebase:
def entrypoint():
print("Works")
pyproject.toml:
[tool.poetry]
name = "package"
version = "1.0.0"
description = "package"
packages = [{include = "src"}, ] # assuming you have the /src structure
[tool.poetry.scripts]
my_entrypoint = "src.entrypoint:entrypoint" # before : is path to file and after is method name
Job config:
package_name: 'package' --> taken from pyproject.toml
entrypoint: 'my_entrypoint' --> taken from the pyproject.toml before the `=` of your entrypoint line
(assuming your installed the wheel on the cluster)
I also pulled my hair out over this and am now bald.
FYI, my full setup is micromamba -> poetry -> gitlab -> pulumi -> databricks โ ๏ธ
โ10-23-2023 04:49 AM
One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.
โ01-10-2022 10:54 AM
Hi Kaniz - I'm afraid that doesn't answer the question. I am asking about the expected value for the entry_point field. I am not trying to use an additional library, I am trying to run a python_wheel_task.
โ02-01-2022 08:03 AM
@Joel Pittโ - Let us know if either of Kaniz's resources helps you. If they do, would you be happy to mark that answer as best? That helps other members find the solutions more quickly.
โ02-01-2022 11:57 AM
The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`
โ09-29-2023 08:23 AM
Just in case anyone comes here in the future, this is kind of how Databricks executes these entry points... How I know? I have banged my head against this wall for a couple of hours already.
from importlib import metadata
package_name = "some.package"
entry_point = "my-entry-point"
available_entry_point = metadata.distribution(package_name).entry_points
entry = [ep for ep in available_entry_points if ep.name = entry_point]
if entry:
enstry[0].load()()
else:
# Imagine that <package-name> is replaced with the package name provided
# and same for <entry-point>
import <package-name>
<package-name>.<entry-point>()
If you cannot see your entry point usint the following, then you (we) are out of luck.
from importlib import metadata
from pprint import pprint
package_name = "my.package"
pprint(metadata.distribution(package_name).entry_points)
My current working theory is that the user installing the package is not the same user running the execution. Or for some weird reason the metadata is not available at the job runtime...
โ09-29-2023 01:31 PM - edited โ09-29-2023 01:50 PM
How do you build the wheel? I got it working with poetry like so:
entrypoint.py somewhere in your codebase:
def entrypoint():
print("Works")
pyproject.toml:
[tool.poetry]
name = "package"
version = "1.0.0"
description = "package"
packages = [{include = "src"}, ] # assuming you have the /src structure
[tool.poetry.scripts]
my_entrypoint = "src.entrypoint:entrypoint" # before : is path to file and after is method name
Job config:
package_name: 'package' --> taken from pyproject.toml
entrypoint: 'my_entrypoint' --> taken from the pyproject.toml before the `=` of your entrypoint line
(assuming your installed the wheel on the cluster)
I also pulled my hair out over this and am now bald.
FYI, my full setup is micromamba -> poetry -> gitlab -> pulumi -> databricks โ ๏ธ
โ10-23-2023 04:49 AM
One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.
โ07-09-2024 02:43 AM
You are a hero for supplying a full example - especially the validation part is great. Thanks dude!
โ09-18-2024 01:30 AM
Just want to confirm - my project uses PDM not poetry
and as such uses
[project.entry-points.packages]
Rather than
[tool.poetry.scripts]
and the bundle is failing to run on the cluster - as it can't find the entry point - is this expected behavior?
โ09-18-2024 01:38 AM
My issue appears to have been uploading wheel with identical version numbers during development.
I've added dynamic versioning to the packages using git hash and timestamp to ensure the latest is installed and runs.
def get_version(version=__version__):
try:
import subprocess
git_hash = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode("ascii").strip()
version += f"+{git_hash}-{int(time.time())}"
except Exception as e:
print(e)
return version
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group