cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to specify entry_point for python_wheel_task?

jpwp
New Contributor III

Can someone provide me an example for a python_wheel_task and what the entry_point field should be?

The jobs UI help popup says this about "entry_point":

"Function to call when starting the wheel, for example: main. If the entry point does not exist in the meta-data of the wheel distribution, the function will be called directly using `$packageName.$entryPoint()`."

However, an entry point in python is a combination of the group and a name. e.g. in my setup.py

entry_points = {
    'my_jobs': [
        'a_job = module_name:job_function'
   ]
},

Here the group is "my_jobs" and the name is "a_job". For databricks, should I make entry_point `a_job`, `my_jobs.a_job`, or does databricks require a specific group name for wheels that are run as tasks?

I couldn't find any documentation online to clarify this.

2 ACCEPTED SOLUTIONS

Accepted Solutions

jpwp
New Contributor III

The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`

View solution in original post

hectorfi
New Contributor III

One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.

View solution in original post

11 REPLIES 11

Kaniz
Community Manager
Community Manager

Hi @Joel Pitt​ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Kaniz
Community Manager
Community Manager

Hi @Joel Pitt​ ,

I believe that this should be achievable by the specification of the libraries field (see docs).

Can you try something like this?:

{
  "existing_cluster_id": <cluster_id>,
  "python_wheel_task": {
    "package_name": <package_name>,
    "entry_point": <entry_point>
  },
  "libraries": [
    { "whl": "dbfs:/FileStore/my-lib.whl" }
  ]
}

jpwp
New Contributor III

Hi Kaniz - I'm afraid that doesn't answer the question. I am asking about the expected value for the entry_point field. I am not trying to use an additional library, I am trying to run a python_wheel_task.

Kaniz
Community Manager
Community Manager

Hi @Joel Pitt​ , Please go through this documentation. Let me know if this helps.

Screenshot 2022-01-31 at 1.57.08 PM

Anonymous
Not applicable

@Joel Pitt​ - Let us know if either of Kaniz's resources helps you. If they do, would you be happy to mark that answer as best? That helps other members find the solutions more quickly.

jpwp
New Contributor III

The correct answer to my question is that the "entry_point" in the databricks API has nothing to do with a python wheel's official "entry_point"s. It is just a dotted python path to a python function. e.g. `mymodule.myfunction`

Kaniz
Community Manager
Community Manager

Hi @jpwp , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

hectorfi
New Contributor III

Just in case anyone comes here in the future, this is kind of how Databricks executes these entry points... How I know? I have banged my head against this wall for a couple of hours already.

from importlib import metadata

package_name = "some.package"
entry_point = "my-entry-point"

available_entry_point = metadata.distribution(package_name).entry_points
entry = [ep for ep in available_entry_points if ep.name = entry_point]

if entry:
    enstry[0].load()()
else:
    # Imagine that <package-name> is replaced with the package name provided
    # and same for <entry-point>
    import <package-name>
    <package-name>.<entry-point>()

If you cannot see your entry point usint the following, then you (we) are out of luck.

from importlib import metadata
from pprint import pprint

package_name = "my.package"
pprint(metadata.distribution(package_name).entry_points)

My current working theory is that the user installing the package is not the same user running the execution. Or for some weird reason the metadata is not available at the job runtime...

GabMorin
New Contributor II

How do you build the wheel? I got it working with poetry like so:

entrypoint.py somewhere in your codebase:

 

def entrypoint():
    print("Works")

 

pyproject.toml:

 

[tool.poetry]
name = "package"
version = "1.0.0"
description = "package"
packages = [{include = "src"}, ]  # assuming you have the /src structure

[tool.poetry.scripts]
my_entrypoint = "src.entrypoint:entrypoint" # before : is path to file and after is method name

 


Job config:
  package_name: 'package' --> taken from pyproject.toml
  entrypoint: 'my_entrypoint' --> taken from the pyproject.toml before the `=` of your entrypoint line

(assuming your installed the wheel on the cluster)

I also pulled my hair out over this and am now bald.


FYI, my full setup is micromamba -> poetry -> gitlab -> pulumi -> databricks ☠️

hectorfi
New Contributor III

One thing to note when working with entry points is that if the name is too long, it may not work on Databricks. That was the cause of my issue.

Kaniz
Community Manager
Community Manager

Hi @hectorfi, It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.