- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 11:02 AM - edited 01-13-2025 11:03 AM
Hello @Alberto_Umana , considers that outside the workflow I can install the library, when I ran the workflow through dabs I still got error:
CalledProcessError: Command 'pip --disable-pip-version-check install '/Workspace/Shared/test-sync-lib/.internal/data_pipelines-0.0.1-py3-none-any.whl'' returned non-zero exit status 1.
and looking in detail the error still got:
ERROR: Package 'data-pipelines' requires a different Python: 3.10.12 not in '<4.0,>=3.11'
But sounds strange, since I wrote that environment field.. that will be inherit to each task automatically in theory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2025 11:44 AM
Hi @jeremy98,
I think it has to do with the serverless version being used outside the workflow versus in DABs, since python version changes. please see: https://docs.databricks.com/en/release-notes/serverless/index.html both the versions have different python versions which might cause dependencies issues. I am not sure how to specify the serverless version in DABs, I will check internally.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2025 12:01 AM
Good morning,
Thanks for the answer, yes please let me know because I found the solution to declare it in the higher level but seems still that doesn't catch the environment inside each task, but If I look the task structure there, there is the environment set but doesn't work
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2025 02:48 AM
@Alberto_Umana, one of my colleague did it using a spark_python_task... maybe this is something only for certain types of files?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2025 04:44 AM
Hi @jeremy98,
When you mentioned using a spark_python_task did it work using serverless too?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-14-2025 04:54 AM - edited 01-14-2025 04:55 AM
Hello,
tasks:
- task_key: batch_inference
description: "Trigger batch inference processing"
spark_python_task:
python_file: "../py_scripts/infge.py" # Path to your Python script
parameters: ["--function", "run_batch_workflow", "--env", "${bundle.target}"]
environment_key: default # Reference the environment specification
timeout_seconds: 6000 # 100 minutes timeout for the task
environments:
- environment_key: default
spec:
client: "1"
dependencies:
- azure-batch==14.2.0
- azure-identity==1.19.0
- azure-keyvault-secrets==4.9.0
He did in this way, but the libraries he needs are only the ones the you see in dependencies. So, different from my side.
Is it also true that the installation of this libraries are done only once, although we are in the serveless mode?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-15-2025 12:12 AM - edited 01-15-2025 12:12 AM
Hello @Alberto_Umana , news?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2025 01:38 AM
Ping @Alberto_Umana