GraphFrames and DLT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2024 07:06 AM
I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image. I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job. Here are my overrides for the standard job compute policy:
{
"spark_version": {
"type": "unlimited",
"defaultValue": "auto:latest-lts-ml"
},
"cluster_type": {
"type": "allowlist",
"defaultValue": "all-purpose",
"values": [
"all-purpose",
"job",
"dlt"
]
}
}
However, when I run the DLT job, I get the following error:
ModuleNotFoundError: No module named 'graphframes',None,Map(),Map(),List(),List(),Map())
GraphFrames is not pip installable that I know of. Primary instructions are maven coords as the python package uses underlying java/scala.
Will DLT pipelines support GraphFrames?
Related but unresolved question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2024 12:26 PM
@Retired_mod - any chance I can get a definitive answer to this question? I know I can %pip install in DLT jobs but graphframes requires a maven type install as it uses underlying java/scala modules/jar files. A related question is whether there is a plan for DLT to support the ML instance (which has GraphFrames installed).
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-21-2024 11:25 AM
Crickets .....