GraphFrames and DLT

lprevost
Contributor III

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.  Here are my overrides for the standard job compute policy:

 

{
"spark_version": {
"type": "unlimited",
"defaultValue": "auto:latest-lts-ml"
},
"cluster_type": {
"type": "allowlist",
"defaultValue": "all-purpose",
"values": [
"all-purpose",
"job",
"dlt"
]
}

}

 

However, when I run the DLT job, I get the following error:

ModuleNotFoundError: No module named 'graphframes',None,Map(),Map(),List(),List(),Map())

 

GraphFrames is not pip installable that I know of.  Primary instructions are maven coords as the python package uses underlying java/scala.

Will DLT pipelines support GraphFrames?

Related but unresolved question.

lprevost
Contributor III

@Retired_mod - any chance I can get a definitive answer to this question?  I know I can %pip install in DLT jobs but graphframes requires a maven type install as it uses underlying java/scala modules/jar files.   A related question is whether there is a plan for DLT to support the ML instance (which has GraphFrames installed).    

Thank you.

lprevost
Contributor III

Crickets .....