cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

GraphFrames and DLT

lprevost
Contributor II

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.  Here are my overrides for the standard job compute policy:

 

{
"spark_version": {
"type": "unlimited",
"defaultValue": "auto:latest-lts-ml"
},
"cluster_type": {
"type": "allowlist",
"defaultValue": "all-purpose",
"values": [
"all-purpose",
"job",
"dlt"
]
}

}

 

However, when I run the DLT job, I get the following error:

ModuleNotFoundError: No module named 'graphframes',None,Map(),Map(),List(),List(),Map())

 

GraphFrames is not pip installable that I know of.  Primary instructions are maven coords as the python package uses underlying java/scala.

Will DLT pipelines support GraphFrames?

Related but unresolved question.

2 REPLIES 2

lprevost
Contributor II

@Retired_mod - any chance I can get a definitive answer to this question?  I know I can %pip install in DLT jobs but graphframes requires a maven type install as it uses underlying java/scala modules/jar files.   A related question is whether there is a plan for DLT to support the ML instance (which has GraphFrames installed).    

Thank you.

lprevost
Contributor II

Crickets .....

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now