cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

GraphFrames and DLT

lprevost
Contributor

I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.   I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.  Here are my overrides for the standard job compute policy:

 

{
"spark_version": {
"type": "unlimited",
"defaultValue": "auto:latest-lts-ml"
},
"cluster_type": {
"type": "allowlist",
"defaultValue": "all-purpose",
"values": [
"all-purpose",
"job",
"dlt"
]
}

}

 

However, when I run the DLT job, I get the following error:

ModuleNotFoundError: No module named 'graphframes',None,Map(),Map(),List(),List(),Map())

 

GraphFrames is not pip installable that I know of.  Primary instructions are maven coords as the python package uses underlying java/scala.

Will DLT pipelines support GraphFrames?

Related but unresolved question.

2 REPLIES 2

lprevost
Contributor

@Retired_mod - any chance I can get a definitive answer to this question?  I know I can %pip install in DLT jobs but graphframes requires a maven type install as it uses underlying java/scala modules/jar files.   A related question is whether there is a plan for DLT to support the ML instance (which has GraphFrames installed).    

Thank you.

lprevost
Contributor

Crickets .....

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group