cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Workaround for GraphFrames not working on Delta Live Table?

amartinez
New Contributor III

According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI

I installed with pip using a magic command, but it seems that the package is not included in the cluster itself. Is there a workaround to get graphframes working on the delta live runtime? I tried adding the maven coordinates in the cluster definition but it seems DLT does not support maven libraries.

6 REPLIES 6

-werners-
Esteemed Contributor III

have you tried an ML cluster? I think that is the key.

amartinez
New Contributor III

How would I specify that I want a ML cluster? According to the Delta Live Table documentation I should not specify a runtime version....

-werners-
Esteemed Contributor III

the doc you mention is specifically for the machine learning runtime.

DLT does not use that runtime and, as you correctly asked, you cannot define a runtime for DLT. So my previous answer is not an option. Sorry about that.

Right now, the only way to install libs is by using pip.

According to this doc pip should work.

But as DLT is still pretty new, it is possible that graphframes is not yet supported.

Senthil1
Contributor

DLT is specifically built for Data Engineering work as of now

use MLFlow for tracking, monitoring with ML cluster for now

lprevost
Contributor

I'm also trying to use GraphFrames inside a DLT pipeline.   I get an error that graphframes not installed in the cluster.   i"m using it successfully in test notebooks using the ML version of the cluster.  Is there a way to use this inside a DLT job?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group