11-28-2022 05:22 PM
According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI
I installed with pip using a magic command, but it seems that the package is not included in the cluster itself. Is there a workaround to get graphframes working on the delta live runtime? I tried adding the maven coordinates in the cluster definition but it seems DLT does not support maven libraries.
11-29-2022 12:44 AM
have you tried an ML cluster? I think that is the key.
11-29-2022 07:53 AM
How would I specify that I want a ML cluster? According to the Delta Live Table documentation I should not specify a runtime version....
11-29-2022 08:06 AM
the doc you mention is specifically for the machine learning runtime.
DLT does not use that runtime and, as you correctly asked, you cannot define a runtime for DLT. So my previous answer is not an option. Sorry about that.
Right now, the only way to install libs is by using pip.
According to this doc pip should work.
But as DLT is still pretty new, it is possible that graphframes is not yet supported.
11-30-2022 05:10 AM
DLT is specifically built for Data Engineering work as of now
11-30-2022 05:12 AM
use MLFlow for tracking, monitoring with ML cluster for now
06-28-2024 08:35 AM
I'm also trying to use GraphFrames inside a DLT pipeline. I get an error that graphframes not installed in the cluster. i"m using it successfully in test notebooks using the ML version of the cluster. Is there a way to use this inside a DLT job?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group