cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

toproximahk
New Contributor II

Thanks for the Databricks community and maintaining such a valuable platform.

I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks Runtime for Machine Learning (17.3 LTS) is still using v0.8.4-db1-spark3.5.

We’re particularly interested in recent updates such as the early stopping feature in Pregel (PR #550):

feat: add early stopping to Pregel by SemyonSinchenko · Pull Request #550 · graphframes/graphframes ...

Releases · graphframes/graphframes

Databricks Runtime 17.3 LTS for Machine Learning | Databricks on AWS

Thanks!

4 REPLIES 4

toproximahk
New Contributor II

For PySpark.

 

nayan_wylde
Esteemed Contributor

I don't see any dates for it. But you can try this work around.

If you need access to the latest GraphFrames features

Manual Installation: You can manually install the GraphFrames v0.9.3 JAR in your cluster.

Sem-Sinchenko
New Contributor II

You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:

io.graphframes:graphframes-spark3_2.12:0.10.0

and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install all the JVM dependencies.

But most probably it won't work on DBR ML runtimes because you will have in CP two differently named graphframes JARs, but with the same namespace and barely anyone will tell you how it will be resolved in runtime... I think the best way is just using generic runtime instead of DBR ML.

Louis_Frolio
Databricks Employee
Databricks Employee

Greeting @toproximahk ,  thanks for the kind words and for the detailed pointers.

 

What’s in Databricks Runtime 17.3 LTS ML today

  • The preinstalled GraphFrames JAR in Databricks Runtime 17.3 LTS for Machine Learning is org.graphframes:graphframes_2.13:0.8.4-db1-spark3.5 on both CPU and GPU clusters, as listed in the Java/Scala libraries section of the 17.3 LTS ML release notes.
  • This same GraphFrames version is also listed for 17.1 and 17.2 ML, indicating no change across recent 17.x ML releases.
  • Databricks Runtime 17.3 LTS is powered by Apache Spark 4.0.0, which is relevant when considering compatibility with any newer GraphFrames artifacts.

Is there a published upgrade timeline to 0.9.x?

  • There is no publicly documented timeline to upgrade the GraphFrames library in the 17.3 LTS ML runtime; neither the 17.3 LTS ML release notes nor the runtime versions/compatibility page mention an upgrade plan for GraphFrames.
  • As of the latest docs, all 17.x ML release notes continue to list the preinstalled GraphFrames as 0.8.4-db1-spark3.5 (17.0, 17.1, 17.2, 17.3), and there’s no change log entry pointing to a move to 0.9.x.

About the Pregel “early stopping” change you referenced

  • I wasn’t able to retrieve the GitHub PR and releases pages via the document reader you shared; however, the 17.3 LTS ML docs don’t indicate that new Pregel functionality from GraphFrames 0.9.x is included in the preinstalled runtime package today.

Options if you need 0.9.x functionality sooner

  • If you want to experiment, you can try attaching a newer GraphFrames JAR to a test cluster. Because 17.3 LTS uses Spark 4.0.0, you’ll want to carefully validate Scala/Spark compatibility of any GraphFrames artifact you bring in; the runtime’s built-in GraphFrames is compiled for Spark 3.5, which is why we recommend validating before relying on it in production.
  • If you prefer to stick to preinstalled libraries, consider whether your workload can remain on the features available in the 0.8.4-db1 build while you monitor runtime release notes for future updates; Databricks will reflect any change to the bundled GraphFrames version in the ML runtime release notes.
 
Hope this helps, Louis.