Databricks Community

JamesDryden · ‎10-15-2024

Hi all, I have a couple of use cases that may benefit from using graphs. I'm interested in whether anyone has graph databases in Production and, if so, whether you're using GraphFrames, Neo4j or something else? What is the architecture you have these running on? Any high-level diagrams would be super helpful - thanks so much in advance!

James

-werners- · ‎10-15-2024

Yes I have.
I have a case where we use graphframes (using BFS). Works fine, as long as you use the graph algorithms that are delivered with graphframes. And that can be quite limiting.
We also have a case where we started with graphframes but decided to go with repetead joins (not recursive joins but a join on a join on a join etc up to x levels). Works too of course but ugly as hell. To be refactored to something better 🙂
And then we have a beast in graphx where we had to apply a greedy graph algorithm.
This couldn't be done in graphframes so graphx to the rescue.
I do not recommend it though. It does what it has to do, but unless you use it very frequently, the code is very hard to interpret and not userfriendly at all.
So, graphs in spark can be done but frankly, I would try to use plain python or something like that if possible.
There once was an anouncement of cypher (neo4j) coming to spark but that kinda disappeared.

Mantsama4 · ‎02-08-2025

Can you guys please share how the Neo4j can integrated on Databricks AWS / Azure platform besides graphframes or graphx? Any architecture will be helpful too, thanks.

Mantu S

-werners- · ‎02-14-2025

Up to now the way to go is graphx or graphframes.
There is also the possibility to use python libraries or others (single node that is), perhaps even Arrow-based.
Another option is to load the data to a graph database and then move back to databricks after processing.
But the latter 2 are a no go when talking about substantial amounts of data.

It's sad graph algorithms do not get more love from Databricks.