How are you deploying graphs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2024 12:13 AM
Hi all, I have a couple of use cases that may benefit from using graphs. I'm interested in whether anyone has graph databases in Production and, if so, whether you're using GraphFrames, Neo4j or something else? What is the architecture you have these running on? Any high-level diagrams would be super helpful - thanks so much in advance!
James
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2024 04:25 AM
Yes I have.
I have a case where we use graphframes (using BFS). Works fine, as long as you use the graph algorithms that are delivered with graphframes. And that can be quite limiting.
We also have a case where we started with graphframes but decided to go with repetead joins (not recursive joins but a join on a join on a join etc up to x levels). Works too of course but ugly as hell. To be refactored to something better 🙂
And then we have a beast in graphx where we had to apply a greedy graph algorithm.
This couldn't be done in graphframes so graphx to the rescue.
I do not recommend it though. It does what it has to do, but unless you use it very frequently, the code is very hard to interpret and not userfriendly at all.
So, graphs in spark can be done but frankly, I would try to use plain python or something like that if possible.
There once was an anouncement of cypher (neo4j) coming to spark but that kinda disappeared.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2025 08:52 AM
Can you guys please share how the Neo4j can integrated on Databricks AWS / Azure platform besides graphframes or graphx? Any architecture will be helpful too, thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2025 05:47 AM
Up to now the way to go is graphx or graphframes.
There is also the possibility to use python libraries or others (single node that is), perhaps even Arrow-based.
Another option is to load the data to a graph database and then move back to databricks after processing.
But the latter 2 are a no go when talking about substantial amounts of data.
It's sad graph algorithms do not get more love from Databricks.

