cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How are you deploying graphs?

JamesDryden
New Contributor II

Hi all, I have a couple of use cases that may benefit from using graphs.  I'm interested in whether anyone has graph databases in Production and, if so, whether you're using GraphFrames, Neo4j or something else?  What is the architecture you have these running on?  Any high-level diagrams would be super helpful - thanks so much in advance!

James

3 REPLIES 3

-werners-
Esteemed Contributor III

Yes I have.
I have a case where we use graphframes (using BFS).  Works fine, as long as you use the graph algorithms that are delivered with graphframes. And that can be quite limiting.
We also have a case where we started with graphframes but decided to go with repetead joins (not recursive joins but a join on a join on a join etc up to x levels). Works too of course but ugly as hell.  To be refactored to something better 🙂
And then we have a beast in graphx where we had to apply a greedy graph algorithm.
This couldn't be done in graphframes so graphx to the rescue.
I do not recommend it though.  It does what it has to do, but unless you use it very frequently, the code is very hard to interpret and not userfriendly at all.
So, graphs in spark can be done but frankly, I would try to use plain python or something like that if possible.
There once was an anouncement of cypher (neo4j) coming to spark but that kinda disappeared.

Mantsama4
Contributor III

Can you guys please share how the Neo4j can integrated on Databricks AWS / Azure platform besides graphframes or graphx? Any architecture will be helpful too, thanks.  

Mantu S

-werners-
Esteemed Contributor III

Up to now the way to go is graphx or graphframes.
There is also the possibility to use python libraries or others (single node that is), perhaps even Arrow-based.
Another option is to load the data to a graph database and then move back to databricks after processing.
But the latter 2 are a no go when talking about substantial amounts of data.

It's sad graph algorithms do not get more love from Databricks.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now