cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling Delta Tables using JDBC

Sid1805
New Contributor II

Hi team,

If we kill - clusters every-time will the connection details changes.

if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.

Also if I want to call a Delta Table from an API using JDBC - should I use the SQL End point or JDBC ( Rest API is still not GA ) - what difference does it makes. Performance wise I understand it would be SQL End point but are there any major difference.

New into the Databricks block so apologies for my ignorance !

Please guide .

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Siddharth Krishna​ :

When you kill a cluster in Databricks, the connection details for accessing the cluster will change, as the cluster will no longer exist. You will need to create a new cluster and update your connection details to reflect the new cluster.

To avoid impacting end-users when you need to update your clusters, you can set up a load balancer or proxy server in front of your Databricks clusters. This allows you to direct traffic to different clusters without exposing the underlying cluster details to end-users. This way, you can update your clusters or switch to new clusters without impacting the end-users.

Regarding calling a Delta table from an API using JDBC - The SQL endpoint is more performant because it allows you to execute SQL queries directly on the cluster.

In terms of major differences between the two, the JDBC API requires more setup and configuration, while the SQL endpoint is easier to use.

View solution in original post

5 REPLIES 5

Anonymous
Not applicable

@Siddharth Krishna​ :

When you kill a cluster in Databricks, the connection details for accessing the cluster will change, as the cluster will no longer exist. You will need to create a new cluster and update your connection details to reflect the new cluster.

To avoid impacting end-users when you need to update your clusters, you can set up a load balancer or proxy server in front of your Databricks clusters. This allows you to direct traffic to different clusters without exposing the underlying cluster details to end-users. This way, you can update your clusters or switch to new clusters without impacting the end-users.

Regarding calling a Delta table from an API using JDBC - The SQL endpoint is more performant because it allows you to execute SQL queries directly on the cluster.

In terms of major differences between the two, the JDBC API requires more setup and configuration, while the SQL endpoint is easier to use.

Sid1805
New Contributor II

@Suteja Kanuri​ thanks for your inputs. Its a clear explanation. Just trying my luck for Azure as Cloud Provider do you have any references how they configured the cluster on backend of LB as the cluster IPs would also be dynamic everytime they terminate the cluster. Use case is we want to call the Delta tables using JDBC from a Java based API but also save cost in non business hours so plan to shut the clusters. This re-pointing exercise in the LB backend would be a pain or there is something which I miss between your lines 🙂 thanks!

Sid1805
New Contributor II

@Suteja Kanuri​ - I did try terminating and starting a given cluster, the JDBC connection remains the same. I think only when we delete the cluster itself that we need re-pointing is the understanding correct ? thanks

Anonymous
Not applicable

@Siddharth Krishna​ :

When you terminate and restart a cluster in Databricks, the connection details such as the IP address, hostname, and other network settings remain the same. This means that any JDBC connections to the cluster will continue to work without any changes.

However, if you delete the cluster and create a new one with a different name, the connection details will change, and you will need to update the JDBC connection string in your application to point to the new cluster.

It's important to note that even if you restart a cluster, any running Spark jobs or applications may be interrupted and will need to be restarted. So, it's always a good practice to ensure that your applications are designed to handle such interruptions and failures.

Anonymous
Not applicable

Hi @Siddharth Krishna​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.