cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

reading databricks tables

arkiboys
Contributor

Hello,
Currently I have created databricks tables in the hive_metastore.databases
To read these tables using a select * query inside the databricks notebook, I have to make sure the databrcks cluster is started.
Question is to do with reading the databricks tables from a c# website.
1- should the cluster be on all the time in databricks? so c# can access the tables?
2- is there a way for c# to hit the tables and yet spin-up the cluster if required to read data?
3- do I have to setup a sql warehouse for c# to read the tables?
4- how can I prepare the close to real cost of how much reading these data from c# will cost?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @arkiboys

 

Cluster Availability:

  • No, the Databricks cluster does not need to be on all the time for your C# application to access the tables.
  • Databricks provides a REST API that allows you to submit queries and retrieve results programmatically. You can start a cluster dynamically when needed, execute your query, and then shut down the cluster afterward.
  • Keep in mind that starting and stopping clusters frequently might impact performance and cost. Consider using an auto-scaling policy to manage cluster availability based on workload.

Dynamic Cluster Spin-Up:

  • Yes, there is a way to spin up a Databricks cluster dynamically when needed.
  • You can create a script or function in your C# application that triggers the cluster start-up when a query needs to be executed. After the query completes, you can shut down the cluster.
  • Use the Databricks REST API to manage cluster lifecycle programmatically.

SQL Warehouse:

  • Databricks itself does not require a separate SQL warehouse for reading tables.
  • However, if you want to use Databricks as a compute engine (similar to MySQL) and get output into your C# application, consider creating tables in Databricks and running queries via an ODBC connection.
  • This approach gives you more control over the SQL query output.

Cost Estimation:

  • To estimate the cost of reading data from Databricks via your C# application, consider the following factors:
    • Cluster cost: Calculate the cost of running the cluster (based on instance type, duration, and number of nodes).
    • Data transfer cost: If you’re transferring large amounts of data between Databricks and your C# application, consider the associated costs.
    • Storage cost: If your Databricks tables are backed by storage (e.g., Azure Data Lake), factor in storage costs.
  • Monitor usage and review billing details in your Databricks workspace to get close-to-real estimates.

Additionally, explore Databricks documentation and examples for best practices on integrating with e...2.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @arkiboys

 

Cluster Availability:

  • No, the Databricks cluster does not need to be on all the time for your C# application to access the tables.
  • Databricks provides a REST API that allows you to submit queries and retrieve results programmatically. You can start a cluster dynamically when needed, execute your query, and then shut down the cluster afterward.
  • Keep in mind that starting and stopping clusters frequently might impact performance and cost. Consider using an auto-scaling policy to manage cluster availability based on workload.

Dynamic Cluster Spin-Up:

  • Yes, there is a way to spin up a Databricks cluster dynamically when needed.
  • You can create a script or function in your C# application that triggers the cluster start-up when a query needs to be executed. After the query completes, you can shut down the cluster.
  • Use the Databricks REST API to manage cluster lifecycle programmatically.

SQL Warehouse:

  • Databricks itself does not require a separate SQL warehouse for reading tables.
  • However, if you want to use Databricks as a compute engine (similar to MySQL) and get output into your C# application, consider creating tables in Databricks and running queries via an ODBC connection.
  • This approach gives you more control over the SQL query output.

Cost Estimation:

  • To estimate the cost of reading data from Databricks via your C# application, consider the following factors:
    • Cluster cost: Calculate the cost of running the cluster (based on instance type, duration, and number of nodes).
    • Data transfer cost: If you’re transferring large amounts of data between Databricks and your C# application, consider the associated costs.
    • Storage cost: If your Databricks tables are backed by storage (e.g., Azure Data Lake), factor in storage costs.
  • Monitor usage and review billing details in your Databricks workspace to get close-to-real estimates.

Additionally, explore Databricks documentation and examples for best practices on integrating with e...2.

arkiboys
Contributor

thank you

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.