Databricks Community

arkiboys · ‎02-15-2024

Hello,
Currently I have created databricks tables in the hive_metastore.databases
To read these tables using a select * query inside the databricks notebook, I have to make sure the databrcks cluster is started.
Question is to do with reading the databricks tables from a c# website.
1- should the cluster be on all the time in databricks? so c# can access the tables?
2- is there a way for c# to hit the tables and yet spin-up the cluster if required to read data?
3- do I have to setup a sql warehouse for c# to read the tables?
4- how can I prepare the close to real cost of how much reading these data from c# will cost?

Thank you

Kaniz_Fatma · ‎02-16-2024

Hi @arkiboys,

Cluster Availability:

No, the Databricks cluster does not need to be on all the time for your C# application to access the tables.
Databricks provides a REST API that allows you to submit queries and retrieve results programmatically. You can start a cluster dynamically when needed, execute your query, and then shut down the cluster afterward.
Keep in mind that starting and stopping clusters frequently might impact performance and cost. Consider using an auto-scaling policy to manage cluster availability based on workload.

Dynamic Cluster Spin-Up:

Yes, there is a way to spin up a Databricks cluster dynamically when needed.
You can create a script or function in your C# application that triggers the cluster start-up when a query needs to be executed. After the query completes, you can shut down the cluster.
Use the Databricks REST API to manage cluster lifecycle programmatically.

SQL Warehouse:

Databricks itself does not require a separate SQL warehouse for reading tables.
However, if you want to use Databricks as a compute engine (similar to MySQL) and get output into your C# application, consider creating tables in Databricks and running queries via an ODBC connection.
This approach gives you more control over the SQL query output.

Cost Estimation:

To estimate the cost of reading data from Databricks via your C# application, consider the following factors:
- Cluster cost: Calculate the cost of running the cluster (based on instance type, duration, and number of nodes).
- Data transfer cost: If you’re transferring large amounts of data between Databricks and your C# application, consider the associated costs.
- Storage cost: If your Databricks tables are backed by storage (e.g., Azure Data Lake), factor in storage costs.
Monitor usage and review billing details in your Databricks workspace to get close-to-real estimates.

Additionally, explore Databricks documentation and examples for best practices on integrating with e...2.

View solution in original post

Kaniz_Fatma · ‎02-16-2024