cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

reading databricks tables

arkiboys
Contributor

Hello,
Currently I have created databricks tables in the hive_metastore.databases
To read these tables using a select * query inside the databricks notebook, I have to make sure the databrcks cluster is started.
Question is to do with reading the databricks tables from a c# website.
1- should the cluster be on all the time in databricks? so c# can access the tables?
2- is there a way for c# to hit the tables and yet spin-up the cluster if required to read data?
3- do I have to setup a sql warehouse for c# to read the tables?
4- how can I prepare the close to real cost of how much reading these data from c# will cost?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @arkiboys

 

Cluster Availability:

  • No, the Databricks cluster does not need to be on all the time for your C# application to access the tables.
  • Databricks provides a REST API that allows you to submit queries and retrieve results programmatically. You can start a cluster dynamically when needed, execute your query, and then shut down the cluster afterward.
  • Keep in mind that starting and stopping clusters frequently might impact performance and cost. Consider using an auto-scaling policy to manage cluster availability based on workload.

Dynamic Cluster Spin-Up:

  • Yes, there is a way to spin up a Databricks cluster dynamically when needed.
  • You can create a script or function in your C# application that triggers the cluster start-up when a query needs to be executed. After the query completes, you can shut down the cluster.
  • Use the Databricks REST API to manage cluster lifecycle programmatically.

SQL Warehouse:

  • Databricks itself does not require a separate SQL warehouse for reading tables.
  • However, if you want to use Databricks as a compute engine (similar to MySQL) and get output into your C# application, consider creating tables in Databricks and running queries via an ODBC connection.
  • This approach gives you more control over the SQL query output.

Cost Estimation:

  • To estimate the cost of reading data from Databricks via your C# application, consider the following factors:
    • Cluster cost: Calculate the cost of running the cluster (based on instance type, duration, and number of nodes).
    • Data transfer cost: If you’re transferring large amounts of data between Databricks and your C# application, consider the associated costs.
    • Storage cost: If your Databricks tables are backed by storage (e.g., Azure Data Lake), factor in storage costs.
  • Monitor usage and review billing details in your Databricks workspace to get close-to-real estimates.

Additionally, explore Databricks documentation and examples for best practices on integrating with e...2.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @arkiboys

 

Cluster Availability:

  • No, the Databricks cluster does not need to be on all the time for your C# application to access the tables.
  • Databricks provides a REST API that allows you to submit queries and retrieve results programmatically. You can start a cluster dynamically when needed, execute your query, and then shut down the cluster afterward.
  • Keep in mind that starting and stopping clusters frequently might impact performance and cost. Consider using an auto-scaling policy to manage cluster availability based on workload.

Dynamic Cluster Spin-Up:

  • Yes, there is a way to spin up a Databricks cluster dynamically when needed.
  • You can create a script or function in your C# application that triggers the cluster start-up when a query needs to be executed. After the query completes, you can shut down the cluster.
  • Use the Databricks REST API to manage cluster lifecycle programmatically.

SQL Warehouse:

  • Databricks itself does not require a separate SQL warehouse for reading tables.
  • However, if you want to use Databricks as a compute engine (similar to MySQL) and get output into your C# application, consider creating tables in Databricks and running queries via an ODBC connection.
  • This approach gives you more control over the SQL query output.

Cost Estimation:

  • To estimate the cost of reading data from Databricks via your C# application, consider the following factors:
    • Cluster cost: Calculate the cost of running the cluster (based on instance type, duration, and number of nodes).
    • Data transfer cost: If you’re transferring large amounts of data between Databricks and your C# application, consider the associated costs.
    • Storage cost: If your Databricks tables are backed by storage (e.g., Azure Data Lake), factor in storage costs.
  • Monitor usage and review billing details in your Databricks workspace to get close-to-real estimates.

Additionally, explore Databricks documentation and examples for best practices on integrating with e...2.

arkiboys
Contributor

thank you

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!