cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta sharing vs CosmosDB

Phani1
Valued Contributor II

 

Hi All,

We have a situation where we write data to CosmosDB and create JSON data for a transaction table, which includes a mini statement in JSON format.

Now, we want to introduce the concept of delta sharing and share the transaction table. The Java application will access the delta table to generate JSON data, and there could be hundreds of API calls to the delta sharing table.

We would like to understand the costs involved in this scenario when comparing delta sharing(computation costs, egress cost etc ) to CosmosDB. Additionally, we want to know if there are any performance issues when accessing the delta sharing table concurrently through API calls to generate JSON data.

Regards,

Phani

 

3 REPLIES 3

Walter_C
Databricks Employee
Databricks Employee

Costs:

  1. Egress Costs:

    • Delta Sharing: Within the same region, Delta Sharing incurs no egress costs. However, sharing data across different regions or clouds may result in egress fees charged by the cloud provider. Databricks supports sharing from Cloudflare R2, which incurs no egress fees. Tools and recommendations are available to monitor and avoid egress fees (source: "Monitor and manage Delta Sharing egress costs (for providers)").
    • CosmosDB: Egress costs apply when data is read from CosmosDB and transferred out of the Azure region. The exact costs depend on the amount of data transferred and the regions involved.
  2. Computation Costs:

    • Delta Sharing: The data provider incurs storage costs for Delta or Parquet files, but there are no compute costs on the provider's side for sharing data. The recipient incurs data processing costs when they access and process the shared data (source: "Delta Sharing Cost Breakdown").
    • CosmosDB: Costs are associated with the provisioned throughput (RU/s) for read and write operations, as well as storage costs.

Performance Issues:

  • Delta Sharing: There can be performance issues when accessing the Delta Sharing table concurrently through API calls. For example, if the number of API requests exceeds the predefined rate limit, it can result in "Too Many Requests" errors (HTTP 429). 
  • CosmosDB: CosmosDB is designed to handle high throughput and low latency for concurrent operations. However, performance can be impacted if the provisioned throughput is not sufficient for the workload

Phani1
Valued Contributor II

Thanks for  your reply,

Right now, the team is transferring data from Databricks to Cosmos DB, and then they're using REST APIs to access that data. They handle about 100 requests per minute, with some tables needing around 100 requests per second due to high transactions.

However, the customer wants to move away from Cosmos DB because it requires a lot of effort to write the data,
and it's becoming too expensive. They are looking for alternative ways to access the delta table data using REST APIs through Java. Can you please provide any approach on this.

Walter_C
Databricks Employee
Databricks Employee

You could use a JDBC connection to connect to a cluster or SQL warehouse and from there you could run your SQL commands to query the delta tables https://docs.databricks.com/en/integrations/jdbc/index.html 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group