cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta sharing vs CosmosDB

Phani1
Valued Contributor II

 

Hi All,

We have a situation where we write data to CosmosDB and create JSON data for a transaction table, which includes a mini statement in JSON format.

Now, we want to introduce the concept of delta sharing and share the transaction table. The Java application will access the delta table to generate JSON data, and there could be hundreds of API calls to the delta sharing table.

We would like to understand the costs involved in this scenario when comparing delta sharing(computation costs, egress cost etc ) to CosmosDB. Additionally, we want to know if there are any performance issues when accessing the delta sharing table concurrently through API calls to generate JSON data.

Regards,

Phani

 

1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

Costs:

  1. Egress Costs:

    • Delta Sharing: Within the same region, Delta Sharing incurs no egress costs. However, sharing data across different regions or clouds may result in egress fees charged by the cloud provider. Databricks supports sharing from Cloudflare R2, which incurs no egress fees. Tools and recommendations are available to monitor and avoid egress fees (source: "Monitor and manage Delta Sharing egress costs (for providers)").
    • CosmosDB: Egress costs apply when data is read from CosmosDB and transferred out of the Azure region. The exact costs depend on the amount of data transferred and the regions involved.
  2. Computation Costs:

    • Delta Sharing: The data provider incurs storage costs for Delta or Parquet files, but there are no compute costs on the provider's side for sharing data. The recipient incurs data processing costs when they access and process the shared data (source: "Delta Sharing Cost Breakdown").
    • CosmosDB: Costs are associated with the provisioned throughput (RU/s) for read and write operations, as well as storage costs.

Performance Issues:

  • Delta Sharing: There can be performance issues when accessing the Delta Sharing table concurrently through API calls. For example, if the number of API requests exceeds the predefined rate limit, it can result in "Too Many Requests" errors (HTTP 429). 
  • CosmosDB: CosmosDB is designed to handle high throughput and low latency for concurrent operations. However, performance can be impacted if the provisioned throughput is not sufficient for the workload

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group