cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Need help to insert huge data into cosmos db from azure data lake storage using databricks

manasa
Contributor

I am trying to insert 6GB of data into cosmos db using OLTP Connector

Container RU's:40000

Cluster Config:image.png

cfg = { 
  "spark.cosmos.accountEndpoint" : cosmosdbendpoint,
  "spark.cosmos.accountKey" : cosmosdbmasterkey,
  "spark.cosmos.database" : cosmosdatabase,
  "spark.cosmos.container" : cosmosdbcontainer,
}
 
spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", cosmosdbendpoint)
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", cosmosdbmasterkey)
spark.conf.set("spark.cosmos.write.bulk.enabled", "true")
 
json_df.write.format("cosmos.oltp").options(**cfg).mode("APPEND").save()

It is taking around 3hrs for me to load into cosmos db

1.Is increasing RU's is the only approach to decrease the execution time

2.Other than OLTP connector, do we have any ways to insert bulk data within less time

3.How to calculate RU's based on data size

4 REPLIES 4

Kaniz_Fatma
Community Manager
Community Manager

Hi @Manasa Kalluriโ€‹, This article explains how to read data from and write data to Azure Cosmos DB using Azure Databricks. 

Hi @Kaniz Fatmaโ€‹ , my problem is not with the resources. I tried every thing mentioned in the article but I need to insert bulk data in less time(def not 3hrs for 6gb data).So, I am looking for a optimized way.

SteveMeckstroth
New Contributor II

You have probably found a solution, but for others that end up here I got dramatic improvements using the Mongo connector to CosmosDB: https://www.mongodb.com/docs/spark-connector/current/write-to-mongodb/

ImAbhishekTomar
New Contributor III

Did anyone find solution for this, Iโ€™m also using similar clutter and RAU and data ingestion taking lot of timeโ€ฆ.?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group