Need help to insert huge data into cosmos db from azure data lake storage using databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2022 01:30 AM
I am trying to insert 6GB of data into cosmos db using OLTP Connector
Container RU's:40000
Cluster Config:
cfg = {
"spark.cosmos.accountEndpoint" : cosmosdbendpoint,
"spark.cosmos.accountKey" : cosmosdbmasterkey,
"spark.cosmos.database" : cosmosdatabase,
"spark.cosmos.container" : cosmosdbcontainer,
}
spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", cosmosdbendpoint)
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", cosmosdbmasterkey)
spark.conf.set("spark.cosmos.write.bulk.enabled", "true")
json_df.write.format("cosmos.oltp").options(**cfg).mode("APPEND").save()It is taking around 3hrs for me to load into cosmos db
1.Is increasing RU's is the only approach to decrease the execution time
2.Other than OLTP connector, do we have any ways to insert bulk data within less time
3.How to calculate RU's based on data size
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2022 05:45 AM
Hi @Kaniz Fatma , my problem is not with the resources. I tried every thing mentioned in the article but I need to insert bulk data in less time(def not 3hrs for 6gb data).So, I am looking for a optimized way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2023 07:06 PM
You have probably found a solution, but for others that end up here I got dramatic improvements using the Mongo connector to CosmosDB: https://www.mongodb.com/docs/spark-connector/current/write-to-mongodb/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2023 01:31 PM
Did anyone find solution for this, I’m also using similar clutter and RAU and data ingestion taking lot of time….?