cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AlexWeh
by New Contributor II
  • 13621 Views
  • 1 replies
  • 2 kudos

Universal Azure Credential Passthrough

At the moment, Azure Databricks has the feature to use AzureAD login for the workspace and create single user clusters with Azure Data Lake Storage credential passthrough. But this can only be used for Data Lake Storage.Is there already a way, or are...

  • 13621 Views
  • 1 replies
  • 2 kudos
Latest Reply
polivbr
New Contributor II
  • 2 kudos

I have exactly the same issue. I have the need to call a protected API within a notebook but have no access to the current user's access token. I've had to resort to nasty workarounds involving installing and running the Azure CLI from within the not...

  • 2 kudos
repcak
by New Contributor III
  • 2351 Views
  • 1 replies
  • 2 kudos

Init Scripts with mounted azure data lake storage gen2

I'm trying to access init script which is stored on mounted azure data lake storage gen2 to dbfsI mounted storage to dbfs:/mnt/storage/container/script.shand when i try to access it i got an error:Cluster scoped init script dbfs:/mnt/storage/containe...

  • 2351 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16752239289
Databricks Employee
  • 2 kudos

I do not think the init script saved under mount point work and we do not suggest that. If you specify abfss , then the cluster need to be configured so that the cluster can authenticate and access the adls gen2 folder. Otherwise, the cluster will no...

  • 2 kudos
manasa
by Contributor
  • 4585 Views
  • 3 replies
  • 1 kudos

Need help to insert huge data into cosmos db from azure data lake storage using databricks

I am trying to insert 6GB of data into cosmos db using OLTP ConnectorContainer RU's:40000Cluster Config:cfg = { "spark.cosmos.accountEndpoint" : cosmosdbendpoint, "spark.cosmos.accountKey" : cosmosdbmasterkey, "spark.cosmos.database" : cosmosd...

image.png
  • 4585 Views
  • 3 replies
  • 1 kudos
Latest Reply
ImAbhishekTomar
New Contributor III
  • 1 kudos

Did anyone find solution for this, I’m also using similar clutter and RAU and data ingestion taking lot of time….?

  • 1 kudos
2 More Replies
Netty
by New Contributor III
  • 4764 Views
  • 5 replies
  • 7 kudos

Resolved! What's the easiest way to upsert data into a table? (Azure ADLS Gen2)

I had been trying to upsert rows into a table in Azure Blob Storage (ADLS Gen 2) based on two partitions (sample code below). insert overwrite table new_clicks_table partition(client_id, mm_date) select click_id ,user_id ,click_timestamp_gmt ,ca...

  • 4764 Views
  • 5 replies
  • 7 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 7 kudos

Below code might help youPython- (df.write .mode("overwrite") .option("partitionOverwriteMode", "dynamic") .saveAsTable("default.people10m") )   SQL- SET spark.sql.sources.partitionOverwriteMode=dynamic; INSERT OVERWRITE TABLE default.people10m...

  • 7 kudos
4 More Replies
enavuio
by New Contributor II
  • 1559 Views
  • 2 replies
  • 3 kudos

Count on External Table to Azure Data Storage is taking too long

I have created an External table to Azure Data Lake Storage Gen2.The Container has about 200K Json files.The structure of the json files are created with```CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(    ComponentInfo STRUCT<ComponentHost: STRING, ...

  • 1559 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ena Vu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
Aran_Oribu
by New Contributor II
  • 4254 Views
  • 5 replies
  • 2 kudos

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...

  • 4254 Views
  • 5 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...

  • 2 kudos
4 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 17140 Views
  • 3 replies
  • 26 kudos

How to connect your Azure Data Lake Storage to Azure DatabricksStandard Workspace &#xd83d;&#xdc49; Private link In your storage accounts please go to “Networ...

How to connect your Azure Data Lake Storage to Azure DatabricksStandard Workspace Private linkIn your storage accounts please go to “Networking” -> “Private endpoint connections” and click Add Private Endpoint.It is important to add private links in ...

image.png image.png image.png image.png
  • 17140 Views
  • 3 replies
  • 26 kudos
Latest Reply
Anonymous
Not applicable
  • 26 kudos

@Hubert Dudek​ - Have I told you lately that you're the best!?!

  • 26 kudos
2 More Replies
Labels