Data Engineering

Forum Posts

Sorted by:

by Zacp • New Contributor II

06-20-2024 8:19:33 PM

1624 Views
1 replies
0 kudos

Delta Sharing Azure Databricks Private Storage

I am looking for details on how Delta Sharing is configured on Azure. If your storage account is private, do you need to whitelist the IP for the recipient? Recipient is not using Databricks.

Data Engineering

1624 Views
1 replies
0 kudos

06-20-2024 8:19:33 PM

View Replies

by dollyb • Contributor II

04-15-2024 8:54:36 AM

3778 Views
2 replies
0 kudos

Resolved! Differences between Spark SQL and Databricks

Hello,I'm using a local Docker Spark 3.5 runtime to test my Databricks Connect code. However I've come across a couple of cases where my code would work in one environment, but not the other.Concrete example, I'm reading data from BigQuery via spark....

Data Engineering

3778 Views
2 replies
0 kudos

04-15-2024 8:54:36 AM

View Replies

Latest Reply

daniel_sahal
Databricks MVP

04-18-2024 10:41:26 PM

0 kudos

@dollyb That's because when you've added another dependency on Databricks, it doesn't really know which one it should use. By default it's using built-in com.google.cloud.spark.bigquery.BigQueryRelationProvider.What you can do is provide whole packag...

0 kudos

04-18-2024 10:41:26 PM

1 More Replies

by Sweta • New Contributor II

06-21-2024 9:03:33 AM

1426 Views
0 replies
0 kudos

Optimized option to write updates to Aurora PostgresDB from Databricks/spark

Hello All, We want to update our postgres tables from our spark structured streaming workflow on Databricks. We are using foreachbatch utility to write to this sink. I want to understand an optimized way to do this at near real time latency avoidi...

Data Engineering

1426 Views
0 replies
0 kudos

06-21-2024 9:03:33 AM

by thiagoawstest • Contributor

06-20-2024 12:13:13 PM

1587 Views
1 replies
0 kudos

Azure Devops - Entra ID - AWS Databricks

Hi, I need to integrate Azure Devops repos with AWS Databricks, but not via personal token.I need it via main service, integrated with Azure Entra ID, using Azure Databricks when I go to create main service, "Entra ID application ID" appears, but in ...

Data Engineering

AWS

1587 Views
1 replies
0 kudos

06-20-2024 12:13:13 PM

View Replies

by christian_chong • New Contributor III

06-21-2024 6:45:59 AM

2085 Views
1 replies
0 kudos

Resolved! unity catalog with external table and column masking

Hi everbody, I am facing a issue with spark structured steaming. here is a sample of my code: df = spark.readStream.load(f"{bronze_table_path}") df.writeStream \ .format("delta") \ .option("checkpointLocation", f"{silver_checkpoint}") \ .option("me...

Data Engineering

2085 Views
1 replies
0 kudos

06-21-2024 6:45:59 AM

View Replies

Latest Reply

christian_chong
New Contributor III

06-21-2024 7:08:50 AM

0 kudos

My first message was not well formatted. i wrote : df = spark.readStream.load(f"{bronze_table_path}") df.writeStream \ .format("delta") \ .option("checkpointLocation", f"{silver_checkpoint}") \ .option("mergeSchema", "true") \ .trigger(availabl...

0 kudos

06-21-2024 7:08:50 AM

by philipkd • New Contributor III

02-22-2024 10:15:26 PM

2768 Views
1 replies
0 kudos

Cannot get past Query Data tutorial for Azure Databricks

I created a new workspace on Azure Databricks, and I can't get past this first step in the tutorial: DROP TABLE IF EXISTS diamonds; CREATE TABLE diamonds USING CSV OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", hea...

Data Engineering

2768 Views
1 replies
0 kudos

02-22-2024 10:15:26 PM

View Replies

Latest Reply

dollyb
Contributor II

06-21-2024 6:22:35 AM

0 kudos

Struggling with this as well. So using dbfs:/ with CREATE TABLE statement works on AWS, but not Azure?

0 kudos

06-21-2024 6:22:35 AM

by Devsql • New Contributor III

05-20-2024 5:27:55 AM

6021 Views
1 replies
0 kudos

Measure size of all tables in Azure databricks

Hi Team,Currently I am trying to find size of all tables in my Azure databricks, as i am trying to get idea of current data loading trends, so i can plan for data forecast ( i.e. Last 2 months, approx 100 GB data came-in, so in next 2-3 months there ...

Data Engineering

6021 Views
1 replies
0 kudos

05-20-2024 5:27:55 AM

View Replies

Latest Reply

Devsql
New Contributor III

06-21-2024 2:47:58 AM

0 kudos

Hi @Retired_mod,1- Regarding this issue i had found below link:https://kb.databricks.com/sql/find-size-of-table#:~:text=You%20can%20determine%20the%20size,stats%20to%20return%20the%20sizeNow to try above link, I need to decide: Delta-Table Vs Non-De...

0 kudos

06-21-2024 2:47:58 AM

by yvuignie • Contributor

06-18-2024 1:54:37 AM

2208 Views
1 replies
0 kudos

Asset Bundles webhook not working

Hello,The webhook notifications in databricks jobs defined in the asset bundles are not taken into account and therefore not created. For instance, this is not working:resources: jobs: job1: name: my_job webhook_notifications: on...

Data Engineering

2208 Views
1 replies
0 kudos

06-18-2024 1:54:37 AM

View Replies

Latest Reply

yvuignie
Contributor

06-21-2024 2:29:02 AM

0 kudos

Hello @Retired_mod ,Thank you for your help.However we did check the job configuration multiple time. If we substitue 'webhook_notifications' with 'email_notifications' it works, so the syntax is correct. Here is a sample of our configuration:For the...

0 kudos

06-21-2024 2:29:02 AM

by N_M • Contributor

12-07-2023 12:32:50 AM

2454 Views
1 replies
0 kudos

Access historical injected data of COPY INTO command

Dear Community,I'm using the COPY INTO command to automate the staging of files that I get in an S3 bucket into specific delta tables (with some transformation on the fly).The command works smoothly, and files are indeed inserted only once (writing i...

Data Engineering

2454 Views
1 replies
0 kudos

12-07-2023 12:32:50 AM

View Replies

by ChingizK • New Contributor III

04-12-2024 10:24:10 AM

5845 Views
2 replies
1 kudos

Resolved! Workflow Failure Alert Webhooks for OpsGenie

I'm trying to set up a Workflow Job Webhook notification to send an alert to OpsGenie REST API on job failure. We've set up Teams & Email successfully.We've created the Webhook and when I configure "On Failure" I can see it in the JSON/YAML view. How...

Data Engineering

jobs

opsgenie

webhooks

Workflows

5845 Views
2 replies
1 kudos

04-12-2024 10:24:10 AM

View Replies

Latest Reply

portoedu
New Contributor III

06-20-2024 12:40:32 PM

1 kudos

Hi guys,I found a workaround by creating an email integration in opsgenie and then creating a databricks notification destination with that email.

1 kudos

06-20-2024 12:40:32 PM

1 More Replies

by AdventureAce • New Contributor III

06-20-2024 2:28:35 PM

1166 Views
0 replies
0 kudos

Short-live token from Unity Catalog

What is this short-lived token shared by unity-catalog in step 4 and 5 here? And how does the cloud storage authenticate the token generated by unity catalog?

Data Engineering

1166 Views
0 replies
0 kudos

06-20-2024 2:28:35 PM

by Pálmi • New Contributor II

06-18-2024 1:29:44 AM

2476 Views
2 replies
1 kudos

IoT hub with kafka connector - how to decode the enqueued timestamp and device id

I'm reading data from the default endpoint of an IoT hub in azure using the kafka connector in Databricks. Most data items are straight forward, but the device id and the timestamp I haven't been able to properly decodeFor example, the key-value map...

Data Engineering

2476 Views
2 replies
1 kudos

06-18-2024 1:29:44 AM

View Replies

Latest Reply

Erik
Valued Contributor III

06-20-2024 12:46:48 PM

1 kudos

https://github.com/Azure/azure-event-hubs-for-kafka/issues/56#issuecomment-1432006831

1 kudos

06-20-2024 12:46:48 PM

1 More Replies

by aozero • New Contributor II

06-14-2024 6:12:58 AM

2777 Views
3 replies
0 kudos

Deleting data programmatically from databricks live delta tables

Hello all, I am relatively new in data engineering and working on a project requiring me to programmatically delete data from delta live tables. However, I found that simply stopping the streaming job and deleting rows from the delta tables caused th...

Data Engineering

2777 Views
3 replies
0 kudos

06-14-2024 6:12:58 AM

View Replies

Latest Reply

aozero
New Contributor II

06-19-2024 10:34:29 AM

0 kudos

Hi @shan_chandra Full refreshing brings back the deleted data since it exists in the pubsub source.

0 kudos

06-19-2024 10:34:29 AM

2 More Replies

by Eiki • New Contributor

06-20-2024 7:38:24 AM

1158 Views
1 replies
0 kudos

How to use the same job cluster in diferents job runs inside the one workflow

I created a Workflow with notebooks and some job runs, but I would to use only one job cluster to run every job runs, without creating a new job cluster for each job run. Because I didn't want to increase the execution time with each new job cluster ...

Data Engineering

1158 Views
1 replies
0 kudos

06-20-2024 7:38:24 AM

View Replies

Latest Reply

brockb
Databricks Employee

06-20-2024 8:15:28 AM

0 kudos

Hi,If I understand correctly, you are hoping to reduce overall job execution time by reducing the Cloud Service Provider instance provisioning time. Is that correct?If so, you may want to consider: Using a Pool of instances: https://docs.databricks.c...

0 kudos

06-20-2024 8:15:28 AM

by diego_poggioli • Contributor

06-17-2024 1:52:11 AM

4938 Views
1 replies
0 kudos

Streaming foreachBatch _jdf jvm attribute not supported

I'm trying to perform a merge inside a streaming foreachbatch using the command: microBatchDF._jdf.sparkSession().sql(self.merge_query)Streaming runs fine if I use a Personal cluster while if I use a Shared cluster streaming fails with the following ...

Data Engineering

4938 Views
1 replies
0 kudos

06-17-2024 1:52:11 AM

View Replies

Latest Reply

holly
Databricks Employee

06-20-2024 7:46:53 AM

0 kudos

Can you share what runtime your cluster is using? This error doesn't surprise me, Unity Catalog Shared clusters have many security limitations, but the list is reducing over time. https://docs.databricks.com/en/compute/access-mode-limitations.html#s...

0 kudos

06-20-2024 7:46:53 AM

Databricks Community

Forum Posts

Delta Sharing Azure Databricks Private Storage

Resolved! Differences between Spark SQL and Databricks

Optimized option to write updates to Aurora PostgresDB from Databricks/spark

Azure Devops - Entra ID - AWS Databricks

Resolved! unity catalog with external table and column masking

Cannot get past Query Data tutorial for Azure Databricks

Measure size of all tables in Azure databricks

Asset Bundles webhook not working

Access historical injected data of COPY INTO command

Resolved! Workflow Failure Alert Webhooks for OpsGenie

Short-live token from Unity Catalog

IoT hub with kafka connector - how to decode the enqueued timestamp and device id

Deleting data programmatically from databricks live delta tables

How to use the same job cluster in diferents job runs inside the one workflow

Streaming foreachBatch _jdf jvm attribute not supported

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template