Data Engineering

Forum Posts

Sorted by:

by abhijitnag • New Contributor II

12-31-2023 10:41:50 AM

489 Views
2 replies
0 kudos

Materialize View creation not supported from DLT Pipeline

Hi Team, I have a very basic scenario where I am using my custom catalog and want materialize view to get created from DLT table at the end of pipeline. The SQL used as below for the same.where "loom_data_transform" is a Streaming table. But pipeline...

Data Engineering

Delta Live Table

dlt

Unity Catalog

489 Views
2 replies
0 kudos

12-31-2023 10:41:50 AM

View Replies

Latest Reply

warsamebashir
New Contributor II

12-31-2023 5:12:32 PM

0 kudos

Hey @abhijitnag are you sure your loom_data_transform was created as a STREAMING table? docs:https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html

0 kudos

12-31-2023 5:12:32 PM

1 More Replies

by naveenprasanth • New Contributor

12-29-2023 7:55:30 AM

713 Views
1 replies
1 kudos

Issue with Reading MongoDB Data in Unity Catalog Cluster

I am encountering an issue while trying to read data from MongoDB in a Unity Catalog Cluster using PySpark. I have shared my code below: from pyspark.sql import SparkSession database = "cloud" collection = "data" Scope = "XXXXXXXX" Key = "XXXXXX-YYY...

Data Engineering

mongodb

spark config

Spark Connector package

Unity Catalog

713 Views
1 replies
1 kudos

12-29-2023 7:55:30 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-29-2023 3:29:52 PM

1 kudos

Few points 1. chce if you installed exactly same driver version as you are pointing this in code (2.12:3.2.0) it has to match 100percentorg.mongodb.spark:mongo-spark-connector_2.12:3.2.02. I have seen people configuring connction to atlas in two way...

1 kudos

12-29-2023 3:29:52 PM

by dzmitry_tt • New Contributor

12-29-2023 4:09:52 AM

698 Views
1 replies
0 kudos

DeltaRuntimeException: Keeping the source of the MERGE statement materialized has failed repeatedly.

I'm using Autoloader (in Azure Databricks) to read parquet files and write their data into the Delta table.schemaEvolutionMode is set to 'rescue'.In foreach_batch I do1) Transform of read dataframe;2) Create temp view based on read dataframe and merg...

Data Engineering

autoloader

MERGE

streaming

698 Views
1 replies
0 kudos

12-29-2023 4:09:52 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-29-2023 6:33:51 AM

0 kudos

Hmm, you can't have duplicated data in source dataframe/batch but it should error out with diffrent erro like "Cannot perform Merge as multiple source rows matched and attempted to modify the same target row...".Also this behaviour after rerun is str...

0 kudos

12-29-2023 6:33:51 AM

by deng77 • New Contributor III

01-17-2023 11:50:06 AM

18670 Views
10 replies
2 kudos

Resolved! Using current_timestamp as a default value in a delta table

I want to add a column to an existing delta table with a timestamp for when the data was inserted. I know I can do this by including current_timestamp with my SQL statement that inserts into the table. Is it possible to add a column to an existing de...

Data Engineering

18670 Views
10 replies
2 kudos

01-17-2023 11:50:06 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

03-10-2023 2:38:41 AM

2 kudos

-- Alter the table to use the GENERATED ALWAYS functionality for the created_at column ALTER TABLE example_table ADD COLUMN created_at TIMESTAMP GENERATED ALWAYS AS CURRENT_TIMESTAMP();@Michael Burch Hi , Did you try using GENERATED ALWAYS feature. ...

2 kudos

03-10-2023 2:38:41 AM

9 More Replies

by EDDatabricks • Contributor

12-28-2023 2:51:27 AM

801 Views
1 replies
0 kudos

Slow stream static join in Spark Structured Streaming

SituationRecords are streamed from an input Delta table via a Spark Structured Streaming job. The streaming job performs the following.Read from input Delta table (readStream)Static join on small JSONStatic join on big Delta tableWrite to three Delta...

Data Engineering

Azure Databricks

optimization

Spark Structured Streaming

Stream static join

801 Views
1 replies
0 kudos

12-28-2023 2:51:27 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-28-2023 1:53:49 PM

0 kudos

You have quite small machines that you are using, please take into consideration that a lot of memory of machine is occupied by other processes https://kb.databricks.com/clusters/spark-shows-less-memoryThis is not good idea to broadcast huge data fra...

0 kudos

12-28-2023 1:53:49 PM

by Erik • Valued Contributor II

01-05-2022 5:17:51 AM

7101 Views
6 replies
3 kudos

Resolved! How to run code-formating on the notebooks

Has anyone found a nice way to run code-formating (like black) on the notebooks **in the workspace**? My current workflow is to commit the file, pull it locally, format, repush and pull. It would be nice if it was some relatively easy way to run blac...

Data Engineering

7101 Views
6 replies
3 kudos

01-05-2022 5:17:51 AM

View Replies

Latest Reply

MartinPlay01
New Contributor II

12-28-2023 5:49:43 AM

3 kudos

Hi Erik,I don't know if you are aware of this feature, currently there is an option to format the code in your databricks notebooks using the black code style formatter.Just you need to either have a version of your DBR equal to or greater than 11.2 ...

3 kudos

12-28-2023 5:49:43 AM

5 More Replies

by XClar_40456 • New Contributor

05-02-2023 8:14:13 AM

1027 Views
2 replies
1 kudos

Resolved! Are there system tables that are customer accessible for setting up job run health monitoring in GCP Databricks?

Is Overwatch still an active project, is there anything equivalent for GCP Databricks or any plans for Overwatch to be available in GCP?

Data Engineering

1027 Views
2 replies
1 kudos

05-02-2023 8:14:13 AM

View Replies

Latest Reply

SriramMohanty
New Contributor III

12-28-2023 5:36:06 AM

1 kudos

Yes overwatch supports GCP.

1 kudos

12-28-2023 5:36:06 AM

1 More Replies

by dvmentalmadess • Valued Contributor

03-31-2023 6:32:36 AM

2755 Views
9 replies
1 kudos

Resolved! Data Explorer minimum permissions

What are the minimum permissions are required to search and view objects in Data Explorer? For example, does a user have to have `USE [SCHEMA|CATALOG]` to search or browse in the Data Explorer? Or can anyone with workspace access browse objects and, ...

Data Engineering

2755 Views
9 replies
1 kudos

03-31-2023 6:32:36 AM

View Replies

Latest Reply

bearded_data
New Contributor II

12-27-2023 10:07:32 AM

1 kudos

Hi all - @LandanG I wanted to bump this thread to see if there was any traction on giving us the ability to expose the table metadata to users (using USE <object> permission) while not allowing the users to SELECT from the tables themselves? I thin...

1 kudos

12-27-2023 10:07:32 AM

8 More Replies

by rt-slowth • Contributor

12-27-2023 4:54:24 PM

286 Views
0 replies
0 kudos

Help design my streaming pipeline

###Data Source- AWS RDS- Database migration tasks have been created using AWS DMS- Relevant cdc information is being stored in a specific bucket in S3### Data frequency- Once a day (but not sure when, sometime after 6pm)### Development environment- d...

Data Engineering

286 Views
0 replies
0 kudos

12-27-2023 4:54:24 PM

by Sas • New Contributor II

12-25-2023 9:39:51 PM

713 Views
2 replies
0 kudos

Resolved! Table is being dropped when cluster terminates in comunity edition

Hi ExpertI have created an external table in databricks community edition. Table is external table. But when i cluster is terminated, i am not able to query the table any more. What is the reason? What i need to do so that table is not dropped. Table...

Data Engineering

713 Views
2 replies
0 kudos

12-25-2023 9:39:51 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

12-27-2023 2:26:14 PM

0 kudos

This is an expected behavior to the community edition clusters. Upon termination of the cluster, the data is purged.

0 kudos

12-27-2023 2:26:14 PM

1 More Replies

by RabahO • New Contributor III

12-26-2023 6:45:12 AM

549 Views
1 replies
0 kudos

Handling data close to SCD2 with Delta tables

Hello, stack used: pyspark and delta tablesI'm working with some data that look a bit like SCD2 data.Basically, the data has columns that represent an id, a rank column and other informations, here's an example:login, email, business_timestamp => the...

Data Engineering

549 Views
1 replies
0 kudos

12-26-2023 6:45:12 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-26-2023 2:19:43 PM

0 kudos

Your problem is exactly like SCD2 . You just add one more column with valid to date ( optionals you can add flag is actual to tag current records)You can use DLT apply changes syntax. Alternatively Merge statement .On the top of that table you can bu...

0 kudos

12-26-2023 2:19:43 PM

by Databricks_POC • New Contributor II

12-20-2021 1:14:14 AM

13727 Views
6 replies
6 kudos

Resolved! I want to compare two data frames. In output I wish to see unmatched Rows and the columns identified leading to the differences.

Data Engineering

13727 Views
6 replies
6 kudos

12-20-2021 1:14:14 AM

View Replies

Latest Reply

bhargavi1
New Contributor II

04-28-2022 1:53:19 AM

6 kudos

@vinita shinde are you Cracked this Code?

6 kudos

04-28-2022 1:53:19 AM

5 More Replies

by lorenz • New Contributor III

06-28-2023 7:21:26 AM

5091 Views
3 replies
1 kudos

Resolved! Databricks approaches to CDC

I'm interested in learning more about Change Data Capture (CDC) approaches with Databricks. Can anyone provide insights on the best practices and recommendations for utilizing CDC effectively in Databricks? Are there any specific connectors or tools ...

Data Engineering

5091 Views
3 replies
1 kudos

06-28-2023 7:21:26 AM

View Replies

Latest Reply

jcozar
Contributor

12-26-2023 5:50:08 AM

1 kudos

Hi, first of all thank you all in advance! I am very interested on this topic!My question is beyond what it is described here. As well as @Pektas , I am using debezium to send data from Postgres to a Kafka topic (in fact, Azure EventHub). My question...

1 kudos

12-26-2023 5:50:08 AM

2 More Replies

by Aidin • New Contributor II

12-22-2023 10:57:23 AM

3331 Views
4 replies
0 kudos

BINARY data type

Hello everyone.I'm trying to understand how BINARY data type works in spark sql. According to examples in the documentation, using cast or literal 'X' should return HEX representation of the binary data type, but when I try the same code, I see base6...

Data Engineering

3331 Views
4 replies
0 kudos

12-22-2023 10:57:23 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-23-2023 9:07:20 AM

0 kudos

If you are confused , please look at this thread, they explain that Databricks use base64 as binary default. This is not documented but can be tracked at source code level.https://stackoverflow.com/questions/75753311/not-getting-binary-value-in-datab...

0 kudos

12-23-2023 9:07:20 AM

3 More Replies

by sahesh1320 • New Contributor

12-22-2023 9:16:35 AM

312 Views
1 replies
0 kudos

Shutdown Cluster in script if there is any failure

I am working on incremental load from sql server to Delta lake tables stored in ADLS gen2. DUring the script i need to qrite a logic toShut down the DB cluster on failure (there needs to be logging added to ensure that shutdown happens promptly to pr...

Data Engineering

312 Views
1 replies
0 kudos

12-22-2023 9:16:35 AM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-22-2023 11:00:28 AM

0 kudos

If you run your notebook via workflow and error happen and there are no retires on job, then job cluster will be terminated immidietly after failure.You can add python block of try catch and if error occurs , you catch the error and log somewhere bef...

0 kudos

12-22-2023 11:00:28 AM

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Materialize View creation not supported from DLT Pipeline

Issue with Reading MongoDB Data in Unity Catalog Cluster

DeltaRuntimeException: Keeping the source of the MERGE statement materialized has failed repeatedly.

Resolved! Using current_timestamp as a default value in a delta table

Slow stream static join in Spark Structured Streaming

Resolved! How to run code-formating on the notebooks

Resolved! Are there system tables that are customer accessible for setting up job run health monitoring in GCP Databricks?

Resolved! Data Explorer minimum permissions

Help design my streaming pipeline

Resolved! Table is being dropped when cluster terminates in comunity edition

Handling data close to SCD2 with Delta tables

Resolved! I want to compare two data frames. In output I wish to see unmatched Rows and the columns identified leading to the differences.

Resolved! Databricks approaches to CDC

BINARY data type

Shutdown Cluster in script if there is any failure

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP

"Can't login to databricks socket is closed" when ...

Temporary views no longer working for Share Comput...

Does DLT use one single SparkSession?