Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Krisna_91 • New Contributor

3 hours ago

21 Views
0 replies
0 kudos

Certification Coupons

I have completed before june 15th one training. how can i eligible for voucher.

Data Engineering

21 Views
0 replies
0 kudos

3 hours ago

by YoshikiFujiwara • New Contributor II

4 hours ago

36 Views
0 replies
0 kudos

Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

ContextI'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with...

Data Engineering

36 Views
0 replies
0 kudos

4 hours ago

by alejandro_jaram • Visitor

6 hours ago

53 Views
1 replies
0 kudos

DLT pipelines failing out of memory (serverless)

I have a Data Lake Transformation (DLT) pipeline that runs weekly. Normally, it takes 8 minutes to complete, but since last Friday (June 19), it has been running for hours until it encounters an out-of-memory error. This pipeline is responsible for c...

Data Engineering

53 Views
1 replies
0 kudos

6 hours ago

View Replies

Latest Reply

bala_sai
New Contributor

5 hours ago

0 kudos

I think this is more like an incremental refresh issue than a generic serverless memory issue.Since the pipeline completes in around 20 minutes with a full refresh, but the normal weekly run runs for hours and then fails with OOM, I would first recom...

0 kudos

5 hours ago

by Surya2 • New Contributor III

8 hours ago

63 Views
1 replies
0 kudos

Auto CDC Delete Propagation Issue: Streaming CDF Reads Don't Capture Delete Events from Auto CDC

SummaryI'm exploring GDPR delete propagation through a medallion architecture (Bronze → Silver → Gold) using Auto CDC with Change Data Feed. Delete events propagate successfully from Landing → Bronze, but fail to propagate from Bronze → Silver → Gold...

Data Engineering

63 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

6 hours ago

0 kudos

Hi @Surya2 , Nice write-up. The symptom you're describing, where updates propagate cleanly but deletes quietly disappear, is a common one, and the good news is that the pattern you're after is fully supported. The break is almost certainly in how you...

0 kudos

6 hours ago

by Yogasathyandrun • New Contributor

Saturday

159 Views
1 replies
0 kudos

Detecting Photon fallback in-cluster + safe right-sizing from system tables

I'm prototyping a cluster cost / right-sizing advisor and wanted to get a reality-check from people running Databricks at real scale before I sink more time into it.The main thing I'm chasing is Photon fallback. Photon quietly drops to the JVM on uns...

Data Engineering

159 Views
1 replies
0 kudos

Saturday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

6 hours ago

0 kudos

Hey @Yogasathyandrun , I did some digging and would like to share some thoughts that you hopefully find useful. You've mapped the boundary here more accurately than most people do, so let me give you a quick reality check on your four sticking points...

0 kudos

6 hours ago

by thedatacrew • Databricks Partner

Wednesday

160 Views
3 replies
2 kudos

Adhoc Table Refresh in Lakeflow Spark Declarative Pipelines (SDP)

Hi,It is currently not possible to specify a list of tables to refresh and their refresh policies (full/normal) in a Lakeflow Job.It can be done via the REST API, but it's messy.For example, if you need some tables or views refreshed more regularly, ...

Data Engineering

160 Views
3 replies
2 kudos

Wednesday

View Replies

Latest Reply

Yogasathyandrun
New Contributor

13 hours ago

2 kudos

This is a real limitation in the current Lakeflow / DLT job model.Today, a pipeline is treated as the unit of refresh, not individual tables inside it. That means:You can run or fully refresh a pipelineBut you cannot define different refresh policies...

2 kudos

13 hours ago

2 More Replies

by Databrickissue • Visitor

yesterday

53 Views
1 replies
0 kudos

DLT Issue

I have one DLT pipeline in Databricks. When I schedule the pipeline, the data is not showing. However, when I run the pipeline manually, the data is displayed properly

Data Engineering

53 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Yogasathyandrun
New Contributor

13 hours ago

0 kudos

A few details would help narrow this down.When the scheduled run executes:Does the pipeline update show Succeeded or Failed?In the pipeline Event Log, do you see rows being processed/written?Is your manual run a normal update or a Full Refresh?Is the...

0 kudos

13 hours ago

by Ericsson • New Contributor II

12-01-2021 8:45:17 AM

6983 Views
4 replies
1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

Data Engineering

6983 Views
4 replies
1 kudos

12-01-2021 8:45:17 AM

View Replies

Latest Reply

Aidutchinso
Visitor

14 hours ago

1 kudos

"I've been exploring different communities lately, and honestly, connecting with people who share your interests makes all the difference. Whether it's diving deep into data engineering discussions or just having random conversations on platforms lik...

1 kudos

14 hours ago

3 More Replies

by samgon • New Contributor III

06-17-2025 2:21:01 AM

7251 Views
5 replies
6 kudos

Resolved! study materials for Certified Data Engineer Professional Certification?

Can anyone recommend high-quality study materials or resources (courses, documentation, practice exams, etc.) that helped you prepare for the Professional-level exam?

Data Engineering

dataengineering

7251 Views
5 replies
6 kudos

06-17-2025 2:21:01 AM

View Replies

Latest Reply

williamandrew
New Contributor II

14 hours ago

6 kudos

Recently achieved this certification and it feels great to see all the hard work pay off. Consistent practice, hands-on learning, and quality study resources made a huge difference. For anyone preparing, I found this resource helpful: https://linkly....

6 kudos

14 hours ago

4 More Replies

by deepak05 • Contributor

01-22-2024 11:02:11 PM

43262 Views
12 replies
13 kudos

Resolved! I Got 70.00% on Databricks Certified Data Engineer Professional Exam but Failed....

Hi Everyone,Today I gave databricks exam for and I got 64 questions and my result was exactly 70.00%(As per databricks the pass percentage is 70 or above). but still the status was showing Failed and I couldn't get certified.Can you anyone help me on...

Data Engineering

43262 Views
12 replies
13 kudos

01-22-2024 11:02:11 PM

View Replies

Latest Reply

halliekohler
Visitor

15 hours ago

13 kudos

Congratulations on this achievement! Reaching this milestone feels incredibly rewarding. I had a similar experience, and quality practice resources from https://linkly.link/2l2Hb were very helpful throughout my preparation journey.

13 kudos

15 hours ago

11 More Replies

by lachu • New Contributor

yesterday

102 Views
2 replies
0 kudos

SDP continuous mode

Hi,I was doing a POC and hence used open source spark and kafka in docket container and got it working. The sample code is ingesting data from kafka but it is running only in batch mode. Not able to continuously ingest the kafka streamQuestion: Can w...

Data Engineering

102 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

bala_sai
New Contributor

yesterday

0 kudos

Yes, we can build a continuous streaming pipeline using open source Spark. The main thing is to use Spark Structured Streaming, not a normal batch read. For Kafka streaming, we need to use spark.readStream, then write using writeStream, and keep the ...

0 kudos

yesterday

1 More Replies

by genie • New Contributor

yesterday

87 Views
1 replies
0 kudos

Genie Code hallucinates CLI commands

I want to run some SQL commands programmatically against and decided to use Genie Code to help me, it came up with unsupported and non-existent commands.

Data Engineering

87 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Yogasathyandrun
New Contributor

yesterday

0 kudos

The command shown in the screenshot appears to be hallucinated.databricks sql-statements execute is not a valid Databricks CLI command. It looks like Genie combined concepts from the SQL Statement Execution API with CLI syntax that doesn't actually e...

0 kudos

yesterday

by Maxrb • New Contributor III

yesterday

190 Views
4 replies
3 kudos

Resolved! Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH]

Hi,I am using autoloader to load parquet files into my unity catalog with the following settings:.option("cloudFiles.format", "parquet") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaEvolutionMode", "addNewColumnsWithTypeWi...

Data Engineering

190 Views
4 replies
3 kudos

yesterday

View Replies

Latest Reply

Yogasathyandrun
New Contributor

yesterday

3 kudos

What you're seeing comes down to where the type mismatch is detected.For Parquet, some mismatches can be handled at the Auto Loader layer and end up in _rescued_data, while others fail earlier inside the Parquet reader itself.In your example, the exi...

3 kudos

yesterday

3 More Replies

by shan-databricks • Databricks Partner

Sunday

102 Views
3 replies
0 kudos

How to store credentials in Databricks and assign them to job parameters

I am using SQL Server, Postgres, and MongoDB as data sources, connecting through Spark and JDBC connector. I would like to store the credentials and connection details in Databricks, pass them as job parameters, and need guidance on possible approach...

Data Engineering

102 Views
3 replies
0 kudos

Sunday

View Replies

Latest Reply

Yogasathyandrun
New Contributor

yesterday

0 kudos

I'd think about this as a separation of concerns:Secrets are for sensitive values (usernames, passwords, tokens, connection URIs).Job parameters are for runtime values (connection name, database, schema, table, query, collection, source system).In mo...

0 kudos

yesterday

2 More Replies

by Nick_Hughes • New Contributor III

05-16-2023 3:43:03 AM

17365 Views
5 replies
1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

Data Engineering

17365 Views
5 replies
1 kudos

05-16-2023 3:43:03 AM

View Replies

Latest Reply

savlahanish27
Databricks Partner

yesterday

1 kudos

The core problem you're facing is that Delta Lake doesn't enforce foreign key constraints, so most datagen tools generate each table independently and your joins produce no meaningful overlap.The solution is to generate a shared key pool first - a si...

1 kudos

yesterday

4 More Replies

Databricks Community

Forum Posts

Certification Coupons

Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

DLT pipelines failing out of memory (serverless)

Auto CDC Delete Propagation Issue: Streaming CDF Reads Don't Capture Delete Events from Auto CDC

Detecting Photon fallback in-cluster + safe right-sizing from system tables

Adhoc Table Refresh in Lakeflow Spark Declarative Pipelines (SDP)

DLT Issue

SQL week format issue its not showing result as 01(ww)

Resolved! study materials for Certified Data Engineer Professional Certification?

Resolved! I Got 70.00% on Databricks Certified Data Engineer Professional Exam but Failed....

SDP continuous mode

Genie Code hallucinates CLI commands

Resolved! Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH]

How to store credentials in Databricks and assign them to job parameters

Best way to generate fake data using underlying schema

Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_T...

Databricks Serverless Costs

Serverless Compute - pySpark - Any alternative for...

Serverless Compute - Python - Custom Emails via SM...

From STTM to Databricks Pipelines: Can Metadata Be...