cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Surya2
by New Contributor III
  • 122 Views
  • 2 replies
  • 1 kudos

Resolved! Auto CDC Delete Propagation Issue: Streaming CDF Reads Don't Capture Delete Events from Auto CDC

SummaryI'm exploring GDPR delete propagation through a medallion architecture (Bronze → Silver → Gold) using Auto CDC with Change Data Feed. Delete events propagate successfully from Landing → Bronze, but fail to propagate from Bronze → Silver → Gold...

  • 122 Views
  • 2 replies
  • 1 kudos
Latest Reply
Surya2
New Contributor III
  • 1 kudos

Hi Louis @Louis_FrolioThank you very much for your comprehensive troubleshooting guidance. The references you shared, particularly the technical blog post on "Propagating Deletes..." were extremely helpful and contained information I had missed earli...

  • 1 kudos
1 More Replies
NathanG
by Visitor
  • 59 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect - Pending ‘full refresh’ process that needs to be removed in gateway pipeline.

Hello, we have the following issue that we have been unable to resolve. Gateway pipeline: gw-replication-spainManaged ingestion pipeline: pip-replication-spainSource: SQL ServerTable: GestionesTarget table: repl.00_landing.gestiones (deleted due to s...

  • 59 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor
  • 0 kudos

Based on the events you've shared, it does appear that the gateway is recognizing the configuration change (Tables removed: Gestiones) but is still attempting to process a previously initiated snapshot request for that table.A few things stand out:Th...

  • 0 kudos
Yogasathyandrun
by New Contributor
  • 214 Views
  • 2 replies
  • 3 kudos

Resolved! Detecting Photon fallback in-cluster + safe right-sizing from system tables

I'm prototyping a cluster cost / right-sizing advisor and wanted to get a reality-check from people running Databricks at real scale before I sink more time into it.The main thing I'm chasing is Photon fallback. Photon quietly drops to the JVM on uns...

  • 214 Views
  • 2 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Hey @Yogasathyandrun , I did some digging and would like to share some thoughts that you hopefully find useful. You've mapped the boundary here more accurately than most people do, so let me give you a quick reality check on your four sticking points...

  • 3 kudos
1 More Replies
Krisna_91
by New Contributor
  • 67 Views
  • 1 replies
  • 0 kudos

Certification Coupons

I have completed before june 15th one training. how can i eligible for voucher.

  • 67 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Esteemed Contributor
  • 0 kudos

@Krisna_91 You need to complete one of the module of the leaning path within the given range: Jun 15 to July 6

  • 0 kudos
YoshikiFujiwara
by New Contributor II
  • 83 Views
  • 0 replies
  • 0 kudos

Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

ContextI'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with...

  • 83 Views
  • 0 replies
  • 0 kudos
alejandro_jaram
by New Contributor
  • 91 Views
  • 1 replies
  • 0 kudos

DLT pipelines failing out of memory (serverless)

I have a Data Lake Transformation (DLT) pipeline that runs weekly. Normally, it takes 8 minutes to complete, but since last Friday (June 19), it has been running for hours until it encounters an out-of-memory error. This pipeline is responsible for c...

  • 91 Views
  • 1 replies
  • 0 kudos
Latest Reply
bala_sai
New Contributor
  • 0 kudos

I think this is more like an incremental refresh issue than a generic serverless memory issue.Since the pipeline completes in around 20 minutes with a full refresh, but the normal weekly run runs for hours and then fails with OOM, I would first recom...

  • 0 kudos
lachu
by New Contributor
  • 119 Views
  • 4 replies
  • 0 kudos

SDP continuous mode

Hi,I was doing a POC and hence used open source spark and kafka in docket container and got it working. The sample code is ingesting data from kafka but it is running only in batch mode. Not able to continuously ingest the kafka streamQuestion: Can w...

  • 119 Views
  • 4 replies
  • 0 kudos
Latest Reply
lachu
New Contributor
  • 0 kudos

Sample code that i usedfrom pyspark import pipelines as dp from pyspark.sql import DataFrame, SparkSession, functions as f from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DecimalType spark = SparkSession.active() @dp...

  • 0 kudos
3 More Replies
thedatacrew
by Databricks Partner
  • 177 Views
  • 3 replies
  • 2 kudos

Adhoc Table Refresh in Lakeflow Spark Declarative Pipelines (SDP)

Hi,It is currently not possible to specify a list of tables to refresh and their refresh policies (full/normal) in a Lakeflow Job.It can be done via the REST API, but it's messy.For example, if you need some tables or views refreshed more regularly, ...

  • 177 Views
  • 3 replies
  • 2 kudos
Latest Reply
Yogasathyandrun
New Contributor
  • 2 kudos

This is a real limitation in the current Lakeflow / DLT job model.Today, a pipeline is treated as the unit of refresh, not individual tables inside it. That means:You can run or fully refresh a pipelineBut you cannot define different refresh policies...

  • 2 kudos
2 More Replies
Databrickissue
by New Contributor
  • 75 Views
  • 1 replies
  • 0 kudos

DLT Issue

I have one DLT pipeline in Databricks. When I schedule the pipeline, the data is not showing. However, when I run the pipeline manually, the data is displayed properly

  • 75 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor
  • 0 kudos

A few details would help narrow this down.When the scheduled run executes:Does the pipeline update show Succeeded or Failed?In the pipeline Event Log, do you see rows being processed/written?Is your manual run a normal update or a Full Refresh?Is the...

  • 0 kudos
Ericsson
by New Contributor II
  • 6997 Views
  • 4 replies
  • 1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

result
  • 6997 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aidutchinso
New Contributor
  • 1 kudos

"I've been exploring different communities lately, and honestly, connecting with people who share your interests makes all the difference. Whether it's diving deep into data engineering discussions or just having random conversations on platforms lik...

  • 1 kudos
3 More Replies
samgon
by New Contributor III
  • 7331 Views
  • 5 replies
  • 6 kudos

Resolved! study materials for Certified Data Engineer Professional Certification?

Can anyone recommend high-quality study materials or resources (courses, documentation, practice exams, etc.) that helped you prepare for the Professional-level exam?

Data Engineering
dataengineering
  • 7331 Views
  • 5 replies
  • 6 kudos
Latest Reply
williamandrew
New Contributor II
  • 6 kudos

Recently achieved this certification and it feels great to see all the hard work pay off. Consistent practice, hands-on learning, and quality study resources made a huge difference. For anyone preparing, I found this resource helpful: https://linkly....

  • 6 kudos
4 More Replies
deepak05
by Contributor
  • 43339 Views
  • 12 replies
  • 13 kudos

Resolved! I Got 70.00% on Databricks Certified Data Engineer Professional Exam but Failed....

Hi Everyone,Today I gave databricks exam for and I got 64 questions and my result was exactly 70.00%(As per databricks the pass percentage is 70 or above). but still the status was showing Failed and I couldn't get certified.Can you anyone help me on...

  • 43339 Views
  • 12 replies
  • 13 kudos
Latest Reply
halliekohler
New Contributor
  • 13 kudos

Congratulations on this achievement! Reaching this milestone feels incredibly rewarding. I had a similar experience, and quality practice resources from https://linkly.link/2l2Hb were very helpful throughout my preparation journey.

  • 13 kudos
11 More Replies
genie
by New Contributor
  • 106 Views
  • 1 replies
  • 0 kudos

Genie Code hallucinates CLI commands

I want to run some SQL commands programmatically against and decided to use Genie Code to help me, it came up with unsupported and non-existent commands.  

genie_0-1782127873093.png
  • 106 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor
  • 0 kudos

The command shown in the screenshot appears to be hallucinated.databricks sql-statements execute is not a valid Databricks CLI command. It looks like Genie combined concepts from the SQL Statement Execution API with CLI syntax that doesn't actually e...

  • 0 kudos
Maxrb
by New Contributor III
  • 238 Views
  • 4 replies
  • 3 kudos

Resolved! Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH]

Hi,I am using autoloader to load parquet files into my unity catalog with the following settings:.option("cloudFiles.format", "parquet") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaEvolutionMode", "addNewColumnsWithTypeWi...

  • 238 Views
  • 4 replies
  • 3 kudos
Latest Reply
Yogasathyandrun
New Contributor
  • 3 kudos

What you're seeing comes down to where the type mismatch is detected.For Parquet, some mismatches can be handled at the Auto Loader layer and end up in _rescued_data, while others fail earlier inside the Parquet reader itself.In your example, the exi...

  • 3 kudos
3 More Replies
Labels