cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

maninegi05
by Visitor
  • 9 Views
  • 1 replies
  • 0 kudos

DLT Pipeline Stopped working

Hello, Suddenly our DLT pipelines we're getting failures saying thatLookupError: Traceback (most recent call last): result_df = result_df.withColumn("input_file_path", col("_metadata.file_path")).withColumn( ...

  • 9 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 0 kudos

 May be there is internally some updates from databricks Can Check and Switch Your Pipeline Channel, In the DLT pipeline settings (under Advanced > Channel), confirm if it's set to "Preview". Switch to "Current" for a more stable engine version, then...

  • 0 kudos
Malthe
by Contributor II
  • 47 Views
  • 4 replies
  • 0 kudos

Resolved! Can't enable "variantType-preview" using DLTs

Using create_streaming_table and passing table properties as follows, I get an error running the pipeline for the first time:> Your table schema requires manually enablement of the following table feature(s): variantType-preview.I'm using this code:c...

  • 47 Views
  • 4 replies
  • 0 kudos
Latest Reply
Malthe
Contributor II
  • 0 kudos

There's a workaround available in most situations which is to first create the table without the VARIANT column, run the pipeline at least once, and then add the column in a subsequent refresh.

  • 0 kudos
3 More Replies
raghvendrarm1
by New Contributor
  • 12 Views
  • 0 replies
  • 0 kudos

Results from the spark application to driver

I tried to read many articles but still not clear on this:The executors complete the execution of tasks and have the results with them.1. The results(output data) from all executors is transported to driver in all cases or executors persist it if tha...

  • 12 Views
  • 0 replies
  • 0 kudos
Upendra_Dwivedi
by Contributor
  • 2423 Views
  • 1 replies
  • 1 kudos

Databricks APP OBO User Authorization

Hi All,We are using on-behalf of user authorization method for our app and the x-forwarded-access-token is expiring after sometime and we have to redeploy our app to rectify the issue. I am not sure what is the issue or how we can keep the token aliv...

Upendra_Dwivedi_0-1747911721728.png
  • 2423 Views
  • 1 replies
  • 1 kudos
Latest Reply
jamesl
Databricks Employee
  • 1 kudos

Hi @Upendra_Dwivedi , are you still facing this issue? The x-forwarded-access-token your app receives is the current user’s access token that Databricks forwards in HTTP headers for on‑behalf‑of‑user access. You should read it from the request on eac...

  • 1 kudos
Mous92i
by New Contributor
  • 154 Views
  • 3 replies
  • 2 kudos

Resolved! Liquid Clustering With Merge

Hello I’m facing severe performance issues with a  merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

  • 154 Views
  • 3 replies
  • 2 kudos
Latest Reply
Mous92i
New Contributor
  • 2 kudos

Thanks for your response

  • 2 kudos
2 More Replies
databricksero
by New Contributor
  • 279 Views
  • 8 replies
  • 3 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

  • 279 Views
  • 8 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 3 kudos

@databricksero  Explicit Schema Definition: When calling spark.createDataFrame(pdf_cleaned), explicitly provide the schema even if the DataFrame is empty. This helps Spark infer the types and prevents the “cannot infer schema from empty dataset” erro...

  • 3 kudos
7 More Replies
deng_dev
by New Contributor III
  • 11252 Views
  • 1 replies
  • 0 kudos

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

Hi!We are creating table in streaming job every micro-batch using spark.sql('create or replace table ... using delta as ...') command. This query includes combining data from multiple tables.Sometimes our job fails with error:py4j.Py4JException: An e...

  • 11252 Views
  • 1 replies
  • 0 kudos
Latest Reply
sahilchavan
New Contributor II
  • 0 kudos

Hi @deng_dev ,Did you discover any way to raise this error gracefully? I'm facing the same error when running the kinesis stream. Although I'm aware of what the error is but my intent is to raise and log the error gracefully 

  • 0 kudos
Brahmareddy
by Esteemed Contributor
  • 126 Views
  • 1 replies
  • 4 kudos

I Tried Teaching Databricks About Itself — Here’s What Happened

Hi All, How are you doing today?I wanted to share something interesting from my recent Databricks work — I’ve been playing around with an idea I call “Real-Time Metadata Intelligence.” Most of us focus on optimizing data pipelines, query performance,...

  • 126 Views
  • 1 replies
  • 4 kudos
Latest Reply
ruicarvalho_de
New Contributor II
  • 4 kudos

I like the core idea. You are mining signals the platform already emits.I would start rules first, track small files ratio and average file size trend, watch skew per partition and shuffle bytes per input gigabyte. Compare job time to input size to c...

  • 4 kudos
Bhavana_Y
by New Contributor
  • 63 Views
  • 1 replies
  • 1 kudos

Learning Path for Spark Developer Associate

Hello Everyone,Happy for being a part of Virtual Journey !!Enrolled in Associate Spark Developer and completed learning path in Databricks Academy. Can anyone please confirm is completing learning path enough for obtaining 50% off voucher for certifi...

Screenshot (15).png
  • 63 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @Bhavana_Y! To be eligible for the incentives, you’ll need to complete one of the pathways mentioned in the Learning Festival post. Based on your screenshot, it looks like you’ve completed all four modules of LEARNING PATHWAY 7: APACHE SPARK DE...

  • 1 kudos
donlxz
by New Contributor III
  • 152 Views
  • 4 replies
  • 3 kudos

Resolved! deadlock occurs with use statement

When issuing a query from Informatica using a Delta connection, the statement use catalog_name.schema_name is executed first. At that time, the following error appeared in the query history:Query could not be scheduled: (conn=5073499)Deadlock found w...

  • 152 Views
  • 4 replies
  • 3 kudos
Latest Reply
donlxz
New Contributor III
  • 3 kudos

I’ll try making adjustments on the Informatica side.Thank you for your help.

  • 3 kudos
3 More Replies
Jonathan_
by New Contributor II
  • 185 Views
  • 4 replies
  • 6 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

  • 185 Views
  • 4 replies
  • 6 kudos
Latest Reply
tarunnagar
New Contributor
  • 6 kudos

This is a pretty common issue with PySpark when working on large DAGs with lots of joins and transformations. As the DAG grows, Spark has to maintain a huge execution plan, and performance can drop due to shuffling, serialization, and memory overhead...

  • 6 kudos
3 More Replies
mikvaar
by New Contributor III
  • 627 Views
  • 8 replies
  • 5 kudos

DAB + DLT destroy fails due to ownership/permissions mismatch

Hi all,We are running into an issue with Databricks Asset Bundles (DAB) when trying to destroy a DLT pipeline. Setup is as follows:Two separate service principals:Deployment SP: used by Azure DevOps for deploying bundles.Run_as SP: used for running t...

Data Engineering
Databricks
Databricks Asset Bundles
DevOps
  • 627 Views
  • 8 replies
  • 5 kudos
Latest Reply
denis-dbx
Databricks Employee
  • 5 kudos

We just released https://github.com/databricks/cli/releases/tag/v0.273.0 with a mitigation for this, the error should disappear if you upgrade. Please try and let us know how it goes. Terraform fix is in https://github.com/databricks/terraform-provid...

  • 5 kudos
7 More Replies
Dimitry
by Contributor III
  • 34 Views
  • 1 replies
  • 0 kudos

Serverless - can't parallelize UDF in applyInPandas

HI allServerless V3 solved an error of mismatching python versions between driver and worker which I had on V2 (can't remember the exact wording).So I'd been running this on classic compute so far.Today I tried on serverless to a partial success - un...

Dimitry_1-1760679790069.png Dimitry_2-1760679824765.png
  • 34 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dimitry
Contributor III
  • 0 kudos

I was wrong in interpreting the results. threading.get_native_id() does not work on serverless as on classic, so different threads return the same ID. The time it takes to execute the test is obviously less than 40 seconds, if it was running on a sin...

  • 0 kudos
bunny1174
by New Contributor
  • 111 Views
  • 2 replies
  • 1 kudos

Spark Streaming Loading 1kto 5k rows only delta table

Hi Team,I have 4-5 millions of files in s3 files around 1.5gb data only with 9 million records, when i try to use autoloader to read the data using read stream and writing to delta table the processing is taking too much time, it is loading from 1k t...

  • 111 Views
  • 2 replies
  • 1 kudos
Latest Reply
Prajapathy_NKR
  • 1 kudos

@bunny1174 It is a common issue that small files gets created during streaming. Since you are using delta file format, I would suggest two solutions,1. try using Liquid clustering. This does auto compact of small files into a bigger chuck mostly of 1...

  • 1 kudos
1 More Replies
SuMiT1
by New Contributor III
  • 449 Views
  • 9 replies
  • 4 kudos

Flattening the json in databricks

I have chatbot data  I read adls json file in databricks and i stored the output in dataframeIn that table two columns contains json data but the data type is string1.content2.metadata Now i have to flatten the.data but i am not getting how to do tha...

  • 449 Views
  • 9 replies
  • 4 kudos
Latest Reply
Prajapathy_NKR
  • 4 kudos

@szymon_dybczak your solution was crisp.@SuMiT1 since you have mentioned your json is dynamic, get one of your json body into a variable. json_body = df.select("content").take(1).collect(0)then get the schema of the json,schema = schema_of_json(json_...

  • 4 kudos
8 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels