cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yuta666
by Visitor
  • 29 Views
  • 1 replies
  • 0 kudos

Auto Loader on UC Volumes stopped resolving wildcards

The following spark.readStream / cloudFiles configuration was confirmed working on2026-04-30, but stopped working on 2026-05-26. No code or config changes were madebetween these dates, so I assume something was changed implicitly on the Databricks si...

  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @yuta666, Thanks for sharing the details. Since the same cloudFiles configuration worked for you previously, and you did not make any code or config changes between 2026-04-30 and 2026-05-26, this does look like a likely regression rather than exp...

  • 0 kudos
thedatacrew
by Databricks Partner
  • 3149 Views
  • 7 replies
  • 0 kudos

Delta Live Tables - skipChangeCommits in SQL

Hi,Could anyone tell me if the skipChangeCommits option is supported in SQL mode? I can use it successfully using Python, but it doesn't look like it is supported by SQL.It seems to be a glaring omission from the SQL support, or support for this will...

thedatacrew_0-1736866714336.png
  • 3149 Views
  • 7 replies
  • 0 kudos
Latest Reply
moritzmeister
Databricks Employee
  • 0 kudos

This is now supported:CREATE OR REFRESH STREAMING TABLE basic_stAS SELECT * FROM STREAM samples.nyctaxi.trips WITH (SKIPCHANGECOMMITS);Supported in runtime 17.3 and later.Documentation: https://docs.databricks.com/aws/en/ldp/developer/sql-dev#create-...

  • 0 kudos
6 More Replies
ccsalt
by New Contributor II
  • 271 Views
  • 4 replies
  • 1 kudos

Inconsistent Cluster Log Persistence to Volume/S3 (stderr, stdout, log4j-active.log)

Saving logs from an all-purpose cluster to Volume or S3 is not consistent, because stderr, stdout, and log4j-active.log get overwritten when the cluster is restarted between minutes 01 and 59.Tested case:A job is configured to start every 20 minutes,...

  • 271 Views
  • 4 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi @ccsalt , This is a known limitation. Log rotation (renaming to log4j-YYYY-MM-DD-HH.log.gz) only happens on the hour boundary. The active log file log4j-active.log has always the same name and is overwritten if a cluster restart happens within one...

  • 1 kudos
3 More Replies
Tracy_
by Databricks Partner
  • 18962 Views
  • 7 replies
  • 0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

image.png
  • 18962 Views
  • 7 replies
  • 0 kudos
Latest Reply
nagarajudevu
  • 0 kudos

i am getting the following error when working in Databricks free edition newly logged inthe code i wrote :from pyspark.sql.functions import *data = "workspace.default.asl.csv"df = spark.read.format("csv").option("header","true").option("inferSchemacc...

  • 0 kudos
6 More Replies
Ashley1
by Contributor
  • 3217 Views
  • 5 replies
  • 2 kudos

Resolved! Turn off AI assistance in notebooks

Hi, has anyone found a way that the AI assistant can be turned off in notebooks? I would be happy to keep code introspection but I find I'm more often hitting escape than accepting the AI's suggestions (or removing the code it has suggested when I ac...

  • 3217 Views
  • 5 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @Ashley1, There are a few different levels where you can control the AI assistance behavior in notebooks. Here is a breakdown: USER-LEVEL: DISABLE AI AUTOCOMPLETE (INLINE SUGGESTIONS) This is the setting that controls the "ghost text" inline code ...

  • 2 kudos
4 More Replies
Schaubi
by New Contributor II
  • 763 Views
  • 2 replies
  • 2 kudos

Resolved! Lakehouse Federation Join Pushdown

Hi,I experimented a little bit with lakehouse federation. I created a connection and foreign catalog that references a SQL Server and activated the public preview feature for Join Pushdowns. After finishing my experiments, it seems to me that the fea...

Data Engineering
join-pushdown
lakehouse-federation
sql-server
  • 763 Views
  • 2 replies
  • 2 kudos
Latest Reply
StefanSch
New Contributor II
  • 2 kudos

Hi,I have experimented a bit with join pushdowns and experienced that intra-schema-Joins are not pushed down if there is a table joined between that is part of another schema.Example:In the following example the Join between x1 and x2 is pushed down ...

  • 2 kudos
1 More Replies
der
by Valued Contributor
  • 248 Views
  • 5 replies
  • 2 kudos

Resolved! spark.databricks.sql.excel.enabled false at cluster level

Native databricks excel data source is GAhttps://www.reddit.com/r/databricks/comments/1t4un82/native_excel_support_is_now_ga/https://docs.databricks.com/aws/en/query/formats/excelHowever, as long as it is not possible to read from another adress than...

  • 248 Views
  • 5 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi  @der ,Most likely because spark.databricks.sql.excel.enabled is a Databricks SQL/session-level internal config, not a SparkConf setting.This specific key appears to be read from the Spark SQL session config, so setting it after the notebook sessi...

  • 2 kudos
4 More Replies
shan-databricks
by Databricks Partner
  • 101 Views
  • 1 replies
  • 2 kudos

Ingestion Gateway DDL Objects Missing - Lakeflow Connect

Facing below issue and need a solution to proceed furtherCategory: ErrorMessage: DDL objects missing on table 'DB.dbo.client'. Execute the DDL objects script and full refresh the table on the Ingestion Pipeline. Error message: 'Reason:- Catalog is no...

  • 101 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @shan-databricks ,The INGESTION_GATEWAY_DDL_OBJECTS_MISSING error means that while CDC is enabled on DB.dbo.client, the LakeFlow-specific DDL support objects (triggers, stored procedures) that allow Databricks to track schema changes (DDL events l...

  • 2 kudos
NageshPatil
by New Contributor III
  • 799 Views
  • 5 replies
  • 1 kudos

Resolved! Lakeflow partial data ingestion for first load

Hi Team,I am doing ingestion of 10 tables from Azure SQL through Lakeflow connect. I have created gateway and ingestion pipelines using databricks SDK. I am starting ingestion pipeline only when gateway is in Running status with resources. I observed...

  • 799 Views
  • 5 replies
  • 1 kudos
Latest Reply
NageshPatil
New Contributor III
  • 1 kudos

HiI finally found a solution that works smoothly to capture the full snapshot on the initial run. Here is the step-by-step approach I implemented:Create a Status Check Function: I wrote a custom function that queries the event_log for a given Pipelin...

  • 1 kudos
4 More Replies
Bank_Kirati
by New Contributor III
  • 152 Views
  • 2 replies
  • 0 kudos

Cross-region S3 reads suddenly fail with 400 Bad Request — eu-west-1 metastore to af-south-1 bucket

What changedA production daily job that has worked unchanged for ~8 months started failing on 2026-05-18 ~23:46 UTC. The notebook does a plain spark.read.json("s3://BUCKET/...") against a bucket in af-south-1. The metastore is in eu-west-1. Same code...

  • 152 Views
  • 2 replies
  • 0 kudos
Latest Reply
sameer_yasser
New Contributor II
  • 0 kudos

Your debugging is really thorough and you've already done the hard work of isolating this. The 400 with an empty body (no proper S3 error code like InvalidArgument) on an opt-in region is almost always one thing: SigV4 signing region mismatch. af-sou...

  • 0 kudos
1 More Replies
plankton
by New Contributor
  • 679 Views
  • 11 replies
  • 6 kudos

Resolved! R plots not rendering

Has anyone been experiencing the issue of R plots not rendering in notebooks, starting a few days ago?t's not related to splarkly or plotly, or specifc data types, or anything. For example in base R: plot(1:3, 5:7) calculates without error, but does ...

  • 679 Views
  • 11 replies
  • 6 kudos
Latest Reply
plankton
New Contributor
  • 6 kudos

Looks like the issue has been resolved. Thanks everyone for chiming in and thanks 'bricks for whatever you did to resolve this.Plankton out!

  • 6 kudos
10 More Replies
seefoods
by Valued Contributor
  • 392 Views
  • 1 replies
  • 1 kudos

DQX - datacontract cli

Hello Guyz, Someone can i combine dqx databricks rules check with datacontract cli ? If yes can we share your idea? https://gpt.datacontract.com/sources/cli.datacontract.com/Cordially, 

  • 392 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @seefoods, Just came across this post. In case you are still looking for an answer, I see these as complementary rather than overlapping tools. A practical approach would be to keep the data contract as the source of truth in datacontract.yaml, us...

  • 1 kudos
IM_01
by Contributor III
  • 282 Views
  • 4 replies
  • 2 kudos

Lakeflow SDP partition error

Hi,I was trying to log an exception in Lakeflow SDP , firstly I am creating an empty streaming dataframe in case of exception and writing log into audit table as shown belowtry: raise Exception("testexception") return df except Exception as e: df=...

  • 282 Views
  • 4 replies
  • 2 kudos
Latest Reply
IM_01
Contributor III
  • 2 kudos

Hi AmiraAs the flows run in parallel, if I use file based logger it might throw exception , so was thinking to go with logging to table as I do not want exception in any of the flow to fail entire pipeline.

  • 2 kudos
3 More Replies
Shivaprasad
by Contributor
  • 210 Views
  • 4 replies
  • 1 kudos

Resolved! Can we able to create materialized view in databricks using all purpose cluster

I was unable to create materialized view in databricks using all purpose cluster wanted to check do we need serverless cluster to create MV

  • 210 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Shivaprasad, You generally should not create a standalone materialised view from an all-purpose cluster. Databricks documents that CREATE MATERIALIZED VIEW is supported from a Pro or Serverless SQL warehouse, or within a pipeline. For standalone ...

  • 1 kudos
3 More Replies
batch_bender
by New Contributor II
  • 192 Views
  • 3 replies
  • 0 kudos

Resolved! Does liquid clustering preserve auditable tenant separation in a shared Delta table architecture?

We’re evaluating a multi-tenant Databricks architecture and considering Liquid Clustering on shared Delta tables. Our concern is that tenant SLAs require data separation for audit/compliance purposes. I’m trying to understand whether Liquid Clusterin...

  • 192 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @batch_bender, I think the key distinction is between data layout for performance and isolation as a control boundary. My view is that Liquid Clustering should not be presented as a tenant-isolation mechanism. The official docs describe it as a da...

  • 0 kudos
2 More Replies
Labels