cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nkrom
by Visitor
  • 70 Views
  • 4 replies
  • 0 kudos

Renaming a folder in adls is taking a lot of time

Hi i have a folder customer and customer_01 in adls location , now i need to rename customer_01 to customer and customer to customer_01 both if these folder have lots of files . If i use dbutls.fs.mv its taking a lot of time like 7 hours something is...

  • 70 Views
  • 4 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Hi @Nkrom ,I am happy to share the Azure REST API method! Using the Azure Python SDK is the absolute fastest way to do this but you can choose any other programming language.ADLS Gen2 uses a "Hierarchical Namespace" (HNS). When you use the Azure SDK ...

  • 0 kudos
3 More Replies
Mario_D
by New Contributor III
  • 74 Views
  • 3 replies
  • 3 kudos

Resolved! Missing upstream column lineage missing from api call after some time

I ran the following piece of code on 2 occasions.table_name = 'full path of table"lineage = w.api_client.do("GET",f"/api/2.0/lineage-tracking/column-lineage",body={"table_name": table_name,"column_name": "column_x"})u_lineage_df = spark.createDataFra...

  • 74 Views
  • 3 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @Mario_D, From what I can gather, this can happen, and it’s usually less about a restriction on calling the API itself and more about how lineage was captured or what the caller is allowed to see. A few common reasons are: The caller no longer has...

  • 3 kudos
2 More Replies
Pat
by Esteemed Contributor
  • 55 Views
  • 1 replies
  • 0 kudos

Sync Tables: Unity Catalog to Lakebase - Materialized Views triggered mode

Hi,I am facing some issues syncing Materialized Views into Lakebase with triggered mode, and I actually wonder if this is actually possible.create materialized view some_catalog.some_schema.some_table tblproperties( 'delta.enableChangeDataFeed' = '...

Pat_0-1780047815206.png Pat_2-1780048797934.png
  • 55 Views
  • 1 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Hi @Pat Your research on this is spot on! I was actually really curious about this problem and tested it in my own environment to find the exact solution, and I replicated your exact issue.MV creation:Sync to Postgres (Snapshot):Here is exactly why t...

  • 0 kudos
dpc
by Contributor III
  • 2080 Views
  • 7 replies
  • 9 kudos

Resolved! disabling a task in a databricks job

Hello I have jobs that perform a number of activities.Sometimes I want to disable one or more of these activities.I can do that easily in an app like ADF.Reading around I cannot find an easy way to do this although, what I've read suggest that it was...

  • 2080 Views
  • 7 replies
  • 9 kudos
Latest Reply
rgabo
Visitor
  • 9 kudos

Hi folks, this is Gabor from Databricks here!We've recently launched Disabled tasks which lets you disable an task indefinitely so that it won't run the next time the job runs on a schedule. Simply specify "disabled": true in the job's configuration ...

  • 9 kudos
6 More Replies
Sameera_Naureen
by New Contributor
  • 42 Views
  • 0 replies
  • 0 kudos

internship

I am a data science aspiring Student i am very much interested in databricks and i am looking for internships. if anyone knows how to apply please help

  • 42 Views
  • 0 replies
  • 0 kudos
johschmidt42
by New Contributor II
  • 2723 Views
  • 4 replies
  • 1 kudos

Autoloader cloudFiles.maxFilesPerTrigger ignored with .trigger(availableNow=True)?

Hi, I'm using the Auto Loader feature to read streaming data from Delta Lake files and process them in a batch. The trigger is set to availableNow to include all new data from the checkpoint offset but I limit the amount of delta files for the batch ...

  • 2723 Views
  • 4 replies
  • 1 kudos
Latest Reply
ShamenParis
New Contributor II
  • 1 kudos

Hi @johschmidt42 ,This is a great question, but the mystery actually lies in the very first line of your read configuration: spark_session.readStream.format(source="delta")Because you are using .format("delta") instead of .format("cloudFiles"), you a...

  • 1 kudos
3 More Replies
mzare
by New Contributor II
  • 134 Views
  • 2 replies
  • 0 kudos

Resolved! Lakeflow SDP equivalent of whenNotMatchedBySource

I have a Lakeflow Connect SCD1 pipeline for SQL Server where I get a mirror of what's live in the source database at the point of ingestion. Now I want to implement a process where I implement a downstream SCD2 table capturing changes for each ingest...

Data Engineering
create_auto_cdc_flow
foreachBatch
lakeflow connect
sdp
  • 134 Views
  • 2 replies
  • 0 kudos
Latest Reply
sameer_yasser
New Contributor II
  • 0 kudos

This is exactly the scenario apply_changes_from_snapshot was designed for. It compares consecutive full snapshots and automatically derives inserts, updates, and deletes by absence no delete indicator column needed.

  • 0 kudos
1 More Replies
ashraf1395
by Honored Contributor
  • 1561 Views
  • 2 replies
  • 1 kudos

Fething the catalog and schema which is set in dlt pipeline configuration

I have a dlt pipeline and the notebook which is running on the dlt pipeline has some requirements.I want to get the catalog and schema which is set my dlt pipeline. Reason for it: I have to specify my volume files paths etc and my volume is on the sa...

  • 1561 Views
  • 2 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 1 kudos

Hi @ashraf1395 Can you try this to get the catalog and schema set by your DLT pipeline in the notebookcatalog = spark.conf.get("pipelines.catalog")schema = spark.conf.get("pipelines.schema")

  • 1 kudos
1 More Replies
yuta666
by New Contributor
  • 166 Views
  • 2 replies
  • 1 kudos

Auto Loader on UC Volumes stopped resolving wildcards

The following spark.readStream / cloudFiles configuration was confirmed working on2026-04-30, but stopped working on 2026-05-26. No code or config changes were madebetween these dates, so I assume something was changed implicitly on the Databricks si...

  • 166 Views
  • 2 replies
  • 1 kudos
Latest Reply
saravjeet
Databricks Partner
  • 1 kudos

We are facing a similar issue, not limited to Autoloader but also affecting DLT pipelines and classic ETL job. The behavior is intermittent, jobs run fine and then fail unexpectedly, though they typically succeed on retry if retries are enabled. We t...

  • 1 kudos
1 More Replies
mnissen1337
by New Contributor II
  • 142 Views
  • 1 replies
  • 0 kudos

Resolved! Serverless compute outbound IP whitelisting for external API calls

I’m trying to understand the networking implications of moving some logic to Databricks Serverless / SDP.My current setup is a notebook running as a job on classic compute, and this works because outbound traffic goes through a NAT Gateway, so we can...

  • 142 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @mnissen1337, Your understanding is basically right. With classic compute, the workload runs in your own VNet/VPC, so using your own NAT Gateway to present a stable public egress IP is a standard pattern. With serverless, the compute runs in the D...

  • 0 kudos
vvanag
by New Contributor III
  • 199 Views
  • 5 replies
  • 1 kudos

Resolved! Rendering HTML in ipywidgets output

Hi, I am experimenting with IPywidgets on Databricks (company Azure account and Free Edition).I have an HTML that I would like to render so typically what happens is that simple things like:import ipywidgets as widgets output = widgets.Output() wit...

  • 199 Views
  • 5 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @vvanag, What you’re seeing is expected to some extent in Databricks Notebooks. Databricks supports ipywidgets, but it doesn’t guarantee full Jupyter or Colab parity, and there are a few documented limitations on how widgets render and behave in n...

  • 1 kudos
4 More Replies
HarshVardhan1
by Databricks Partner
  • 77 Views
  • 1 replies
  • 1 kudos

Oracle HIVE Metadata to Databricks UC migration

Dear Databricks Member, Could you please recommend the best solution for the following use case? Use Case: The Hive Metastore is currently used to manage metadata and is backed by an Oracle database. This metastore stores critical information such as...

  • 77 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @HarshVardhan1, The best approach for your scenario is usually not to migrate the Oracle database itself into Unity Catalog. Instead, the supported pattern is to migrate the Hive Metastore objects into Unity Catalog using Databricks migration tool...

  • 1 kudos
peter_hoeltschi
by Databricks Partner
  • 311 Views
  • 3 replies
  • 1 kudos

SCD2 table migration using LakeFlow

A source SQL DB of an operational systems delivers daily snapshots to a legacy DWH with SCD2 logic enabled. Now for a migration to Databricks. Lets look at the table "customer" (SCD2; with customer_id, valid_from and valid_to columns). On migration d...

  • 311 Views
  • 3 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

Hello @peter_hoeltschi !I think your workspace or user permissions only allow SQL/serverless compute not classi clusters because even if you have enterprise pay-as-you-go at the account level does not automatically mean every user can create all-purp...

  • 1 kudos
2 More Replies
thedatacrew
by Databricks Partner
  • 3220 Views
  • 7 replies
  • 0 kudos

Delta Live Tables - skipChangeCommits in SQL

Hi,Could anyone tell me if the skipChangeCommits option is supported in SQL mode? I can use it successfully using Python, but it doesn't look like it is supported by SQL.It seems to be a glaring omission from the SQL support, or support for this will...

thedatacrew_0-1736866714336.png
  • 3220 Views
  • 7 replies
  • 0 kudos
Latest Reply
moritzmeister
Databricks Employee
  • 0 kudos

This is now supported:CREATE OR REFRESH STREAMING TABLE basic_stAS SELECT * FROM STREAM samples.nyctaxi.trips WITH (SKIPCHANGECOMMITS);Supported in runtime 17.3 and later.Documentation: https://docs.databricks.com/aws/en/ldp/developer/sql-dev#create-...

  • 0 kudos
6 More Replies
ccsalt
by New Contributor II
  • 308 Views
  • 4 replies
  • 1 kudos

Inconsistent Cluster Log Persistence to Volume/S3 (stderr, stdout, log4j-active.log)

Saving logs from an all-purpose cluster to Volume or S3 is not consistent, because stderr, stdout, and log4j-active.log get overwritten when the cluster is restarted between minutes 01 and 59.Tested case:A job is configured to start every 20 minutes,...

  • 308 Views
  • 4 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi @ccsalt , This is a known limitation. Log rotation (renaming to log4j-YYYY-MM-DD-HH.log.gz) only happens on the hour boundary. The active log file log4j-active.log has always the same name and is overwritten if a cluster restart happens within one...

  • 1 kudos
3 More Replies
Labels