cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

bricks_2026
by New Contributor III
  • 1150 Views
  • 4 replies
  • 1 kudos

Resolved! Issue while handling Deletes and Inserts in Structured Streaming

HelloWe have a framework which reads the CDF logs from the source table and then merges to the target table. The logic is implemented in such a way that( if there are multiple commit_versions in the source table), a window function is applied to iden...

  • 1150 Views
  • 4 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi @bricks_2026 , I recommend you to consider moving to AUTO CDC which handles the merge and window logic of CDF flow automatically. You need SCD Type 1 to get the last operation only. Check out these docs: Stop hand-coding change data capture pipeli...

  • 1 kudos
3 More Replies
NageshPatil
by New Contributor III
  • 675 Views
  • 4 replies
  • 1 kudos

Lakeflow partial data ingestion for first load

Hi Team,I am doing ingestion of 10 tables from Azure SQL through Lakeflow connect. I have created gateway and ingestion pipelines using databricks SDK. I am starting ingestion pipeline only when gateway is in Running status with resources. I observed...

  • 675 Views
  • 4 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, the recommended approach on this is to just run the pipeline multiple times on the initial load until all the data is captured. You can also monitor the snapshot completed events in the gateway completed log before triggering the ingestion, but t...

  • 1 kudos
3 More Replies
hskimskydd
by Databricks Partner
  • 446 Views
  • 2 replies
  • 0 kudos

INTERNAL_ERROR occurred while converting Iceberg format table to Delta format using Spark

I used Apache Spark to write an iceberg table to Amazon S3.I then ran the code below to convert the iceberg table to delta, and the following exception occurred:```pythonspark.sql('convert to delta iceberg.`s3a://BUCKET/path/to/table_name/`')``````te...

  • 446 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hi @hskimskydd  Thanks for sharing the full stack trace — that helps a lot. What you're seeing looks like a bug at first glance, but there are a few things working together under the surface. The NullPointerException coming out of DelegatingCatalogEx...

  • 0 kudos
1 More Replies
leopold_cudzik
by New Contributor II
  • 548 Views
  • 4 replies
  • 0 kudos

Expensive join in spark declarative pipeline against a Lakeflow conect table

I'm trying to resolve one issue and I would like to get some expert opinion on what the right solution actually is.I have an SQL server with CDC with tables big_source (containing hundreds of millions of rows) and small_source containing small amount...

  • 548 Views
  • 4 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi Leopold, Broadcast join is 100% the first thing to try. This effectively copies the the smaller table to each node that the larger table is being processes on and avoids multiple shuffles. Here is an example of how this is done in SQL. CREATE OR R...

  • 0 kudos
3 More Replies
Luisbct
by New Contributor II
  • 519 Views
  • 2 replies
  • 1 kudos

Resolved! Declarative Automation Bundles Volume creation fails with CATALOG_DOES_NOT_EXIST on first deploy

Hi everyone,I'm working with DAB, and I'm running into a deployment ordering issue.On my first deploy, I get this error:Error: cannot create resources.volumes.raw_data: Catalog 'mycatalog_prod' does not exist. (404 CATALOG_DOES_NOT_EXIST) Endpoint: ...

  • 519 Views
  • 2 replies
  • 1 kudos
Latest Reply
Luisbct
New Contributor II
  • 1 kudos

It works now, thanks a lot for your help

  • 1 kudos
1 More Replies
RolandoCM2020
by New Contributor II
  • 486 Views
  • 2 replies
  • 1 kudos

Job with a cluster defined in DAB YML has error [UC_NOT_ENABLED] on cluster

Hello,The error is: [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.We have a job that uses a cluster defined in yml as: small_cluster_id: description: "Small cluster, singleNode, for longer jobs" type: complex default: ...

Data Engineering
DAB
UC
YML
  • 486 Views
  • 2 replies
  • 1 kudos
Latest Reply
RahulPathakDBX
Databricks Employee
  • 1 kudos

UC_NOT_ENABLED is a cluster configuration issue, not a permissions issue on the user. Databricks throws UC_NOT_ENABLED when the compute is not Unity Catalog enabled, i.e., its access mode is not Standard or Dedicated / (data_security_mode not USER_IS...

  • 1 kudos
1 More Replies
thackman
by Databricks Partner
  • 961 Views
  • 3 replies
  • 2 kudos

Resolved! Intermittent failure with Python IMPORTS statements after upgrading to DBR18.0

We have a python module (WidgetUtil.py) that sits in the same folder as our notebook. For the past few years we have been using a simple import statement to use it. Starting with DBR18.0 the imports fails intermittently (25% of the time) when running...

imports.png image (1).png TestCode.jpg WorkingRun.jpg
  • 961 Views
  • 3 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, This is a known issue with the WSFS FUSE layer in DBR 18.x — a fix has been developed but may notbe fully rolled out yet. The most reliable workaround is to package your .py modules as a wheel and install via %pip install, which bypasses FUSE ent...

  • 2 kudos
2 More Replies
Malthe
by Valued Contributor II
  • 499 Views
  • 4 replies
  • 0 kudos

429 XHR requests against jobs endpoint (RESOURCE_EXHAUSTED)

The Databricks UI sends thousands of repeated queries on the form:/ajax-api/2.0/jobs/get?include_acls=true&job_id=<redacted>Getting a 429 Too Many Requests response. It seems to rotate on a list of job ids.It just keeps trying ...

  • 499 Views
  • 4 replies
  • 0 kudos
Latest Reply
Malthe
Valued Contributor II
  • 0 kudos

Just notify your engineers, they'll know how to handle this and won't need more information. It's a no-brainer to fix this.

  • 0 kudos
3 More Replies
ashraf1395
by Honored Contributor
  • 2572 Views
  • 4 replies
  • 0 kudos

Getting error while using Live.target_table in dlt pipeline

I have created a target table in the same dlt pipeline. But when I read that table in different block of notebook with Live.table_path. It is not able to read it Here is my code block 1 Creating a streaming table # Define metadata tables catalog = sp...

  • 2572 Views
  • 4 replies
  • 0 kudos
Latest Reply
IM_01
Contributor III
  • 0 kudos

Hi @ashraf1395 Were you able to add expectations to the append flow table. If yes could you please share the approach

  • 0 kudos
3 More Replies
Daniel_dlh
by New Contributor II
  • 740 Views
  • 3 replies
  • 0 kudos

Resolved! Declarative Automation Bundles: Replace variables in an SQL file

Hi all,I want to deploy a workflow that has an SQL task. The SQL in this tasks needs to be parametrized (as e.g. the catalog name is dependent on the environment).I have this so far:In .src/mysql.sqlSELECT * FROM {{ catalog }}.schema.table; And my re...

  • 740 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Daniel_dlh, I had recently responded to a similar query on this topic. You may want to check this.  At a high level, you’re combining two separate mechanisms.  Bundle variables (variables: + ${var.name}) are resolved only in the bundle config (YA...

  • 0 kudos
2 More Replies
harshgrewal27
by New Contributor II
  • 827 Views
  • 3 replies
  • 2 kudos

Databricks serverless Queue based on Serverless Environment

So for my Databricks Workflows , i'd a job that was using Environment 3 of Serverless Compute with Performace Optimized as enabled , as we wanted quick execution of Job when triggered . There can be around 10-20 concurrent run , but noticed maybe of ...

harshgrewal27_0-1773407674223.png
  • 827 Views
  • 3 replies
  • 2 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 2 kudos

Hi @harshgrewal27 ,Did you compare query profiles of a run in Env 3 vs Env 4? Check for:Amount of data being processed per runLongest running stagesThis might also explain the execution time difference between Env 3 and Env 4.Best regards,

  • 2 kudos
2 More Replies
moto-charles
by New Contributor II
  • 1036 Views
  • 2 replies
  • 0 kudos

Resolved! CVE-2023-51385 and CVE-2023-38408 in Runtime 17.3 LTS in Azure Gov Databricks

My org is running Databricks in Azure Gov and recently upgraded from runtime 17.1 to 17.3 LTS.  Around the same time as the upgrade, our security team found 17 CVE's, two of which are related to openssh.  We have already contacted Microsoft and they ...

  • 1036 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @moto-charles, For an authoritative statement on CVE‑2023‑51385 and CVE‑2023‑38408 in your specific workspace and region (Azure Gov), the best path is to open a support ticket from your Azure Databricks workspace. That allows Databricks Support an...

  • 0 kudos
1 More Replies
GeKo
by Contributor
  • 964 Views
  • 3 replies
  • 3 kudos

Resolved! how to reliably get the timestamp of the last write/delete activity on a unity catalog table

Hi,I'd like to know the best, most reliable, way to discover when the data in a UnityCatalog table was last modified (means 'added' or 'deleted'). Either python or SQL, doesn't matter....as long as I can use it within a scheduled job to run it period...

  • 964 Views
  • 3 replies
  • 3 kudos
Latest Reply
GeKo
Contributor
  • 3 kudos

Many thanks for answering @SFDataEng  and @Louis_Frolio I'll try the recommendation from Lou first.....then approaching method 3 and 4 from SFDataEng , if required  

  • 3 kudos
2 More Replies
Dimitry
by Valued Contributor
  • 901 Views
  • 5 replies
  • 1 kudos

Resolved! How to specify home catalog for table trigger in YAML

I have a job that is triggered from changes in a table.Table is located in the home catalog.I have multiple environments, that are not predefined (created on the fly)Some process writes into a table, and then job starts processing this table.I have a...

  • 901 Views
  • 5 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

@Dimitry Thanks for the detailed question. This is a common challenge when working with Databricks Asset Bundles (DABs) across dynamically provisioned workspaces where the catalog name varies per environment. You are right that there is currently no ...

  • 1 kudos
4 More Replies
397973
by New Contributor III
  • 319 Views
  • 1 replies
  • 1 kudos

Resolved! Is it unusual that I need to start a compute cluster to sync with Git?

I would guess unusual but want to hear from others before I nag my managers about it. In Databricks (I access in web browser) we have a compute cluster specifically for Git; you need to start it to push code or even to change branches. This is separa...

  • 319 Views
  • 1 replies
  • 1 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 1 kudos

It means you are on the old classic Git Proxy that helped establish connectivity from the Databricks Control Plane to your on-prem Git Server. If your Git Server was cloud-based you would not need the proxy cluster. That being said, the new way is th...

  • 1 kudos
Labels