cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dbernstein_tp
by New Contributor III
  • 227 Views
  • 3 replies
  • 2 kudos

Lakeflow Connect CDC error, broken links

I get this error, regarding database validation, when setting up a lakeflow connect CDC pipeline (see screenshot). The two links mentioned in the message are broken, they give me a "404 - Content Not Found" when I try to open them. 

Screenshot 2025-11-21 at 9.42.20 AM.png
  • 227 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 2 kudos

@Advika Thank you. My reason for this post was to alert the SQL server ingestion team to this bug in the interface. I will file a report about this (didn't know I could do that) and a few other issues with the feature that I've found recently.

  • 2 kudos
2 More Replies
sk007
by New Contributor
  • 1128 Views
  • 4 replies
  • 2 kudos

Resolved! Lakeflow Connect - Postgres connector

Hi, I was wondering when is the ETA of the LF Connector for PostgreSQL (even in public/private preview)?  

  • 1128 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Ask your workspace administrator if they disabled access to it. Louis  

  • 2 kudos
3 More Replies
ashishCh
by New Contributor II
  • 246 Views
  • 2 replies
  • 1 kudos

Facing CANNOT_OPEN_SOCKET error after job cluster fails to upsacle to target nodes

This error pops up in my Databricks workflow 1 out of 10 times, and everytime it occurs I see the below message in event logs. Compute upsize complete, but below target size. The current worker count is 1, out of a target of 3.And right after this my...

Screenshot 2025-11-25 at 6.08.19 PM.png Screenshot 2025-11-25 at 6.10.50 PM.png Screenshot 2025-11-25 at 6.12.30 PM.png
  • 246 Views
  • 2 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

@ashishCh  The [CANNOT_OPEN_SOCKET] failures stem from PySpark’s default, socket‑based data transfer path used when collecting rows back to Python (e.g., .collect(), .first(), .take()), where the local handshake to a JVM‑opened ephemeral port on 127....

  • 1 kudos
1 More Replies
maddan80
by New Contributor II
  • 2431 Views
  • 5 replies
  • 3 kudos

Oracle Essbase connectivity

Team, I wanted to understand the best way of connecting to Oracle Essbase to ingest data into the delta lake

  • 2431 Views
  • 5 replies
  • 3 kudos
Latest Reply
hyaqoob
New Contributor II
  • 3 kudos

I am currently working with Essbase 21c and I need to pull data from Databricks through a SQL query. I was able to successfully setup JDBC connection to Databricks but when I try to create a data source using a SQL query, it gives me an error: "[Data...

  • 3 kudos
4 More Replies
Dimitry
by Valued Contributor
  • 266 Views
  • 4 replies
  • 1 kudos

Dataframe from SQL query glitches when grouping - what is going on !?!

I have a query with some grouping. I'm using spark.sql to run that query.skus = spark.sql('with cte as (select... group by all) select *, .. from cte group by all')It displays as expected table.This table I want to split into batches for processing, ...

Dimitry_1-1763964698802.png Dimitry_2-1763964772801.png Dimitry_3-1763964861816.png Dimitry_4-1763964998951.png
  • 266 Views
  • 4 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

Try to use this code customized in the way you need:Instead of using monotonically_increasing_id function directly, use row_number over the previous result. This will ensure sequential "small" numbers. This was indeed the exact solution I used to sol...

  • 1 kudos
3 More Replies
Aviraldb
by New Contributor
  • 202 Views
  • 3 replies
  • 0 kudos

Moving files from Volume to Workspace

Hello Team,I am trying to move some files from volume to %shdatabricks fs cp dbfs:/Volumes/workspace/default/delc/generated_scripts/*.py Workspace/Shared/Delc_Project/scripts/ I tried all ways , Please help me to move them  @DataBricks @Louis_Frolio ...

  • 202 Views
  • 3 replies
  • 0 kudos
Latest Reply
Prajapathy_NKR
Contributor
  • 0 kudos

@Aviraldb please try the below way,%shcp /dbfs/Volumes/workspace/default/delc/generated_scripts/*.py  /Workspace/Shared/Delc_Project/scripts/ Hope it helps.

  • 0 kudos
2 More Replies
Suheb
by Contributor
  • 301 Views
  • 2 replies
  • 5 kudos

Resolved! What strategies have you found most effective for optimizing ETL pipelines built on the Databricks L

If you are building data pipelines in Databricks (where data is Extracted, Transformed, and Loaded), what tips, methods, or best practices do you use to make those pipelines run faster, cheaper, and more efficiently?

  • 301 Views
  • 2 replies
  • 5 kudos
Latest Reply
bianca_unifeye
Contributor
  • 5 kudos

When I think about optimising ETL on the Databricks Lakehouse, I split it into four layers: data layout, Spark/SQL design, platform configuration, and operational excellence.And above all: you are not building pipelines for yourself, you are building...

  • 5 kudos
1 More Replies
Naveenkumar1811
by New Contributor III
  • 316 Views
  • 4 replies
  • 2 kudos

Resolved! SkipChangeCommit to True Scenario on Data Loss Possibility

Hi Team,I have Below Scenario,I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a D...

  • 316 Views
  • 4 replies
  • 2 kudos
Latest Reply
Naveenkumar1811
New Contributor III
  • 2 kudos

Hi szymon/Raman,My Question was on the commit it performs with the insert/append via my streaming and the delete operation by the weekly maintenance Job... Is there a way that both transaction will fall into same commit. I need to understand that por...

  • 2 kudos
3 More Replies
Shivaprasad
by Contributor
  • 141 Views
  • 1 replies
  • 0 kudos

In databricks custom APP how I can retrieve Genie parameters and use it in app

I have created a databricks custom APP and it is working. I need to pass parameters from Genie to the custom app. Can someone able to suggest on how I can achieve it. 

  • 141 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

You can pass values between a Genie space and your Databricks App using the Genie Conversation API and by adding the Genie space as an app resource: https://docs.databricks.com/aws/en/dev-tools/databricks-apps/genie Do you want the parameters to orig...

  • 0 kudos
hobrob
by New Contributor
  • 182 Views
  • 2 replies
  • 0 kudos

UDFs for working with date ranges

Hi bricklayers,Originally from a Teradata background and relatively new to Databricks, I was in need of brushing up on my Python and Github CI/CD skills so I’ve spun up a repo for a project I’m calling Terabricks.The aim is to provide a space for mak...

  • 182 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

Fantastic Initiative @hobrob.I have used Teradata for good 5+ years but pre-2014/5. So I will be closely following it and very happy to contribute to it. Thanks. 

  • 0 kudos
1 More Replies
oye
by New Contributor II
  • 290 Views
  • 4 replies
  • 3 kudos

Resolved! Using a cluster of type SINGLE_USER to run parallel python tasks in one job

Hi, I have set up a job of multiple spark python tasks running in parallel. I have only set up one job cluster, single node, data security mode SINGLE_USER, using Databricks Runtime version 14.3.x-scala2.12. These parallel spark python tasks share so...

  • 290 Views
  • 4 replies
  • 3 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 3 kudos

@oye - The variables scope is local to the individual task and do no interfere with other tasks even if the underlying cluster is same. In fact, the issue is normally other way round where if we have to share the variable across tasks - Then the solu...

  • 3 kudos
3 More Replies
carolsun08
by New Contributor
  • 990 Views
  • 2 replies
  • 0 kudos

Repairing job is useful but disabled if the job is triggered by another Run job task

Hi, I regularly use the repair job option to rerun a subset of a larger data pipeline job. However, this option is disabled if I start this job from another "orchestration job" via Job Run tasks. The "repair" button is disabled in my case. This restr...

  • 990 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @carolsun08 How are you doing today?, Yeah, it's a bit frustrating that the repair job option is disabled when running jobs through Job Run tasks in an orchestration job. While there's no official confirmation on when this might change, it would d...

  • 0 kudos
1 More Replies
YS1
by Contributor
  • 295 Views
  • 4 replies
  • 2 kudos

Resolved! Impact of Updating DAB Root Path on Databricks Job Run History

Hello,I’m using Databricks Asset Bundles (DABs) to orchestrate several workflows, and I’d like to update the root path where the bundle is stored.Before making this change, I want to understand its impact:Will changing the bundle root path remove or ...

  • 295 Views
  • 4 replies
  • 2 kudos
Latest Reply
Coffee77
Contributor III
  • 2 kudos

Whenever you keep "job key name" in DAB, root folder should not have any impact. History is associated to the job irrespective of being deployed via DAB or updated manually. KR.

  • 2 kudos
3 More Replies
200649021
by New Contributor
  • 173 Views
  • 0 replies
  • 0 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 173 Views
  • 0 replies
  • 0 kudos
dpc
by Contributor
  • 463 Views
  • 6 replies
  • 9 kudos

Resolved! disabling a task in a databricks job

Hello I have jobs that perform a number of activities.Sometimes I want to disable one or more of these activities.I can do that easily in an app like ADF.Reading around I cannot find an easy way to do this although, what I've read suggest that it was...

  • 463 Views
  • 6 replies
  • 9 kudos
Latest Reply
dpc
Contributor
  • 9 kudos

Thanks. I like the UI option. It's not a permanent disable but it will be good enough

  • 9 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels