cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ManojkMohan
by Honored Contributor II
  • 315 Views
  • 6 replies
  • 4 kudos

Resolved! Accessing Databricks data in Salesforce via zero copy

I have uploaded clickstream data as shown belowDo i have to mandatorily share via Delta sharing for values to be exposed in Salesforce ?At the Salesforce end i have confirmed that i have a working connector where i am able to see samples data , but u...

ManojkMohan_0-1763394266370.png ManojkMohan_1-1763394302680.png ManojkMohan_3-1763394433335.png ManojkMohan_4-1763394570789.png
  • 315 Views
  • 6 replies
  • 4 kudos
Latest Reply
Rash_Databrick
  • 4 kudos

HI Team ,Please help me my task is to connect Databrick and salesforce data cloud with zero copy . where we need  databricks data in Salesforce data cloud , also just to mention my databricks workspace +  ADLS stoarge is on private end point. any hel...

  • 4 kudos
5 More Replies
kyeongmin_baek
by New Contributor II
  • 77 Views
  • 4 replies
  • 1 kudos

AWS_INSUFFICIENT_INSTANCE_CAPACITY_FAILURE when starting SQL Server Ingestion pipeline

 Dear Community,I’m seeing a compute error when running a Databricks ingestion pipeline (Lakeflow managed ingestion) on AWS.Cloud : AWSRegion: ap−northeast−2Source: SQL Server ingestion pipelineWhen I start the ingestion pipeline, it fails with the f...

kyeongmin_baek_0-1765269963905.png
  • 77 Views
  • 4 replies
  • 1 kudos
Latest Reply
kyeongmin_baek
New Contributor II
  • 1 kudos

Thank you for your response.I have an additional question.When creating a SQL Server ingestion pipeline using the Databricks Connector, is it possible to edit the compute instance type settings?I am currently configuring this in the Databricks UI, bu...

  • 1 kudos
3 More Replies
dikla
by Visitor
  • 53 Views
  • 3 replies
  • 1 kudos

Resolved! Issues Creating Genie Space via API Join Specs Are Not Persisted

Hi,I’m experimenting with the new API to create a Genie Space.I’m able to successfully create the space, but the join definitions are not created, even though I’m passing a join_specs object in the same format returned by GET /spaces/{id} for an exis...

  • 53 Views
  • 3 replies
  • 1 kudos
Latest Reply
dikla
Visitor
  • 1 kudos

@Raman_Unifeye@Raman_Unifeye Thanks for the detailed explanation — that really helps clarify why my join specs weren’t being persisted.Do you know if support for persisting join_specs, sql_snippets, and measures via the API is planned for an upcoming...

  • 1 kudos
2 More Replies
dvd_lg_bricks
by New Contributor
  • 103 Views
  • 6 replies
  • 3 kudos

Questions About Workers and Executors Configuration in Databricks

Hi everyone, sorry, I’m new here. I’m considering migrating to Databricks, but I need to clarify a few things first.When I define and launch an application, I see that I can specify the number of workers, and then later configure the number of execut...

  • 103 Views
  • 6 replies
  • 3 kudos
Latest Reply
dvd_lg_bricks
New Contributor
  • 3 kudos

I mean: while we’re at it @szymon_dybczak or @Raman_Unifeye , is there a place where all available Databricks configuration parameters are documented? I have some pipelines that rely on special settings, such as changing the serializer, enabling Apac...

  • 3 kudos
5 More Replies
Richard3
by New Contributor
  • 135 Views
  • 4 replies
  • 3 kudos

IDENTIFIER in SQL Views not supported?

Dear community,We are phasing out the dollar param `${catalog_name}` because it has been deprecated since runtime 15.2.We use this parameter in many queries and should now be replaced by the IDENTIFIER clause.In the query below where we retrieve data...

Richard3_0-1765199283388.png Richard3_1-1765199860462.png
  • 135 Views
  • 4 replies
  • 3 kudos
Latest Reply
mnorland
Valued Contributor
  • 3 kudos

There are two options you may want to consider:Switch to using SQL UDTFs from views in certain casesFor each session, dynamically recreate the view using CREATE VIEW via EXECUTE IMMEDIATE or via Python string templating:

  • 3 kudos
3 More Replies
prashant151
by New Contributor II
  • 70 Views
  • 1 replies
  • 1 kudos

Using Init Scipt to execute python notebook at all-purpose cluster level

HiWe have setup.py in my databricks workspace.This script is executed in other transformation scripts using%run /Workspace/Common/setup.pywhich consume lot of time. This setup.py internally calls other utilities notebooks using %run%run /Workspace/Co...

  • 70 Views
  • 1 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 1 kudos

@prashant151 - Unlike legacy (pre-UC) clusters, you cannot directly run a Databricks notebook (like setup.py) from a cluster init script, because init scripts only support shell commands — not %run or notebook execution.You will need to refactor your...

  • 1 kudos
venkatesh557
by Visitor
  • 37 Views
  • 1 replies
  • 0 kudos

Is there a supported method to register a custom PySpark DataSource so that it becomes visible in th

Built a custom connector using the PySpark DataSource API (DataSource V2). The connector works programmatically, but it does not appear in the Databricks Ingestion UI (Add Data → Connectors) like the Salesforce connector.Is there a supported method t...

  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @venkatesh557 ,Unfortunately, the answer is no - there isn’t a supported way for you to “register” an arbitrary PySpark DataSource V2 so that it appears as a tile in the Databricks Add data → Connectors (Ingestion) UI right now

  • 0 kudos
tak0519
by New Contributor II
  • 258 Views
  • 6 replies
  • 6 kudos

Resolved! How can I pass parameters from DABs to something(like notebooks)?

I'm implementing DABs, Jobs, and Notebooks.For configure management, I set parameters on databricks.yml.but I can't get parameters on notebook after executed a job successfully. What I implemented ans Steps to the issue:Created "dev-catalog" on WEB U...

  • 258 Views
  • 6 replies
  • 6 kudos
Latest Reply
Taka-Yayoi
Databricks Employee
  • 6 kudos

Hi @tak0519  I think I found the issue! Don't worry - your DABs configuration looks correct. The problem is actually about how you're verifying the results, not the configuration itself. What's happening In your last comment, you mentioned: "Manuall...

  • 6 kudos
5 More Replies
anhnnguyen
by New Contributor II
  • 143 Views
  • 6 replies
  • 2 kudos

Materialized view always load full table instead of incremental

My delta table are stored at HANA data lake file and I have ETL configured like below@DP.materialized_view(temporary=True) def source(): return spark.read.format("delta").load("/data/source") @dp.materialized_view def sink(): return spark.re...

  • 143 Views
  • 6 replies
  • 2 kudos
Latest Reply
anhnnguyen
New Contributor II
  • 2 kudos

1 more note that I'm not using Unity Catalog here, not sure if it's relevant

  • 2 kudos
5 More Replies
GANAPATI_HEGDE
by New Contributor III
  • 225 Views
  • 3 replies
  • 0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this? 

GANAPATI_HEGDE_0-1762754316899.png GANAPATI_HEGDE_1-1762754398253.png
  • 225 Views
  • 3 replies
  • 0 kudos
Latest Reply
GANAPATI_HEGDE
New Contributor III
  • 0 kudos

i updated my CLI and deployed the job, still i dont see the clusters updates in  pipeline

  • 0 kudos
2 More Replies
hgm251
by New Contributor II
  • 304 Views
  • 3 replies
  • 3 kudos

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Hello!This seems so sudden that we cannot create online tables anymore? Is there a workaround to being able to create online tables temporarily as we need more time to move to synced tables? #online_tables 

  • 304 Views
  • 3 replies
  • 3 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 3 kudos

Yes, the Databricks online tables (legacy) are being deprecated, and after January 15, 2026, you will no longer be able to access or create them.https://docs.databricks.com/aws/en/machine-learning/feature-store/migrate-from-online-tablesHere are few ...

  • 3 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 218 Views
  • 3 replies
  • 1 kudos

Best Practice for Updating Data Skipping Statistics for Additional Columns

Hi Community,I have a scenario where I’ve already calculated delta statistics for the first 32 columns after enabling the dataskipping property. Now, I need to include 10 more frequently used columns that were not part of the original 32.Goal:I want ...

  • 218 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @pooja_bhumandla ,Updating any of two below options does not automatically recompute statistics for existing data. Rather, it impacts the behavior of future statistics collection when adding or updating data in the table.- delta.dataSkippingNumInd...

  • 1 kudos
2 More Replies
absan
by Contributor
  • 83 Views
  • 4 replies
  • 6 kudos

How integrate unique PK expectation into LDP pipeline graph

Hi everyone,I'm working on a LDP and need help ensuring a downstream table only runs if a primary key unique validation check passes. In something like dbt this is very easy to configure but with LDP it seems to require creating a separate view. Addi...

  • 83 Views
  • 4 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

I know your solution is quite popular (just I don't get SELECT MAX(load_date) ). Another one is to use AUTO CDC even if you don't have CDC, as there is KEY option. If MAX(load_date) means that the last snapshot is most essential for you, please check...

  • 6 kudos
3 More Replies
hidden
by New Contributor II
  • 81 Views
  • 3 replies
  • 0 kudos

replicate the behaviour of DLT create auto cdc flow

I want to custom write the behaviour of DLT create auto cdc flow . how can we do it  

  • 81 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

And you need to handle dozens of exceptions, such as late-arriving data, duplicate data, data in the wrong order, etc.

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels