cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shubham007
by Databricks Partner
  • 1665 Views
  • 9 replies
  • 2 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error while import)

Hi community experts,I am getting error "cannot import name 'recon' from 'databricks.labs.lakebridge.reconcile.execute'" importing modules as shown in attached screenshot. I am follwing steps as mentioned in your partner training module "Lakebridge f...

error_recon.png
  • 1665 Views
  • 9 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @shubham007 ,They made refactoring to that module in last month so that's why it stopped working. Probably Lakebridge for SQL Source System Migration module was recorded before that change.And why they made change? It is explained here:Split recon...

  • 2 kudos
8 More Replies
Dimitry
by Valued Contributor
  • 2833 Views
  • 11 replies
  • 3 kudos

Resolved! Unreliable file events on Azure Storage (SFTP) for job trigger

Hi allI got a job trigger by a file event on the external location.The location and jobs triggers are working fine when uploading file via Azure Portal.I need SFTP trigger, so I went into the event grid, found subscription for the storage account on ...

Dimitry_2-1756857231122.png Dimitry_1-1756857151591.png
  • 2833 Views
  • 11 replies
  • 3 kudos
Latest Reply
Dimitry
Valued Contributor
  • 3 kudos

UpdateAppears that even uploading via UI does not trigger it any more. It did trigger weeks ago.I have just uploaded a file in UI and saw this message in the storage queue:{"topic":"/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Storage/st...

  • 3 kudos
10 More Replies
shubham007
by Databricks Partner
  • 640 Views
  • 1 replies
  • 0 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error in Data and Schema Validation)

Hi community experts,I am getting error while Data and Schema Validation with the Reconciler. As attached here screenshots. Please help resolve this issue.Output:    

shubham007_0-1756969442961.png shubham007_1-1756969493574.png shubham007_0-1756969649236.png shubham007_1-1756969678992.png
  • 640 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @shubham007 ,As stated in another thread. I think this error could be related to misconfiguration on your side. Lakebridge is trying to find following table in your SQL Server instance -> None.SalesLT.customerBut look at which database reconciliat...

  • 0 kudos
Nabbott
by New Contributor
  • 1861 Views
  • 1 replies
  • 2 kudos

Databrick Genie

I have curated silver and gold tables in Advana that feed downstream applications. Other organizations also create tables for their own use. Can Databricks Genie query across tables from different pipelines within the same organization and across mul...

  • 1861 Views
  • 1 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Can you explain the landscape a bit more? The term "pipelines" means something specific in Databricks. You mention "across multiple organizations." What does that mean?  Are you guys using Unity Catalog, are all the tables/data in Unity?  Please elab...

  • 2 kudos
yinan
by New Contributor III
  • 1096 Views
  • 4 replies
  • 4 kudos

Resolved! Does the free version of Databricks not support external storage data sources?

1、Can the data I use with the free version of Databricks on Azure only be stored on Azure, AWS, and Google Cloud Storage?2、Assuming the network is connected, can the paid version be used to access other publicly stored data (i.e., custom storage spac...

  • 1096 Views
  • 4 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 4 kudos

Not sure if this is a cheeky way to get around bringing files in: https://community.databricks.com/t5/data-engineering/connect-to-azure-data-lake-storage-using-databricks-free-edition/m-p/127900#M48116 but I answered a similar thing on a different po...

  • 4 kudos
3 More Replies
ManoramTaparia
by New Contributor II
  • 753 Views
  • 1 replies
  • 1 kudos

Identify updated rows during incremental refresh in DLT Materialized Views

Hello, every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE". I realised I was using current_timestamp as a column in my MV to identify rows which got updated in the last refresh. But that make...

  • 753 Views
  • 1 replies
  • 1 kudos
Latest Reply
ck7007
Contributor II
  • 1 kudos

@ManoramTaparia The issue is that current_timestamp() makes your MV non-deterministic, forcing complete recomputes. Here's how to fix it:Solution: Use the Source Table's Change TrackingOption 1: Leverage Source Table's Timestamp Column@Dlt.table(name...

  • 1 kudos
yinan
by New Contributor III
  • 1260 Views
  • 5 replies
  • 2 kudos
  • 1260 Views
  • 5 replies
  • 2 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 2 kudos

Hello @yinan Good day!!Databricks, being a cloud-based platform, does not have direct built-in support for reading data from a truly air-gapped (completely offline, no network connectivity) Cloudera Distribution for Hadoop (CDH) environment.  In such...

  • 2 kudos
4 More Replies
Kurgod
by New Contributor II
  • 534 Views
  • 2 replies
  • 0 kudos

Using Databricks to transform cloudera lakehouse on-prem without bringing the data to cloud

I am looking for a solution to connect databricks to cloudera lakehouse hosted on-prem and transform the data using databricks without bringing the data to databricks delta tables or cloud storage. once the transformation is done the data need to be ...

  • 534 Views
  • 2 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Databricks Partner
  • 0 kudos

Hello, What is your data volume? You can connect using  jdbc/odbc but this process will be slower if the data volume is too high.Another way of connecting is if your cloudera storage is in HDFS then you can also connect through HDFS API as well.  

  • 0 kudos
1 More Replies
azam-io
by Databricks Partner
  • 1434 Views
  • 4 replies
  • 2 kudos

How can I structure pipeline-specific job params separately in Databricks Asset Bundle.

Hi all, I am working with databricks asset bundle and want to separate environment-specific job params (for example, for "env" and "dev") for each pipeline within my bundle. I need each pipeline to have its own job params values for different environ...

  • 1434 Views
  • 4 replies
  • 2 kudos
Latest Reply
Michał
New Contributor III
  • 2 kudos

Hi azam-io, were you able to solve your problem? Are you trying to have different parameters depending on the environment, or a different parameter value? I think the targets would allow to specify different parameters per environment / target. As fo...

  • 2 kudos
3 More Replies
seefoods
by Valued Contributor
  • 2839 Views
  • 2 replies
  • 1 kudos

Resolved! assets bundle

Hello Guys,I am working on assets bundle. So i want to make it generic for all team like ( analytics, data engineering), Someone could you share a best practice for this purpose ? Cordially, 

  • 2839 Views
  • 2 replies
  • 1 kudos
Latest Reply
Michał
New Contributor III
  • 1 kudos

Hi seefoods, Were you able to achieve that generic asset bundle setup? I've been working on something, potentially, similar, and I'd be happy to discuss it, hoping to share experiences. While what I have works for a few teams, it is focused on declar...

  • 1 kudos
1 More Replies
korijn
by New Contributor II
  • 1456 Views
  • 4 replies
  • 0 kudos

Git integration inconsistencies between git folders and job git

It's a little confusing and limiting that the git integration support is inconsistent between the two options available.Sparse checkout is only supported when using a workspace Git folder, and checking out by commit hash is only supported when using ...

  • 1456 Views
  • 4 replies
  • 0 kudos
Latest Reply
_J
Databricks Partner
  • 0 kudos

Same here, could be a good improvement for the jobs layer guys!

  • 0 kudos
3 More Replies
IONA
by New Contributor III
  • 1983 Views
  • 6 replies
  • 7 kudos

Resolved! Getting data from the Spark query profiler

When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis? Thanks

  • 1983 Views
  • 6 replies
  • 7 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 7 kudos

 Hi @IONA ,As @Louis_Frolio  correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.1. You can try to use query history system table, but it has limited number of metrics %sql SELECT * FROM system.query.history 2. You can use /a...

  • 7 kudos
5 More Replies
Yulei
by New Contributor III
  • 34512 Views
  • 7 replies
  • 1 kudos

Resolved! Could not reach driver of cluster

 Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

  • 34512 Views
  • 7 replies
  • 1 kudos
Latest Reply
osingh
Contributor
  • 1 kudos

It seems like a temporary connectivity or cluster initialization glitch. So if anyone else runs into this, try re-running the job before diving into deeper troubleshooting - it might just work!Hope this helps someone save time.

  • 1 kudos
6 More Replies
ChristianRRL
by Honored Contributor
  • 1011 Views
  • 1 replies
  • 1 kudos

Resolved! Can schemaHints dynamically handle nested json structures? (Part 2)

Hi there, I'd like to follow up on a prior post:https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures/m-p/130209/highlight/true#M48731Basically I'm wondering what's the best way to set *both* d...

  • 1011 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I am not aware on schemahints supporting wildcards for now.  It would be awesome to have though, I agree.So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.

  • 1 kudos
minhhung0507
by Valued Contributor
  • 778 Views
  • 1 replies
  • 1 kudos

Could not reach driver of cluster

I am running a pipeline job in Databricks and it failed with the following message:Run failed with error message Could not reach driver of cluster 5824-145411-p65jt7uo. This message is not very descriptive, and I am not able to identify the root ca...

minhhung0507_0-1756870994085.png
  • 778 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @minhhung0507 ,Typically this error could appear when there's a high load on the driver node. Another reason could be related to high garbage collection on driver node as well as high memory and cpu which leads to throttling, and prevents the driv...

  • 1 kudos
Labels