cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nmtc9to5
by New Contributor II
  • 70 Views
  • 2 replies
  • 0 kudos

Trigger a full refresh in a lakeflow connect pipeline with a job

Hello everyone, I'm implementing a lakeflow connect pipeline that is orchestrated by a lakeflow job. My question is: Is there a way ti configure a job parameter so that when ir receives the vale "true" a full refresh Is performed in the pipeline and ...

  • 70 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can solve it using the conditional tasks in the jobs. You can use If/Else conditional tasks in the main job configuration to run full/incremental based on the parameter. It allows you to route to different pipeline tasks based on the parameter va...

  • 0 kudos
1 More Replies
mnissen1337
by Contributor
  • 126 Views
  • 2 replies
  • 0 kudos

Resolved! How does Databricks handle registration and discovery of custom PySpark data sources in SDPs?

I'm working with Databricks declarative pipelines and have defined a custom PySpark data source (CDS) in its own standalone Python module. I include this module as part of the pipeline resources. What I find interesting is that, even without explicit...

  • 126 Views
  • 2 replies
  • 0 kudos
Latest Reply
aliyasingh
New Contributor III
  • 0 kudos

That is a great observation! You aren't actually triggering a hidden "auto-discovery" feature for custom data sources. Instead, what you are seeing is a byproduct of how Spark Declarative Pipelines (SDPs) evaluate pipeline resources.To answer your sp...

  • 0 kudos
1 More Replies
prafullkatiyar
by New Contributor
  • 76 Views
  • 1 replies
  • 0 kudos

Documentation issue: Invalid JSON example in Lakeflow Connect multi-destination pipeline

Hi Community,I am trying to implement multi-destination pipelines and came across this code in Databricks documentation: Create multi-destination pipelines | Databricks on AWSKey 'table' is repeated in all examples - I believe JSON objects should not...

  • 76 Views
  • 1 replies
  • 0 kudos
Latest Reply
aliyasingh
New Contributor III
  • 0 kudos

You are absolutely correct! Great catch.The JSON example provided in the documentation is structurally invalid. Because standard JSON parsers do not support duplicate keys within the same dictionary object, the second "table" key will silently overwr...

  • 0 kudos
praful
by New Contributor II
  • 6575 Views
  • 6 replies
  • 1 kudos

Recover Lost Notebook

Hi Team, I was using Databricks community edition for learning purpose. I had an account https://community.cloud.databricks.com/?o=6822095545287159 where I stored all my learning notebooks. Unfortunately, this account suddenly stopped working, and I ...

  • 6575 Views
  • 6 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

The workspace id you have shared seems to be related to a workspace which is still in running state, if you missed the login access to this workspace then our team you have reached over email would be able to assist.I will add the following doc for s...

  • 1 kudos
5 More Replies
shan-databricks
by Databricks Partner
  • 82 Views
  • 2 replies
  • 0 kudos

How to promote Lakeflow Connect and Spark Declarative Pipeline to a higher environment

I have five data ingestion jobs using Lakeflow Connect from SQL Server to Databricks, along with a Spark Declarative Pipeline to load data from bronze to silver. I need guidance on promoting these pipelines and jobs from Dev to Stage and Prod, and ho...

  • 82 Views
  • 2 replies
  • 0 kudos
Latest Reply
aliyasingh
New Contributor III
  • 0 kudos

Promoting data workflows across environments is a critical step, and setting it up correctly from the start will save you a lot of operational headaches.The recommended and most robust methodology for managing the lifecycle of both Lakeflow Connect j...

  • 0 kudos
1 More Replies
GabeMatch
by New Contributor
  • 169 Views
  • 1 replies
  • 0 kudos

Lakeflow connect Native connectors (tik, meta ads, Google Ads) - one table per account

We want to leverage these connectors to pull in marketing spend data. But the docs seem to say that the destination must be unique based on accounts. For Tik, we have a hundred accounts... each account will have a destination table for each object.  ...

  • 169 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 0 kudos

I share your pain, I am sitting on 710 landing tables for this reason (of our services is templated website hosting and this is all Google Tag data).  I tried all kinds of tricks in the UI to try and land data into a single landing table but it would...

  • 0 kudos
thackman
by Databricks Partner
  • 1509 Views
  • 5 replies
  • 3 kudos

Resolved! Intermittent failure with Python IMPORTS statements after upgrading to DBR18.0

We have a python module (WidgetUtil.py) that sits in the same folder as our notebook. For the past few years we have been using a simple import statement to use it. Starting with DBR18.0 the imports fails intermittently (25% of the time) when running...

imports.png image (1).png TestCode.jpg WorkingRun.jpg
  • 1509 Views
  • 5 replies
  • 3 kudos
Latest Reply
freddyT
Visitor
  • 3 kudos

Hello, we also have the same issue. The error is sporadic (on the next executions it passes, then it fails again). It is either "cannot import name ..." or "no module ...".We did not change anything and we have this issue since a few hours ago. We ar...

  • 3 kudos
4 More Replies
DazzaiDe
by New Contributor III
  • 251 Views
  • 2 replies
  • 0 kudos

DAB best practices suggestion

We're currently setting up Databricks Asset Bundles (DAB) with a CI/CD pipeline using Azure DevOps.Our planned development workflow is as follows:Main branch → Developer creates a feature branch → Implement changes → Create a Pull Request → Senior de...

  • 251 Views
  • 2 replies
  • 0 kudos
Latest Reply
savlahanish27
Databricks Partner
  • 0 kudos

The workflow itself is solid. A few things worth tightening at each stage, based on running this in production with Azure DevOps:At the PR stage, run databricks bundle validate as a required check before merge is allowed. It catches wrong field names...

  • 0 kudos
1 More Replies
YoshikiFujiwara
by New Contributor II
  • 305 Views
  • 3 replies
  • 0 kudos

Resolved! Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

ContextI'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with...

  • 305 Views
  • 3 replies
  • 0 kudos
Latest Reply
AlfieJames
New Contributor
  • 0 kudos

Following this, I'm curious if anyone has gotten this working.

  • 0 kudos
2 More Replies
DineshOjha
by New Contributor III
  • 178 Views
  • 3 replies
  • 1 kudos

Views in DR environment

Hi Team,We are currently using the Databricks Deep clone feature to clone our tables to Databricks DR environment. When we deploy our jobs, they run in production and the tables get cloned to the DR. But the views dont get cloned as deepclone doesnt ...

  • 178 Views
  • 3 replies
  • 1 kudos
Latest Reply
DineshOjha
New Contributor III
  • 1 kudos

Thank you AshwinWe have implemented CI/CD pipeline to deploy the jobs, but it doesn't run the jobs.The jobs which load the data/create the tables need to run at specific times and so that is handled by a separate scheduling tool. In such a scenario, ...

  • 1 kudos
2 More Replies
AustinBen
by New Contributor
  • 129 Views
  • 1 replies
  • 1 kudos

Streaming Amazon DocumentDB to Databricks in near real time - what's the best approach?

Hi everyone,I'm looking for advice from anyone who has implemented near real-time ingestion from Amazon DocumentDB into Databricks.Our current architecture is:Application → Amazon DocumentDBPython AWS Lambda functions capture changes from DocumentDBL...

  • 129 Views
  • 1 replies
  • 1 kudos
Latest Reply
anagilla
Databricks Employee
  • 1 kudos

The best pattern I can think of is to put a streaming bus between DocumentDB and Databricks and consume it with Structured Streaming. You are most of the way there already. Lowest-disruption path, since you already capture changes in Lambda: Repoint...

  • 1 kudos
Shivani_Komma99
by New Contributor
  • 185 Views
  • 4 replies
  • 1 kudos

Resolved! Unable to see a folder in DBFS

Hi Team,We have a few scripts stored in a folder on a DBFS path. Recently, we've noticed that when we navigate to this path manually, the folder appears to be empty, and we are unable to see the scripts.However, the jobs that reference and access the...

  • 185 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Shivani_Komma99, Thanks for flagging this. Based on the behaviour you described, this appears to be consistent with a DBFS browser UI issue/regression rather than a problem with the underlying files, especially since the files are still accessibl...

  • 1 kudos
3 More Replies
aditi_mokashi
by New Contributor
  • 94 Views
  • 0 replies
  • 0 kudos

Urgent: Installing Lakebridge on Databricks

Hi,I want to install Databricks Lakebridge on my Databricks environment and use the analyze and transpile commands through a python script.The usecase is that we need to create an automated pipeline that will migrate the existing scripts from snowfla...

  • 94 Views
  • 0 replies
  • 0 kudos
darek554
by New Contributor
  • 146 Views
  • 1 replies
  • 0 kudos

Code on cluster runs idefinitely

Hello.Ive created a custom cluster - m4.large. When i try to execute some code in this cluster the behaviour is as follows:- Cluster starts, have running status- I run code, for example print("Hello")- Code runs indefinitely- I click interrupt, it st...

  • 146 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 0 kudos

The fact that print("Hello") eventually works but SELECT 1 never completes suggests the cluster may be running but not fully initialized for Spark workloads.A few things I’d check first:Cluster Event Log for any provisioning or startup errors.Spark U...

  • 0 kudos
animeshjain
by New Contributor
  • 260 Views
  • 3 replies
  • 0 kudos

Bundle deployment overwrites artifacts while jobs are running - best practices?

Hi everyone,I'm using #Declarative Automation Bundles (DAB) to deploy data pipelines, and I've run into an issue with concurrent job runs and deploymentWhat happened:I started a job that depends on a wheel file built by the bundle (timestamped artifa...

animeshjain_0-1782560608354.png
  • 260 Views
  • 3 replies
  • 0 kudos
Latest Reply
sudhaktr
New Contributor II
  • 0 kudos

Do you have source_linked_deployment set as false? That's probably causing it.

  • 0 kudos
2 More Replies
Labels