cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GabeMatch
by New Contributor
  • 188 Views
  • 3 replies
  • 0 kudos

Lakeflow connect Native connectors (tik, meta ads, Google Ads) - one table per account

We want to leverage these connectors to pull in marketing spend data. But the docs seem to say that the destination must be unique based on accounts. For Tik, we have a hundred accounts... each account will have a destination table for each object.  ...

  • 188 Views
  • 3 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can follow belowAccept the landing tables in Bronze layer - Create a dedicated schema  like marketing_landing or meta_landing. Allow the managed ingestion pipeline to create all the destination tables here (ads_account1, ads_account2, etc.). Keep...

  • 0 kudos
2 More Replies
hanifmusa
by Visitor
  • 41 Views
  • 1 replies
  • 0 kudos

Getting error hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: The specified blob d

I am exporting parquet files (partitioned by id) in append mode. However, I encounter errors occasionally, while other times the job completes successfully.Apache Spark Exception: Exception thrown in awaitResult: hadoop_azure_shaded.com.microsoft.azu...

  • 41 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

Its generally due to race conditions when Spark checks for existing partition files before writing combined with Azure Blob Storage's eventual consistency mode.You can follow below1. Switch to Delta Lake - You can use Delta Lake format instead of Par...

  • 0 kudos
alejandro_jaram
by New Contributor II
  • 352 Views
  • 4 replies
  • 0 kudos

Resolved! DLT pipelines failing out of memory (serverless)

I have a Data Lake Transformation (DLT) pipeline that runs weekly. Normally, it takes 8 minutes to complete, but since last Friday (June 19), it has been running for hours until it encounters an out-of-memory error. This pipeline is responsible for c...

  • 352 Views
  • 4 replies
  • 0 kudos
Latest Reply
alejandro_jaram
New Contributor II
  • 0 kudos

Hey, I found that another engineer added joins to the metric view definitions to a federated query table. I converted it to a streaming table to use CDF and now time reduced. I need to improve joins to reduce latency as much as possible, but I’m not ...

  • 0 kudos
3 More Replies
hasan_sayyed
by New Contributor II
  • 38 Views
  • 1 replies
  • 0 kudos

Aspiring Data Engineer offering free project support in exchange for mentorship 🚀

Hi everyone,I am looking to accelerate my growth as a Data Engineer and am seeking hands-on guidance from an experienced professional. To gain real-world experience, I am offering my technical services completely free of charge.What I bring to the ta...

  • 38 Views
  • 1 replies
  • 0 kudos
Latest Reply
hasan_sayyed
New Contributor II
  • 0 kudos

check my github where i make some small project while learning https://github.com/hasansayyed13?tab=repositories

  • 0 kudos
Nmtc9to5
by New Contributor II
  • 75 Views
  • 2 replies
  • 0 kudos

Trigger a full refresh in a lakeflow connect pipeline with a job

Hello everyone, I'm implementing a lakeflow connect pipeline that is orchestrated by a lakeflow job. My question is: Is there a way ti configure a job parameter so that when ir receives the vale "true" a full refresh Is performed in the pipeline and ...

  • 75 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can solve it using the conditional tasks in the jobs. You can use If/Else conditional tasks in the main job configuration to run full/incremental based on the parameter. It allows you to route to different pipeline tasks based on the parameter va...

  • 0 kudos
1 More Replies
mnissen1337
by Contributor
  • 144 Views
  • 2 replies
  • 0 kudos

Resolved! How does Databricks handle registration and discovery of custom PySpark data sources in SDPs?

I'm working with Databricks declarative pipelines and have defined a custom PySpark data source (CDS) in its own standalone Python module. I include this module as part of the pipeline resources. What I find interesting is that, even without explicit...

  • 144 Views
  • 2 replies
  • 0 kudos
Latest Reply
aliyasingh
New Contributor III
  • 0 kudos

That is a great observation! You aren't actually triggering a hidden "auto-discovery" feature for custom data sources. Instead, what you are seeing is a byproduct of how Spark Declarative Pipelines (SDPs) evaluate pipeline resources.To answer your sp...

  • 0 kudos
1 More Replies
prafullkatiyar
by New Contributor
  • 98 Views
  • 1 replies
  • 0 kudos

Documentation issue: Invalid JSON example in Lakeflow Connect multi-destination pipeline

Hi Community,I am trying to implement multi-destination pipelines and came across this code in Databricks documentation: Create multi-destination pipelines | Databricks on AWSKey 'table' is repeated in all examples - I believe JSON objects should not...

  • 98 Views
  • 1 replies
  • 0 kudos
Latest Reply
aliyasingh
New Contributor III
  • 0 kudos

You are absolutely correct! Great catch.The JSON example provided in the documentation is structurally invalid. Because standard JSON parsers do not support duplicate keys within the same dictionary object, the second "table" key will silently overwr...

  • 0 kudos
praful
by New Contributor II
  • 6581 Views
  • 6 replies
  • 1 kudos

Recover Lost Notebook

Hi Team, I was using Databricks community edition for learning purpose. I had an account https://community.cloud.databricks.com/?o=6822095545287159 where I stored all my learning notebooks. Unfortunately, this account suddenly stopped working, and I ...

  • 6581 Views
  • 6 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

The workspace id you have shared seems to be related to a workspace which is still in running state, if you missed the login access to this workspace then our team you have reached over email would be able to assist.I will add the following doc for s...

  • 1 kudos
5 More Replies
shan-databricks
by Databricks Partner
  • 98 Views
  • 2 replies
  • 0 kudos

How to promote Lakeflow Connect and Spark Declarative Pipeline to a higher environment

I have five data ingestion jobs using Lakeflow Connect from SQL Server to Databricks, along with a Spark Declarative Pipeline to load data from bronze to silver. I need guidance on promoting these pipelines and jobs from Dev to Stage and Prod, and ho...

  • 98 Views
  • 2 replies
  • 0 kudos
Latest Reply
aliyasingh
New Contributor III
  • 0 kudos

Promoting data workflows across environments is a critical step, and setting it up correctly from the start will save you a lot of operational headaches.The recommended and most robust methodology for managing the lifecycle of both Lakeflow Connect j...

  • 0 kudos
1 More Replies
thackman
by Databricks Partner
  • 1520 Views
  • 5 replies
  • 5 kudos

Resolved! Intermittent failure with Python IMPORTS statements after upgrading to DBR18.0

We have a python module (WidgetUtil.py) that sits in the same folder as our notebook. For the past few years we have been using a simple import statement to use it. Starting with DBR18.0 the imports fails intermittently (25% of the time) when running...

imports.png image (1).png TestCode.jpg WorkingRun.jpg
  • 1520 Views
  • 5 replies
  • 5 kudos
Latest Reply
freddyT
New Contributor
  • 5 kudos

Hello, we also have the same issue. The error is sporadic (on the next executions it passes, then it fails again). It is either "cannot import name ..." or "no module ...".We did not change anything and we have this issue since a few hours ago. We ar...

  • 5 kudos
4 More Replies
DazzaiDe
by New Contributor III
  • 265 Views
  • 2 replies
  • 0 kudos

DAB best practices suggestion

We're currently setting up Databricks Asset Bundles (DAB) with a CI/CD pipeline using Azure DevOps.Our planned development workflow is as follows:Main branch → Developer creates a feature branch → Implement changes → Create a Pull Request → Senior de...

  • 265 Views
  • 2 replies
  • 0 kudos
Latest Reply
savlahanish27
Databricks Partner
  • 0 kudos

The workflow itself is solid. A few things worth tightening at each stage, based on running this in production with Azure DevOps:At the PR stage, run databricks bundle validate as a required check before merge is allowed. It catches wrong field names...

  • 0 kudos
1 More Replies
YoshikiFujiwara
by New Contributor II
  • 324 Views
  • 3 replies
  • 0 kudos

Resolved! Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

ContextI'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with...

  • 324 Views
  • 3 replies
  • 0 kudos
Latest Reply
AlfieJames
New Contributor
  • 0 kudos

Following this, I'm curious if anyone has gotten this working.

  • 0 kudos
2 More Replies
DineshOjha
by New Contributor III
  • 184 Views
  • 3 replies
  • 1 kudos

Views in DR environment

Hi Team,We are currently using the Databricks Deep clone feature to clone our tables to Databricks DR environment. When we deploy our jobs, they run in production and the tables get cloned to the DR. But the views dont get cloned as deepclone doesnt ...

  • 184 Views
  • 3 replies
  • 1 kudos
Latest Reply
DineshOjha
New Contributor III
  • 1 kudos

Thank you AshwinWe have implemented CI/CD pipeline to deploy the jobs, but it doesn't run the jobs.The jobs which load the data/create the tables need to run at specific times and so that is handled by a separate scheduling tool. In such a scenario, ...

  • 1 kudos
2 More Replies
Labels