Data Engineering

Forum Posts

Sorted by:

by fundat • New Contributor II

a week ago

136 Views
2 replies
2 kudos

Resolved! Course - Introduction to Apache Spark

Hi,In the course Introduction to Apache Spark; according to Apache Spark Runtime Architecture; Page 6 of 15. It says that :The cluster manager allocates resources and assigns tasks......Workers perform tasks assigned by the driverCan you help me plea...

Data Engineering

136 Views
2 replies
2 kudos

a week ago

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor III

a week ago

2 kudos

Hi @fundat Perhaps the picture is useful here:Give this blog a read, I think this will answer some of your questions: https://medium.com/@knoldus/understanding-the-working-of-spark-driver-and-executor-4fec0e669399 .All the best,BS

2 kudos

a week ago

1 More Replies

by jigar191089 • New Contributor III

05-16-2025 8:06:32 AM

4790 Views
12 replies
0 kudos

Multiple concurrent jobs using interactive cluster

Hi All,I have notebook in Databricks. This notebook is executed from azure datafactory pipeline having a databricks notebook activity with linkedservice connected to an interactive cluster.When multiple concurrent runs of this pipeline are created, I...

Data Engineering

azure

Databricks

interactive cluster

4790 Views
12 replies
0 kudos

05-16-2025 8:06:32 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

7 hours ago

0 kudos

Greetings @jigar191089 , I did some digging and here are some ideas to think about. This smells like a shared-state/import-path issue on an interactive cluster under concurrency. What likely happened Your notebook imports Python modules from /dbf...

0 kudos

7 hours ago

11 More Replies

by jano • New Contributor III

7 hours ago

9 Views
0 replies
0 kudos

DABs with multi github sources

I want to deploy a dabs that has dev using a github branch and prod using a github release tag. I can't seem to find a way to make this part dynamic based on the target. Things I've tried:- Setting the git varaible in the databricks.yml- making the g...

Data Engineering

9 Views
0 replies
0 kudos

7 hours ago

by biancadoesdata1 • New Contributor II

8 hours ago

18 Views
0 replies
0 kudos

Webinars

Hi! My colleagues and I at Unifeye are hosting a series of regular webinars focused on Databricks content. In November, we’re running four sessions covering Geospatial, Governance, AI, and Delta Sharing, featuring Databricks architects as guest speak...

Data Engineering

18 Views
0 replies
0 kudos

8 hours ago

by Hsn • Visitor

18 hours ago

49 Views
3 replies
1 kudos

Suggest about data engineer

Hey, I'm Hasan Sayyed, currently pursuing SYBCA. I want to become a Data Engineer, but as a beginner, I’ve wasted some time learning other languages and technologies due to a lack of proper knowledge about this field. If someone could guide and teach...

Data Engineering

49 Views
3 replies
1 kudos

18 hours ago

View Replies

Latest Reply

biancadoesdata1
New Contributor II

9 hours ago

1 kudos

Hi Hasan. Great to see your motivation! Here’s a good way to start your journey into data engineering:Master SQL, it’s the foundation of everything in data.Enroll in the Databricks Academy (free) and take the beginner courses like “Get Started with D...

1 kudos

9 hours ago

2 More Replies

by mkwparth • New Contributor III

Thursday

58 Views
2 replies
1 kudos

Resolved! DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Hey Community, I'm facing this error, It says that "com.databricks.pipelines.common.errors.deployment.DeploymentException: Communication lost with driver. Cluster 1030-205818-yu28ft9s was not reachable for 120 seconds" This issue occurred in producti...

Data Engineering

58 Views
2 replies
1 kudos

Thursday

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

10 hours ago

1 kudos

This is actually a known intermittent issue in Databricks, particularly with streaming or Delta Live Tables (DLT) pipelines.This isn’t a logical failure in your code — it’s an infrastructure-level timeout between the Databricks control plane and the ...

1 kudos

10 hours ago

1 More Replies

by CaptainJack • New Contributor III

Thursday

60 Views
1 replies
0 kudos

Pull workspace url and workspace name using databricks-sdk / programaticaly in notebook

1. How could I pull workspace url (https://adb-XXXXX.XX.....net) 2. How could I get workspace name visible in top right corner.I know that easies solution is dbutils.notebook.entry_point.... browserHostName but unfortunetly it is not working in job c...

Data Engineering

60 Views
1 replies
0 kudos

Thursday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

12 hours ago

0 kudos

Can you give this a shot? Not sure if you've a hard requirement of using SDK. workspace_url = spark.conf.get('spark.databricks.workspaceUrl') Getting name is more tricky. You could potentially get it from tags if there is a tagging strategy in place...

0 kudos

12 hours ago

by deano2025 • New Contributor II

Friday

49 Views
1 replies
0 kudos

Databricks asset bundles CI/CD design for github actions

We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...

Data Engineering

asset bundles

49 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

12 hours ago

0 kudos

Have you read about following approach before? Repository Structure Options 1. Monorepo with Multiple Bundles repo-root/ ├── .github/ │ └── workflows/ │ ├── bundle-ci.yml │ └── bundle-deploy.yml ├── bundles/ │ ├...

0 kudos

12 hours ago

by Mathias_Peters • Contributor II

12 hours ago

16 Views
0 replies
0 kudos

Reading MongoDB collections into an RDD

Hi, for a Spark job which does some custom computation, I need to access data from a MongoDB collection and access the elements as of type Document. The reason for this is, that I want to apply some custom type serialization which is already implemen...

Data Engineering

16 Views
0 replies
0 kudos

12 hours ago

by JanFalta • New Contributor

Friday

41 Views
1 replies
0 kudos

Data Masking

Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...

Data Engineering

41 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

13 hours ago

0 kudos

Are you on Unity catalog? Databricks has a solution for this through Unity Catalog Column Masking (also called Dynamic Views or Column-Level Security). https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/filters-and-mask...

0 kudos

13 hours ago

by bhawana-pandey • New Contributor III

yesterday

55 Views
1 replies
0 kudos

Looking for reference DABs bundle yaml and resources for Databricks app deployment (FastAPI redirect

Looking for example databricks.yml and bundle resources for deploying a FastAPI Databricks app using DABs from one environment to another. Deployment works but FastAPI redirects to localhost after deployment, though the homepage loads fine. Need refe...

Data Engineering

55 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

13 hours ago

0 kudos

This is a great place to start: https://apps-cookbook.dev/resources/ Happy to answer specifics as they come after you've reviewed that resource.

0 kudos

13 hours ago

by 02CSE33 • Visitor

13 hours ago

12 Views
0 replies
0 kudos

Migrating SQL Server Tables and Views to Databricks using Lakebridge

We have a requirement to carry out migration of few 100 tables which are present in SQL Server to Databricks Delta Table. We intend to explore Lakebridge capability for carrying out a PoC for this. We also want to migrate few historic records say las...

Data Engineering

12 Views
0 replies
0 kudos

13 hours ago

by aonurdemir • Contributor

a week ago

116 Views
2 replies
4 kudos

Resolved! Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

116 Views
2 replies
4 kudos

a week ago

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Wednesday

4 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

4 kudos

Wednesday

1 More Replies

by kfoster • Contributor

09-28-2022 8:07:48 AM

5922 Views
8 replies
7 kudos

Azure DevOps Repo - Invalid Git Credentials

I have a Repo in Databricks connected to Azure DevOps Repositories.The repo has been working fine for almost a month, until last week. Now when I try to open the Git settings in Databricks, I am getting "Invalid Git Credentials". Nothing has change...

Data Engineering

5922 Views
8 replies
7 kudos

09-28-2022 8:07:48 AM

View Replies

Latest Reply

klaas
New Contributor II

17 hours ago

7 kudos

I had a similar problem. I could fix following these steps:in the Azure Devops repository: User Settings -> Personal access tokens -> + New tokenin Databricks: Settings -> User -> Linked accounts -> Azure Devops (Personal access token)You could also...

7 kudos

17 hours ago

7 More Replies

by Danish11052000 • New Contributor

17 hours ago

20 Views
0 replies
0 kudos

Missing warehouse id/metadata for the system compute warehouse table

I ran the following queries for a specific warehouse_id = '54a93d2138433216' SELECT * FROM system.billing.usage WHERE usage_metadata.warehouse_id = '54a93d2138433216';SELECT * FROM system.compute.warehouse_events WHERE warehouse_id = '54a93d213843321...

Data Engineering

20 Views
0 replies
0 kudos

17 hours ago

Databricks Community

Forum Posts

Resolved! Course - Introduction to Apache Spark

Multiple concurrent jobs using interactive cluster

DABs with multi github sources

Webinars

Suggest about data engineer

Resolved! DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Pull workspace url and workspace name using databricks-sdk / programaticaly in notebook

Databricks asset bundles CI/CD design for github actions

Reading MongoDB collections into an RDD

Data Masking

Looking for reference DABs bundle yaml and resources for Databricks app deployment (FastAPI redirect

Migrating SQL Server Tables and Views to Databricks using Lakebridge

Resolved! Broken s3 file paths in File Notifications for auto loader

Azure DevOps Repo - Invalid Git Credentials

Missing warehouse id/metadata for the system compute warehouse table

Join Us as a Local Community Builder!

Course - Introduction to Apache Spark

DLT | Communication lost with driver | Cluster was...

Broken s3 file paths in File Notifications for aut...

Reading empty json file in serverless gives error

Drop Delta Log seems not to be working