Data Engineering

Forum Posts

Sorted by:

by jainshasha • New Contributor II

a week ago

236 Views
12 replies
2 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

Data Engineering

236 Views
12 replies
2 kudos

a week ago

View Replies

Latest Reply

emora
New Contributor II

18 seconds ago

2 kudos

Honestly you shouldn't have any kind of limitation executing diferent workflows.I did a test case in my Databricks and if you have your workflows with a job cluster your shouldn't have limitation. But I did all my test in Azure and just for you to kn...

2 kudos

18 seconds ago

11 More Replies

by namankhamesara • New Contributor II

yesterday

20 Views
1 replies
0 kudos

Error while running Databricks modules

Hi Databricks Community,I am following https://customer-academy.databricks.com/learn/course/1266/data-engineering-with-databricks?generated_by=575333&hash=6edddab97f2f528922e2d38d8e4440cda4e5302a this course provided by databricks. In this when I am ...

Data Engineering

databrickscommunity

20 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

17m ago

0 kudos

Hi @namankhamesara, Thank you for reaching out! It appears there might be an issue with accessing the data for your course. To expedite your request and resolve this issue promptly, please list your concerns on our ticketing portal. Our support staff...

0 kudos

17m ago

by Shazam • New Contributor

Wednesday

75 Views
1 replies
0 kudos

Ingestion time clustering -Initial load

As per info available ingestion time clustering makes use of time of the time a file is written or ingested in databricks. In a use case where there is new delta table and an etl which runs in timely fashion(say daily) inserting records, am able to ...

Data Engineering

75 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

Kaniz
Community Manager

28m ago

0 kudos

Hi @Shazam, Great questions! Let’s break down each scenario: Initial Data Migration: When migrating data from an existing platform to Databricks, you might have a large initial load of records. In this case, ingestion time clustering can still be...

0 kudos

28m ago

by Anske • New Contributor II

2 weeks ago

167 Views
6 replies
1 kudos

Resolved! DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...

Data Engineering

Delta Live Tables

167 Views
6 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

1 kudos

Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table. Let’s troubleshoot this together! Pipeline Update Process: Whe...

1 kudos

yesterday

5 More Replies

by MohammadWasi • Visitor

yesterday

16 Views
0 replies
0 kudos

i can list out the file using dbutils but can not able to read files in databricks

I can list out the file using dbutils but can not able to read files in databricks. PFB in screenshot. I can able to see the file using dbutils.fs.ls but when i try to read this file using read_excel then it is showing me an error like "FileNotFound...

Data Engineering

Databricks

16 Views
0 replies
0 kudos

yesterday

by ashraf1395 • New Contributor

yesterday

16 Views
0 replies
0 kudos

Starting Serverless sql cluster on GCP

Hello there,I am trying to start a serverless databricks SQL cluster in GCP. I am following this databricks doc: https://docs.gcp.databricks.com/en/admin/sql/serverless.htmlI have checked that all my requirements are fulfilled for activating the clus...

Data Engineering

16 Views
0 replies
0 kudos

yesterday

by halox6000 • New Contributor III

yesterday

37 Views
1 replies
0 kudos

How do i stop pyspark from outputting text

I am using a tqdm progress bar to monitor the amount of data records I have collected via API. I am temporarily writing them to a file in the DBFS, then uploading to a Spark DataFrame. Each time I write to a file, I get a message like 'Wrote 8873925 ...

Data Engineering

37 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

0 kudos

Hi @halox6000, To stop the progress bar output from tqdm, you can use the disable argument. Set it to True to silence any tqdm output. In fact, it will not only hide the display but also skip the progress bar calculations entirely1. Here’s an examp...

0 kudos

yesterday

by MrD • New Contributor

Wednesday

77 Views
1 replies
0 kudos

Issue with autoscalling the cluster

Hi All, My job is breaking as the cluster is not able to autoscale. below is the log,can it be due to AWS vms are not spinning up or can be due to issue databricks configuration.Does anyone has faced it before ?TERMINATING Compute terminated. Reason:...

Data Engineering

77 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

koushiknpvs
New Contributor III

yesterday

0 kudos

Hey MrD,I faced this issue while running Azure VMs. A restart and re atatching the cluster helped me. Please let me know if that works for you.

0 kudos

yesterday

by smukhi • New Contributor

Friday

113 Views
2 replies
0 kudos

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of ...

Data Engineering

113 Views
2 replies
0 kudos

Friday

View Replies

Latest Reply

smukhi
New Contributor

yesterday

0 kudos

As advised, I double confirmed that no code or cluster configuration was changed (even got a second set of eyes on it that confirmed the same).I was able to find a "fix" which puts a bandaid on the issue:I was able to pinpoint that the issue seems to...

0 kudos

yesterday

1 More Replies

by Wolfoflag • New Contributor II

yesterday

36 Views
1 replies
0 kudos

Threads vs Processes (Parallel Programming) Databricks

Hi Everyone,I am trying to implement parallel processing in databricks and all the resources online point to using ThreadPool from the pythons multiprocessing.pool library or concurrent future library. These libraries offer methods for creating async...

Data Engineering

36 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Wojciech_BUK
Contributor III

yesterday

0 kudos

I am not super expert but I have been using databricks for a while and I can say that - when you use any Python library like asyncio, ThredPool and so one - this is good only to some maintenance things, small api calls etc.When you want to leverage s...

0 kudos

yesterday

by digui • New Contributor

07-15-2022 1:58:57 PM

1975 Views
4 replies
0 kudos

Issues when trying to modify log4j.properties

Hi y'all.I'm trying to export metrics and logs to AWS cloudwatch, but while following their tutorial to do so, I ended up facing this error when trying to initialize my cluster with an init script they provided.This is the part where the script fail...

Data Engineering

1975 Views
4 replies
0 kudos

07-15-2022 1:58:57 PM

View Replies

Latest Reply

cool_cool_cool
New Contributor II

yesterday

0 kudos

@digui Did you figure out what to do? We're facing the same issue, the script works for the executors.I was thinking on adding an if that checks if there is log4j.properties and modify it only if it exists

0 kudos

yesterday

3 More Replies

by Menegat • Visitor

yesterday

43 Views
1 replies
0 kudos

VACUUM seems to be deleting Autoloader's log files.

Hello everyone,I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.The issue I'm facing is that...

Data Engineering

43 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

0 kudos

Hi @Menegat, It seems you’re encountering an issue with your Delta tables during incremental updates. Let’s dive into this and explore potential solutions. Delta Live Tables and Incremental Updates: Delta Live Tables allow for incremental updates...

0 kudos

yesterday

by georgef • Visitor

yesterday

41 Views
1 replies
0 kudos

Cannot import relative python paths

Hello,Some variations of this question have been asked before but there doesn't seem to be an answer for the following simple use case:I have the following file structure on a Databricks Asset Bundles project: src --dir1 ----file1.py --dir2 ----file2...

Data Engineering

41 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

0 kudos

Hi @georgef, It appears that you’re encountering issues with importing modules within a Databricks Asset Bundles (DABs) project. Let’s explore some potential solutions to address this problem. Bundle Deployment and Import Paths: When deploying a ...

0 kudos

yesterday

by ChingizK • New Contributor II

4 weeks ago

300 Views
1 replies
0 kudos

Workflow Failure Alert Webhooks for OpsGenie

I'm trying to set up a Workflow Job Webhook notification to send an alert to OpsGenie REST API on job failure. We've set up Teams & Email successfully.We've created the Webhook and when I configure "On Failure" I can see it in the JSON/YAML view. How...

Data Engineering

jobs

opsgenie

webhooks

Workflows

300 Views
1 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

0 kudos

Hi @ChingizK, Configuring the payload for OpsGenie Webhook integration is essential to ensure that the data sent to OpsGenie meets your requirements. Let’s walk through the steps: Create a Webhook Integration in OpsGenie: Go to Settings > Integra...

0 kudos

yesterday

by Databricks143 • New Contributor III

10-03-2023 9:54:43 PM

4360 Views
6 replies
0 kudos

Recrusive cte in databrick sql

Hi Team,How to write recrusive cte in databricks SQL.Please let me know any one have solution for this

Data Engineering

4360 Views
6 replies
0 kudos

10-03-2023 9:54:43 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

yesterday

0 kudos

It is still not supported. Not sure when it will be (if ever).

0 kudos

yesterday

5 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Job Cluster in Databricks workflow

Error while running Databricks modules

Ingestion time clustering -Initial load

Resolved! DLT apply_changes applies only deletes and inserts not updates

i can list out the file using dbutils but can not able to read files in databricks

Starting Serverless sql cluster on GCP

How do i stop pyspark from outputting text

Issue with autoscalling the cluster

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

Threads vs Processes (Parallel Programming) Databricks

Issues when trying to modify log4j.properties

VACUUM seems to be deleting Autoloader's log files.

Cannot import relative python paths

Workflow Failure Alert Webhooks for OpsGenie

Recrusive cte in databrick sql

Does DLT use one single SparkSession?

Optimising Clusters in Databricks on GCP

DLT apply_changes applies only deletes and inserts...

Azure Data Factory and Photon

Scheduled job output export