cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Valued Contributor
  • 433 Views
  • 1 replies
  • 0 kudos

Databricks cell-level code parallel execution through the Python threading library

Hi Team,We are currently planning to  implement Databricks cell-level code parallel execution through the Python threading library. We are interested in comprehending the resource consumption and allocation process from the cluster. Are there any pot...

  • 433 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Phani1, Implementing Databricks cell-level code parallel execution through the Python threading library can be beneficial for performance, but there are some considerations to keep in mind. Let’s break it down: Resource Consumption and Alloca...

  • 0 kudos
Fresher
by New Contributor II
  • 287 Views
  • 1 replies
  • 0 kudos

users are deleted/ unsynced from azure AD to databricks

In azure AD, it's shows users are synced to Databricks. But in Databricks, it's showing users is not a part of the group. The user is not part of only one group , he is part of remaining groups. All the syncing works fine till yesterday. I don't now ...

  • 287 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Fresher, It sounds like you’re experiencing an issue with user synchronization between Azure AD and Databricks. Let’s troubleshoot this together! Here are some steps you can take to resolve the issue: Check SCIM Provisioning Configuration: En...

  • 0 kudos
chloeh
by New Contributor II
  • 278 Views
  • 1 replies
  • 0 kudos

Chaining window aggregations in SQL

In my SQL data transformation pipeline, I'm doing chained/cascading window aggregations: for example, I want to do average over the last 5 minutes, then compute average over the past day on top of the 5 minute average, so that my aggregations are mor...

  • 278 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @chloeh, You’re working with a Spark SQL data transformation pipeline involving chained window aggregations. Let’s look at your code snippet and see if we can identify the issue. First, let’s break down the steps you’ve implemented: You’re read...

  • 0 kudos
jainshasha
by New Contributor III
  • 1361 Views
  • 12 replies
  • 2 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

  • 1361 Views
  • 12 replies
  • 2 kudos
Latest Reply
emora
New Contributor II
  • 2 kudos

Honestly you shouldn't have any kind of limitation executing diferent workflows.I did a test case in my Databricks and if you have your workflows with a job cluster your shouldn't have limitation. But I did all my test in Azure and just for you to kn...

  • 2 kudos
11 More Replies
namankhamesara
by New Contributor II
  • 411 Views
  • 1 replies
  • 0 kudos

Error while running Databricks modules

Hi Databricks Community,I am following https://customer-academy.databricks.com/learn/course/1266/data-engineering-with-databricks?generated_by=575333&hash=6edddab97f2f528922e2d38d8e4440cda4e5302a this course provided by databricks. In this when I am ...

namankhamesara_0-1715054731073.png
Data Engineering
databrickscommunity
  • 411 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @namankhamesara, Thank you for reaching out! It appears there might be an issue with accessing the data for your course. To expedite your request and resolve this issue promptly, please list your concerns on our ticketing portal. Our support staff...

  • 0 kudos
Shazam
by New Contributor
  • 295 Views
  • 1 replies
  • 0 kudos

Ingestion time clustering -Initial load

As per info available ingestion time clustering makes use of time of the time a file is written or ingested in databricks. In a use case where there is  new delta table and an etl which runs in timely fashion(say daily) inserting records, am able to ...

  • 295 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Shazam, Great questions! Let’s break down each scenario: Initial Data Migration: When migrating data from an existing platform to Databricks, you might have a large initial load of records. In this case, ingestion time clustering can still be...

  • 0 kudos
Anske
by New Contributor III
  • 912 Views
  • 6 replies
  • 1 kudos

Resolved! DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes(    target = "cdctest",    source = "cdctest_cdc_enriched",    keys = ["ID"],    sequence_by...

Data Engineering
Delta Live Tables
  • 912 Views
  • 6 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table. Let’s troubleshoot this together! Pipeline Update Process: Whe...

  • 1 kudos
5 More Replies
halox6000
by New Contributor III
  • 321 Views
  • 1 replies
  • 0 kudos

How do i stop pyspark from outputting text

I am using a tqdm progress bar to monitor the amount of data records I have collected via API. I am temporarily writing them to a file in the DBFS, then uploading to a Spark DataFrame. Each time I write to a file, I get a message like 'Wrote 8873925 ...

  • 321 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @halox6000, To stop the progress bar output from tqdm, you can use the disable argument. Set it to True to silence any tqdm output. In fact, it will not only hide the display but also skip the progress bar calculations entirely1. Here’s an examp...

  • 0 kudos
MrD
by New Contributor
  • 304 Views
  • 1 replies
  • 0 kudos

Issue with autoscalling the cluster

Hi All, My job is breaking as the cluster is not able to autoscale. below is the log,can it be due to AWS vms are not spinning up or can be due to issue databricks configuration.Does anyone has faced it before ?TERMINATING Compute terminated. Reason:...

  • 304 Views
  • 1 replies
  • 0 kudos
Latest Reply
koushiknpvs
New Contributor III
  • 0 kudos

Hey MrD,I faced this issue while running Azure VMs. A restart and re atatching the cluster helped me. Please let me know if that works for you.

  • 0 kudos
smukhi
by New Contributor II
  • 808 Views
  • 2 replies
  • 0 kudos

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of ...

  • 808 Views
  • 2 replies
  • 0 kudos
Latest Reply
smukhi
New Contributor II
  • 0 kudos

As advised, I double confirmed that no code or cluster configuration was changed (even got a second set of eyes on it that confirmed the same).I was able to find a "fix" which puts a bandaid on the issue:I was able to pinpoint that the issue seems to...

  • 0 kudos
1 More Replies
Wolfoflag
by New Contributor II
  • 617 Views
  • 1 replies
  • 0 kudos

Threads vs Processes (Parallel Programming) Databricks

Hi Everyone,I am trying to implement parallel processing in databricks and all the resources online point to using ThreadPool from the pythons multiprocessing.pool library or concurrent future library. These libraries offer methods for creating async...

  • 617 Views
  • 1 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

I am not super expert but I have been using databricks for a while and I can say that - when you use any Python library like asyncio, ThredPool and so one - this is good only to some maintenance things, small api calls etc.When you want to leverage s...

  • 0 kudos
digui
by New Contributor
  • 2434 Views
  • 4 replies
  • 0 kudos

Issues when trying to modify log4j.properties

Hi y'all.​I'm trying to export metrics and logs to AWS cloudwatch, but while following their tutorial to do so, I ended up facing this error when trying to initialize my cluster with an init script they provided.This is the part where the script fail...

  • 2434 Views
  • 4 replies
  • 0 kudos
Latest Reply
cool_cool_cool
New Contributor II
  • 0 kudos

@digui Did you figure out what to do? We're facing the same issue, the script works for the executors.I was thinking on adding an if that checks if there is log4j.properties and modify it only if it exists

  • 0 kudos
3 More Replies
Menegat
by New Contributor
  • 327 Views
  • 1 replies
  • 0 kudos

VACUUM seems to be deleting Autoloader's log files.

Hello everyone,I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.The issue I'm facing is that...

  • 327 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Menegat, It seems you’re encountering an issue with your Delta tables during incremental updates. Let’s dive into this and explore potential solutions. Delta Live Tables and Incremental Updates: Delta Live Tables allow for incremental updates...

  • 0 kudos
georgef
by New Contributor
  • 314 Views
  • 1 replies
  • 0 kudos

Cannot import relative python paths

Hello,Some variations of this question have been asked before but there doesn't seem to be an answer for the following simple use case:I have the following file structure on a Databricks Asset Bundles project: src --dir1 ----file1.py --dir2 ----file2...

  • 314 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @georgef, It appears that you’re encountering issues with importing modules within a Databricks Asset Bundles (DABs) project. Let’s explore some potential solutions to address this problem. Bundle Deployment and Import Paths: When deploying a ...

  • 0 kudos
lindsey
by New Contributor
  • 845 Views
  • 1 replies
  • 0 kudos

"Error: cannot read mws credentials: invalid Databricks Account configuration" on TF Destroy

I have a terraform project that creates a workspace in Databricks, assigns it to an existing metastore, then creates external location/storage credential/catalog. The apply works and all expected resources are created. However, without touching any r...

  • 845 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @lindsey, It seems you’re encountering an issue with Terraform and Databricks when trying to destroy resources. Let’s explore some potential solutions to address this problem: Resource Order in Terraform Configuration: Ensure that the databric...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors