cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jeremy98
by Honored Contributor
  • 1307 Views
  • 9 replies
  • 0 kudos

Resolved! Error Databricks Bundle Deploy with changes in the wheel file

Hello Community,Suddenly, I have an error, when I'm doing the deploy of the new bundle to databricks changing the python script, the cluster continue to point to an old version of the py script uploaded from databricks asset bundle, why this? 

  • 1307 Views
  • 9 replies
  • 0 kudos
Latest Reply
denis-dbx
Databricks Employee
  • 0 kudos

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the ...

  • 0 kudos
8 More Replies
Tommabip
by New Contributor III
  • 424 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Cluster Policies

Hi, I' m trying to create a terraform script that does the following:- create a policy where I specify env variables and libraries- create a cluster that inherits from that policy and uses the env variables specified in the policy.I saw in the decume...

  • 424 Views
  • 3 replies
  • 2 kudos
Latest Reply
BigRoux
Databricks Employee
  • 2 kudos

You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy...

  • 2 kudos
2 More Replies
Alex_Persin
by New Contributor III
  • 6733 Views
  • 6 replies
  • 8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 6733 Views
  • 6 replies
  • 8 kudos
Latest Reply
stevewb
New Contributor II
  • 8 kudos

Bump again... does anyone have a solution for this?

  • 8 kudos
5 More Replies
valde
by New Contributor
  • 206 Views
  • 1 replies
  • 0 kudos

Window function VS groupBy + map

Let's say we have an RDD like this:RDD(id: Int, measure: Int, date: LocalDate)Let's say we want to apply some function that compares 2 consecutive measures by date, outputs a number and we want to get the sum of those numbers by id. The function is b...

  • 206 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Contributor
  • 0 kudos

Hi @valde, those two approaches give the same result, but they don’t work the same way under the hood. SparkSQL uses optimized window functions that handle things like shuffling and memory more efficiently, often making it faster and lighter.On the o...

  • 0 kudos
Nathant93
by New Contributor III
  • 1098 Views
  • 2 replies
  • 0 kudos

(java.util.concurrent.ExecutionException) Boxed Error

Has anyone ever come across the error above?I am trying to get two tables from unity catalog and join them, the join is fairly complex as it is imitating a where not exists top 1 sql query.

  • 1098 Views
  • 2 replies
  • 0 kudos
Latest Reply
pk13
New Contributor II
  • 0 kudos

Hello @VZLA Recently, I am getting the exact same error.It has a caused by as below -```Caused by: kafkashaded.org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.```Stacktrace -ERROR: Some ...

  • 0 kudos
1 More Replies
eenaagrawal
by New Contributor
  • 421 Views
  • 1 replies
  • 0 kudos
  • 421 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
New Contributor III
  • 0 kudos

Hi @eenaagrawal ,There isn't a specific built-in integration in Databricks to directly interact with Sharepoint. However, you can accomplish this by leveraging libraries like Office365-REST-Python-Client, which enable interaction with Sharepoint's RE...

  • 0 kudos
rahuja
by Contributor
  • 1327 Views
  • 2 replies
  • 0 kudos

Resolved! Cloning Git Repository in Databricks via Rest API Endpoint using Azure Service principal

HelloI have written a python script that uses Databricks Rest API(s). I am trying to clone/ update an Azure Devops Repository inside databricks using Azure Service Principal. I am able to retrieve the credential_id for the service principal I am usin...

  • 1327 Views
  • 2 replies
  • 0 kudos
Latest Reply
rahuja
Contributor
  • 0 kudos

@nicole_lu_PM  So sorry for coming back to this issue after such a long time. But I looked into it and it seems like this concept of OBO token is applicable in case we use Databricks with AWS as our cloud provider. In case of Azure most of the commen...

  • 0 kudos
1 More Replies
ShashiPrakash
by New Contributor II
  • 475 Views
  • 2 replies
  • 1 kudos

Resolved! Unity Catalog Table in Databricks Asset Bundle

I am looking to deploy unity catalog schemas and tables via Databricks Asset Bundle (DAB). We can do schema evolution of tables via notebooks as well, but we already have 1000+ notebooks and implementing via notebooks will be an effort hence was look...

  • 475 Views
  • 2 replies
  • 1 kudos
Latest Reply
ShashiPrakash
New Contributor II
  • 1 kudos

Thanks for the prompt response @saurabh18cs . Yes that was the alternating i was considering. I believe it will be the warehouses group command which will explore. Will you be able to share any best practice document to manage the SQL project file, w...

  • 1 kudos
1 More Replies
RobCox
by New Contributor II
  • 328 Views
  • 2 replies
  • 0 kudos

DAB - Common cluster configs possible?

I've been trying various solutions and perhaps maybe just thinking about this the wrong way.We're migrating over from Synapse where we're used to have a defined set of DBX Cluster profiles to run our jobs against, these are all job clusters created v...

  • 328 Views
  • 2 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor
  • 0 kudos

hi, you can also parametrize your job clusters ?? job_clusters:      - job_cluster_key: Job_cluster        new_cluster:          spark_version: ${var.spark_version}          spark_conf: ${var.spark_configuration}          azure_attributes:           ...

  • 0 kudos
1 More Replies
ShivangiB
by New Contributor III
  • 396 Views
  • 3 replies
  • 0 kudos

Zorder and Liquid Clustering Performance while reading and writing data

when i am writing to a liquid clustering table it is taking more time compared to zorder

  • 396 Views
  • 3 replies
  • 0 kudos
Latest Reply
ShivangiB
New Contributor III
  • 0 kudos

We are trying to understabnd the overall behavior of liquid clustering

  • 0 kudos
2 More Replies
DatabricksQuery
by New Contributor
  • 154 Views
  • 1 replies
  • 0 kudos

Databricks Job Listener Concept for Tracking Personal Jobs

Hello everyoneI want to know if any listener mechanism in Databricks can track the configuration of Databricks jobs deployed through CI/CD. With the help of this listener, we can track our custom jobs that are not part of the CI/CD process. This way,...

  • 154 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor
  • 0 kudos

Hi , I don't think Databricks provides a built-in listener mechanism to track changes to job configurations directly. However, you can implement a custom solution to monitor and track changes to Databricks jobs deployed through CI/CD pipelines using ...

  • 0 kudos
khishore
by Contributor
  • 4824 Views
  • 9 replies
  • 6 kudos

Resolved! i haven't received my certificate or the badge for Databricks Certified Data Engineer Associate

Hi @Lindsay Olson​ @Kaniz Fatma​ ,I have cleared my Databricks Certified Data Engineer Associate on 29 October 2022. but haven't received my badge or certificate yet .Can you guys please help .Thanks

  • 4824 Views
  • 9 replies
  • 6 kudos
Latest Reply
gokul2
New Contributor III
  • 6 kudos

Hi @Lindsay Olson​ @Kaniz Fatma​ ,I have cleared my Databricks Certified Data Engineer Associate on 01 December 2024.you have shared my certificate to this mail id (927716@congizant.com) on December 2 but my origination has blocked external sites, ki...

  • 6 kudos
8 More Replies
chethankumar
by New Contributor III
  • 1562 Views
  • 4 replies
  • 1 kudos

How to execute SQL statement using terraform

Is there a way to execute SQL statements using Terraform I can see it can be possible using API as bellow, https://docs.databricks.com/api/workspace/statementexecution/executestatementbut I want to know is a strength way to run like bellow code provi...

  • 1562 Views
  • 4 replies
  • 1 kudos
Latest Reply
KartikeyaJain
New Contributor III
  • 1 kudos

The official Databricks provider in Terraform only allows you to create SQL queries, not execute them. To actually run queries, you can either:Use the http provider to make API calls to the Databricks REST API to execute SQL queries.Alternatively, if...

  • 1 kudos
3 More Replies
naga93
by New Contributor
  • 613 Views
  • 1 replies
  • 0 kudos

How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio

Hello,I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.The ...

  • 613 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi naga93,How are you doing today?, As per my understanding, you’ve done a great job navigating all the tricky parts of Delta + Unity Catalog + Dremio integration! You're absolutely right to set minReaderVersion to 2 and disable deletion vectors to m...

  • 0 kudos
surajitDE
by New Contributor III
  • 509 Views
  • 1 replies
  • 0 kudos

How can we change from GC to G1GC in serverless

My DLT jobs are experiencing throttling due to the following error message:[GC (GCLocker Initiated GC) [PSYoungGen: 5431990K->102912K(5643264K)] 9035507K->3742053K(17431552K), 0.1463381 secs] [Times: user=0.29 sys=0.00, real=0.14 secs]I came across s...

  • 509 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi surajitDE,How are you doing today?, As per my understanding, You're absolutely right to look into the GC (Garbage Collection) behavior—when you're seeing messages like GCLocker Initiated GC and frequent young gen collections, it usually means your...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels