cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Tommabip
by New Contributor III
  • 587 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Cluster Policies

Hi, I' m trying to create a terraform script that does the following:- create a policy where I specify env variables and libraries- create a cluster that inherits from that policy and uses the env variables specified in the policy.I saw in the decume...

  • 587 Views
  • 3 replies
  • 2 kudos
Latest Reply
BigRoux
Databricks Employee
  • 2 kudos

You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy...

  • 2 kudos
2 More Replies
Alex_Persin
by New Contributor III
  • 6946 Views
  • 6 replies
  • 8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 6946 Views
  • 6 replies
  • 8 kudos
Latest Reply
stevewb
New Contributor II
  • 8 kudos

Bump again... does anyone have a solution for this?

  • 8 kudos
5 More Replies
valde
by New Contributor
  • 249 Views
  • 1 replies
  • 0 kudos

Window function VS groupBy + map

Let's say we have an RDD like this:RDD(id: Int, measure: Int, date: LocalDate)Let's say we want to apply some function that compares 2 consecutive measures by date, outputs a number and we want to get the sum of those numbers by id. The function is b...

  • 249 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Contributor III
  • 0 kudos

Hi @valde, those two approaches give the same result, but they don’t work the same way under the hood. SparkSQL uses optimized window functions that handle things like shuffling and memory more efficiently, often making it faster and lighter.On the o...

  • 0 kudos
Nathant93
by New Contributor III
  • 1265 Views
  • 2 replies
  • 0 kudos

(java.util.concurrent.ExecutionException) Boxed Error

Has anyone ever come across the error above?I am trying to get two tables from unity catalog and join them, the join is fairly complex as it is imitating a where not exists top 1 sql query.

  • 1265 Views
  • 2 replies
  • 0 kudos
Latest Reply
pk13
New Contributor II
  • 0 kudos

Hello @VZLA Recently, I am getting the exact same error.It has a caused by as below -```Caused by: kafkashaded.org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.```Stacktrace -ERROR: Some ...

  • 0 kudos
1 More Replies
eenaagrawal
by New Contributor
  • 744 Views
  • 1 replies
  • 0 kudos
  • 744 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Contributor
  • 0 kudos

Hi @eenaagrawal ,There isn't a specific built-in integration in Databricks to directly interact with Sharepoint. However, you can accomplish this by leveraging libraries like Office365-REST-Python-Client, which enable interaction with Sharepoint's RE...

  • 0 kudos
rahuja
by Contributor
  • 1439 Views
  • 2 replies
  • 0 kudos

Resolved! Cloning Git Repository in Databricks via Rest API Endpoint using Azure Service principal

HelloI have written a python script that uses Databricks Rest API(s). I am trying to clone/ update an Azure Devops Repository inside databricks using Azure Service Principal. I am able to retrieve the credential_id for the service principal I am usin...

  • 1439 Views
  • 2 replies
  • 0 kudos
Latest Reply
rahuja
Contributor
  • 0 kudos

@nicole_lu_PM  So sorry for coming back to this issue after such a long time. But I looked into it and it seems like this concept of OBO token is applicable in case we use Databricks with AWS as our cloud provider. In case of Azure most of the commen...

  • 0 kudos
1 More Replies
ShashiPrakash
by New Contributor II
  • 706 Views
  • 2 replies
  • 1 kudos

Resolved! Unity Catalog Table in Databricks Asset Bundle

I am looking to deploy unity catalog schemas and tables via Databricks Asset Bundle (DAB). We can do schema evolution of tables via notebooks as well, but we already have 1000+ notebooks and implementing via notebooks will be an effort hence was look...

  • 706 Views
  • 2 replies
  • 1 kudos
Latest Reply
ShashiPrakash
New Contributor II
  • 1 kudos

Thanks for the prompt response @saurabh18cs . Yes that was the alternating i was considering. I believe it will be the warehouses group command which will explore. Will you be able to share any best practice document to manage the SQL project file, w...

  • 1 kudos
1 More Replies
RobCox
by New Contributor II
  • 409 Views
  • 2 replies
  • 0 kudos

DAB - Common cluster configs possible?

I've been trying various solutions and perhaps maybe just thinking about this the wrong way.We're migrating over from Synapse where we're used to have a defined set of DBX Cluster profiles to run our jobs against, these are all job clusters created v...

  • 409 Views
  • 2 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor
  • 0 kudos

hi, you can also parametrize your job clusters ?? job_clusters:      - job_cluster_key: Job_cluster        new_cluster:          spark_version: ${var.spark_version}          spark_conf: ${var.spark_configuration}          azure_attributes:           ...

  • 0 kudos
1 More Replies
ShivangiB
by New Contributor III
  • 524 Views
  • 3 replies
  • 0 kudos

Zorder and Liquid Clustering Performance while reading and writing data

when i am writing to a liquid clustering table it is taking more time compared to zorder

  • 524 Views
  • 3 replies
  • 0 kudos
Latest Reply
ShivangiB
New Contributor III
  • 0 kudos

We are trying to understabnd the overall behavior of liquid clustering

  • 0 kudos
2 More Replies
DatabricksQuery
by New Contributor
  • 248 Views
  • 1 replies
  • 0 kudos

Databricks Job Listener Concept for Tracking Personal Jobs

Hello everyoneI want to know if any listener mechanism in Databricks can track the configuration of Databricks jobs deployed through CI/CD. With the help of this listener, we can track our custom jobs that are not part of the CI/CD process. This way,...

  • 248 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor
  • 0 kudos

Hi , I don't think Databricks provides a built-in listener mechanism to track changes to job configurations directly. However, you can implement a custom solution to monitor and track changes to Databricks jobs deployed through CI/CD pipelines using ...

  • 0 kudos
khishore
by Contributor
  • 5046 Views
  • 9 replies
  • 6 kudos

Resolved! i haven't received my certificate or the badge for Databricks Certified Data Engineer Associate

Hi @Lindsay Olson​ @Kaniz Fatma​ ,I have cleared my Databricks Certified Data Engineer Associate on 29 October 2022. but haven't received my badge or certificate yet .Can you guys please help .Thanks

  • 5046 Views
  • 9 replies
  • 6 kudos
Latest Reply
gokul2
New Contributor III
  • 6 kudos

Hi @Lindsay Olson​ @Kaniz Fatma​ ,I have cleared my Databricks Certified Data Engineer Associate on 01 December 2024.you have shared my certificate to this mail id (927716@congizant.com) on December 2 but my origination has blocked external sites, ki...

  • 6 kudos
8 More Replies
chethankumar
by New Contributor III
  • 1783 Views
  • 4 replies
  • 1 kudos

How to execute SQL statement using terraform

Is there a way to execute SQL statements using Terraform I can see it can be possible using API as bellow, https://docs.databricks.com/api/workspace/statementexecution/executestatementbut I want to know is a strength way to run like bellow code provi...

  • 1783 Views
  • 4 replies
  • 1 kudos
Latest Reply
KartikeyaJain
New Contributor III
  • 1 kudos

The official Databricks provider in Terraform only allows you to create SQL queries, not execute them. To actually run queries, you can either:Use the http provider to make API calls to the Databricks REST API to execute SQL queries.Alternatively, if...

  • 1 kudos
3 More Replies
naga93
by New Contributor
  • 746 Views
  • 1 replies
  • 0 kudos

How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio

Hello,I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.The ...

  • 746 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi naga93,How are you doing today?, As per my understanding, you’ve done a great job navigating all the tricky parts of Delta + Unity Catalog + Dremio integration! You're absolutely right to set minReaderVersion to 2 and disable deletion vectors to m...

  • 0 kudos
surajitDE
by New Contributor III
  • 603 Views
  • 1 replies
  • 0 kudos

How can we change from GC to G1GC in serverless

My DLT jobs are experiencing throttling due to the following error message:[GC (GCLocker Initiated GC) [PSYoungGen: 5431990K->102912K(5643264K)] 9035507K->3742053K(17431552K), 0.1463381 secs] [Times: user=0.29 sys=0.00, real=0.14 secs]I came across s...

  • 603 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi surajitDE,How are you doing today?, As per my understanding, You're absolutely right to look into the GC (Garbage Collection) behavior—when you're seeing messages like GCLocker Initiated GC and frequent young gen collections, it usually means your...

  • 0 kudos
prasadvaze
by Valued Contributor II
  • 8432 Views
  • 4 replies
  • 6 kudos

Resolved! Limit on number of result rows displayed on databricks SQL UI

Databricks SQL UI currently limits the query results display to 64000 rows. When will this limit go away? Using SSMS I get 40MM rows results in the UI and my users won't switch to databricks SQL for this reason

  • 8432 Views
  • 4 replies
  • 6 kudos
Latest Reply
User16765136105
New Contributor III
  • 6 kudos

Hi @prasad vaze​ - We do have a feature in the works that will increase this limit. If you reach out to your Databricks contact they can give you more details regarding dates and the preview.

  • 6 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels