cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jeremy98
by Contributor III
  • 12 Views
  • 1 replies
  • 0 kudos

how to install the package using --index-url

Hi community,I created a job using databricks asset bundle, but I'm worrying about how to install this dependency in the right way?because, I was testing the related job, but seems it doesn't install the torch library properly 

jeremy98_0-1744215217654.png
  • 12 Views
  • 1 replies
  • 0 kudos
Latest Reply
jeremy98
Contributor III
  • 0 kudos

I tried to do it manually and it works.. through databricks asset bundle no. But, I did at the end: dependencies: - torch==2.5.1 - --index-url https://download.pytorch.org/whl/cpu It says:Error: file doesn't exi...

  • 0 kudos
Dnirmania
by Contributor
  • 95 Views
  • 2 replies
  • 0 kudos

Read file from AWS S3 using Azure Databricks

Hi TeamI am currently working on a project to read CSV files from an AWS S3 bucket using an Azure Databricks notebook. My ultimate goal is to set up an autoloader in Azure Databricks that reads new files from S3 and loads the data incrementally. Howe...

Dnirmania_0-1744106993274.png
  • 95 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dnirmania
Contributor
  • 0 kudos

Thank you, @Brahmareddy , for your response. I updated the code based on your suggestion, but I'm still encountering the same error message. I even made my S3 bucket public, but no luck. Interestingly, I was able to read a CSV file from the S3 bucket...

  • 0 kudos
1 More Replies
NikosLoutas
by New Contributor II
  • 15 Views
  • 1 replies
  • 0 kudos

Materialized Views Compute

When creating a Materialized View (MV) without a schedule, there seems to be a cost associated with the MV once it is created, even if it is not queried.The question is, once the MV is created, is there already a "hot" compute ready for use in case a...

  • 15 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

When a Materialized View (MV) is created in Databricks without a refresh schedule, there is no “hot” compute automatically kept ready for ad-hoc refreshes. However, the MV incurs costs associated with storage (vendor cost) because it physically store...

  • 0 kudos
guest0
by New Contributor
  • 62 Views
  • 1 replies
  • 0 kudos

Spark UI Simulator Not Accessible

Hello,The Spark UI Simulator is not accessible since the last few days. I was able to refer to it last week, at https://www.databricks.training/spark-ui-simulator/index.html. I already have access to partner academy (if that is any relevant).  <Error...

Data Engineering
simulator
spark-ui
  • 62 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @guest0! You can refer to this post, which addresses the same issue and outlines a potential workaround.If the issue persists, I recommend raising a ticket with the Databricks Support Team.

  • 0 kudos
jeremy98
by Contributor III
  • 878 Views
  • 9 replies
  • 0 kudos

Error Databricks Bundle Deploy with changes in the wheel file

Hello Community,Suddenly, I have an error, when I'm doing the deploy of the new bundle to databricks changing the python script, the cluster continue to point to an old version of the py script uploaded from databricks asset bundle, why this? 

  • 878 Views
  • 9 replies
  • 0 kudos
Latest Reply
denis-dbx
Databricks Employee
  • 0 kudos

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the ...

  • 0 kudos
8 More Replies
Vasu_Kumar_T
by New Contributor
  • 7 Views
  • 0 replies
  • 0 kudos

Blade bridge Analyzer out of memory issue

We are running bladebridge analyzer, and we are getting to run out of memorywe tried to increase the RAM and still it gives the same error.We cannot run the analyzer against subset of metadata as it would not generate comprehensive report with how th...

  • 7 Views
  • 0 replies
  • 0 kudos
Tommabip
by Visitor
  • 26 Views
  • 3 replies
  • 1 kudos

Databricks Cluster Policies

Hi, I' m trying to create a terraform script that does the following:- create a policy where I specify env variables and libraries- create a cluster that inherits from that policy and uses the env variables specified in the policy.I saw in the decume...

  • 26 Views
  • 3 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy...

  • 1 kudos
2 More Replies
Alex_Persin
by New Contributor III
  • 6282 Views
  • 6 replies
  • 8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 6282 Views
  • 6 replies
  • 8 kudos
Latest Reply
stevewb
New Contributor II
  • 8 kudos

Bump again... does anyone have a solution for this?

  • 8 kudos
5 More Replies
valde
by New Contributor
  • 33 Views
  • 1 replies
  • 0 kudos

Window function VS groupBy + map

Let's say we have an RDD like this:RDD(id: Int, measure: Int, date: LocalDate)Let's say we want to apply some function that compares 2 consecutive measures by date, outputs a number and we want to get the sum of those numbers by id. The function is b...

  • 33 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
New Contributor III
  • 0 kudos

Hi @valde, those two approaches give the same result, but they don’t work the same way under the hood. SparkSQL uses optimized window functions that handle things like shuffling and memory more efficiently, often making it faster and lighter.On the o...

  • 0 kudos
ShivangiB
by New Contributor III
  • 13 Views
  • 1 replies
  • 0 kudos

Not Able To Access GCP storage bucket from Databricks

While running :df = spark.read.format("csv") \    .option("header", "true") \    .option("inferSchema", "true") \    .load('path')df.show()Getting error : java.io.IOException: Invalid PKCS8 data.Cluster Spark Config : spark.hadoop.fs.gs.auth.service....

  • 13 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

Troubleshooting and Resolution for java.io.IOException: Invalid PKCS8 data The error java.io.IOException: Invalid PKCS8 data typically occurs when there is an issue with the private key format or its storage in Databricks secrets. Based on the provid...

  • 0 kudos
Nathant93
by New Contributor III
  • 764 Views
  • 2 replies
  • 0 kudos

(java.util.concurrent.ExecutionException) Boxed Error

Has anyone ever come across the error above?I am trying to get two tables from unity catalog and join them, the join is fairly complex as it is imitating a where not exists top 1 sql query.

  • 764 Views
  • 2 replies
  • 0 kudos
Latest Reply
pk13
New Contributor II
  • 0 kudos

Hello @VZLA Recently, I am getting the exact same error.It has a caused by as below -```Caused by: kafkashaded.org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.```Stacktrace -ERROR: Some ...

  • 0 kudos
1 More Replies
eenaagrawal
by New Contributor
  • 23 Views
  • 1 replies
  • 0 kudos
  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
New Contributor
  • 0 kudos

Hi @eenaagrawal ,There isn't a specific built-in integration in Databricks to directly interact with Sharepoint. However, you can accomplish this by leveraging libraries like Office365-REST-Python-Client, which enable interaction with Sharepoint's RE...

  • 0 kudos
HaripriyaP
by New Contributor
  • 776 Views
  • 2 replies
  • 0 kudos

Multiple Notebooks Migration from one workspace to another without using Git.

Hi all!I need to migrate multiple notebooks from one workspace to another. Is there any way to do it without using Git?Since Manual Import and Export is difficult to do for multiple notebooks and folders, need an alternate solution.Please reply as so...

  • 776 Views
  • 2 replies
  • 0 kudos
Latest Reply
rabia_farooq
  • 0 kudos

@daniel_sahal  this link says page not found

  • 0 kudos
1 More Replies
rahuja
by Contributor
  • 1144 Views
  • 2 replies
  • 0 kudos

Resolved! Cloning Git Repository in Databricks via Rest API Endpoint using Azure Service principal

HelloI have written a python script that uses Databricks Rest API(s). I am trying to clone/ update an Azure Devops Repository inside databricks using Azure Service Principal. I am able to retrieve the credential_id for the service principal I am usin...

  • 1144 Views
  • 2 replies
  • 0 kudos
Latest Reply
rahuja
Contributor
  • 0 kudos

@nicole_lu_PM  So sorry for coming back to this issue after such a long time. But I looked into it and it seems like this concept of OBO token is applicable in case we use Databricks with AWS as our cloud provider. In case of Azure most of the commen...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels