Data Engineering

Forum Posts

Sorted by:

by jeremy98 • Contributor III

an hour ago

12 Views
1 replies
0 kudos

how to install the package using --index-url

Hi community,I created a job using databricks asset bundle, but I'm worrying about how to install this dependency in the right way?because, I was testing the related job, but seems it doesn't install the torch library properly

Data Engineering

12 Views
1 replies
0 kudos

an hour ago

View Replies

Latest Reply

jeremy98
Contributor III

14m ago

0 kudos

I tried to do it manually and it works.. through databricks asset bundle no. But, I did at the end: dependencies: - torch==2.5.1 - --index-url https://download.pytorch.org/whl/cpu It says:Error: file doesn't exi...

0 kudos

14m ago

by Dnirmania • Contributor

yesterday

95 Views
2 replies
0 kudos

Read file from AWS S3 using Azure Databricks

Hi TeamI am currently working on a project to read CSV files from an AWS S3 bucket using an Azure Databricks notebook. My ultimate goal is to set up an autoloader in Azure Databricks that reads new files from S3 and loads the data incrementally. Howe...

Data Engineering

95 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Dnirmania
Contributor

26m ago

0 kudos

Thank you, @Brahmareddy , for your response. I updated the code based on your suggestion, but I'm still encountering the same error message. I even made my S3 bucket public, but no luck. Interestingly, I was able to read a CSV file from the S3 bucket...

0 kudos

26m ago

1 More Replies

by NikosLoutas • New Contributor II

8 hours ago

15 Views
1 replies
0 kudos

Materialized Views Compute

When creating a Materialized View (MV) without a schedule, there seems to be a cost associated with the MV once it is created, even if it is not queried.The question is, once the MV is created, is there already a "hot" compute ready for use in case a...

Data Engineering

15 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

BigRoux
Databricks Employee

35m ago

0 kudos

When a Materialized View (MV) is created in Databricks without a refresh schedule, there is no “hot” compute automatically kept ready for ad-hoc refreshes. However, the MV incurs costs associated with storage (vendor cost) because it physically store...

0 kudos

35m ago

by guest0 • New Contributor

yesterday

62 Views
1 replies
0 kudos

Spark UI Simulator Not Accessible

Hello,The Spark UI Simulator is not accessible since the last few days. I was able to refer to it last week, at https://www.databricks.training/spark-ui-simulator/index.html. I already have access to partner academy (if that is any relevant). <Error...

Data Engineering

simulator

spark-ui

62 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Advika
Databricks Employee

48m ago

0 kudos

Hello @guest0! You can refer to this post, which addresses the same issue and outlines a potential workaround.If the issue persists, I recommend raising a ticket with the Databricks Support Team.

0 kudos

48m ago

by jeremy98 • Contributor III

01-02-2025 8:11:21 AM

878 Views
9 replies
0 kudos

Error Databricks Bundle Deploy with changes in the wheel file

Hello Community,Suddenly, I have an error, when I'm doing the deploy of the new bundle to databricks changing the python script, the cluster continue to point to an old version of the py script uploaded from databricks asset bundle, why this?

Data Engineering

878 Views
9 replies
0 kudos

01-02-2025 8:11:21 AM

View Replies

Latest Reply

denis-dbx
Databricks Employee

8 hours ago

0 kudos

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the ...

0 kudos

8 hours ago

8 More Replies

by Vasu_Kumar_T • New Contributor

an hour ago

7 Views
0 replies
0 kudos

Blade bridge Analyzer out of memory issue

We are running bladebridge analyzer, and we are getting to run out of memorywe tried to increase the RAM and still it gives the same error.We cannot run the analyzer against subset of metadata as it would not generate comprehensive report with how th...

Data Engineering

7 Views
0 replies
0 kudos

an hour ago

by Tommabip • Visitor

7 hours ago

26 Views
3 replies
1 kudos

Databricks Cluster Policies

Hi, I' m trying to create a terraform script that does the following:- create a policy where I specify env variables and libraries- create a cluster that inherits from that policy and uses the env variables specified in the policy.I saw in the decume...

Data Engineering

26 Views
3 replies
1 kudos

7 hours ago

View Replies

Latest Reply

BigRoux
Databricks Employee

3 hours ago

1 kudos

You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy...

1 kudos

3 hours ago

2 More Replies

by Alex_Persin • New Contributor III

10-28-2021 2:59:06 AM

6282 Views
6 replies
8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

Data Engineering

6282 Views
6 replies
8 kudos

10-28-2021 2:59:06 AM

View Replies

Latest Reply

stevewb
New Contributor II

3 hours ago

8 kudos

Bump again... does anyone have a solution for this?

8 kudos

3 hours ago

5 More Replies

by lsrinivas2k13 • Visitor

3 hours ago

7 Views
0 replies
0 kudos

not able to run python script even after everything is in place in azure data bricks

getting the below error while running a python which connects to azure sql db Database connection error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)") can some on...

Data Engineering

7 Views
0 replies
0 kudos

3 hours ago

by valde • New Contributor

yesterday

33 Views
1 replies
0 kudos

Window function VS groupBy + map

Let's say we have an RDD like this:RDD(id: Int, measure: Int, date: LocalDate)Let's say we want to apply some function that compares 2 consecutive measures by date, outputs a number and we want to get the sum of those numbers by id. The function is b...

Data Engineering

33 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Renu_
New Contributor III

4 hours ago

0 kudos

Hi @valde, those two approaches give the same result, but they don’t work the same way under the hood. SparkSQL uses optimized window functions that handle things like shuffling and memory more efficiently, often making it faster and lighter.On the o...

0 kudos

4 hours ago

by ShivangiB • New Contributor III

7 hours ago

13 Views
1 replies
0 kudos

Not Able To Access GCP storage bucket from Databricks

While running :df = spark.read.format("csv") \ .option("header", "true") \ .option("inferSchema", "true") \ .load('path')df.show()Getting error : java.io.IOException: Invalid PKCS8 data.Cluster Spark Config : spark.hadoop.fs.gs.auth.service....

Data Engineering

13 Views
1 replies
0 kudos

7 hours ago

View Replies

Latest Reply

BigRoux
Databricks Employee

5 hours ago

0 kudos

Troubleshooting and Resolution for java.io.IOException: Invalid PKCS8 data The error java.io.IOException: Invalid PKCS8 data typically occurs when there is an issue with the private key format or its storage in Databricks secrets. Based on the provid...

0 kudos

5 hours ago

by Nathant93 • New Contributor III

11-11-2024 1:38:27 AM

764 Views
2 replies
0 kudos

(java.util.concurrent.ExecutionException) Boxed Error

Has anyone ever come across the error above?I am trying to get two tables from unity catalog and join them, the join is fairly complex as it is imitating a where not exists top 1 sql query.

Data Engineering

764 Views
2 replies
0 kudos

11-11-2024 1:38:27 AM

View Replies

Latest Reply

pk13
New Contributor II

6 hours ago

0 kudos

Hello @VZLA Recently, I am getting the exact same error.It has a caused by as below -```Caused by: kafkashaded.org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.```Stacktrace -ERROR: Some ...

0 kudos

6 hours ago

1 More Replies

by eenaagrawal • New Contributor

yesterday

23 Views
1 replies
0 kudos

How to upload files from databricks to sharepoint?

I required steps.

Data Engineering

23 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

SP_6721
New Contributor

6 hours ago

0 kudos

Hi @eenaagrawal ,There isn't a specific built-in integration in Databricks to directly interact with Sharepoint. However, you can accomplish this by leveraging libraries like Office365-REST-Python-Client, which enable interaction with Sharepoint's RE...

0 kudos

6 hours ago

by HaripriyaP • New Contributor

04-17-2024 6:38:07 AM

776 Views
2 replies
0 kudos

Multiple Notebooks Migration from one workspace to another without using Git.

Hi all!I need to migrate multiple notebooks from one workspace to another. Is there any way to do it without using Git?Since Manual Import and Export is difficult to do for multiple notebooks and folders, need an alternate solution.Please reply as so...

Data Engineering

776 Views
2 replies
0 kudos

04-17-2024 6:38:07 AM

View Replies

Latest Reply

rabia_farooq
Visitor

7 hours ago

0 kudos

@daniel_sahal this link says page not found

0 kudos

7 hours ago

1 More Replies

by rahuja • Contributor

07-01-2024 1:36:34 PM

1144 Views
2 replies
0 kudos

Resolved! Cloning Git Repository in Databricks via Rest API Endpoint using Azure Service principal

HelloI have written a python script that uses Databricks Rest API(s). I am trying to clone/ update an Azure Devops Repository inside databricks using Azure Service Principal. I am able to retrieve the credential_id for the service principal I am usin...

Data Engineering

1144 Views
2 replies
0 kudos

07-01-2024 1:36:34 PM

View Replies

Latest Reply

rahuja
Contributor

7 hours ago

0 kudos

@nicole_lu_PM So sorry for coming back to this issue after such a long time. But I looked into it and it seems like this concept of OBO token is applicable in case we use Databricks with AWS as our cloud provider. In case of Azure most of the commen...

0 kudos

7 hours ago

1 More Replies

User

Count

1611

768

345

286

252

Databricks Community

Forum Posts

how to install the package using --index-url

Read file from AWS S3 using Azure Databricks

Materialized Views Compute

Spark UI Simulator Not Accessible

Error Databricks Bundle Deploy with changes in the wheel file

Blade bridge Analyzer out of memory issue

Databricks Cluster Policies

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

not able to run python script even after everything is in place in azure data bricks

Window function VS groupBy + map

Not Able To Access GCP storage bucket from Databricks

(java.util.concurrent.ExecutionException) Boxed Error

How to upload files from databricks to sharepoint?

Multiple Notebooks Migration from one workspace to another without using Git.

Resolved! Cloning Git Repository in Databricks via Rest API Endpoint using Azure Service principal

Join Us as a Local Community Builder!

OPTIMIZE command on heavily nested table OOM error

Schema updating with CI/CD development in SQL

Re-Ingest Autoloader files foreachbatch

Dynamic inference tasks in workflows using dabs

Generic pipeline with Databricks workflows with mu...