Data Engineering

Forum Posts

Sorted by:

by Alex_Persin • New Contributor III

10-28-2021 2:59:06 AM

8255 Views
6 replies
8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

Data Engineering

8255 Views
6 replies
8 kudos

10-28-2021 2:59:06 AM

View Replies

Latest Reply

stevewb
New Contributor III

04-09-2025 6:54:01 AM

8 kudos

Bump again... does anyone have a solution for this?

8 kudos

04-09-2025 6:54:01 AM

5 More Replies

by MCosta • New Contributor III

08-20-2021 10:23:46 AM

15088 Views
10 replies
19 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

Data Engineering

15088 Views
10 replies
19 kudos

08-20-2021 10:23:46 AM

View Replies

Latest Reply

petern
New Contributor II

03-04-2024 1:06:47 PM

19 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

19 kudos

03-04-2024 1:06:47 PM

9 More Replies

by imgaboy • New Contributor III

03-08-2022 10:11:01 AM

5318 Views
4 replies
3 kudos

Resolved! pySpark Dataframe to DeepLearning model

I have a large time series with many measuring stations recording the same 5 data (Temperature, Humidity, etc.) I want to predict a future moment with a time series model, for which I pass the data from all the measuring stations to the Deep Learning...

Data Engineering

5318 Views
4 replies
3 kudos

03-08-2022 10:11:01 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-08-2022 11:02:04 AM

3 kudos

df.groupBy("date").pivot("Node").agg(first("Temp"))It is converting to classic crosstable so pivot will help. Example above.

3 kudos

03-08-2022 11:02:04 AM

3 More Replies

by User16752240150 • Databricks Employee

06-04-2021 12:34:03 PM

2229 Views
1 replies
0 kudos

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?

I've read this article, which covers:Using CrossValidator or TrainValidationSplit to track hyperparameter tuning (no hyperopt). Only random/grid searchparallel "single-machine" model training with hyperopt using hyperopt.SparkTrials (not spark.ml)"Di...

Data Engineering

2229 Views
1 replies
0 kudos

06-04-2021 12:34:03 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 5:00:45 PM

0 kudos

It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.

0 kudos

06-17-2021 5:00:45 PM

Databricks Community

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

Resolved! Debugging!

Resolved! pySpark Dataframe to DeepLearning model

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?