cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Alex_Persin
by New Contributor III
  • 5796 Views
  • 5 replies
  • 7 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 5796 Views
  • 5 replies
  • 7 kudos
Latest Reply
sashme
New Contributor II
  • 7 kudos

bump this one. interested in the topic too.is there a known solution yet?

  • 7 kudos
4 More Replies
MCosta
by New Contributor III
  • 11399 Views
  • 10 replies
  • 19 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

  • 11399 Views
  • 10 replies
  • 19 kudos
Latest Reply
petern
New Contributor II
  • 19 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

  • 19 kudos
9 More Replies
imgaboy
by New Contributor III
  • 3285 Views
  • 4 replies
  • 3 kudos

Resolved! pySpark Dataframe to DeepLearning model

I have a large time series with many measuring stations recording the same 5 data (Temperature, Humidity, etc.) I want to predict a future moment with a time series model, for which I pass the data from all the measuring stations to the Deep Learning...

image image
  • 3285 Views
  • 4 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

df.groupBy("date").pivot("Node").agg(first("Temp"))It is converting to classic crosstable so pivot will help. Example above.

  • 3 kudos
3 More Replies
User16752240150
by New Contributor II
  • 1412 Views
  • 1 replies
  • 0 kudos

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?

I've read this article, which covers:Using CrossValidator or TrainValidationSplit to track hyperparameter tuning (no hyperopt). Only random/grid searchparallel "single-machine" model training with hyperopt using hyperopt.SparkTrials (not spark.ml)"Di...

  • 1412 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.

  • 0 kudos
Labels