cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Alex_Persin
by New Contributor III
  • 4646 Views
  • 4 replies
  • 6 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 4646 Views
  • 4 replies
  • 6 kudos
Latest Reply
OxFF
New Contributor II
  • 6 kudos

Recently stumbled on this problem. It seems like it basically makes impossible usage of compute with custom docker images for any pytorch-based real life computer vision ML experiments. Which is unfortunate. +1 for requesting followup and possible al...

  • 6 kudos
3 More Replies
MCosta
by New Contributor III
  • 9715 Views
  • 10 replies
  • 19 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

  • 9715 Views
  • 10 replies
  • 19 kudos
Latest Reply
petern
New Contributor II
  • 19 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

  • 19 kudos
9 More Replies
imgaboy
by New Contributor III
  • 2691 Views
  • 4 replies
  • 3 kudos

Resolved! pySpark Dataframe to DeepLearning model

I have a large time series with many measuring stations recording the same 5 data (Temperature, Humidity, etc.) I want to predict a future moment with a time series model, for which I pass the data from all the measuring stations to the Deep Learning...

image image
  • 2691 Views
  • 4 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

df.groupBy("date").pivot("Node").agg(first("Temp"))It is converting to classic crosstable so pivot will help. Example above.

  • 3 kudos
3 More Replies
User16752240150
by New Contributor II
  • 1185 Views
  • 1 replies
  • 0 kudos

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?

I've read this article, which covers:Using CrossValidator or TrainValidationSplit to track hyperparameter tuning (no hyperopt). Only random/grid searchparallel "single-machine" model training with hyperopt using hyperopt.SparkTrials (not spark.ml)"Di...

  • 1185 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.

  • 0 kudos
Labels