cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826990884
by New Contributor III
  • 1599 Views
  • 2 replies
  • 0 kudos

Rollback cluster changes

Is it possible to rollback changes made to a cluster? The problem I'm trying to solve is to recover from an accidental change made by a user on a cluster that affects interactive and job runs. Cluster policies help, but the policy still provides the ...

  • 1599 Views
  • 2 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@User16826990884 Along with what @sajith_appukutt mentioned, we can achive this viaVersion Control for Cluster Configurations: Store cluster configurations in JSON files in GitHub or another version control system.In case of accidental changes, you c...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 1306 Views
  • 1 replies
  • 7 kudos

Train machine learning models: How can I take my ML lifecycle from experimentation to production?

Note: the following guide is primarily for Python users. For other languages, please view the following links: • Table batch reads and writes • Create a table in SQL • Visualizing data with DBSQLThis step-by-step guide will get your data...

Image Image Image Image
  • 1306 Views
  • 1 replies
  • 7 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 7 kudos

I got good knowledge by your post . It is very clear . Thank you . Keep sharing like this posts .It will be helpful

  • 7 kudos
Gilg
by Contributor II
  • 7167 Views
  • 1 replies
  • 0 kudos

Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeout

Hi Team,When creating a new cluster in a workspace within a VNET receiving this error:Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeoutCluster terminated. Reason: Bootstrap TimeoutCheers.Gil

  • 7167 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Gil Gonong​ :The error message you are receiving suggests that the creation of the new cluster has failed due to a bootstrap timeout. The bootstrap process is responsible for setting up the initial configuration of the cluster, and if it takes too l...

  • 0 kudos
fa
by New Contributor III
  • 4212 Views
  • 6 replies
  • 7 kudos

How are dashboards served and what would happen to them if the cluster attached to the notebook terminates?

I have two dashboards in presentation mode both from notebooks being run on the same compute cluster. Last night the cluster terminated due to idle time and in the morning one of my dashboards was fine but the other one was set to the default stab di...

  • 4212 Views
  • 6 replies
  • 7 kudos
Latest Reply
Manoj12421
Valued Contributor II
  • 7 kudos

​If your query were scheduled, it's automatically started the cluster at the scheduled time Or might be possible that the portion that is still visible doesn't need to be generated so it looks like it's working but it is just left over from the prior...

  • 7 kudos
5 More Replies
rubenteixeira
by New Contributor III
  • 5447 Views
  • 3 replies
  • 1 kudos

Permission denied: Lightning Logs

I'm doing parameter tuning for a NeuralProphet model (you can see in the image the parameters and code for training)When I try to parallelize the training, it gives me Permission Error.Why can't I access the folder '/databricks/spark/work/*'? Do I ne...

altri1 MicrosoftTeams-image
  • 5447 Views
  • 3 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, Could you please check on cluster-level permissions and let us know if it helps? Please refer: https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions

  • 1 kudos
2 More Replies
llvu
by New Contributor III
  • 3121 Views
  • 3 replies
  • 2 kudos

How to solve cluster break down due to GC when training a pyspark.ml Random Forest

I am trying to train and optimize a random forest. At first the cluster handles the garbage collection fine, but after a couple of hours the cluster breaks down as Garbage Collection has gone up significantly.The train_df has a size of 6,365,018 reco...

  • 3121 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

The cache is expensive and wants to save that data to memory and disk (id there is no more space left in memory). I know that, in theory, it should improve, but it can make things worse. I would just putscaled_train_data = pipeline_data.transform(tra...

  • 2 kudos
2 More Replies
Somi
by New Contributor III
  • 1380 Views
  • 3 replies
  • 0 kudos

No saved model after stopping the cluster.

I have saved a keras model in some directories in dbfs to load and retrain that with more data, etc. The problem is that when cluster stops and restarts, seems those directories and model are no longer available there and it starts training a new mod...

  • 1380 Views
  • 3 replies
  • 0 kudos
Latest Reply
Somi
New Contributor III
  • 0 kudos

Hi @Vidula Khanna​ I figured it out by replacing OS library module with dbutils utilities. It looks like mre compatible with DBFS.

  • 0 kudos
2 More Replies
Vik1
by New Contributor II
  • 3842 Views
  • 4 replies
  • 2 kudos

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

My setup:Worker type: Standard_D32d_v4, 128 GB Memory, 32 Cores, Min Workers: 2, Max Workers: 8Driver type: Standard_D32ds_v4, 128 GB Memory, 32 CoresDatabricks Runtime Version: 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)I ran a snowflake quer...

  • 3842 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Vivek Ranjan​ Checking in. If Joseph's answer helped, would you let us know and mark the answer as best?  It would be really helpful for the other members to find the solution more quickly.Thanks!

  • 2 kudos
3 More Replies
User16826988699
by New Contributor
  • 27952 Views
  • 2 replies
  • 2 kudos

Resolved! Problem with spinning up a cluster on a new workspace

Error: Please check network connectivity from the data plane to the control plane.{ "reason": {   "code": "BOOTSTRAP_TIMEOUT",   "parameters": {     "databricks_error_message": "[id: InstanceId(i-0457092c), status: INSTANCE_INITIALIZING, workerEnvId:...

  • 27952 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16725394280
Contributor II
  • 2 kudos

Can you please get the system logs from AWS EC2 console as soon the cluster fails - System Logs for the failed instance will be accessible from the AWS console up to an hour after the shutdown.AWS console clears the references of terminated clusters ...

  • 2 kudos
1 More Replies
User16789201666
by Databricks Employee
  • 2479 Views
  • 4 replies
  • 0 kudos

How do you control the cost of provisioning a cluster?

How do you govern the cost of running clusters in Databricks so you're not sticker shocked?

  • 2479 Views
  • 4 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Less use of Interactive cluster and more use of job cluster can one of the way above others

  • 0 kudos
3 More Replies
Labels