Machine Learning

by rubenteixeira • New Contributor III

01-12-2023 8:47:29 AM

6402 Views
4 replies
1 kudos

Permission denied: Lightning Logs

I'm doing parameter tuning for a NeuralProphet model (you can see in the image the parameters and code for training)When I try to parallelize the training, it gives me Permission Error.Why can't I access the folder '/databricks/spark/work/*'? Do I ne...

Machine Learning

Reply

6402 Views
4 replies
1 kudos

01-12-2023 8:47:29 AM

View Replies

Latest Reply

susanameiras
New Contributor II

02-18-2025 11:52:17 AM

1 kudos

Hi Ruben!I am facing exactly the same error running a similar approach when using runtime 16.2 ML. I didn't have this issue when using runtime 12.2 LTS ML or 13.3 ML. Did you find a solution?Many thanks!

1 kudos

02-18-2025 11:52:17 AM

3 More Replies

by User16826990884 • New Contributor III

06-25-2021 11:59:36 AM

2195 Views
2 replies
0 kudos

Rollback cluster changes

Is it possible to rollback changes made to a cluster? The problem I'm trying to solve is to recover from an accidental change made by a user on a cluster that affects interactive and job runs. Cluster policies help, but the policy still provides the ...

Machine Learning

Reply

2195 Views
2 replies
0 kudos

06-25-2021 11:59:36 AM

View Replies

Latest Reply

Panda
Valued Contributor

10-14-2024 5:02:01 AM

0 kudos

@User16826990884 Along with what @sajith_appukutt mentioned, we can achive this viaVersion Control for Cluster Configurations: Store cluster configurations in JSON files in GitHub or another version control system.In case of accidental changes, you c...

0 kudos

10-14-2024 5:02:01 AM

1 More Replies

by Anonymous • Not applicable

09-07-2022 8:52:16 AM

1450 Views
1 replies
7 kudos

Train machine learning models: How can I take my ML lifecycle from experimentation to production?

Note: the following guide is primarily for Python users. For other languages, please view the following links: • Table batch reads and writes • Create a table in SQL • Visualizing data with DBSQLThis step-by-step guide will get your data...

Machine Learning

Reply

1450 Views
1 replies
7 kudos

09-07-2022 8:52:16 AM

View Replies

Latest Reply

Priyag1
Honored Contributor II

05-03-2023 11:20:07 PM

7 kudos

I got good knowledge by your post . It is very clear . Thank you . Keep sharing like this posts .It will be helpful

7 kudos

05-03-2023 11:20:07 PM

by Gilg • Contributor II

04-17-2023 12:29:23 PM

7609 Views
1 replies
0 kudos

Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeout

Hi Team,When creating a new cluster in a workspace within a VNET receiving this error:Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeoutCluster terminated. Reason: Bootstrap TimeoutCheers.Gil

Machine Learning

Reply

7609 Views
1 replies
0 kudos

04-17-2023 12:29:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 7:39:58 PM

0 kudos

@Gil Gonong :The error message you are receiving suggests that the creation of the new cluster has failed due to a bootstrap timeout. The bootstrap process is responsible for setting up the initial configuration of the cluster, and if it takes too l...

0 kudos

04-20-2023 7:39:58 PM

by fa • New Contributor III

12-07-2022 10:18:05 AM

4817 Views
6 replies
7 kudos

How are dashboards served and what would happen to them if the cluster attached to the notebook terminates?

I have two dashboards in presentation mode both from notebooks being run on the same compute cluster. Last night the cluster terminated due to idle time and in the morning one of my dashboards was fine but the other one was set to the default stab di...

Machine Learning

Reply

4817 Views
6 replies
7 kudos

12-07-2022 10:18:05 AM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

02-08-2023 11:14:08 AM

7 kudos

If your query were scheduled, it's automatically started the cluster at the scheduled time Or might be possible that the portion that is still visible doesn't need to be generated so it looks like it's working but it is just left over from the prior...

7 kudos

02-08-2023 11:14:08 AM

5 More Replies

by llvu • New Contributor III

01-05-2023 2:14:35 AM

3537 Views
3 replies
2 kudos

How to solve cluster break down due to GC when training a pyspark.ml Random Forest

I am trying to train and optimize a random forest. At first the cluster handles the garbage collection fine, but after a couple of hours the cluster breaks down as Garbage Collection has gone up significantly.The train_df has a size of 6,365,018 reco...

Machine Learning

Reply

3537 Views
3 replies
2 kudos

01-05-2023 2:14:35 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 2:38:34 AM

2 kudos

The cache is expensive and wants to save that data to memory and disk (id there is no more space left in memory). I know that, in theory, it should improve, but it can make things worse. I would just putscaled_train_data = pipeline_data.transform(tra...

2 kudos

01-05-2023 2:38:34 AM

2 More Replies

by Somi • New Contributor III

06-24-2022 11:07:35 AM

1688 Views
3 replies
0 kudos

No saved model after stopping the cluster.

I have saved a keras model in some directories in dbfs to load and retrain that with more data, etc. The problem is that when cluster stops and restarts, seems those directories and model are no longer available there and it starts training a new mod...

Machine Learning

Reply

1688 Views
3 replies
0 kudos

06-24-2022 11:07:35 AM

View Replies

Latest Reply

Somi
New Contributor III

09-02-2022 12:58:50 PM

0 kudos

Hi @Vidula Khanna I figured it out by replacing OS library module with dbutils utilities. It looks like mre compatible with DBFS.

0 kudos

09-02-2022 12:58:50 PM

2 More Replies

by Vik1 • New Contributor II

01-21-2022 9:16:42 AM

4233 Views
4 replies
2 kudos

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

My setup:Worker type: Standard_D32d_v4, 128 GB Memory, 32 Cores, Min Workers: 2, Max Workers: 8Driver type: Standard_D32ds_v4, 128 GB Memory, 32 CoresDatabricks Runtime Version: 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)I ran a snowflake quer...

Machine Learning

Reply

4233 Views
4 replies
2 kudos

01-21-2022 9:16:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:23:05 AM

2 kudos

Hey there @Vivek Ranjan Checking in. If Joseph's answer helped, would you let us know and mark the answer as best? It would be really helpful for the other members to find the solution more quickly.Thanks!

2 kudos

04-22-2022 7:23:05 AM

3 More Replies

by User16826988699 • New Contributor

02-01-2022 11:01:10 AM

29104 Views
2 replies
2 kudos

Resolved! Problem with spinning up a cluster on a new workspace

Error: Please check network connectivity from the data plane to the control plane.{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0457092c), status: INSTANCE_INITIALIZING, workerEnvId:...

Machine Learning

Reply

29104 Views
2 replies
2 kudos

02-01-2022 11:01:10 AM

View Replies

Latest Reply

User16725394280
Contributor II

02-09-2022 2:46:27 AM

2 kudos

Can you please get the system logs from AWS EC2 console as soon the cluster fails - System Logs for the failed instance will be accessible from the AWS console up to an hour after the shutdown.AWS console clears the references of terminated clusters ...

2 kudos

02-09-2022 2:46:27 AM

1 More Replies

by User16789201666 • Databricks Employee

06-07-2021 11:19:33 AM

2863 Views
4 replies
0 kudos

How do you control the cost of provisioning a cluster?

How do you govern the cost of running clusters in Databricks so you're not sticker shocked?

Machine Learning

Reply

2863 Views
4 replies
0 kudos

06-07-2021 11:19:33 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-08-2021 12:17:35 AM

0 kudos

Less use of Interactive cluster and more use of job cluster can one of the way above others

0 kudos

06-08-2021 12:17:35 AM

3 More Replies

Databricks Community

Forum Posts

Permission denied: Lightning Logs

Rollback cluster changes

Train machine learning models: How can I take my ML lifecycle from experimentation to production?

Failed to add 1 container to the cluster. will attempt retry: false. reason: bootstrap timeout

How are dashboards served and what would happen to them if the cluster attached to the notebook terminates?

How to solve cluster break down due to GC when training a pyspark.ml Random Forest

No saved model after stopping the cluster.

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

Resolved! Problem with spinning up a cluster on a new workspace

How do you control the cost of provisioning a cluster?