Topics with Label: Cluster Configuration

Forum Posts

Sorted by:

by Spencer_Kent • New Contributor III

06-07-2023 4:57:19 PM

15780 Views
10 replies
3 kudos

Shared cluster configuration that permits `dbutils.fs` commands

My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually us...

insufficient_permissions_on_shared_cluster

Data Engineering

15780 Views
10 replies
3 kudos

06-07-2023 4:57:19 PM

View Replies

Latest Reply

jacovangelder
Honored Contributor

07-02-2024 10:44:45 PM

3 kudos

Can you not use a No Isolation Shared cluster with Table access controls enabled on workspace level?

3 kudos

07-02-2024 10:44:45 PM

9 More Replies

by ramravi • Contributor II

01-03-2023 1:19:32 AM

2446 Views
3 replies
4 kudos

Issue while reading data from Kafka topic to Spark strutured streaming

py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql.SQLContext.readStream() is not whitelisted on class class org.apache.spark.sql.SQLContextI already disable acl for cluster using "...

Data Engineering

2446 Views
3 replies
4 kudos

01-03-2023 1:19:32 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-06-2023 11:27:50 AM

4 kudos

Hi @Ravi Teja,Just a friendly follow-up. Do you still need help? if you do, please share more details, like DBR version, standard or High concurrency cluster? etc

4 kudos

04-06-2023 11:27:50 AM

2 More Replies

by arz • New Contributor

09-17-2022 12:59:39 PM

2093 Views
0 replies
0 kudos

PySpark job with joins & write parquet operation fails with FetchFailedException

I'm working on a task where I transform a dataset and re-save it to an S3 bucket. This involves joining the dataset to two others, dropping fields from the initial dataset which overlapped with fields from the other two, hashing certain fields with p...

Data Engineering

2093 Views
0 replies
0 kudos

09-17-2022 12:59:39 PM

by Serhii • Contributor

08-18-2022 9:23:59 AM

3269 Views
1 replies
1 kudos

Resolved! Behaviour of cluster launches in multi-task jobs

We are adapting the multi-tasks workflow example from dbx documentation for our pipelines https://dbx.readthedocs.io/en/latest/examples/python_multitask_deployment_example.html. As a part of configuration we specify cluster configuration and provide ...

Data Engineering

3269 Views
1 replies
1 kudos

08-18-2022 9:23:59 AM

View Replies

Latest Reply

User16873043099
Contributor

08-18-2022 10:22:33 AM

1 kudos

Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last ...

1 kudos

08-18-2022 10:22:33 AM

by baatchus • New Contributor III

02-22-2022 3:36:24 AM

4287 Views
4 replies
0 kudos

Resolved! parameterize azure storage account name in spark cluster config databricks

wondering if this is to parameterize the azure storage account name part in the spark cluster config in Databricks?I have a working example where the values are referencing secret scopes:spark.hadoop.fs.azure.account.oauth2.client.id.<azurestorageacc...

Data Engineering

4287 Views
4 replies
0 kudos

02-22-2022 3:36:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 2:10:02 PM

0 kudos

Fantastic! Thanks for letting us know!

0 kudos

03-06-2022 2:10:02 PM

3 More Replies

by jpwp • New Contributor III

01-10-2022 5:07:28 PM

5355 Views
0 replies
0 kudos

Adding a dependent library to a Job task permanently adds it to the entire cluster?

Why does adding a dependent library to a Job task also permanently add it to the entire cluster?I am using python wheels, and even when I remove the dependent library from a Job task, the wheel is still part of the cluster configuration.If I then upd...

Data Engineering

5355 Views
0 replies
0 kudos

01-10-2022 5:07:28 PM

by Ryan_Chynoweth • Esteemed Contributor

12-21-2021 2:39:05 PM

1530 Views
0 replies
0 kudos

Azure_DAAM

Attached to this post we have added an ADLS Gen2 access recommendation to have the ideal security and governance over your data. The best practice involves leveraging Cluster ACLs, cluster configuration, and secret ACLs to handle user access over you...

Data Engineering

1530 Views
0 replies
0 kudos

12-21-2021 2:39:05 PM

by User16765131552 • Contributor III

06-25-2021 9:59:52 AM

523 Views
0 replies
0 kudos

docs.databricks.com

Best practices: Cluster configuration | Databricks on AWSLearn best practices when creating and configuring Databricks clusters.https://docs.databricks.com/clusters/cluster-config-best-practices.html

Data Engineering

523 Views
0 replies
0 kudos

06-25-2021 9:59:52 AM

by aladda • Databricks Employee

06-23-2021 9:13:35 PM

1477 Views
1 replies
0 kudos

Resolved! What type of cluster configuration should one use to run Optimize on a Delta Table

Data Engineering

1477 Views
1 replies
0 kudos

06-23-2021 9:13:35 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-23-2021 9:15:49 PM

0 kudos

Optimize merges small files into larger ones and can involve shuffling and creation of large in-memory partitions. Thus its recommended to use a memory optimized executor configuration to prevent spilling to disk. IN additional use of autoscaling wil...

0 kudos

06-23-2021 9:15:49 PM

by User16826992666 • Valued Contributor

06-15-2021 11:34:40 AM

3669 Views
1 replies
0 kudos

Resolved! How do I know which worker type to choose when creating my cluster?

I am new to using Databricks and want to create a cluster, but there are many different worker types to choose from. How do I know which worker type is the right type for my use case?

Data Engineering

3669 Views
1 replies
0 kudos

06-15-2021 11:34:40 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-18-2021 3:50:09 PM

0 kudos

For delta workloads, where you could benefit from caching it is recommended to use storage optimized instances that come with NVMe SSDs. For other workloads, it would be a good idea to check Ganglia metrics to see whether your workload is Cpu/Memory ...

0 kudos

06-18-2021 3:50:09 PM

Databricks Community

Shared cluster configuration that permits `dbutils.fs` commands

Issue while reading data from Kafka topic to Spark strutured streaming

PySpark job with joins & write parquet operation fails with FetchFailedException

Resolved! Behaviour of cluster launches in multi-task jobs

Resolved! parameterize azure storage account name in spark cluster config databricks

Adding a dependent library to a Job task permanently adds it to the entire cluster?

Azure_DAAM

docs.databricks.com

Resolved! What type of cluster configuration should one use to run Optimize on a Delta Table

Resolved! How do I know which worker type to choose when creating my cluster?