Topics with Label: Databricks Cluster

Forum Posts

Sorted by:

by sage5616 • Valued Contributor

07-12-2022 8:40:36 AM

5054 Views
5 replies
5 kudos

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...

Data Engineering

5054 Views
5 replies
5 kudos

07-12-2022 8:40:36 AM

View Replies

Latest Reply

sage5616
Valued Contributor

07-20-2022 7:37:37 AM

5 kudos

Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...

5 kudos

07-20-2022 7:37:37 AM

4 More Replies

by KumarShiv • New Contributor III

07-27-2022 12:41:49 AM

2874 Views
5 replies
11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

Data Engineering

2874 Views
5 replies
11 kudos

07-27-2022 12:41:49 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 6:10:11 AM

11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

11 kudos

07-27-2022 6:10:11 AM

4 More Replies

by sage5616 • Valued Contributor

07-26-2022 10:50:30 AM

4687 Views
2 replies
3 kudos

Resolved! Running local python code with arguments in Databricks via dbx utility.

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Co...

Data Engineering

4687 Views
2 replies
3 kudos

07-26-2022 10:50:30 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 3:20:47 AM

3 kudos

You can pass parameters using dbx launch --parametersIf you want to define it in the deployment template please try to follow exactly databricks API 2.1 schema https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate (for examp...

3 kudos

07-27-2022 3:20:47 AM

1 More Replies

by junaid • New Contributor

07-14-2022 2:56:09 AM

4732 Views
1 replies
0 kudos

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

When attempting to deploy/start an Databricks cluster on AWS through the UI, the following error consistently occurs:Bootstrap Timeout:[id: InstanceId(i-093caac78cdbfa7e1), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-335698072713...

Data Engineering

4732 Views
1 replies
0 kudos

07-14-2022 2:56:09 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-15-2022 9:55:34 PM

0 kudos

Hi @Junaid Ahmed, Nice to meet you, and Thank you for asking me this question. We have had a similar issue in the past and got the best answer too on it.Please see this community thread with the same question. Please let us know if that helps you.

0 kudos

07-15-2022 9:55:34 PM

by Alex0101 • New Contributor II

06-29-2022 2:51:47 AM

2461 Views
3 replies
0 kudos

Resolved! Can python futures utilise all cluster nodes ?

I used python futures to call a function multiple times concurrently, however I am not sure if all nodes is utilised or how to make sure it use all cluster nodes.Can you confirm if I create a cluster with 5 works each with 8 memory cores for example....

Data Engineering

2461 Views
3 replies
0 kudos

06-29-2022 2:51:47 AM

View Replies

Latest Reply

Keyuri
New Contributor II

06-29-2022 9:44:17 AM

0 kudos

You can create a init script and then add it during cluster start up

0 kudos

06-29-2022 9:44:17 AM

2 More Replies

by Confused • New Contributor III

04-04-2022 3:57:50 AM

15236 Views
2 replies
1 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

Data Engineering

15236 Views
2 replies
1 kudos

04-04-2022 3:57:50 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

05-12-2022 10:51:09 PM

1 kudos

for your first question https://docs.databricks.com/libraries/index.html#python-environment-management and https://docs.databricks.com/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-commands this may help. again you can convert t...

1 kudos

05-12-2022 10:51:09 PM

1 More Replies

by William_Scardua • Valued Contributor

04-25-2022 12:30:05 PM

1173 Views
1 replies
2 kudos

Resolved! Best way to encrypt PII data

Hi guys, I have around 600GB per load, in you opnion, what is the best way to encrypt PII data in terms of performance ? (lib, cluster type, etc.)Thank youWilliam

Data Engineering

1173 Views
1 replies
2 kudos

04-25-2022 12:30:05 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

06-01-2022 11:08:38 PM

2 kudos

Hello @William Scardua please check if the blog helps you.https://databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

2 kudos

06-01-2022 11:08:38 PM

by Michael_Galli • Contributor II

04-22-2022 2:46:57 AM

6334 Views
7 replies
8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

Data Engineering

6334 Views
7 replies
8 kudos

04-22-2022 2:46:57 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

04-25-2022 8:38:19 AM

8 kudos

@Michael Galli I don't think you can monitor metrics captured by mspnp/spark-monitoring in datadog, there is a service called Azure Log Analytics workspace where these logs are available for querying.You can also check out below if you are interest...

8 kudos

04-25-2022 8:38:19 AM

6 More Replies

by Michael_Galli • Contributor II

04-22-2022 3:00:10 AM

2284 Views
1 replies
1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

Data Engineering

2284 Views
1 replies
1 kudos

04-22-2022 3:00:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-22-2022 3:16:05 AM

1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

1 kudos

04-22-2022 3:16:05 AM

by wgsing • New Contributor

02-22-2022 11:18:06 PM

2025 Views
4 replies
0 kudos

Resolved! Databricks Cluster create fail

i facing the problem here in creating cluster in databricks. Error as below :MessageCluster terminated.Reason:Unexpected launch failureAn unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the proble...

Data Engineering

2025 Views
4 replies
0 kudos

02-22-2022 11:18:06 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-11-2022 1:36:49 PM

0 kudos

Hi @Giin Sing Wong ,Just a friendly follow-up. Is this issue still happening or you were able to resolve it by increasing your account's quota? Please let us know.

0 kudos

04-11-2022 1:36:49 PM

3 More Replies

by Suman • New Contributor III

03-03-2022 10:31:08 PM

1510 Views
5 replies
3 kudos

Resolved! Change Data Feed functionality from SQL Endpoint

I am trying to run command to retrieve change data from sql endpoint. It is throwing below error."The input query contains unsupported data source(s).Only csv, json, avro, delta, parquet, orc, text data sources are supported on Databricks SQL."But th...

Data Engineering

1510 Views
5 replies
3 kudos

03-03-2022 10:31:08 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2022 1:30:02 AM

3 kudos

Hi @Suman Chowdhury , Change Data Feed is only available in Databricks Runtime 8.4 and above.

3 kudos

03-07-2022 1:30:02 AM

4 More Replies

by Juniper_AIML • New Contributor

02-15-2022 8:50:49 AM

1667 Views
2 replies
1 kudos

Resolved! How to setup Instance profile for initializing Databricks Cluster using Docker?

I was trying to start of the Databricks cluster through a docker image. I followed the setup instruction. Excluding the additional setup to setup the IAM role and instance profile as I was facing issues.The image is stored on AWS ECR in a public repo...

Data Engineering

1667 Views
2 replies
1 kudos

02-15-2022 8:50:49 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-22-2022 1:02:41 PM

1 kudos

Hi @Aman Gaurav , Please check the below requirements to avail the Databricks Container Services.Note :-Databricks Runtime for Machine Learning and Databricks Runtime for Genomics does not support Databricks Container Services.Databricks Runtime 6.1...

1 kudos

02-22-2022 1:02:41 PM

1 More Replies

by DoD • New Contributor III

01-18-2022 7:42:43 PM

1082 Views
2 replies
1 kudos

Resolved! Why are R scripts inside of Databricks notebooks creating writeLines errors?

I recently posted this in Stack Overflow. I'm using R in Databricks. R Studio runs fine and executes from the Databricks cluster. I would like to transition from R Studio to notebooks. When I start the cluster, R seems to run fine from notebooks. ...

Data Engineering

1082 Views
2 replies
1 kudos

01-18-2022 7:42:43 PM

View Replies

Latest Reply

Anonymous
Not applicable

01-19-2022 8:15:56 AM

1 kudos

@Paul Evangelista - Thank you for letting us know. You did great!Would you be happy to mark your answer as best so that others can find your solution more easily?

1 kudos

01-19-2022 8:15:56 AM

1 More Replies

by FemiAnthony • New Contributor III

11-05-2021 2:39:42 AM

2899 Views
6 replies
5 kudos

Resolved! /dbfs is empty

Why does /dbfs seem to be empty in my Databricks cluster ?If I run %sh ls /dbfsI get no output.I am looking for the databricks-datasets subdirectory ? I can't find it under /dbfs

Data Engineering

2899 Views
6 replies
5 kudos

11-05-2021 2:39:42 AM

View Replies

Latest Reply

FemiAnthony
New Contributor III

11-09-2021 1:09:39 AM

5 kudos

Thanks @Prabakar Ammeappin

5 kudos

11-09-2021 1:09:39 AM

5 More Replies

by User16869510359 • Esteemed Contributor

06-25-2021 3:53:39 PM

1611 Views
1 replies
1 kudos

Is it recommended to use G1GC on the Databricks cluster

Data Engineering

1611 Views
1 replies
1 kudos

06-25-2021 3:53:39 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 4:09:12 PM

1 kudos

G1GC can solve problems in some cases where garbage collection is a bottleneck. checkout https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

1 kudos

06-25-2021 4:09:12 PM