cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sage5616
by Valued Contributor
  • 5054 Views
  • 5 replies
  • 5 kudos

Resolved! SQL Error when querying any tables/views on a Databricks cluster via Dbeaver.

I am able to connect to the cluster, browse its hive catalog, see tables/views and columns/datatypesRunning a simple select statement from a view on a parquet file produces this error and no other results:"SQL Error [500540] [HY000]: [Databricks][Dat...

  • 5054 Views
  • 5 replies
  • 5 kudos
Latest Reply
sage5616
Valued Contributor
  • 5 kudos

Update. I have tried SQL Workbench/J and encountered exactly the same error(s) as with Dbeaver. I have also tried JetBrains DataGrip and it worked flawlessly. Able to connect, browse the databases and query tables/views. https://docs.microsoft.com/en...

  • 5 kudos
4 More Replies
KumarShiv
by New Contributor III
  • 2874 Views
  • 5 replies
  • 11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

DB_Issue
  • 2874 Views
  • 5 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

  • 11 kudos
4 More Replies
sage5616
by Valued Contributor
  • 4687 Views
  • 2 replies
  • 3 kudos

Resolved! Running local python code with arguments in Databricks via dbx utility.

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Co...

  • 4687 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You can pass parameters using dbx launch --parametersIf you want to define it in the deployment template please try to follow exactly databricks API 2.1 schema https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate (for examp...

  • 3 kudos
1 More Replies
junaid
by New Contributor
  • 4732 Views
  • 1 replies
  • 0 kudos

We are seeing "BOOTSTRAP_TIMEOUT" issue in a new workspace.

When attempting to deploy/start an Databricks cluster on AWS through the UI, the following error consistently occurs:Bootstrap Timeout:[id: InstanceId(i-093caac78cdbfa7e1), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-335698072713...

  • 4732 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Junaid Ahmed​, Nice to meet you, and Thank you for asking me this question. We have had a similar issue in the past and got the best answer too on it.Please see this community thread with the same question. Please let us know if that helps you.

  • 0 kudos
Alex0101
by New Contributor II
  • 2461 Views
  • 3 replies
  • 0 kudos

Resolved! Can python futures utilise all cluster nodes ?

I used python futures to call a function multiple times concurrently, however I am not sure if all nodes is utilised or how to make sure it use all cluster nodes.Can you confirm if I create a cluster with 5 works each with 8 memory cores for example....

  • 2461 Views
  • 3 replies
  • 0 kudos
Latest Reply
Keyuri
New Contributor II
  • 0 kudos

You can create a init script and then add it during cluster start up ​

  • 0 kudos
2 More Replies
Confused
by New Contributor III
  • 15236 Views
  • 2 replies
  • 1 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

  • 15236 Views
  • 2 replies
  • 1 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 1 kudos

for your first question https://docs.databricks.com/libraries/index.html#python-environment-management and https://docs.databricks.com/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-commands this may help. again you can convert t...

  • 1 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 1173 Views
  • 1 replies
  • 2 kudos

Resolved! Best way to encrypt PII data

Hi guys, I have around 600GB per load, in you opnion, what is the best way to encrypt PII data in terms of performance ? (lib, cluster type, etc.)Thank youWilliam

  • 1173 Views
  • 1 replies
  • 2 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 2 kudos

Hello @William Scardua​ please check if the blog helps you.https://databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

  • 2 kudos
Michael_Galli
by Contributor II
  • 6334 Views
  • 7 replies
  • 8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

  • 6334 Views
  • 7 replies
  • 8 kudos
Latest Reply
User16764241763
Honored Contributor
  • 8 kudos

@Michael Galli​  I don't think you can monitor metrics captured by mspnp/spark-monitoring in datadog, there is a service called Azure Log Analytics workspace where these logs are available for querying.You can also check out below if you are interest...

  • 8 kudos
6 More Replies
Michael_Galli
by Contributor II
  • 2284 Views
  • 1 replies
  • 1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

  • 2284 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

  • 1 kudos
wgsing
by New Contributor
  • 2025 Views
  • 4 replies
  • 0 kudos

Resolved! Databricks Cluster create fail

i facing the problem here in creating cluster in databricks. Error as below :MessageCluster terminated.Reason:Unexpected launch failureAn unexpected error was encountered while setting up the cluster. Please retry and contact Databricks if the proble...

  • 2025 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Giin Sing Wong​ ,Just a friendly follow-up. Is this issue still happening or you were able to resolve it by increasing your account's quota? Please let us know.

  • 0 kudos
3 More Replies
Suman
by New Contributor III
  • 1510 Views
  • 5 replies
  • 3 kudos

Resolved! Change Data Feed functionality from SQL Endpoint

I am trying to run command to retrieve change data from sql endpoint. It is throwing below error."The input query contains unsupported data source(s).Only csv, json, avro, delta, parquet, orc, text data sources are supported on Databricks SQL."But th...

  • 1510 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Suman Chowdhury​ , Change Data Feed is only available in Databricks Runtime 8.4 and above.

  • 3 kudos
4 More Replies
Juniper_AIML
by New Contributor
  • 1667 Views
  • 2 replies
  • 1 kudos

Resolved! How to setup Instance profile for initializing Databricks Cluster using Docker?

I was trying to start of the Databricks cluster through a docker image. I followed the setup instruction. Excluding the additional setup to setup the IAM role and instance profile as I was facing issues.The image is stored on AWS ECR in a public repo...

Screenshot 2022-02-15 at 2.39.57 PM Screenshot 2022-02-15 at 2.49.03 PM
  • 1667 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Aman Gaurav​ , Please check the below requirements to avail the Databricks Container Services.Note :-Databricks Runtime for Machine Learning and Databricks Runtime for Genomics does not support Databricks Container Services.Databricks Runtime 6.1...

  • 1 kudos
1 More Replies
DoD
by New Contributor III
  • 1082 Views
  • 2 replies
  • 1 kudos

Resolved! Why are R scripts inside of Databricks notebooks creating writeLines errors?

I recently posted this in Stack Overflow. I'm using R in Databricks. R Studio runs fine and executes from the Databricks cluster. I would like to transition from R Studio to notebooks. When I start the cluster, R seems to run fine from notebooks. ...

  • 1082 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Paul Evangelista​ - Thank you for letting us know. You did great!Would you be happy to mark your answer as best so that others can find your solution more easily?

  • 1 kudos
1 More Replies
FemiAnthony
by New Contributor III
  • 2899 Views
  • 6 replies
  • 5 kudos

Resolved! /dbfs is empty

Why does /dbfs seem to be empty in my Databricks cluster ?If I run %sh ls /dbfsI get no output.I am looking for the databricks-datasets subdirectory ? I can't find it under /dbfs

  • 2899 Views
  • 6 replies
  • 5 kudos
Latest Reply
FemiAnthony
New Contributor III
  • 5 kudos

Thanks @Prabakar Ammeappin​ 

  • 5 kudos
5 More Replies
User16869510359
by Esteemed Contributor
  • 1611 Views
  • 1 replies
  • 1 kudos
  • 1611 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

 G1GC can solve problems in some cases where garbage collection is a bottleneck. checkout https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

  • 1 kudos
Labels