cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mado
by Valued Contributor II
  • 1095 Views
  • 2 replies
  • 2 kudos

Can I use a cluster created in Data Science & Engineering persona to run SQL commands in the SQL persona?

Hi,I have created a single-node cluster in Data Science & Engineering persona (Standard_DS3_v2). I don't have enough vCPU to create a SQL warehouse. Is there any way I can use the cluster to run a query in SQL persona?

  • 1095 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rajeev45
New Contributor III
  • 2 kudos

Hi MadoYes, you can use cluster and run sql query in the notebook, please refer the following page for more details. https://docs.databricks.com/getting-started/quick-start.html#tutorial-query-data-with-notebookshttps://docs.databricks.com/getting-st...

  • 2 kudos
1 More Replies
AyushModi038
by New Contributor III
  • 3849 Views
  • 6 replies
  • 0 kudos

Library mismatch in same cluster different file

In continuation to the issues encountered in this discussion.https://community.databricks.com/s/feed/0D58Y00009tCiQTSA0 I have a bizzare issue.Here are the 2 screenshots taken few seconds apart1.2 . Same cluster, same command, executed 6 seconds apar...

image image
  • 3849 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ayush Modi​  I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest pr...

  • 0 kudos
5 More Replies
Arunsundar
by New Contributor III
  • 2791 Views
  • 4 replies
  • 4 kudos

The possibility of finding the workload dynamically and spin up the cluster based on the workload

Hi Team,Good morning. I would like to understand if there is a possibility to determine the workload automatically through code (data load from a file to a table, determine the file size, kind of a benchmark that we can check), based on which we can ...

  • 2791 Views
  • 4 replies
  • 4 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 4 kudos

Hi @Arunsundar Muthumanickam​ , When you say workload, I believe you might be handling various volumes of data between Dev and Prod environment. If you are using Databricks cluster and do not have much idea on how the volumes might turn out in differ...

  • 4 kudos
3 More Replies
Fed
by New Contributor III
  • 1492 Views
  • 1 replies
  • 2 kudos

Resolved! Ray as a cluster library instead of notebook-scoped library

This article rightly suggests to install `ray` with `%pip`, although it fails to mention that installing it as a cluster library won't work.The reason, I think, is that `setup_ray_cluster` will use `sys.executable` (ie `/local_disk0/.ephemeral_nfs/en...

  • 1492 Views
  • 1 replies
  • 2 kudos
Latest Reply
Fed
New Contributor III
  • 2 kudos

Ugly, but this seems to work for nowimport sys import os import shutil from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster   shutil.copy( "/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/ray", os.path.dirname(sys.execu...

  • 2 kudos
Kash
by Contributor III
  • 3057 Views
  • 4 replies
  • 0 kudos

Creating a spot only single-node job compute cluster policy

Hi there,I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.The jobs I need to ETL are very short and c...

  • 3057 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Avkash Kana​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 0 kudos
3 More Replies
jerry747847
by New Contributor III
  • 6426 Views
  • 6 replies
  • 11 kudos

Resolved! When to increase maximum bound vs when to increase cluster size?

Hello experts,For the below question, I am trying to understand why option C was selected instead of B? As B would also have resolved the issueQuestion 40A data analyst has noticed that their Databricks SQL queries are running too slowly. They claim ...

  • 6426 Views
  • 6 replies
  • 11 kudos
Latest Reply
JRL
New Contributor II
  • 11 kudos

On a sql server, there are wait states. Wait states occur when several processors (vCPUs) are processing and several threads are working through the processors. A longer running thread that has dependencies, can cause the thread that may have begun o...

  • 11 kudos
5 More Replies
whh99
by New Contributor II
  • 1712 Views
  • 2 replies
  • 1 kudos

Given user id, what API can we use to find out which cluster the user is connected to?

I want to know the cluster that user is connected to in databricks. It would be great if we can also get the duration that the user is connected.

  • 1712 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

You can track activity logs by activating audit logs.I'm not sure which cloud provider you're using, but ex. for Azure you can find a manual here:https://learn.microsoft.com/en-us/azure/databricks/administration-guide/account-settings/audit-logs

  • 1 kudos
1 More Replies
ivanychev
by Contributor II
  • 1314 Views
  • 2 replies
  • 0 kudos

Resolved! When Databricks on AWS will support c6i/m6i/r6i EC2 instance types?

The instances are almost 1.5 years old now and provide better efficiency that the 5 gen.

  • 1314 Views
  • 2 replies
  • 0 kudos
Latest Reply
LandanG
Honored Contributor
  • 0 kudos

@Sergey Ivanychev​ those instance types are under development and should be GA very soon. No official date AFAIK

  • 0 kudos
1 More Replies
ty2
by New Contributor II
  • 2078 Views
  • 3 replies
  • 1 kudos

Resolved! How to start my cluster

​I try to stop my_cluster from compute from admin role. BTW, using same account, I could not restart my_cluster. The information is as followings. How should I do?

20230121-my_cluster_not_start
  • 2078 Views
  • 3 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

rit seems this is community edition so in CE this feature is disabled , delete this one and create new cluster

  • 1 kudos
2 More Replies
labtech
by Valued Contributor II
  • 3886 Views
  • 4 replies
  • 18 kudos

Resolved! Limit resource when create cluster in Databricks on AWS platform

Hi team,Could you please help check on my case? I always failed at this step Thanks

image
  • 3886 Views
  • 4 replies
  • 18 kudos
Latest Reply
labtech
Valued Contributor II
  • 18 kudos

Thanks all your answer. The problem come from AWS side. Don't know why the first ticket they said that the issue didn't come from AWS

  • 18 kudos
3 More Replies
martcerv
by New Contributor II
  • 2735 Views
  • 4 replies
  • 2 kudos

Cloud provider launch failure

When I want to create a cluster a get this error message:DetailsAWS API error code: InvalidGroup.NotFoundAWS error message: The security group 'sg-0ded75eefd66bf421' does not exist in VPC 'vpc-0ec7da3d5977f6ec9'And when I inspect the security groups ...

  • 2735 Views
  • 4 replies
  • 2 kudos
Latest Reply
AminChad_22427
New Contributor II
  • 2 kudos

Hi, I am running into a similar issue. but in my case, the security has been deleted by mistake.Is there a way to make Databricks recreate the missing group ?@Kaniz Fatma​ , where can the CreateSecurityGroup command be ran ? Does it change the securi...

  • 2 kudos
3 More Replies
auser85
by New Contributor III
  • 3326 Views
  • 1 replies
  • 1 kudos

How to incorporate these GC options into my Databricks Cluster? )(spark.executor.extraJavaOptions)

I want to try incorporating these options into my databricks cluster.spark.driver.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMark spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+G1SummarizeConcMarkIf I put them under Compute -> Cluster -> Co...

  • 3326 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @Andrew Fogarty​ , I think this is only for the spark-submit command, not for cluster UI.Please have a look at this doc - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.htmlspark.executor.extraJavaOptionsA string of extra JVM...

  • 1 kudos
Aviral-Bhardwaj
by Esteemed Contributor III
  • 1559 Views
  • 0 replies
  • 31 kudos

Databricks New Runtime Version is Available Now  PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This pr...

Databricks New Runtime Version is Available Now PySpark memory profiling- Memory profiling is now enabled for PySpark user-defined functions. This provides information on memory increment, memory usage, and number of occurrences for each line of code...

image
  • 1559 Views
  • 0 replies
  • 31 kudos
andrew0117
by Contributor
  • 4945 Views
  • 4 replies
  • 4 kudos

Resolved! will I be charged by Databricks if leaving the cluster on but not running?

or Databricks only charges you whenever you are actually running the cluster, no matter how long you keep the cluster idle?Thanks!

  • 4945 Views
  • 4 replies
  • 4 kudos
Latest Reply
labtech
Valued Contributor II
  • 4 kudos

If you not congifure your cluster auto terminate after period of idle time, yes you will be charged for that.

  • 4 kudos
3 More Replies
Labels