Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I've also run into the same issue, customised docker image does not give DATABRICKS_RUNTIME_VERSION as env. I believe there are still many issues in how customised docker image is used in databricks cluster.Can anyone from databricks help answer it?
I am working on writing a large amount of data from Databricks to an external SQL server using a JDB connection. I keep getting timeout errors/connection lost but digging deeper it appears to be a memory problem. I am wondering what cluster configura...
I am currently working with a VNET injected databricks workspace. At the moment I have mounted a the databricks cluster on an ADLS G2 resource. When running notebooks on a single node that read, transform, and write data we do not encounter any probl...
I am looking for something preferably similar to Windows task manager which we can use for monitoring the CPU, memory and disk usage for local desktop.
Some important info to look in Gangalia UI in CPU, memory and server load charts to spot the problem:CPU chart :User %Idle %High percentage of user % indicates heavy CPU usage in the cluster.Memory chart : Use %Free %Swap % If you see purple line ove...
Hi,We have two workspaces on Databricks, prod and dev. On prod, if we create a new all-purpose cluster through the web interface and go to Environment in the the spark UI, the spark.master setting is correctly set to be the host IP. This results in a...
I found the same issue when choosing the default cluster setup on first setup that when I went to edit the cluster to add an instance profile, I was not able to save without fixing this. Thanks for the tip
It took me quite some time to find the option to create a cluster in High Concurrency mode. It was hidden in the new UI.What should be the way to access the data with TAC?What is the equivalent mode to work with TAC ?Does it mean that we are being pu...
A Databricks cluster is a set of computation resources that performs the heavy lifting of all of the data workloads you run in Databricks. Databricks provides a number of options when you create and configure clusters to help you get the best perform...
@Doug Harrigan Thanks for your question! @Prabakar Ammeappin linked above to our Docs page that mentions a bit more about the recent (April) version update/change: "This release fixes an issue that removed the Swap cluster button from the Databrick...
Hello all and thanks.After apply to serving a model, I go to edit corresponding Job Cluster to configure its init_script but when I try to save changes (Confirm and restart) it thrown the following error:Error: Cannot edit cluster 0503-141315-hu3wd4i...
Sorry for the delay in responding. Finally a partner could fix the problem, he can edit without problems the cluster and add the init_script.Thank you!
We have been trying to update some library versions by uninstalling the old versions and installing new ones. However, the old libraries continue to get installed on cluster startup despite not showing up in the "libraries" tab of the cluster page. W...
The issue seemed to go away on its own. At some point the libraries page started showing what was getting installed to the cluster, and removing libraries from the page caused them to stop getting installed on cluster startup. I'm guessing there was ...
Hello. I am trying to understand High Availability in DataBricks. I understand that DB uses Kubernetes for the cluster manager and to manage Docker Containers. And while DB runs on top of AWS or Azure or GCP, is HA automatically provisioned when I st...
Hi,I've been encountering the following error when I try to start a cluster, but the status page says everything is fine. Is something happening or are there other steps I can try?Time2022-03-13 14:40:51 EDTMessageCluster terminated.Reason:Unexpected...
Hi @Rachel Kelley We have some internal service interruptions due to which we had this issue. Our engineering has applied the fix and the cluster startup works as expected. Sincerely apologies for the inconvenience caused here.Regards,Darshan
We are trying to configure our environment so when our cluster starts up, it checks to see if we have mounted our Azure storage account container and if is not, mount it. We can do this fine in a notebook however have no luck doing this through an in...
Hi all,Environment:Nodes: Standard_E8s_v3Databricks Runtime: 9.0.NET for Apache Spark 2.0.0I'm invoking spark submit to run a .Net Spark job hosted in Azure Databricks. The job is written in C#.Net with its only transformation and action, reading a C...
Hi @Timothy Lin ,I will recommend to not use spark.stop() or System.exit(0) in your code because it will explicitly stop the Spark context but the graceful shutdown and handshake with databricks' job service does not happen.
Hi All, I'm just wondering when exactly the billing time starts for the DataBricks cluster? Is starting time included? If cluster creation time takes 3 minutes and query execution only 2, will I pay for 2 or 5?Thanks in advance! MC
Billing for databricks DBUs starts when Spark Context becomes available. Billing for the cloud provider starts when the request for compute is received and the VMs are starting up.
Databricks Runtime 10.2 Beta is available from yesterday.More details here: https://docs.databricks.com/release-notes/runtime/10.2.htmlNew features and improvementsUse Files in Repos with Spark StreamingDatabricks Utilities adds an update mount comma...