Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am trying to install "pycaret" libraray in cluster using whl file.But it is creating conflict in the dependency sometimes (not always, sometimes it works too.) My questions are -1 - How to install libraries in cluster only single time (Maybe from ...
@Retired_modWhat about question #1, which is what subsequent comments to this thread have been referring to? To recap the question: is it possible for "cluster-installed" libraries to be cached in such a way that they aren't completely reinstalled ev...
Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...
I have a daily running notebook that occasionally fails with the error:"Run result unavailable: job failed with error message Unexpected failure while waiting for the cluster Some((xxxxxxxxxxxxxxx) )to be readySome(: Cluster xxxxxxxxxxxxxxxx is in un...
Cluster 'xxxxxxx' was terminated. Reason: WORKER_SETUP_FAILURE (SERVICE_FAULT). Parameters: databricks_error_message:DBFS Daemomn is not reachable., gcp_error_message:Unable to reach the colocated DBFS Daemon.Can Anyone help me how can we resolve thi...
I was creating delta table from ADLS json input file. but the job was running long while creating delta table from json. Below is my cluster configuration. Is the issue related to cluster config ? Do I need to upgrade the cluster config ?The cluster ...
I am a bit lazy and trying to manually recreate a cluster I have in one workspace into another one. The cluster was created some time ago. Looking at the configuration, the access mode field is "custom": When trying to create a new cluster, I do not...
Hi All - I am new to databricks and trying to create my first workflow. For some reason, the cluster created is of type -"custom". I have not mentioned it anywhere in my asset bundle.Due to this - I cannot create get the Unity Catalog feature. Could ...
Currently, I am running a cluster that is set to terminate after 60 minutes of inactivity. However, in one of my notebooks, one of the cells is still running. How can I prevent this from happening, if want my notebook to run overnight without monito...
If a cell is already running ( I assume it's a streaming operation), then I think it doesn't mean that the cluster is inactive. The cluster should be running if a cell is running on it.On the other hand, if you want to keep running your clusters for ...
Yes! Cluster's owner/creator can be changed with the REST API - POST /api/2.1/clusters/change-ownerRequest Body sample:{
"cluster_id": "string",
"owner_username": "string"
}Ref: Clusters API | Change cluster ownerHope this helps!
Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have a...
Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Del...
Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...
This is very common issue I see with community edition. I suppose the only work around is to create new cluster each time. More info on stackoverflow:https://stackoverflow.com/questions/69072694/databricks-community-edition-cluster-wont-start
I am having the same issue (Azure Databricks).I have a running compute cluster analytics-compute-cluster running in Single User access mode. The Event Log for the cluster says the cluster is running and the "Driver is healthy".I have Manage permissi...
My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually us...
Whenever my cluster is terminated, I lose my whole database(I'm not sure if it's related, I made those database with delta format. ) And since the cluster is terminated in 2 hours from not using it, I wake up with no database every morning.I don't wa...
As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb"
from pathlib import Path
path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/"
tables_dirs = dbutils.fs....
I am currently working with a VNET injected databricks workspace. At the moment I have mounted a the databricks cluster on an ADLS G2 resource. When running notebooks on a single node that read, transform, and write data we do not encounter any probl...
My goal is to have table access control in the data science and engineering workspace. So I enabled access control to my cluster using this config "spark.databricks.acl.dfAclsEnabled": "true" and my cluster is shown as Table ACLs enabled now (shield ...
Here is my use case: https://community.databricks.com/t5/data-engineering/structured-streaming-using-delta-as-source-and-delta-as-sink-and/td-p/67825And I get this error: "py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.Datase...
How can I fix the IP address of my Azure Cluster so that I can whitelist the IP address to run my job daily on my python notebook? Or can I find out the IP address to perform whitelisting? Thanks