Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hello, I have two workspaces, each workspace pointing to a VPC in AWS, in one of the accounts we need to remove a subnet, after removing the InvalidSubnetID.NotFound AWS error when starting the clueter, checked in Manager Account, the networl is poin...
Hi,I just explored serverless feature in databricks and wondering how can i track cost associated with it. Is it stored in system tables? If yes, then where can i find it?And also how can i prove that it's cost is relatively less compared to classic ...
Hi,I recently came across File Trigger in Databricks and find mostly similar to Autoloader. My 1st question is why file trigger as we have autoloader.In which scenarios I can go with file triggers and autoloader.Can you please differentiate?
Hi Community.Recently I gone through the AI Functions and amazed by the results.I just wanted to know whether can we use our custom endpoints(instead of databricks foundational models) and leverage this AI Functions(ai_classify, ai_mask, etc)https://...
AWS by the way, if that matters. We have an old production table that has been running in the background for a couple of years, always with auto-optimize and auto-compaction turned off. Since then, it has written many small files (like 10,000 an hour...
Sometime, if we have less commit versions for a delta table, it won't create checkpoint files in the table. Checkpoint file is responsible to trigger the log cleanup activities. In case, you observe that there are no checkpoint files available for th...
Is anyone familiar with installing the Datadog agent on clusters? We're not having much luck. We honestly might not be having the init script run since we're not seeing it in the log, but we can get just a generic "hellow world" init script to run a...
Responding here with the solution I found. Hopefully it'll help anyone with similar issues.First, the Datadog install script is practically a matryoshka doll- the script creates another script which creates a YAML file.One of the consequences of that...
Hello,I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a...
Hi @ranged_coop Yes, we are using the same job compute for using different workflows. But I think different tasks are like different docker containers so that is why it becomes an issue. It would be nice if you can explain a bit about the approach yo...
Hi,I got this error "com.databricks.WorkflowException: com.databricks.common.client.DatabricksServiceHttpClientException: DEADLINE_EXCEEDED" during the run of a job workflow with an interactive cluster, at the start of this. It's a job that has been ...
I'm worried about how much the Databricks AI assistant will cost me.I need to understand what I'll be charged for, especially when I give a prompt to the AI Assistant Pane and how it will operate in the background.
Hi Mates!I'm trying to get some data from an SQLServer using a query; the query has a WITH statement but I'm getting the following error:raise convert_exception(pyspark.errors.exceptions.connect.SparkConnectGrpcException: (com.microsoft.sqlserver.jdb...
Hi @Jreco ,You need to use prepare query option and then query like below: url = "jdbc:sqlserver://server_name:1433;database=db_name"
df = spark.read \
.format("jdbc") \
.option("url", url) \
.option("prepareQuery", "with cte as ( SELECT ...
I am having the same issue (Azure Databricks).I have a running compute cluster analytics-compute-cluster running in Single User access mode. The Event Log for the cluster says the cluster is running and the "Driver is healthy".I have Manage permissi...
Hello all,I'm trying to create a connection from Databricks to Information Design Tool using access token generated using Databricks Service Principal.While testing the connection I'm getting this error: [Databricks][JDBCDriver](500593) Communication...
End goal is to apply OPTIMIZE and ZORDER table.However, one of the columns to be ZORDER doesn't have stats collected.Running ANALYZE generates the error below.QueryANALYZE TABLE <catalog>.<schema>.<table> COMPUTE STATISTICS FOR COLUMNS my_col_1, my_c...
Hi Databricks Community,We faced a strange error today where the error below was returned when a notebook was being run. It only happens on git connected notebooks and on rerun it succeeds. What is the issue?
One of our databricks workflow job is failing occasionally with below error, after re-running and working fine without any issue.What is the exact reason for the issue and how can we fix itError:Unexpected failure while waiting for the cluster to be ...
These are cloud provider related errors and we will not have much error details from the error message. Based on the error message and also, that you have enough CPU/VM quota available, I think the issue might be due to the storage creation stage in ...