Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...
Hello @vlado101
The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...
Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.
I have an SFTP server I need to routinely download Excel files from and put into GCP cloud storage buckets.Every variation of the filepath to either my GCP path or just the dbfs in-built file system is giving an error of " [Errno 2] No such file or d...
Hi everybody,sharing data with an access token and Databricks Connector works fine in Power BI (desktop). Now we wanted to switch to Delta Sharing.We setup a delta share to distribute data via open share to anyone outside our organization. Unity Cata...
Hi everybody,for anybody running into the same issue.It is a bug in the current Power Bi version (2.121.644.0). I reverted back to the April release (2.116.404.0), which does work as expected.
I'm creating a new job in databricks using the databricks-cli:databricks jobs create --json-file ./deploy/databricks/config/job.config.jsonWith the following json:{
"name": "Job Name",
"new_cluster": {
"spark_version": "4.1.x-scala2.1...
This is an old post but still relevant for future readers, so will answer how it is done. You need to add base_parameters flag in the notebook_task config, like the following.
"notebook_task": {
"notebook_path": "...",
"base_parameters": {
...
Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.Databricks is deployed on google platform and I was using the trial.It is quite difficult to debug if the spark ui is only semi accessible.Part of the results in raw html are visible but all ...
Hello everyone, I tried to change a Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML, however I got this error: Failed to add 1 container to the compute. Will attempt retry: false. Reason: Global init script failureGlobal init script Instal...
Hi @lndlzy, Based on the information, your error is related to a global init script failure when changing the Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML. This error indicates that the worldwide init script failed with a non-zero exit ...
Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...
Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution.
Does anybody know any in-notebook or JAR code to pull cluster tags from the runtime environment? Something like... dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user')but for the cluster name?
Did you find any documentation for spark.conf.get properties? I am trying to get some metadata about the environment my notebook is running in (specifically cluster custom tags)? But cannot find any information beside a couple of forum posts.
I have a process that should run the same notebook with varying parameters, thus translating to a job with queue and concurrency enabled. When the first executions are triggered the Jobs Runs work as expected, i.e. if the job has a max concurrency se...
Hi @Kaniz, we double-checked everything, the resources are enough and all settings are properly set. I'll reach out the support by filing a new ticket. Thank you for your help.
I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...
I'm trying to move data from database A to B on Snowflake. There's no permission issue since using the Python package snowflake.connector works Databricks runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)Insert into database B fail...
Hello !We're currently building a pipeline of file ingestion using a Delta Live Tables pipeline and autoloader. The bronze tables are pretty much the following schema : file_name | file_upload_date | colA | colB (Well, there are actually 250+ columns...
Hi Team,I am looking for some advice to perf tune my bronze layer using DLT.I have the following code very simple and yet very effective. @dlt.create_table(name="bronze_events",
comment = "New raw data ingested from storage account ...
Hi @Gilg
You mentioned that micro-batch time is around 12 minutes recently. Do we also see jobs/stages with 12 minutes in the spark ui. If that is the case, then the processing of the file itself takes 12 minutes. If not, the 12 minutes is spent on ...