Data Engineering

Forum Posts

Sorted by:

by vlado101 • New Contributor II

10-12-2023 8:25:44 AM

1748 Views
1 replies
1 kudos

Resolved! ANALYZE TABLE is not updating columns stats

Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...

Data Engineering

1748 Views
1 replies
1 kudos

10-12-2023 8:25:44 AM

View Replies

Latest Reply

Priyanka_Biswas
Valued Contributor

10-12-2023 4:54:05 PM

1 kudos

Hello @vlado101 The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...

1 kudos

10-12-2023 4:54:05 PM

by Hubert-Dudek • Esteemed Contributor III

10-11-2023 1:26:31 PM

881 Views
1 replies
2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

Data Engineering

881 Views
1 replies
2 kudos

10-11-2023 1:26:31 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-12-2023 3:49:55 PM

2 kudos

Thank you for sharing the valuable information @Hubert-Dudek

2 kudos

10-12-2023 3:49:55 PM

by Hubert-Dudek • Esteemed Contributor III

10-12-2023 3:20:25 AM

865 Views
1 replies
1 kudos

Structured Streaming Aggregation

Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.

Data Engineering

865 Views
1 replies
1 kudos

10-12-2023 3:20:25 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-12-2023 3:48:33 PM

1 kudos

Thank you for sharing @Hubert-Dudek !!!

1 kudos

10-12-2023 3:48:33 PM

by Kayla • Contributor

09-26-2023 11:52:44 AM

1814 Views
3 replies
2 kudos

Resolved! Paramiko SFTP Get fails on databricks file system

I have an SFTP server I need to routinely download Excel files from and put into GCP cloud storage buckets.Every variation of the filepath to either my GCP path or just the dbfs in-built file system is giving an error of " [Errno 2] No such file or d...

Data Engineering

1814 Views
3 replies
2 kudos

09-26-2023 11:52:44 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-12-2023 3:36:21 PM

2 kudos

Thank you for sharing the solution. Many more users will find this information very useful.

2 kudos

10-12-2023 3:36:21 PM

2 More Replies

by Gobo • New Contributor II

10-12-2023 9:49:45 AM

915 Views
1 replies
0 kudos

Connecting Power BI to Delta Share: Error on data retrieval

Hi everybody,sharing data with an access token and Databricks Connector works fine in Power BI (desktop). Now we wanted to switch to Delta Sharing.We setup a delta share to distribute data via open share to anyone outside our organization. Unity Cata...

Data Engineering

delta share

Power BI

915 Views
1 replies
0 kudos

10-12-2023 9:49:45 AM

View Replies

Latest Reply

Gobo
New Contributor II

10-12-2023 12:45:33 PM

0 kudos

Hi everybody,for anybody running into the same issue.It is a bug in the current Power Bi version (2.121.644.0). I reverted back to the April release (2.116.404.0), which does work as expected.

0 kudos

10-12-2023 12:45:33 PM

by User16790091296 • Contributor II

06-24-2021 8:52:27 AM

1662 Views
1 replies
0 kudos

How to create a databricks job with parameters via CLI?

I'm creating a new job in databricks using the databricks-cli:databricks jobs create --json-file ./deploy/databricks/config/job.config.jsonWith the following json:{ "name": "Job Name", "new_cluster": { "spark_version": "4.1.x-scala2.1...

Data Engineering

1662 Views
1 replies
0 kudos

06-24-2021 8:52:27 AM

View Replies

Latest Reply

matthew_m
New Contributor III

10-12-2023 9:37:24 AM

0 kudos

This is an old post but still relevant for future readers, so will answer how it is done. You need to add base_parameters flag in the notebook_task config, like the following. "notebook_task": { "notebook_path": "...", "base_parameters": { ...

0 kudos

10-12-2023 9:37:24 AM

by samst • New Contributor III

10-20-2021 9:16:34 AM

3119 Views
11 replies
6 kudos

Resolved! Spark UI reverse Proxy blocked on GCP

Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.Databricks is deployed on google platform and I was using the trial.It is quite difficult to debug if the spark ui is only semi accessible.Part of the results in raw html are visible but all ...

Data Engineering

3119 Views
11 replies
6 kudos

10-20-2021 9:16:34 AM

View Replies

Latest Reply

LucasArrudaW
New Contributor II

10-12-2023 7:53:07 AM

6 kudos

Any news about this?

6 kudos

10-12-2023 7:53:07 AM

10 More Replies

by lndlzy • New Contributor II

09-10-2023 11:38:54 PM

2389 Views
3 replies
0 kudos

Resolved! ADD_NODES_FAILED Cluster Does Not Start

Hello everyone, I tried to change a Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML, however I got this error: Failed to add 1 container to the compute. Will attempt retry: false. Reason: Global init script failureGlobal init script Instal...

Data Engineering

2389 Views
3 replies
0 kudos

09-10-2023 11:38:54 PM

View Replies

Latest Reply

Kaniz
Community Manager

09-11-2023 3:49:10 AM

0 kudos

Hi @lndlzy, Based on the information, your error is related to a global init script failure when changing the Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML. This error indicates that the worldwide init script failed with a non-zero exit ...

0 kudos

09-11-2023 3:49:10 AM

2 More Replies

by TimReddick • New Contributor III

09-14-2023 10:40:14 AM

4011 Views
7 replies
2 kudos

Using run_job_task in Databricks Asset Bundles

Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...

Data Engineering

Databrick Asset Bundles

run_job_task

4011 Views
7 replies
2 kudos

09-14-2023 10:40:14 AM

View Replies

Latest Reply

kyle_r
New Contributor II

10-11-2023 12:56:05 PM

2 kudos

Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution.

2 kudos

10-11-2023 12:56:05 PM

6 More Replies

by User16765131552 • Contributor III

06-25-2021 10:45:10 AM

2595 Views
3 replies
0 kudos

Resolved! Pull Cluster Tags

Does anybody know any in-notebook or JAR code to pull cluster tags from the runtime environment? Something like... dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user')but for the cluster name?

Data Engineering

2595 Views
3 replies
0 kudos

06-25-2021 10:45:10 AM

View Replies

Latest Reply

DatBoi
Contributor

10-11-2023 11:37:28 AM

0 kudos

Did you find any documentation for spark.conf.get properties? I am trying to get some metadata about the environment my notebook is running in (specifically cluster custom tags)? But cannot find any information beside a couple of forum posts.

0 kudos

10-11-2023 11:37:28 AM

2 More Replies

by arielmoraes • New Contributor III

10-09-2023 10:36:37 AM

1173 Views
3 replies
1 kudos

Resolved! Job Concurrency Queue not working as expected

I have a process that should run the same notebook with varying parameters, thus translating to a job with queue and concurrency enabled. When the first executions are triggered the Jobs Runs work as expected, i.e. if the job has a max concurrency se...

Data Engineering

1173 Views
3 replies
1 kudos

10-09-2023 10:36:37 AM

View Replies

Latest Reply

arielmoraes
New Contributor III

10-11-2023 10:54:30 AM

1 kudos

Hi @Kaniz, we double-checked everything, the resources are enough and all settings are properly set. I'll reach out the support by filing a new ticket. Thank you for your help.

1 kudos

10-11-2023 10:54:30 AM

2 More Replies

by b_1 • New Contributor II

06-20-2023 1:51:36 AM

646 Views
2 replies
1 kudos

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...

Data Engineering

646 Views
2 replies
1 kudos

06-20-2023 1:51:36 AM

View Replies

Latest Reply

b_1
New Contributor II

10-11-2023 9:20:05 AM

1 kudos

Is there anybody who has the same issue or knows that this is in fact an issue?

1 kudos

10-11-2023 9:20:05 AM

1 More Replies

by orso • New Contributor III

10-10-2023 3:14:01 AM

3044 Views
1 replies
0 kudos

Resolved! Java - FAILED_WITH_ERROR when saving to snowflake

I'm trying to move data from database A to B on Snowflake. There's no permission issue since using the Python package snowflake.connector works Databricks runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)Insert into database B fail...

Data Engineering

3044 Views
1 replies
0 kudos

10-10-2023 3:14:01 AM

View Replies

Latest Reply

orso
New Contributor III

10-11-2023 4:01:20 AM

0 kudos

Found the problem. The sub-roles didn't have grants to the warehouse.I hope it will help someone one day

0 kudos

10-11-2023 4:01:20 AM

by erigaud • Honored Contributor

10-05-2023 1:19:20 AM

2682 Views
5 replies
5 kudos

Resolved! DLT overwrite part of the table

Hello !We're currently building a pipeline of file ingestion using a Delta Live Tables pipeline and autoloader. The bronze tables are pretty much the following schema : file_name | file_upload_date | colA | colB (Well, there are actually 250+ columns...

Data Engineering

2682 Views
5 replies
5 kudos

10-05-2023 1:19:20 AM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

10-11-2023 2:50:41 AM

5 kudos

@erigaud Using jobs/workflows would be the right choice for this.

5 kudos

10-11-2023 2:50:41 AM

4 More Replies

by Gilg • Contributor II

10-05-2023 8:07:03 PM

1067 Views
4 replies
2 kudos

DLT: Autoloader Perf

Hi Team,I am looking for some advice to perf tune my bronze layer using DLT.I have the following code very simple and yet very effective. @dlt.create_table(name="bronze_events", comment = "New raw data ingested from storage account ...

Data Engineering

1067 Views
4 replies
2 kudos

10-05-2023 8:07:03 PM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

10-11-2023 2:49:25 AM

2 kudos

Hi @Gilg You mentioned that micro-batch time is around 12 minutes recently. Do we also see jobs/stages with 12 minutes in the spark ui. If that is the case, then the processing of the file itself takes 12 minutes. If not, the 12 minutes is spent on ...

2 kudos

10-11-2023 2:49:25 AM

3 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Resolved! ANALYZE TABLE is not updating columns stats

foreachBatch

Structured Streaming Aggregation

Resolved! Paramiko SFTP Get fails on databricks file system

Connecting Power BI to Delta Share: Error on data retrieval

How to create a databricks job with parameters via CLI?

Resolved! Spark UI reverse Proxy blocked on GCP

Resolved! ADD_NODES_FAILED Cluster Does Not Start

Using run_job_task in Databricks Asset Bundles

Resolved! Pull Cluster Tags

Resolved! Job Concurrency Queue not working as expected

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

Resolved! Java - FAILED_WITH_ERROR when saving to snowflake

Resolved! DLT overwrite part of the table

DLT: Autoloader Perf

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...