Data Engineering

Forum Posts

Sorted by:

by lndlzy • New Contributor II

09-10-2023 11:38:54 PM

2351 Views
3 replies
0 kudos

Resolved! ADD_NODES_FAILED Cluster Does Not Start

Hello everyone, I tried to change a Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML, however I got this error: Failed to add 1 container to the compute. Will attempt retry: false. Reason: Global init script failureGlobal init script Instal...

Data Engineering

2351 Views
3 replies
0 kudos

09-10-2023 11:38:54 PM

View Replies

Latest Reply

Kaniz
Community Manager

09-11-2023 3:49:10 AM

0 kudos

Hi @lndlzy, Based on the information, your error is related to a global init script failure when changing the Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML. This error indicates that the worldwide init script failed with a non-zero exit ...

0 kudos

09-11-2023 3:49:10 AM

2 More Replies

by TimReddick • New Contributor III

09-14-2023 10:40:14 AM

3928 Views
7 replies
2 kudos

Using run_job_task in Databricks Asset Bundles

Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...

Data Engineering

Databrick Asset Bundles

run_job_task

3928 Views
7 replies
2 kudos

09-14-2023 10:40:14 AM

View Replies

Latest Reply

kyle_r
New Contributor II

10-11-2023 12:56:05 PM

2 kudos

Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution.

2 kudos

10-11-2023 12:56:05 PM

6 More Replies

by User16765131552 • Contributor III

06-25-2021 10:45:10 AM

2507 Views
3 replies
0 kudos

Resolved! Pull Cluster Tags

Does anybody know any in-notebook or JAR code to pull cluster tags from the runtime environment? Something like... dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user')but for the cluster name?

Data Engineering

2507 Views
3 replies
0 kudos

06-25-2021 10:45:10 AM

View Replies

Latest Reply

DatBoi
Contributor

10-11-2023 11:37:28 AM

0 kudos

Did you find any documentation for spark.conf.get properties? I am trying to get some metadata about the environment my notebook is running in (specifically cluster custom tags)? But cannot find any information beside a couple of forum posts.

0 kudos

10-11-2023 11:37:28 AM

2 More Replies

by arielmoraes • New Contributor III

10-09-2023 10:36:37 AM

1150 Views
3 replies
1 kudos

Resolved! Job Concurrency Queue not working as expected

I have a process that should run the same notebook with varying parameters, thus translating to a job with queue and concurrency enabled. When the first executions are triggered the Jobs Runs work as expected, i.e. if the job has a max concurrency se...

Data Engineering

1150 Views
3 replies
1 kudos

10-09-2023 10:36:37 AM

View Replies

Latest Reply

arielmoraes
New Contributor III

10-11-2023 10:54:30 AM

1 kudos

Hi @Kaniz, we double-checked everything, the resources are enough and all settings are properly set. I'll reach out the support by filing a new ticket. Thank you for your help.

1 kudos

10-11-2023 10:54:30 AM

2 More Replies

by b_1 • New Contributor II

06-20-2023 1:51:36 AM

632 Views
2 replies
1 kudos

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

I have this datetime string in my dataset: '2023061218154258' and I want to convert it to datetime, using below code. However the format that I expect to work, doesn't work, namely: yyyyMMddHHmmssSS. This code will reproduce the issue:from pyspark.sq...

Data Engineering

632 Views
2 replies
1 kudos

06-20-2023 1:51:36 AM

View Replies

Latest Reply

b_1
New Contributor II

10-11-2023 9:20:05 AM

1 kudos

Is there anybody who has the same issue or knows that this is in fact an issue?

1 kudos

10-11-2023 9:20:05 AM

1 More Replies

by orso • New Contributor III

10-10-2023 3:14:01 AM

3024 Views
1 replies
0 kudos

Resolved! Java - FAILED_WITH_ERROR when saving to snowflake

I'm trying to move data from database A to B on Snowflake. There's no permission issue since using the Python package snowflake.connector works Databricks runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)Insert into database B fail...

Data Engineering

3024 Views
1 replies
0 kudos

10-10-2023 3:14:01 AM

View Replies

Latest Reply

orso
New Contributor III

10-11-2023 4:01:20 AM

0 kudos

Found the problem. The sub-roles didn't have grants to the warehouse.I hope it will help someone one day

0 kudos

10-11-2023 4:01:20 AM

by erigaud • Honored Contributor

10-05-2023 1:19:20 AM

2619 Views
5 replies
5 kudos

Resolved! DLT overwrite part of the table

Hello !We're currently building a pipeline of file ingestion using a Delta Live Tables pipeline and autoloader. The bronze tables are pretty much the following schema : file_name | file_upload_date | colA | colB (Well, there are actually 250+ columns...

Data Engineering

2619 Views
5 replies
5 kudos

10-05-2023 1:19:20 AM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

10-11-2023 2:50:41 AM

5 kudos

@erigaud Using jobs/workflows would be the right choice for this.

5 kudos

10-11-2023 2:50:41 AM

4 More Replies

by Gilg • Contributor II

10-05-2023 8:07:03 PM

1041 Views
4 replies
2 kudos

DLT: Autoloader Perf

Hi Team,I am looking for some advice to perf tune my bronze layer using DLT.I have the following code very simple and yet very effective. @dlt.create_table(name="bronze_events", comment = "New raw data ingested from storage account ...

Data Engineering

1041 Views
4 replies
2 kudos

10-05-2023 8:07:03 PM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

10-11-2023 2:49:25 AM

2 kudos

Hi @Gilg You mentioned that micro-batch time is around 12 minutes recently. Do we also see jobs/stages with 12 minutes in the spark ui. If that is the case, then the processing of the file itself takes 12 minutes. If not, the 12 minutes is spent on ...

2 kudos

10-11-2023 2:49:25 AM

3 More Replies

by Kaviana • New Contributor III

10-09-2023 5:01:07 PM

1007 Views
2 replies
0 kudos

internal server error when creating workspace

I tried to create a workspace and it is not generated either automatically or manually. The strange thing is that it stopped working after a certain time. It seems like an internal Databricks error but it is not known if it is like that or a bug, wha...

Data Engineering

1007 Views
2 replies
0 kudos

10-09-2023 5:01:07 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-10-2023 10:55:32 PM

0 kudos

Hi @Kaviana , Thank you for posting your question in our community! We are happy to assist you. To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

0 kudos

10-10-2023 10:55:32 PM

1 More Replies

by N_M • New Contributor III

10-10-2023 1:03:28 AM

910 Views
2 replies
0 kudos

Resolved! Unzip multipart files

Hi all,Due to file size and file transfer limitation, we are receiving huge files compressed and split, in the format FILE.z01, FILE.z02,...,FILE.zipHowever, I can't find a way to unzip multipart files using databricks.I tried already some of the ...

Data Engineering

bash

unzip

910 Views
2 replies
0 kudos

10-10-2023 1:03:28 AM

View Replies

Latest Reply

Kaniz
Community Manager

10-10-2023 10:53:29 PM

0 kudos

Hi @N_M , Thank you for posting your question in our community! We are happy to assist you. To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your qu...

0 kudos

10-10-2023 10:53:29 PM

1 More Replies

by phoebe_dt • New Contributor

10-09-2023 6:13:53 AM

1961 Views
2 replies
1 kudos

Access denied error to s3 bucket in Databricks notebook

When running a databricks notebook connected to an s3 cluster I randomly but frequently experience the following error: java.nio.file.AccessDeniedException: s3://mybucket: getFileStatus on s3://mybucket: com.amazonaws.services.s3.model.AmazonS3Except...

Data Engineering

access denied

AWS

databricks notebook

1961 Views
2 replies
1 kudos

10-09-2023 6:13:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

10-10-2023 10:50:41 PM

1 kudos

Hi @phoebe_dt , Thank you for posting your question in our community! We are happy to assist you. To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

1 kudos

10-10-2023 10:50:41 PM

1 More Replies

by Monika_Bagyal • New Contributor

10-10-2023 3:19:01 PM

2218 Views
1 replies
0 kudos

Access denied error while reading file from S3 to spark

I'm seeing the access denied error from spark cluster while reading s3 file into notebook.Running on personal single user compute with LTS 13.3 ML.configs setup looks like this:spark.conf.set("spark.hadoop.fs.s3a.access.key", access_id)spark.conf.set...

Data Engineering

2218 Views
1 replies
0 kudos

10-10-2023 3:19:01 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-10-2023 10:37:08 PM

0 kudos

Hi @Monika_Bagyal , The "Access Denied" error you are seeing is likely due to insufficient permissions to read the S3 bucket. The configurations you've set up are correct for accessing S3 using temporary AWS credentials, but the credentials themse...

0 kudos

10-10-2023 10:37:08 PM

by Gilg • Contributor II

10-07-2023 1:13:32 PM

1072 Views
3 replies
1 kudos

APPLY_CHANGES late arriving data

Hi Team,I have a DLT pipeline that uses APPLY_CHANGES to our Silver tables. I am using Id as keys and timestamp to know the sequence of the incoming data. Question: How does APPLY_CHANGES handles late arriving data?i.e., for silver_table_1, the data ...

Data Engineering

1072 Views
3 replies
1 kudos

10-07-2023 1:13:32 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-09-2023 12:31:12 AM

1 kudos

Hi @Gilg , The APPLY_CHANGES function in Databricks Delta Live Tables handles late arriving data using a specified SEQUENCE BY column, which in your case is the timestamp. It uses this column to propagate appropriate sequencing values to the __START_...

1 kudos

10-09-2023 12:31:12 AM

2 More Replies

by PradyumnJoshi • New Contributor

10-04-2023 11:58:54 PM

875 Views
2 replies
0 kudos

Resolved! Databricks Academy - Advanced Data Engineering - Notebook Error while loading configurations

Hi Databricks Academy team,I am getting below errors while running classroom setup command in Databricks Academy - Advanced data engineering course Notebooks in databricks community edition. Please help me resolve it. #databricksacademy #advanceddat...

Data Engineering

875 Views
2 replies
0 kudos

10-04-2023 11:58:54 PM

View Replies

Latest Reply

User16847923431
Contributor II

10-10-2023 9:12:48 AM

0 kudos

Hi, all. Our apologies - the Advanced Data Engineering with Databricks course will not run on Databricks Community Edition. If you would like a lab environment to run this course on, please see the new paid lab subscription available via the Databric...

0 kudos

10-10-2023 9:12:48 AM

1 More Replies

by dng • New Contributor III

11-30-2022 4:39:44 PM

3605 Views
8 replies
11 kudos

Databricks JDBC Driver v2.6.29 Cloud Fetch failing for Windows Operating System

Hi everyone, I've been stuck for the past two days on this issue with my Databricks JDBC driver and I'm hoping someone can give me more insight into how to troubleshoot. I am using the Databricks JDBC driver in RStudio and the connection was working ...

Data Engineering

3605 Views
8 replies
11 kudos

11-30-2022 4:39:44 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

01-30-2023 9:09:08 AM

11 kudos

@Debbie Ng From your message I see there was a windows update and this failure started. based on the conversation you tried latest version of the driver and still you face the problem. I believe this is something related to the Java version compatib...

11 kudos

01-30-2023 9:09:08 AM

7 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! ADD_NODES_FAILED Cluster Does Not Start

Using run_job_task in Databricks Asset Bundles

Resolved! Pull Cluster Tags

Resolved! Job Concurrency Queue not working as expected

to_timstamp function in non-legacy mode does not parse this format: yyyyMMddHHmmssSS

Resolved! Java - FAILED_WITH_ERROR when saving to snowflake

Resolved! DLT overwrite part of the table

DLT: Autoloader Perf

internal server error when creating workspace

Resolved! Unzip multipart files

Access denied error to s3 bucket in Databricks notebook

Access denied error while reading file from S3 to spark

APPLY_CHANGES late arriving data

Resolved! Databricks Academy - Advanced Data Engineering - Notebook Error while loading configurations

Databricks JDBC Driver v2.6.29 Cloud Fetch failing for Windows Operating System

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...