Data Engineering

Forum Posts

Sorted by:

by Avinash_Narala • New Contributor III

03-06-2024 8:14:09 AM

1072 Views
3 replies
0 kudos

Bootstrap Timeout: DURING CLUSTER START

Hi,When I start a cluster, I am getting below error:Bootstrap Timeout:[id: InstanceId(i-05bbcfbb30027ce2c), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-2247916891060257-01b40fb4-3eb1-4a26-99b4-30d6aa0bfe83), lastStatusChangeTime:...

Data Engineering

1072 Views
3 replies
0 kudos

03-06-2024 8:14:09 AM

View Replies

Latest Reply

dhtubong
New Contributor II

03-15-2024 5:59:33 PM

0 kudos

Hello - if you're using DB Community Edition and having Bootstrap Timeout issue, then below resolution may help.Error: Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-00f21ee2d3ca61424 @ 10.172.245.1. Please check network conne...

0 kudos

03-15-2024 5:59:33 PM

2 More Replies

by Dick1960 • New Contributor II

08-14-2023 3:55:55 AM

1209 Views
3 replies
2 kudos

how to know what is the domain of my databricks workspace

hi,I'm trying to open a support case and it asks me for my domain. in the browser I have: https://adb-27xxxx4341636xxx.5.azuredatabricks.net can you help me ?

Data Engineering

1209 Views
3 replies
2 kudos

08-14-2023 3:55:55 AM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

08-14-2023 6:33:17 AM

2 kudos

@Dick1960 The numeric value you have in the workspace URL is the domain name.In your case, it would be 27xxxx4341636xxx

2 kudos

08-14-2023 6:33:17 AM

2 More Replies

by Brad • Contributor

03-14-2024 11:17:43 AM

325 Views
2 replies
0 kudos

WAL for structured streaming

Hi, I cannot find deep-dive on this from latest links. So far the understanding is:Previously SS (structured streaming) copies and caches the data in WAL. After a version, with retrieve less, SS doesn't copy the data to WAL any more, and only stores ...

Data Engineering

325 Views
2 replies
0 kudos

03-14-2024 11:17:43 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 2:25:05 AM

0 kudos

Your understanding is partially correct. Let’s delve into the details of Structured Streaming in Apache Spark. Write-Ahead Log (WAL): In the past, Structured Streaming used to copy and cache data in the Write-Ahead Log (WAL).The WAL served as a r...

0 kudos

03-15-2024 2:25:05 AM

1 More Replies

by lilo_z • New Contributor III

03-12-2024 10:12:41 AM

698 Views
3 replies
0 kudos

Resolved! Databricks Asset Bundles - job specific "run_as" user/service_principle

Was wondering if this was possible, since a use case came up in my team. Would it be possible to use a different service principle for a single job than what is specified for that target environment? For example:bundle: name: hello-bundle resource...

Data Engineering

698 Views
3 replies
0 kudos

03-12-2024 10:12:41 AM

View Replies

Latest Reply

lilo_z
New Contributor III

03-15-2024 9:41:23 AM

0 kudos

Found a working solution, posting it here for anyone else hitting the same issue - trick was to redefine "resources" under the target you want to make an exception for:bundle: name: hello_bundle include: - resources/*.yml targets: dev: w...

0 kudos

03-15-2024 9:41:23 AM

2 More Replies

by dbx-user7354 • New Contributor III

01-03-2024 1:18:07 AM

766 Views
3 replies
3 kudos

Create a Job via SKD with JobSettings Object

Hey, I want to create a Job via the Python SDK with a JobSettings object.import os import time from databricks.sdk import WorkspaceClient from databricks.sdk.service import jobs from databricks.sdk.service.jobs import JobSettings w = WorkspaceClien...

Data Engineering

766 Views
3 replies
3 kudos

01-03-2024 1:18:07 AM

View Replies

Latest Reply

nenetto
New Contributor II

03-15-2024 7:19:33 AM

3 kudos

I just faced the same problem. The issue is that the when you do JobSettings.as_dict()the settings are parsed to a dict where all the values are also parsed recursively. When you pass the parameters as **params, the create method again tries to parse...

3 kudos

03-15-2024 7:19:33 AM

2 More Replies

by noname123 • New Contributor III

03-14-2024 5:37:30 AM

589 Views
2 replies
0 kudos

Resolved! Delta table version protocol

I do:df.write.format("delta").mode("append").partitionBy("timestamp").option("mergeSchema", "true").save(destination)If table doesn't exist, it creates new table with "minReaderVersion":3,"minWriterVersion":7.Yesterday it was creating table with "min...

Data Engineering

589 Views
2 replies
0 kudos

03-14-2024 5:37:30 AM

View Replies

Latest Reply

noname123
New Contributor III

03-15-2024 5:22:28 AM

0 kudos

Thanks for help.Issue was caused by "Auto-Enable Deletion Vectors" setting.

0 kudos

03-15-2024 5:22:28 AM

1 More Replies

by nihar_ghude • New Contributor II

03-11-2024 10:23:11 AM

701 Views
2 replies
0 kudos

OSError: [Errno 107] Transport endpoint is not connected

Hi,I am facing this error when performing write operation in foreach() on a dataframe. The piece of code was working fine for over 3 months but started failing since last week.To give some context, I have a dataframe extract_df which contains 2 colum...

Data Engineering

ADLS

azure

python

spark

701 Views
2 replies
0 kudos

03-11-2024 10:23:11 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-12-2024 1:08:49 AM

0 kudos

Hi @nihar_ghude, Instead of using foreach(), consider using foreachBatch(). This method allows you to apply custom logic on the output of each micro-batch, which can help address parallelism issues.Unlike foreach(), which operates on individual rows...

0 kudos

03-12-2024 1:08:49 AM

1 More Replies

by oussValrho • New Contributor

02-22-2024 7:01:25 AM

663 Views
1 replies
0 kudos

Cannot resolve due to data type mismatch: incompatible types ("STRING" and ARRAY<STRING>

hey i have this error from a while : Cannot resolve "(needed_skill_id = needed_skill_id)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("STRING" and "ARRAY<STRING>"). SQLSTATE: 42K09;and these ...

Data Engineering

663 Views
1 replies
0 kudos

02-22-2024 7:01:25 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 3:59:00 AM

0 kudos

Hi @oussValrho, The error message you’re encountering indicates a data type mismatch in your SQL query. Specifically, it states that the left and right operands of the binary operator have incompatible types: a STRING and an ARRAY<STRING>. Let’s bre...

0 kudos

03-15-2024 3:59:00 AM

by Lightyagami • New Contributor

02-22-2024 3:25:18 AM

1881 Views
1 replies
0 kudos

Save workbook with macros

Hi, Is there any way to save a workbook without losing the macros in databricks?

Data Engineering

Databricks

pyspark

1881 Views
1 replies
0 kudos

02-22-2024 3:25:18 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 3:56:57 AM

0 kudos

Hi @Lightyagami, When working with Databricks and dealing with macros, there are a few approaches you can consider to save a workbook without losing the macros: Export to Excel with Macros Enabled: You can generate an Excel file directly from PyS...

0 kudos

03-15-2024 3:56:57 AM

by philipkd • New Contributor III

02-22-2024 10:15:26 PM

413 Views
1 replies
0 kudos

Cannot get past Query Data tutorial for Azure Databricks

I created a new workspace on Azure Databricks, and I can't get past this first step in the tutorial: DROP TABLE IF EXISTS diamonds; CREATE TABLE diamonds USING CSV OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", hea...

Data Engineering

413 Views
1 replies
0 kudos

02-22-2024 10:15:26 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 3:47:19 AM

0 kudos

Hi @philipkd, It appears you’ve encountered an issue while creating a table in Azure Databricks using the Unity Catalog. Let’s address this step by step: URI Format: The error message indicates that the URI for your CSV file is missing a cloud f...

0 kudos

03-15-2024 3:47:19 AM

by alxsbn • New Contributor III

02-23-2024 2:18:10 AM

659 Views
1 replies
0 kudos

Resolved! Compute pool and AWS instance profiles

Hi everyone,We're looking at using the compute pool feature. Now we're mostly relying on all-purpose and job compute. On these two we're using instance profiles to let the clusters access our s3 buckets and more.We don't see anything related to insta...

Data Engineering

659 Views
1 replies
0 kudos

02-23-2024 2:18:10 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 3:42:59 AM

0 kudos

Hi @alxsbn , Let’s delve into the details of compute pools and instance profiles. Compute Pools: Compute pools in Databricks allow you to manage and allocate compute resources efficiently. They provide a way to organize and share compute resource...

0 kudos

03-15-2024 3:42:59 AM

by BjarkeM • New Contributor II

12-18-2023 6:26:09 AM

1504 Views
6 replies
0 kudos

Schema migration of production delta tables

GoalWe would like to be in control of schema migrations of delta tables in all dev and production environments, and it must be automatically deployed.I anticipated this to be a common problem with a well-known standard solution. But unfortunately, I ...

Data Engineering

1504 Views
6 replies
0 kudos

12-18-2023 6:26:09 AM

View Replies

Latest Reply

zerobugs
New Contributor II

03-15-2024 3:35:14 AM

0 kudos

Hello, so does this mean that it's necessary to migrate away from hive_metastore to unity_catalog in order to be able to use schema migrations?

0 kudos

03-15-2024 3:35:14 AM

5 More Replies

by GOW • New Contributor II

03-15-2024 12:37:27 AM

213 Views
2 replies
1 kudos

Databricks to s3

I am new to data engineering in Databricks. I need some guidance surrounding Databricks to s3. Can I get an example job or approach to do this?

Data Engineering

213 Views
2 replies
1 kudos

03-15-2024 12:37:27 AM

View Replies

Latest Reply

GOW
New Contributor II

03-15-2024 2:48:32 AM

1 kudos

Thank you for the reply. Can I apply this to dbt or using a dbt macro to unload the data? So dbt models running in Databricks?

1 kudos

03-15-2024 2:48:32 AM

1 More Replies

by exilon • New Contributor

02-23-2024 8:58:01 AM

427 Views
1 replies
0 kudos

DLT streaming with sliding window missing last windows interval

Hello, I have a DLT pipeline where I want to calculate the rolling average of a column for the last 24 hours which is updated every hour.I'm using the below code to achieve this: @Dlt.table() def gold(): df = dlt.read_stream("silver_table")...

Data Engineering

dlt

spark

streaming

window

427 Views
1 replies
0 kudos

02-23-2024 8:58:01 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 2:45:08 AM

0 kudos

Hi @exilon, It seems like you’re trying to calculate a rolling average for a specific time window in your DLT pipeline. Let’s address the issue you’re facing. The behavior you’re observing is due to the way the window specification is defined. Whe...

0 kudos

03-15-2024 2:45:08 AM

by dbph • New Contributor

02-23-2024 9:16:00 AM

544 Views
1 replies
0 kudos

Databricks asset bundles error "failed to instantiate provider"

Hi all,I'm trying to deploy with databricks asset bundles. When running bundle deploy, the process fails with following error message:failed execution pid=25092 exit_code=1 error="terraform apply: exit status 1\n\nError: failed to read schema for dat...

Data Engineering

544 Views
1 replies
0 kudos

02-23-2024 9:16:00 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-15-2024 2:44:05 AM

0 kudos

Hi @dbph , It seems you’re encountering an issue with deploying Databricks Asset Bundles. Let’s troubleshoot this step by step. Terraform Provider Issue: The error message indicates a problem with the Terraform provider for Databricks. Specifical...

0 kudos

03-15-2024 2:44:05 AM

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Bootstrap Timeout: DURING CLUSTER START

how to know what is the domain of my databricks workspace

WAL for structured streaming

Resolved! Databricks Asset Bundles - job specific "run_as" user/service_principle

Create a Job via SKD with JobSettings Object

Resolved! Delta table version protocol

OSError: [Errno 107] Transport endpoint is not connected

Cannot resolve due to data type mismatch: incompatible types ("STRING" and ARRAY<STRING>

Save workbook with macros

Cannot get past Query Data tutorial for Azure Databricks

Resolved! Compute pool and AWS instance profiles

Schema migration of production delta tables

Databricks to s3

DLT streaming with sliding window missing last windows interval

Databricks asset bundles error "failed to instantiate provider"

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...