Data Engineering

Forum Posts

Sorted by:

by Nis • New Contributor II

06-06-2024 5:50:30 AM

10560 Views
11 replies
6 kudos

Resolved! can we commit offset in spark structured streaming in databricks.

We are storing offset details in checkpoint location wanted to know is there a way can we commit offset once we consume the message from kafka.

Data Engineering

10560 Views
11 replies
6 kudos

06-06-2024 5:50:30 AM

View Replies

Latest Reply

raphaelblg
Databricks Employee

01-09-2025 10:21:12 AM

6 kudos

@dmytro yes, but this feature is currently in Private Preview. Please submit a support case in https://help.databricks.com/s/ if you have interest in trying out this new feature.

6 kudos

01-09-2025 10:21:12 AM

10 More Replies

by chaosBEE • New Contributor II

06-20-2024 5:40:40 AM

4399 Views
5 replies
1 kudos

StructField Metadata Dictionary - What are the possible keys?

I have a Delta Live Table which is being deposited to Unity Catalog. In the Python notebook, I am defining the schema with a series of StructFields, for example: StructField( "columnName", StringType(), True, metadata = { 'comme...

Data Engineering

4399 Views
5 replies
1 kudos

06-20-2024 5:40:40 AM

View Replies

Latest Reply

ipreston
New Contributor III

01-09-2025 11:10:54 AM

1 kudos

Bump,I've got the same issue. Looks like there was a partial reply from Kaniz but I can't see it in this thread.

1 kudos

01-09-2025 11:10:54 AM

4 More Replies

by BNV • New Contributor II

01-08-2025 9:04:10 AM

2380 Views
10 replies
0 kudos

Translating SQL Value Function For XML To Databricks SQL

Trying to translate this line of a SQL query that evaluates XML to Databricks SQL.SELECT MyColumn.value('(/XMLData/Values/ValueDefinition[@colID="10"]/@Value)[1]', 'VARCHAR(max)') as Color The XML looks like this:<XMLData><Values><ValueDefinition c...

Data Engineering

2380 Views
10 replies
0 kudos

01-08-2025 9:04:10 AM

View Replies

Latest Reply

hari-prasad
Valued Contributor II

01-08-2025 11:13:04 PM

0 kudos

Yes, now they support XML parse directly in databricks 14.3 or higher, else earlier you could have leveraged spark xml library jars to parse it.You can still leverage xpath in case where one of data column hold XML value in a dataset. As @BNV is look...

0 kudos

01-08-2025 11:13:04 PM

9 More Replies

by aliacovella • Contributor

01-09-2025 7:10:28 AM

1011 Views
1 replies
1 kudos

Resolved! How can I dedupe from a table created from a Kinesis change data capture feed.

Here I have a table named organizations_silver that was build from a bronze table created from a Kinesis change data capture feed.@dlt.table(name="kinesis_raw_stream", table_properties={"pipelines.reset.allowed": "false"})def kinesis_raw_stream(): ...

Data Engineering

1011 Views
1 replies
1 kudos

01-09-2025 7:10:28 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

01-09-2025 9:25:33 AM

1 kudos

Hello @aliacovella, Looks like there are duplicate records in your source table that match the same target record. This is indeed the case since your source table, organizations_silver, contains duplicates due to the append-only nature of the Kinesis...

1 kudos

01-09-2025 9:25:33 AM

by krishnachaitany • Databricks Partner

09-30-2021 11:43:31 AM

6758 Views
3 replies
4 kudos

Resolved! Spot instance in Azure Databricks

When I run a job enabling using spot instances , I would like to know how many number of workers are using spot and how many number of workers are using on demand instances for a given job run In order to identify the spot instances we got for any...

Data Engineering

6758 Views
3 replies
4 kudos

09-30-2021 11:43:31 AM

View Replies

Latest Reply

drumcircle
Databricks Partner

01-09-2025 8:17:32 AM

4 kudos

This remains a challenge using system tables.

4 kudos

01-09-2025 8:17:32 AM

2 More Replies

by TimB • New Contributor III

09-07-2023 2:30:58 PM

8621 Views
2 replies
0 kudos

Create external table using multiple paths/locations

I want to create an external table from more than a single path. I have configured my storage creds and added an external location, and I can successfully create a table using the following code;create table test.base.Example using csv options ( h...

Data Engineering

8621 Views
2 replies
0 kudos

09-07-2023 2:30:58 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

01-09-2025 7:19:23 AM

0 kudos

You do not have to create all the partition folders yourslef. You just need to specify the parent folder like CREATE OR REPLACE TABLE <catalog>.<schema>.<table-name> USING <format> PARTITIONED BY (<partition-column-list>) LOCATION 's3://<bucket-path...

0 kudos

01-09-2025 7:19:23 AM

1 More Replies

by CDICSteph • New Contributor

01-13-2024 4:30:22 PM

5272 Views
5 replies
0 kudos

permission denied listing external volume when using vscode databricks extension

hey, i'm using the Db extension for vscode (Databricks connect v2). When using dbutils to list an external volume defined in UC like so: dbutils.fs.ls("/Volumes/dev/bronze/rawdatafiles/") i get this error: "databricks.sdk.errors.mapping.PermissionD...

Data Engineering

5272 Views
5 replies
0 kudos

01-13-2024 4:30:22 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

01-09-2025 7:07:24 AM

0 kudos

great, thanks for confirming. This feature was under development in the early quarter last year. Now it is available.

0 kudos

01-09-2025 7:07:24 AM

4 More Replies

by SeliLi_52097 • New Contributor III

01-07-2023 5:32:42 PM

6172 Views
5 replies
5 kudos

Databricks Academy webpage showing insecure connection (in Chrome)

When I was trying to visit the Databricks Academy website https://customer-academy.databricks.com, it showed insecure connection as below.This happened at 8 January 2023 (AEDT) around 12:30pm.

Data Engineering

6172 Views
5 replies
5 kudos

01-07-2023 5:32:42 PM

View Replies

Latest Reply

barendlinders
New Contributor II

01-09-2025 6:54:18 AM

5 kudos

Certificate has expired again...

5 kudos

01-09-2025 6:54:18 AM

4 More Replies

by gfar • New Contributor II

04-12-2023 8:47:16 AM

23662 Views
13 replies
5 kudos

Is it possible to connect QGIS to Databricks using ODBC?

I can connect ArcGIS to Databricks using ODBC, but using the same ODBC DSN for QGIS I get an error - Unable to initialize ODBC connection to DSNHas anyone got this working?

Data Engineering

23662 Views
13 replies
5 kudos

04-12-2023 8:47:16 AM

View Replies

Latest Reply

fgoulet
New Contributor III

01-09-2025 6:29:47 AM

5 kudos

That should probably help, but I tried and my table has 0 rows when the same table loaded with all the schema analyzed has 835...Still have testing to do, but with that, you can now choose a single file to add using the connection stringODBC:token/yo...

5 kudos

01-09-2025 6:29:47 AM

12 More Replies

by rgomez • New Contributor

12-23-2024 4:02:49 PM

2502 Views
2 replies
2 kudos

Install notebook dependency via terraform for serverless notebook tasks

I am trying to install a wheel file as a dependency for a serverless notebook task via terraform. According to https://docs.databricks.com/en/compute/serverless/dependencies.html , dependencies in serverless notebooks can be configured via the base e...

Data Engineering

2502 Views
2 replies
2 kudos

12-23-2024 4:02:49 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-23-2024 6:28:29 PM

2 kudos

Currently, the databricks_job resource in Terraform does not support configuring the environment for notebook tasks directly. You can upload the YAML file and configure the environment as mentioned in https://docs.databricks.com/en/compute/serverless...

2 kudos

12-23-2024 6:28:29 PM

1 More Replies

by TjommeV-Vlaio • New Contributor III

01-09-2025 2:03:59 AM

4914 Views
10 replies
0 kudos

Which process is eating up my driver memory?

Hi,We're running DBR 14.3 on a shared multi-node cluster.When checking the metrics of the driver, I see that the Memory utilization and Memory swap utilization are increasing a lot and are almost never decreasing. Even if no processes are running any...

Data Engineering

4914 Views
10 replies
0 kudos

01-09-2025 2:03:59 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-09-2025 5:36:54 AM

0 kudos

On OS level you will not see notebooks, you will see the mem consumption of the spark application (so this is all notebooks).For that there is the spark ui.I'd look for collect(), broadcast() statements. Python code outside of spark, tons of graphics...

0 kudos

01-09-2025 5:36:54 AM

9 More Replies

by jeremy98 • Honored Contributor

01-09-2025 2:58:17 AM

5957 Views
3 replies
0 kudos

Resolved! Problem with installing Python WHEEL in an existed cluster

Hi community,I was running a workflow based on different tasks but also taking into account the existed cluster to execute those tasks, but I was getting error in configurations: run failed with error message Library installation failed for library d...

Data Engineering

5957 Views
3 replies
0 kudos

01-09-2025 2:58:17 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

01-09-2025 4:59:25 AM

0 kudos

You can do it by following steps in https://docs.databricks.com/en/compute/serverless/dependencies.html

0 kudos

01-09-2025 4:59:25 AM

2 More Replies

by Ajay-Pandey • Databricks MVP

02-10-2023 5:05:14 AM

9996 Views
5 replies
5 kudos

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Support of running multiple cells at a time in databricks notebookHi all,Now databricks notebook supports parallel run of commands in a single notebook that will help run ad hoc queries simultaneously without creating a separate notebook.Once you run...

Data Engineering

9996 Views
5 replies
5 kudos

02-10-2023 5:05:14 AM

View Replies

Latest Reply

SunilUIIT
Databricks Partner

01-09-2025 4:59:15 AM

5 kudos

Hi Team,I am observing that the functionality is not working as expected in the Trial workspace of Databricks. Is there a setting that needs to be enabled to allow independent SQL cells in a Databricks notebook to run in parallel, while dependent cel...

5 kudos

01-09-2025 4:59:15 AM

4 More Replies

by amarnathpal • New Contributor III

01-08-2025 10:27:11 PM

3117 Views
4 replies
0 kudos

Resolved! Integrating PySpark DataFrame into SQL Dashboard for Enhanced Visualization

I have created a DataFrame in a notebook using PySpark and am considering creating a fully-featured dashboard in SQL. My question is whether I need to first store the DataFrame as a table in order to use it in the dashboard, or if it's possible to di...

Data Engineering

3117 Views
4 replies
0 kudos

01-08-2025 10:27:11 PM

View Replies

Latest Reply

hari-prasad
Valued Contributor II

01-09-2025 4:58:29 AM

0 kudos

Sorry, I vaugely remember we used to create persistent views on dataframe earlier.Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.# Assuming there is an ex...

0 kudos

01-09-2025 4:58:29 AM

3 More Replies

by RiyazAliM • Honored Contributor

01-07-2025 7:04:23 PM

1650 Views
3 replies
1 kudos

Requirement to remove/skip column(s) in the downstream tables/views while PII data masking

Hi there,As a compliance measure, I'm tasked with masking the PII data starting from bronze to silver and all the tables and views downstream. I suggested my clients to use row filters and column masks as mentioned in the doc.However, when a user who...

Data Engineering

1650 Views
3 replies
1 kudos

01-07-2025 7:04:23 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

01-09-2025 4:57:14 AM

1 kudos

You are right, on this case we might need to open a feature request through https://docs.databricks.com/en/resources/ideas.html#ideas

1 kudos

01-09-2025 4:57:14 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! can we commit offset in spark structured streaming in databricks.

StructField Metadata Dictionary - What are the possible keys?

Translating SQL Value Function For XML To Databricks SQL

Resolved! How can I dedupe from a table created from a Kinesis change data capture feed.

Resolved! Spot instance in Azure Databricks

Create external table using multiple paths/locations

permission denied listing external volume when using vscode databricks extension

Databricks Academy webpage showing insecure connection (in Chrome)

Is it possible to connect QGIS to Databricks using ODBC?

Install notebook dependency via terraform for serverless notebook tasks

Which process is eating up my driver memory?

Resolved! Problem with installing Python WHEEL in an existed cluster

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Resolved! Integrating PySpark DataFrame into SQL Dashboard for Enhanced Visualization

Requirement to remove/skip column(s) in the downstream tables/views while PII data masking

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...