Data Engineering

Forum Posts

Sorted by:

by ConfusedZebra • New Contributor III

02-23-2025 9:50:11 AM

1995 Views
3 replies
0 kudos

HTML form within notebooks

Hi all,I'm trying to make a small form in Databricks notebooks. I can't currently use apps so want an interim solution. I can successfully make the form using HTML which displays correctly but I cannot extract the values/use them e.g. A form with thr...

Data Engineering

1995 Views
3 replies
0 kudos

02-23-2025 9:50:11 AM

View Replies

Latest Reply

ConfusedZebra
New Contributor III

03-05-2025 7:26:10 AM

0 kudos

Thanks both. Notebooks are a little too intimidating for some users so we are trying to make them look and feel a bit more like what they are used to. Ideally we would build an app but apps aren't available in our area yet so we need an interim solut...

0 kudos

03-05-2025 7:26:10 AM

2 More Replies

by jura • New Contributor II

03-20-2024 1:28:31 AM

3157 Views
3 replies
1 kudos

SQL Identifier clause

Hi, I was trying to prepare some dynamic SQLs to create table using the IDENTIFIER clause and WITH AS clause, but I'm stuck on some bug as it seems. could someone verify it or tell me that I am doing something wrong?code is running on SQL Warehouse T...

Data Engineering

identifier

3157 Views
3 replies
1 kudos

03-20-2024 1:28:31 AM

View Replies

Latest Reply

vinay_yogeesh
New Contributor II

03-05-2025 1:41:36 AM

1 kudos

Hey, I am struck with the same issue, did you find any workaround. I am trying to run DESCRIBE & ALTER command using IDENTIFIER() using databricks-sql-connector. Did u figure out how to run the identifier command statements??

1 kudos

03-05-2025 1:41:36 AM

2 More Replies

by lprevost • Contributor III

03-03-2025 12:34:44 PM

2007 Views
3 replies
0 kudos

Using WorkspaceClient -- run a saved query

I've saved a query on my sql warehouse which has a parameter called :list_parameter. I've found my query id as follows: from databricks.sdk import WorkspaceClient w = WorkspaceClient() for query in w.queries.list(): print(f"query: {query.displ...

Data Engineering

2007 Views
3 replies
0 kudos

03-03-2025 12:34:44 PM

View Replies

Latest Reply

koji_kawamura
Databricks Employee

03-03-2025 4:40:20 PM

0 kudos

Hi @lprevost The WorkspaceClient provides APIs to manage Query objects. But it doesn't provide the API to run it. If you need to run the query from a notebook, you can pass the query text into `spark.sql`. It returns SparkDataFrame. I hope this help...

0 kudos

03-03-2025 4:40:20 PM

2 More Replies

by abhijeet_more • New Contributor II

03-03-2025 11:30:28 AM

2413 Views
2 replies
1 kudos

Resolved! DLT pipline with generated identity column

I got a csv file which I am looking to read into a streaming table. I always want to add a generated identity column as surrogate key. I found few blogs which says we can achieve this by explicit mention of schema. However, I have around 40 odd fiel...

Data Engineering

2413 Views
2 replies
1 kudos

03-03-2025 11:30:28 AM

View Replies

Latest Reply

abhijeet_more
New Contributor II

03-04-2025 1:57:19 PM

1 kudos

Thank you @koji_kawamura .This was helpful.

1 kudos

03-04-2025 1:57:19 PM

1 More Replies

by jeremy98 • Honored Contributor

02-26-2025 11:15:20 AM

6251 Views
7 replies
0 kudos

Optimizing .collect() Usage in Spark

Hi all!I'm facing an issue with driver memory after deploying a cluster with 14GB of memory. My code utilizes the cluster’s compute power continuously (it never shuts down, as I cannot communicate with the Azure PostgreSQL database otherwise at the m...

Data Engineering

6251 Views
7 replies
0 kudos

02-26-2025 11:15:20 AM

View Replies

Latest Reply

cgrant
Databricks Employee

02-27-2025 3:22:35 PM

0 kudos

I would expect both the Python process on the driver and Spark's JVM to release memory once you are done with each chunk of data. Otherwise, this sounds like a memory leak. If you suspect this is a problem in the JVM, you can look at heap dumps - the...

0 kudos

02-27-2025 3:22:35 PM

6 More Replies

by knutasm • New Contributor III

08-11-2023 1:12:30 AM

10007 Views
7 replies
7 kudos

Run Delta Live Tables as service principal

How to run a delta live tables pipeline in production? It uses the owner's (creator's) permissions for writing to tables, and I can't change the owner of a UC-enabled pipeline after creation. I don't want regular users to have write access to prod ta...

Data Engineering

10007 Views
7 replies
7 kudos

08-11-2023 1:12:30 AM

View Replies

Latest Reply

ashwini0723
New Contributor II

03-04-2025 4:32:13 AM

7 kudos

@knutasmI have build the solution for it. The way to create DLT pipeline using SPN is to write a code wherein via databricks API a new DLT pipeline will be created and you mentioned owner as a service principal in the API code as shown below. Below m...

7 kudos

03-04-2025 4:32:13 AM

6 More Replies

by Ian_Neft • New Contributor II

11-10-2023 12:17:42 PM

13975 Views
4 replies
1 kudos

Data Lineage in Unity Catalog not Populating

I have been trying to get the data lineage to populate with the simplest of queries on a unity enabled catalog with a unity enabled cluster. I am essentially running the example provided with more data to see how it works with various aggregates dow...

Data Engineering

13975 Views
4 replies
1 kudos

11-10-2023 12:17:42 PM

View Replies

Latest Reply

pmahawar
New Contributor II

03-04-2025 4:01:22 AM

1 kudos

Cluster running in shared mode with Databricks runtime 15.4 LTSUC setup as per Databricks guide. I can see system tables everything but data is not populating in the table_lineage table. EventHub Firewall 9093 port is also open.Enabled runtime settin...

1 kudos

03-04-2025 4:01:22 AM

3 More Replies

by HeyRam • New Contributor II

03-03-2025 2:21:59 PM

975 Views
1 replies
1 kudos

Resolved! Lab material for "Apache Spark Developer Learning Plan"

Hi, I just finished the following course "Introduction to Python for Data Science and Data Engineering". The instructor talks about the lab material but no where in the tabs on the left hand side, I am able to find any link to the lab material. I am ...

Data Engineering

975 Views
1 replies
1 kudos

03-03-2025 2:21:59 PM

View Replies

Latest Reply

Advika_
Databricks Employee

03-04-2025 1:44:23 AM

1 kudos

Hello @HeyRam! Did you take the Self-paced course? To clarify, lab materials are not available in self-paced courses. To access them, you have two options: Enroll in the ILT (Instructor-Led Training) course - This will grant you access to the labs fo...

1 kudos

03-04-2025 1:44:23 AM

by Anish_2 • New Contributor III

02-21-2025 12:36:22 AM

1661 Views
3 replies
0 kudos

removal of Delta live tables

Hello Team,I have removed definition of table from delta live table pipeline but table is still present in unity catalog. In event log, it is giving below messageMaterialized View '`catalog1`.`schema1`.`table1`' is no longer defined in the pipeline a...

Data Engineering

Delta Live Table

1661 Views
3 replies
0 kudos

02-21-2025 12:36:22 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

02-21-2025 12:52:34 PM

0 kudos

Hi @Anish_2 How are you doing today? I agree with @KaranamS's answer.Databricks marks the table as inactive instead of removing it to prevent accidental data loss, allowing you to restore it if needed. Once inactive, the table remains in Unity Catalo...

0 kudos

02-21-2025 12:52:34 PM

2 More Replies

by N38 • New Contributor III

02-03-2025 2:55:44 AM

2921 Views
11 replies
4 kudos

DLT Pipeline event_log error - invalid pipeline name / The Spark SQL phase analysis failed

I am trying the below queries using both SQL warehouse and a shared cluster on Databricks runtime (15.4/16.1) with Unity Catalog: SELECT * FROM event_log(table(my_catalog.myschema.bronze_employees))SELECT * FROM event_log("6b317553-5c5a-40d5-9541-1a5...

Data Engineering

2921 Views
11 replies
4 kudos

02-03-2025 2:55:44 AM

View Replies

Latest Reply

ron99
New Contributor II

03-03-2025 6:03:04 PM

4 kudos

Hi,I am also facing same issue, is there any ETA to fix it?

4 kudos

03-03-2025 6:03:04 PM

10 More Replies

by phoebe_dt • New Contributor

10-09-2023 6:13:53 AM

5854 Views
1 replies
0 kudos

Access denied error to s3 bucket in Databricks notebook

When running a databricks notebook connected to an s3 cluster I randomly but frequently experience the following error: java.nio.file.AccessDeniedException: s3://mybucket: getFileStatus on s3://mybucket: com.amazonaws.services.s3.model.AmazonS3Except...

Data Engineering

access denied

AWS

databricks notebook

5854 Views
1 replies
0 kudos

10-09-2023 6:13:53 AM

View Replies

Latest Reply

pg289
New Contributor II

03-03-2025 1:52:54 PM

0 kudos

Hey! I'm getting this same error connecting to s3 compatible storage - MinIO. Were you able to resolve this? Thanks!

0 kudos

03-03-2025 1:52:54 PM

by mkolonay • New Contributor II

03-03-2025 8:16:54 AM

1471 Views
2 replies
0 kudos

Widget management in Jupyter (ipynb) notebooks

Hello!We use widgets to set values for which catalogs and schemas to use in our solutions. These values change as we change environments, feature, dev, uat, prod.In dev, uat and prod, the values are set in workflows so the widget values are not an is...

Data Engineering

1471 Views
2 replies
0 kudos

03-03-2025 8:16:54 AM

View Replies

Latest Reply

mkolonay
New Contributor II

03-03-2025 9:55:07 AM

0 kudos

Thanks for the quick response.Your suggestion is essentially what we are currently doing. We have a config file for each environment that is used to create all the widgets. In the first cell of our notebooks, we call a script to create the widgets us...

0 kudos

03-03-2025 9:55:07 AM

1 More Replies

by hnnhhnnh • New Contributor II

02-28-2025 12:51:12 PM

3043 Views
3 replies
1 kudos

windowing/Join function on streaming data failing in DLT

i have a DLT pipeline, wanted to update a perticular column value from other records, trying it using windowing first_value geting below error.approach#1: joinI tried selfjoin initially, failing with joins are not possible in DLT with below error.org...

Data Engineering

3043 Views
3 replies
1 kudos

02-28-2025 12:51:12 PM

View Replies

Latest Reply

hnnhhnnh
New Contributor II

03-03-2025 6:27:59 AM

1 kudos

hi @jhonm_839 thanks for your response to the post. I appreciate your time.I have watermark on dataframe in palce and used last(), min() etc.. aggregate functions its still filing.final_df = df.withWatermark("kafka_ts", "1 hour").withColumn("s_featur...

1 kudos

03-03-2025 6:27:59 AM

2 More Replies

by the_dude • New Contributor II

02-27-2025 9:11:45 AM

1484 Views
3 replies
0 kudos

How are .whl files executed for Python wheel tasks?

Hello,We package a Poetry managed project into a .whl and run it as a Python wheel task. Naturally, many of the dependencies referenced by the .whl file are already present on the Databricks cluster. Is this detected by the task setup (in its virtual...

Data Engineering

1484 Views
3 replies
0 kudos

02-27-2025 9:11:45 AM

View Replies

Latest Reply

Nik_Vanderhoof
Contributor

02-28-2025 6:59:49 AM

0 kudos

Hi David,I can't speak exactly to how Poetry handles the dependency resolution of libraries that are already installed, or how that interacts with the Databricks runtime. However, I can offer you some advice on how my team handles this situtation.It'...

0 kudos

02-28-2025 6:59:49 AM

2 More Replies

by udi_azulay • New Contributor II

07-31-2024 9:36:58 PM

4603 Views
6 replies
1 kudos

Variant type table within DLT

Hi,I have a table with Variant type (preview) and works well in 15.3, when i try to run a code that reference this Variant type in a DLT pipeline i get : com.databricks.sql.transaction.tahoe.DeltaUnsupportedTableFeatureException: [DELTA_UNSUPPORTED_F...

Data Engineering

4603 Views
6 replies
1 kudos

07-31-2024 9:36:58 PM

View Replies

Latest Reply

MAJVeld
New Contributor II

03-03-2025 4:37:02 AM

1 kudos

I can indeed confirm that adding some additional table properties to the @Dlt attribute in the DLT pipeline definition resolved the earlier issues. Thanks for pointing this out.

1 kudos

03-03-2025 4:37:02 AM

5 More Replies

Databricks Community

Forum Posts

HTML form within notebooks

SQL Identifier clause

Using WorkspaceClient -- run a saved query

Resolved! DLT pipline with generated identity column

Optimizing .collect() Usage in Spark

Run Delta Live Tables as service principal

Data Lineage in Unity Catalog not Populating

Resolved! Lab material for "Apache Spark Developer Learning Plan"

removal of Delta live tables

DLT Pipeline event_log error - invalid pipeline name / The Spark SQL phase analysis failed

Access denied error to s3 bucket in Databricks notebook

Widget management in Jupyter (ipynb) notebooks

windowing/Join function on streaming data failing in DLT

How are .whl files executed for Python wheel tasks?

Variant type table within DLT

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template