Data Engineering

Forum Posts

Sorted by:

by -werners- • Esteemed Contributor III

09-06-2023 3:21:43 AM

4195 Views
3 replies
0 kudos

git integration with volumes?

Volumes in unity are said to be ideal for "storing library and config files of arbitrary formats such as .whl or .txt centrally and providing secure access across workspaces to it".So basically this is the same we had with dbfs but with decent access...

Data Engineering

4195 Views
3 replies
0 kudos

09-06-2023 3:21:43 AM

View Replies

Latest Reply

Riverara
New Contributor II

08-03-2025 8:11:39 PM

0 kudos

I totally get where you’re coming from! Managing config files with access control is crucial, but the lack of Git integration in Unity volumes can be a bit frustrating. In my experience, finding a smooth workflow can take time. I’ve found using tools...

0 kudos

08-03-2025 8:11:39 PM

2 More Replies

by liu • Databricks Partner

07-31-2025 12:35:40 AM

1477 Views
4 replies
0 kudos

When formatting dates using the yyyyMMddHHmmssSSS pattern, an error occurred

An error occurred while converting a timestamp in the yyyyMMddHHmmssSSS formatfrom pyspark.sql.functions import to_timestamp_ntz, col, lit df = spark.createDataFrame( [("20250730090833000")], ["datetime"]) df2 = df.withColumn("dateformat", to_t...

Data Engineering

1477 Views
4 replies
0 kudos

07-31-2025 12:35:40 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-31-2025 2:17:45 AM

0 kudos

Hi @liu ,I think it could be related to following bug in Java. I suspect that internally to_timestamp_ntz uses DateTimeFormatter.[JDK-8031085] DateTimeFormatter won't parse dates with custom format "yyyyMMddHHmmssSSS" - Java Bug SystemNow what's inte...

0 kudos

07-31-2025 2:17:45 AM

3 More Replies

by Carl_B • New Contributor II

08-01-2025 11:22:45 AM

2045 Views
1 replies
1 kudos

Resolved! HuggingFace bert-large-uncased gives NameError

Hello,I am trying to run the LLM bert-large-uncased from HuggingFace.I have downloaded the transformers from github. I have installed the various packages. I am now getting an error message: NameError: name 'torch' is not defined.Not sure what the pr...

Data Engineering

2045 Views
1 replies
1 kudos

08-01-2025 11:22:45 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

08-02-2025 2:17:48 PM

1 kudos

Hi @Carl_B The error indicates that PyTorch is not installed in your environment. Just try below this.Install PyTorchOption 1: Using pippip install torch torchvision torchaudioOption 2: Using conda (if you're using Anaconda/Miniconda)conda install py...

1 kudos

08-02-2025 2:17:48 PM

by Raj_DB • Contributor

07-29-2025 5:39:28 AM

7201 Views
5 replies
1 kudos

Performance Issue – Writing Large Dataset to ADLS from Oracle via JDBC

Hi there, I am currently working on a notebook where I pull data from an Oracle database using an Oracle SQL script with a JDBC connection. Due to the large dataset size and joins in my query, I’ve implemented the /*+ parallel(n) */ hint, which works...

Data Engineering

7201 Views
5 replies
1 kudos

07-29-2025 5:39:28 AM

View Replies

Latest Reply

chanukya-pekala
Contributor III

07-30-2025 6:29:31 AM

1 kudos

I can suggest few tweaks in the compute, the current D series is good enough, but we are handling huge data, please try bumping up minimum workers from 1 to at least 4; change the VM type - to a bigger one - Standard_E64ds_v5, and if not try to use a...

1 kudos

07-30-2025 6:29:31 AM

4 More Replies

by nick-monda • New Contributor III

07-28-2025 1:18:16 PM

1898 Views
7 replies
0 kudos

Cannot Remove Table from Delta Share

Hello! We have some production data pipelines using delta shares that have previously worked but broken today and I can't find a new solution. I create a delta share: CREATE SHARE IF NOT EXISTS `local_new_2022` Then, like normal, I add a table to th...

Data Engineering

1898 Views
7 replies
0 kudos

07-28-2025 1:18:16 PM

View Replies

Latest Reply

nick-monda
New Contributor III

08-01-2025 11:02:14 AM

0 kudos

We are using the databricks_sql python sdk: ```pythonfrom databricks import sql as databricks_sqlconnection = databricks_sql.connect( server_hostname=server_hostname, http_path=http_path, access_token=access_token)with connection.cursor() as...

0 kudos

08-01-2025 11:02:14 AM

6 More Replies

by Dharshan15 • New Contributor II

07-30-2025 11:55:31 PM

867 Views
2 replies
3 kudos

Unity Catalog not detected until I manually restarted the cluster

I just spent a lot of time setting up Unity Catalog with external locations, access connectors, and all the right permissions. My cluster had Dedicated access mode, Unity Catalog has been assigned properly, and everything was configured correctly. Bu...

Data Engineering

867 Views
2 replies
3 kudos

07-30-2025 11:55:31 PM

View Replies

Latest Reply

Advika
Community Manager

08-01-2025 8:10:56 AM

3 kudos

Appreciate you posting this, @Dharshan15! It’s something that can easily be overlooked. After enabling Unity Catalog, the cluster needs to be restarted for the changes to take effect. You can also share this feedback directly with the Databricks team...

3 kudos

08-01-2025 8:10:56 AM

1 More Replies

by prasannag • New Contributor

08-01-2025 1:48:44 AM

536 Views
1 replies
0 kudos

How to get a programmatically get he non dlt pipeline logs.

We are triggering the jobs which notebooks. Notebook is consist of python and sql queries. We need to read the error messages when it is triggered using an job thanksprasanna

Data Engineering

536 Views
1 replies
0 kudos

08-01-2025 1:48:44 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

08-01-2025 6:29:50 AM

0 kudos

Use some logging to send the error logs to a delta table or file. Can you please share the code. I wanted to see what you are trying to do.

0 kudos

08-01-2025 6:29:50 AM

by AlbertWang • Valued Contributor

07-31-2025 9:29:12 PM

3003 Views
5 replies
3 kudos

Resolved! Problems and questions with deploying Lakeflow Declarative Pipeline using Databricks Bundles

Hi all,I met some problems and have some questions about deploying Lakeflow Declarative Pipeline using Databricks Bundles. Could anyone kindly help?Below is my current bundle resource file for the pipeline: resources: pipelines: dbr_d365_crm_p...

Data Engineering

3003 Views
5 replies
3 kudos

07-31-2025 9:29:12 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-01-2025 1:33:50 AM

3 kudos

Hi @AlbertWang ,I think some of those issues could be related to your databricks assets bundle version. For example the glob thing is in Beta. It could be available in UI, but not in your version of databricks cli.The same applies for root_path: As o...

3 kudos

08-01-2025 1:33:50 AM

4 More Replies

by junaid-databrix • New Contributor III

07-31-2025 10:02:48 AM

1135 Views
1 replies
3 kudos

Data Engineering Associate Exam Course Content Not Aligned With Exam Guide

Recently Databricks has changed the exam guide content for Data Engineering Associate Certification exam. The sections under Exam Outline list Data Asset Bundles too which is neither part of the e-learning course material and nor the ongoing instruct...

Data Engineering

1135 Views
1 replies
3 kudos

07-31-2025 10:02:48 AM

View Replies

Latest Reply

Advika
Community Manager

08-01-2025 3:09:19 AM

3 kudos

Thanks for bringing this to light, @junaid-databrix!Could you please raise a ticket with the Databricks Support team? That will help ensure this reaches the appropriate team for review. While creating the ticket, kindly mention which certification ex...

3 kudos

08-01-2025 3:09:19 AM

by drag7ter • Contributor

07-31-2025 11:34:03 AM

1879 Views
2 replies
0 kudos

Delta sharing json predicate doesn't work

I'm trying to push predicates via python delta_sharing pkg: https://github.com/delta-io/delta-sharing/tree/main/python/delta_sharingand delta sharing protocol: https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#json-predicates-for-filter...

Data Engineering

1879 Views
2 replies
0 kudos

07-31-2025 11:34:03 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-31-2025 12:28:06 PM

0 kudos

Hi @drag7ter ,PredicateHints are just hints, not enforced filters. They do not guarantee that the returned data will be filtered.Check below thread for detailed discussion. Also check if your delta server has evaluate predicate hints flag set to true...

0 kudos

07-31-2025 12:28:06 PM

1 More Replies

by carlos_tasayco • Contributor

07-30-2025 8:37:43 AM

1509 Views
3 replies
0 kudos

Resolved! flattening json in dlt pipeline

Hi,I have in my bronze schema json files, I am flattening them in a dataframe after that I am creating materialized views in a dlt pipeline, however, in production is taking a lot of time (over 3 hours) is not even a lot of data the biggest materiali...

Data Engineering

1509 Views
3 replies
0 kudos

07-30-2025 8:37:43 AM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

07-31-2025 1:20:04 PM

0 kudos

Hello @carlos_tasayco As I mentioned whether you used any join, just wanted to ask, were this cross join were joining two large tables?

0 kudos

07-31-2025 1:20:04 PM

2 More Replies

by Pratikmsbsvm • Contributor

07-30-2025 9:38:45 PM

1510 Views
3 replies
0 kudos

Databricks Workflow Orchestration for Pipeline

Hello,I am using Databricks first time. May someone please help me how to do orchestration for the pipeline shown below.Kindly share the steps how to implement Orchestration , what all steps we have to consider.Thanks a lot

Data Engineering

1510 Views
3 replies
0 kudos

07-30-2025 9:38:45 PM

View Replies

Latest Reply

junaid-databrix
New Contributor III

07-31-2025 10:49:00 AM

0 kudos

The diagram you have shared is bit confusing: From Azure there is a data pull to Bronze layer, and from the same data source data is being pulled into Silver layer. However, following the Medallion architecture typically the raw data is ingested into...

0 kudos

07-31-2025 10:49:00 AM

2 More Replies

by ashokpola1 • New Contributor III

07-05-2025 7:13:11 PM

4528 Views
7 replies
5 kudos

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

I’m planning to take the Databricks Data Engineer Associate certification exam, and I wanted to ask if there are any official discounts, coupons, or student offers available to help reduce the exam fee.I’m a student right now, so any discount or prom...

Data Engineering

4528 Views
7 replies
5 kudos

07-05-2025 7:13:11 PM

View Replies

Latest Reply

ashokpola1
New Contributor III

07-31-2025 9:52:32 AM

5 kudos

thank you

5 kudos

07-31-2025 9:52:32 AM

6 More Replies

by my_super_name • New Contributor III

04-15-2024 5:37:51 AM

3715 Views
3 replies
4 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

Data Engineering

3715 Views
3 replies
4 kudos

04-15-2024 5:37:51 AM

View Replies

Latest Reply

Mathias_Peters
Contributor II

01-06-2025 1:38:34 AM

4 kudos

Hi, we are having similar issues with schema hints formulated in fully qualified DDL, e.g. "a STRUCT<b INT>" etc. Did you find a solution? Also, did you specify the schema hint using the dot-notation, e.g. "a.b INT" before ingesting any data or after...

4 kudos

01-06-2025 1:38:34 AM

2 More Replies

by pooja_bhumandla • Databricks Partner

07-25-2025 5:04:52 AM

938 Views
1 replies
0 kudos

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Hi Databricks Community,I’m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.I wanted to share my findings to better understand:Why ZORDER or Deletion Vectors help...

Data Engineering

938 Views
1 replies
0 kudos

07-25-2025 5:04:52 AM

View Replies

Latest Reply

radothede
Valued Contributor II

07-31-2025 7:32:24 AM

0 kudos

Hi @pooja_bhumandla Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario Regarding Your questions:1)Overall operation could become more complex due to:a) deletion vector creation and maintenance,b...

0 kudos

07-31-2025 7:32:24 AM

Databricks Community

Forum Posts

git integration with volumes?

When formatting dates using the yyyyMMddHHmmssSSS pattern, an error occurred

Resolved! HuggingFace bert-large-uncased gives NameError

Performance Issue – Writing Large Dataset to ADLS from Oracle via JDBC

Cannot Remove Table from Delta Share

Unity Catalog not detected until I manually restarted the cluster

How to get a programmatically get he non dlt pipeline logs.

Resolved! Problems and questions with deploying Lakeflow Declarative Pipeline using Databricks Bundles

Data Engineering Associate Exam Course Content Not Aligned With Exam Guide

Delta sharing json predicate doesn't work

Resolved! flattening json in dlt pipeline

Databricks Workflow Orchestration for Pipeline

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template