cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

-werners-
by Esteemed Contributor III
  • 4195 Views
  • 3 replies
  • 0 kudos

git integration with volumes?

Volumes in unity are said to be ideal for "storing library and config files of arbitrary formats such as .whl or .txt centrally and providing secure access across workspaces to it".So basically this is the same we had with dbfs but with decent access...

  • 4195 Views
  • 3 replies
  • 0 kudos
Latest Reply
Riverara
New Contributor II
  • 0 kudos

I totally get where you’re coming from! Managing config files with access control is crucial, but the lack of Git integration in Unity volumes can be a bit frustrating. In my experience, finding a smooth workflow can take time. I’ve found using tools...

  • 0 kudos
2 More Replies
liu
by Databricks Partner
  • 1477 Views
  • 4 replies
  • 0 kudos

When formatting dates using the yyyyMMddHHmmssSSS pattern, an error occurred

An error occurred while converting a timestamp in the yyyyMMddHHmmssSSS formatfrom pyspark.sql.functions import to_timestamp_ntz, col, lit df = spark.createDataFrame( [("20250730090833000")], ["datetime"]) df2 = df.withColumn("dateformat", to_t...

  • 1477 Views
  • 4 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @liu ,I think it could be related to following bug in Java. I suspect that internally to_timestamp_ntz uses DateTimeFormatter.[JDK-8031085] DateTimeFormatter won't parse dates with custom format "yyyyMMddHHmmssSSS" - Java Bug SystemNow what's inte...

  • 0 kudos
3 More Replies
Carl_B
by New Contributor II
  • 2046 Views
  • 1 replies
  • 1 kudos

Resolved! HuggingFace bert-large-uncased gives NameError

Hello,I am trying to run the LLM bert-large-uncased from HuggingFace.I have downloaded the transformers from github. I have installed the various packages. I am now getting an error message: NameError: name 'torch' is not defined.Not sure what the pr...

  • 2046 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @Carl_B The error indicates that PyTorch is not installed in your environment. Just try below this.Install PyTorchOption 1: Using pippip install torch torchvision torchaudioOption 2: Using conda (if you're using Anaconda/Miniconda)conda install py...

  • 1 kudos
Raj_DB
by Contributor
  • 7201 Views
  • 5 replies
  • 1 kudos

Performance Issue – Writing Large Dataset to ADLS from Oracle via JDBC

Hi there, I am currently working on a notebook where I pull data from an Oracle database using an Oracle SQL script with a JDBC connection. Due to the large dataset size and joins in my query, I’ve implemented the /*+ parallel(n) */ hint, which works...

  • 7201 Views
  • 5 replies
  • 1 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 1 kudos

I can suggest few tweaks in the compute, the current D series is good enough, but we are handling huge data, please try bumping up minimum workers from 1 to at least 4; change the VM type - to a bigger one - Standard_E64ds_v5, and if not try to use a...

  • 1 kudos
4 More Replies
nick-monda
by New Contributor III
  • 1898 Views
  • 7 replies
  • 0 kudos

Cannot Remove Table from Delta Share

Hello! We have some production data pipelines using delta shares that have previously worked but broken today and I can't find a new solution.  I create a delta share: CREATE SHARE IF NOT EXISTS `local_new_2022` Then, like normal, I add a table to th...

  • 1898 Views
  • 7 replies
  • 0 kudos
Latest Reply
nick-monda
New Contributor III
  • 0 kudos

We are using the databricks_sql python sdk: ```pythonfrom databricks import sql as databricks_sqlconnection = databricks_sql.connect(    server_hostname=server_hostname,    http_path=http_path,    access_token=access_token)with connection.cursor() as...

  • 0 kudos
6 More Replies
Dharshan15
by New Contributor II
  • 867 Views
  • 2 replies
  • 3 kudos

Unity Catalog not detected until I manually restarted the cluster

I just spent a lot of time setting up Unity Catalog with external locations, access connectors, and all the right permissions. My cluster had Dedicated access mode, Unity Catalog has been assigned properly, and everything was configured correctly. Bu...

  • 867 Views
  • 2 replies
  • 3 kudos
Latest Reply
Advika
Community Manager
  • 3 kudos

Appreciate you posting this, @Dharshan15! It’s something that can easily be overlooked. After enabling Unity Catalog, the cluster needs to be restarted for the changes to take effect. You can also share this feedback directly with the Databricks team...

  • 3 kudos
1 More Replies
prasannag
by New Contributor
  • 536 Views
  • 1 replies
  • 0 kudos

How to get a programmatically get he non dlt pipeline logs.

We are triggering the jobs which notebooks. Notebook is consist of python and sql queries. We need to read the error messages when it is triggered using an job thanksprasanna

  • 536 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

Use some logging to send the error logs to a delta table or file. Can you please share the code. I wanted to see what you are trying to do.

  • 0 kudos
AlbertWang
by Valued Contributor
  • 3003 Views
  • 5 replies
  • 3 kudos

Resolved! Problems and questions with deploying Lakeflow Declarative Pipeline using Databricks Bundles

 Hi all,I met some problems and have some questions about deploying Lakeflow Declarative Pipeline using Databricks Bundles. Could anyone kindly help?Below is my current bundle resource file for the pipeline: resources: pipelines: dbr_d365_crm_p...

AlbertWang_0-1754014007933.png AlbertWang_1-1754014301014.png
  • 3003 Views
  • 5 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @AlbertWang ,I think some of those issues could be related to your databricks assets bundle version. For example the glob thing is in Beta. It could be available in UI, but not in your version of databricks cli.The same applies for root_path: As o...

  • 3 kudos
4 More Replies
junaid-databrix
by New Contributor III
  • 1135 Views
  • 1 replies
  • 3 kudos

Data Engineering Associate Exam Course Content Not Aligned With Exam Guide

Recently Databricks has changed the exam guide content for Data Engineering Associate Certification exam. The sections under Exam Outline list Data Asset Bundles too which is neither part of the e-learning course material and nor the ongoing instruct...

  • 1135 Views
  • 1 replies
  • 3 kudos
Latest Reply
Advika
Community Manager
  • 3 kudos

Thanks for bringing this to light, @junaid-databrix!Could you please raise a ticket with the Databricks Support team? That will help ensure this reaches the appropriate team for review. While creating the ticket, kindly mention which certification ex...

  • 3 kudos
drag7ter
by Contributor
  • 1879 Views
  • 2 replies
  • 0 kudos

Delta sharing json predicate doesn't work

I'm trying to push predicates via python delta_sharing pkg: https://github.com/delta-io/delta-sharing/tree/main/python/delta_sharingand delta sharing protocol: https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#json-predicates-for-filter...

  • 1879 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @drag7ter ,PredicateHints are just hints, not enforced filters. They do not guarantee that the returned data will be filtered.Check below thread for detailed discussion. Also check if your delta server has evaluate predicate hints flag set to true...

  • 0 kudos
1 More Replies
carlos_tasayco
by Contributor
  • 1509 Views
  • 3 replies
  • 0 kudos

Resolved! flattening json in dlt pipeline

Hi,I have in my bronze schema json files, I am flattening them in a dataframe after that I am creating materialized views in a dlt pipeline, however, in production is taking a lot of time (over 3 hours) is not even a lot of data the biggest materiali...

  • 1509 Views
  • 3 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello @carlos_tasayco  As I mentioned whether you used any join, just wanted to ask, were this cross join were joining two large tables?

  • 0 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 1510 Views
  • 3 replies
  • 0 kudos

Databricks Workflow Orchestration for Pipeline

Hello,I am using Databricks first time. May someone please help me how to do orchestration for the pipeline shown below.Kindly share the steps how to implement Orchestration , what all steps we have to consider.Thanks a lot   

Pratikmsbsvm_0-1753936635561.png
  • 1510 Views
  • 3 replies
  • 0 kudos
Latest Reply
junaid-databrix
New Contributor III
  • 0 kudos

The diagram you have shared is bit confusing: From Azure there is a data pull to Bronze layer, and from the same data source data is being pulled into Silver layer. However, following the Medallion architecture typically the raw data is ingested into...

  • 0 kudos
2 More Replies
ashokpola1
by New Contributor III
  • 4528 Views
  • 7 replies
  • 5 kudos

Resolved! Are there any student discounts or coupons for the Databricks Data Engineer Associate certification

I’m planning to take the Databricks Data Engineer Associate certification exam, and I wanted to ask if there are any official discounts, coupons, or student offers available to help reduce the exam fee.I’m a student right now, so any discount or prom...

  • 4528 Views
  • 7 replies
  • 5 kudos
Latest Reply
ashokpola1
New Contributor III
  • 5 kudos

thank you

  • 5 kudos
6 More Replies
my_super_name
by New Contributor III
  • 3715 Views
  • 3 replies
  • 4 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

  • 3715 Views
  • 3 replies
  • 4 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 4 kudos

Hi, we are having similar issues with schema hints formulated in fully qualified DDL, e.g. "a STRUCT<b INT>" etc. Did you find a solution? Also, did you specify the schema hint using the dot-notation, e.g. "a.b INT" before ingesting any data or after...

  • 4 kudos
2 More Replies
pooja_bhumandla
by Databricks Partner
  • 938 Views
  • 1 replies
  • 0 kudos

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Hi Databricks Community,I’m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.I wanted to share my findings to better understand:Why ZORDER or Deletion Vectors help...

  • 938 Views
  • 1 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @pooja_bhumandla Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario Regarding Your questions:1)Overall operation could become more complex due to:a) deletion vector creation and maintenance,b...

  • 0 kudos
Labels