cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DivyaKumar
by New Contributor
  • 150 Views
  • 1 replies
  • 0 kudos

Databricks to Dataverse migration via ADF copy data

Hi team,I need to load data from databricks delta tables to dataverse tables and I have one unique id column which I am ensuring via mapping. Its datatype is GUID in dataverse and string in delta table. I ensured that column holds unique values. Sinc...

  • 150 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

That is not a valid guid.Dataverse will check this.http://guid.us/test/guid

  • 0 kudos
Brahmareddy
by Esteemed Contributor
  • 364 Views
  • 1 replies
  • 4 kudos

How Databricks Helped Me See Data Engineering Differently

Over the years working as a data engineer, I’ve started to see my role very differently. In the beginning, most of my focus was on building pipelines—extracting, transforming, and loading data so it could land in the right place. Pipelines were the g...

  • 364 Views
  • 1 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 4 kudos

@Brahmareddy thanks for this! .Think you've nailed it on the head there. If the stakeholders trust the data and there's integrity, governance, and a single source of truth, you've got a recipe for a great product! Love this take @Brahmareddy . Really...

  • 4 kudos
noorbasha534
by Valued Contributor II
  • 465 Views
  • 6 replies
  • 4 kudos

Resolved! Figure out stale tables/folders being loaded by auto-loader

Hello allWe have a pipeline which uses auto-loader to load data from cloud object storage (ADLS) to a delta table. We use directory listing at the moment. And there exist around 20000 folders to be verified in ADLS every 30 mins to check for new data...

  • 465 Views
  • 6 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 4 kudos

@Krishna_S I didn't know about file detection modes, that's very cool! .@noorbasha534 according to the documentation, there is a piece around RockDB: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/#how-does-auto-loader-...

  • 4 kudos
5 More Replies
billfoster
by New Contributor II
  • 25700 Views
  • 10 replies
  • 7 kudos

how can I learn DataBricks

I am currently enrolled in data engineering boot camp. We go over various technologies azure , pyspark , airflow , Hadoop ,nosql,SQL, python. But not over something like databricks. I am in contact with lots of recent graduates who landed a job. Almo...

  • 25700 Views
  • 10 replies
  • 7 kudos
Latest Reply
SprunkiRetake
New Contributor II
  • 7 kudos

yes, i often refer to the helpful tutorials at https://www.youtube.com/c/AdvancingAnalytics?reload=9&app=desktop Sprunki Retake

  • 7 kudos
9 More Replies
devagya
by New Contributor
  • 1090 Views
  • 3 replies
  • 1 kudos

Infor Data Lake to Databricks

I'm working on this project which involves moving data from Infor to Databricks.Infor is somewhat of an enterprise solution. I could not find much resources on this. I could not even find any free trial option on their site.If anyone has experience w...

  • 1090 Views
  • 3 replies
  • 1 kudos
Latest Reply
Shirlzz
New Contributor II
  • 1 kudos

I specialise in data migration with Infor.What is your question, how to connect databricks to the infor datalake through the data fabric?

  • 1 kudos
2 More Replies
leireroman
by New Contributor III
  • 2917 Views
  • 2 replies
  • 2 kudos

Resolved! DBR 16.4 LTS - Spark 3.5.2 is not compatible with Delta Lake 3.3.1

I'm migrating to Databricks Runtime 16.4 LTS, which is using Spark 3.5.2 and Delta Lake 3.3.1 according to the documentation: Databricks Runtime 16.4 LTS - Azure Databricks | Microsoft LearnI've upgraded my conda environment to use those versions, bu...

Captura de pantalla 2025-06-09 084355.png
  • 2917 Views
  • 2 replies
  • 2 kudos
Latest Reply
SamAdams
Contributor
  • 2 kudos

@leireroman encountered the same and used an override (like a pip constraints.txt file or PDM resolution override specification) to make sure my local development environment matched the runtime.

  • 2 kudos
1 More Replies
adrianhernandez
by New Contributor III
  • 228 Views
  • 2 replies
  • 1 kudos

Resolved! Folder execute permissions

Hello,After reading multiple posts, going thru online forums, even asking AI I still don't have an answer for my questions. On the latest Databricks with unity catalog, what happens if I give users Execute permissions on a folder.Can they view the co...

  • 228 Views
  • 2 replies
  • 1 kudos
Latest Reply
adrianhernandez
New Contributor III
  • 1 kudos

Thanks for your response. That's what I imagined although could not confirm as my current project uses Unity Catalog and we are not allowed to run many commands including ACL related PySpark code.

  • 1 kudos
1 More Replies
bbastian
by New Contributor
  • 130 Views
  • 1 replies
  • 0 kudos

[VARIANT_SIZE_LIMIT] Cannot build variant bigger than 16.0 MiB in parse_json

I have a table coming from postgreSql, with one column containing json data in string format. We have been using parse_json to convert that to a vraiant column. But lately it is failing with the SIZE_LIMIT error. When I isolated the row which gave er...

  • 130 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @bbastian ,Unfortunately, as of now there is strict limitation regarding size - a variant column cannot contain a value larger than 16 MiB.Variant support in Delta Lake | Databricks on AWSAnd tbh you cannot compare the size of this json string to ...

  • 0 kudos
sslyle
by New Contributor III
  • 7597 Views
  • 9 replies
  • 5 kudos

Resolved! Combining multiple Academy profiles

I have this profile @gmail.com; my personal professional profile.I also have a @mycompany.com profile.How do I combine both so I can leave my current job for a better life without losing the accolades I'm accumulated under my @mycompany.com login giv...

  • 7597 Views
  • 9 replies
  • 5 kudos
Latest Reply
jChantoHdz
New Contributor II
  • 5 kudos

I have the same question, how this can be done?

  • 5 kudos
8 More Replies
gzr58l
by New Contributor
  • 143 Views
  • 1 replies
  • 0 kudos

How to setup lakeflow HTTP for connector with M2M Authentication

I am getting the following error about content-type with no option to pick a different content-type when configuring the lakeflow connectorThe OAuth token exchange failed with HTTP status code 415 Unsupported Media Type. The returned server response ...

  • 143 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @gzr58l are you configuring a custom Lakeflow connector or external connection in Databricks? Also, consider using a service principal or personal access token (PAT) for authentication as a temporary workaround.

  • 0 kudos
JeffSeaman
by New Contributor III
  • 608 Views
  • 8 replies
  • 1 kudos

Resolved! JDCB Error trying a get schemas call.

Hi Community,I have a free demo version and can create a jdbc connection and get metadata (schema, table, and columns structure info). Everything works as described in the docs, but when working with someone who has a paid version of databricks the s...

  • 608 Views
  • 8 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

@JeffSeaman , please let us know if any of my suggestions help get you on the right track. If they do, kindly mark the post as "Accepted Solution" so others can benefit as well. Cheers, Louis.

  • 1 kudos
7 More Replies
jakesippy
by New Contributor III
  • 605 Views
  • 7 replies
  • 14 kudos

Resolved! How to get pipeline update duration programmatically

I'm looking to track how much time is being spent running updates for my DLT pipelines.When querying the list pipeline updates REST API endpoint I can see start and end times being returned, however, these fields are not listed in the documentation. ...

  • 605 Views
  • 7 replies
  • 14 kudos
Latest Reply
jakesippy
New Contributor III
  • 14 kudos

Originally went with the approach of exporting to and reading from the event log table, which has been helpful for getting other metrics as well.Also found today that there is a new system table is in public preview which exposes the durations I was ...

  • 14 kudos
6 More Replies
VIRALKUMAR
by Contributor II
  • 9275 Views
  • 5 replies
  • 0 kudos

How to Determine the Cost for Each Query Run Against SQL Warehouse Serverless?

Hello Everyone.First of all, I would like to thank you to databricks to enable system tables for customers. It does help a lot. I am working on cost optimization topic. Particularly sql warehouse serverless. I am not sure all of you have tried system...

  • 9275 Views
  • 5 replies
  • 0 kudos
Latest Reply
skumarraj
New Contributor II
  • 0 kudos

Can you Share the query that you used ?

  • 0 kudos
4 More Replies
aranjan99
by Contributor
  • 422 Views
  • 4 replies
  • 2 kudos

Databricks Pipeline SDK misisng fields

Looking at the Databricks Java SDK for Pipeline events, I see that the Rest API returns a details field that has the same information as event log details. But this is not surfaced in SDK, should be a small change to add it. Is that something which c...

  • 422 Views
  • 4 replies
  • 2 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 2 kudos

The start and end time fields in the Pipeline Updates API are currently present in the Databricks REST API but are not yet supported (i.e., not included or mapped) in the Databricks Java SDK as of September 2025.This means:You can see these fields (s...

  • 2 kudos
3 More Replies
manish24101981
by New Contributor
  • 1692 Views
  • 1 replies
  • 1 kudos

Resolved! DLT or DataBricks for CDC and NRT

We are currently delivering a large-scale healthcare data migration project involving:One-time historical migration of approx. 80 TB of data, already completed and loaded into Delta Lake.CDC merge logic is already developed and validated using Apache...

  • 1692 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

For cost-sensitive, large-scale healthcare data streaming scenarios, using Delta Live Tables (DLT) for both CDC and streaming (Option C) is generally the most scalable, manageable, and cost-optimized approach. DLT offers native support for structured...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels