cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dhruv-22
by Contributor II
  • 92 Views
  • 3 replies
  • 0 kudos

Merge with schema evolution fails because of upper case columns

The following is a minimal reproducible example of what I'm facing right now.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.test_table ( id INT ); INSERT INTO edw_nprd_aen.bronze.test_table VALUES (1); SELECT * FROM edw_nprd_aen.bronze.test_tab...

Dhruv22_0-1768233514715.png Dhruv22_1-1768233551139.png Dhruv22_0-1768234077162.png
  • 92 Views
  • 3 replies
  • 0 kudos
Latest Reply
css-1029
New Contributor
  • 0 kudos

Hi @Dhruv-22,It's actually not a bug. Let me explain what's happening.The Root CauseThe issue stems from how schema evolution works with Delta Lake's MERGE statement, combined with Spark SQL's case-insensitivity settings.Here's the key insight: spark...

  • 0 kudos
2 More Replies
bsr
by New Contributor II
  • 908 Views
  • 4 replies
  • 4 kudos

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

After upgrading from DBR 17.3.2 to DBR 17.3.3, we started seeing a flood of DEBUG logs like this in job outputs:```DEBUG:ThreadMonitor:Logging python thread stack frames for MainThread and py4j threads: DEBUG:ThreadMonitor:Logging Thread-8 (run) stac...

  • 908 Views
  • 4 replies
  • 4 kudos
Latest Reply
WAHID
New Contributor II
  • 4 kudos

@iyashk-DBWe are currently using DBR version 17.3 LTS, and the issue is still occurring.Do you know when the fix is expected to be applied? We need this information to decide whether we should wait for the fix or proceed with the workaround you propo...

  • 4 kudos
3 More Replies
rijin-thomas
by New Contributor II
  • 249 Views
  • 4 replies
  • 3 kudos

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can ...

  • 249 Views
  • 4 replies
  • 3 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 3 kudos

Hi @rijin-thomas - Can you please allow the CIDR block for databricks account VPC from aws document db sg ( Executor connectivity stated by@bianca_unifeye ) . 

  • 3 kudos
3 More Replies
SaugatMukherjee
by New Contributor III
  • 208 Views
  • 2 replies
  • 1 kudos

Structured streaming for iceberg tables

According to this https://iceberg.apache.org/docs/latest/spark-structured-streaming/ , we can stream from iceberg tables. I have ensured that my source table is Iceberg version 3, but no matter what I do, I get Iceberg does not streaming reads. Looki...

  • 208 Views
  • 2 replies
  • 1 kudos
Latest Reply
SaugatMukherjee
New Contributor III
  • 1 kudos

Hi,Iceberg streaming is possible in Databricks. One does not need to change to Delta Lake. In my previous attempt, I used "load" while reading the source iceberg table. One should instead use "table". Load apparently seems to take a path and not a ta...

  • 1 kudos
1 More Replies
AcrobaticMonkey
by New Contributor III
  • 122 Views
  • 2 replies
  • 2 kudos

Salesforce Connector SCD2 - Get new record with isDeleted = true on deletion

Hi all,I'm using the Databricks Salesforce connector to ingest tables with history tracking enabled (SCD Type 2).When records are deleted in Salesforce, the connector closes the existing record by setting the end date. The isDeleted flag remains fals...

  • 122 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Greetings @AcrobaticMonkey , I put on my researcher hat and dug into our internal docs. Here is what I found:  Short answer: this isn’t configurable today. The connector’s SCD Type 2 behavior “closes” a record by setting __END_AT and does not emit a ...

  • 2 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 140 Views
  • 2 replies
  • 0 kudos

Behavior of Zstd Compression for Delta Tables Across Different Databricks Runtime Versions

Hi all,For ZSTD compression, as per the documentation, any table created with DBR 16.0 or newer (or Apache Spark 3.5+) uses Zstd as the default compression codec instead of Snappy.I explicitly set the table property to Zstd:spark.sql("""ALTER TABLE m...

  • 140 Views
  • 2 replies
  • 0 kudos
Latest Reply
JAHNAVI
Databricks Employee
  • 0 kudos

@pooja_bhumandla  New files written by DBR 15.4 (or any pre‑16.0 runtime) will still use Zstd as long as the table property delta.compression.codec = 'zstd' remains set on the table.When we explicitly run: ALTER TABLE my_tableSET TBLPROPERTIES ('delt...

  • 0 kudos
1 More Replies
tonkol
by New Contributor II
  • 98 Views
  • 1 replies
  • 0 kudos

Migrate on-premise delta tables to Databricks (Azure)

Hi There,I have the situation that we've decided to migrate our on-premise delta-lake to Azure Databricks.Because of networking I can only "push" the data from on-prem to cloud.What would be the best way to replicate all tables: schema+partitioning i...

  • 98 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
New Contributor II
  • 0 kudos

The correct solution is not SQL based.Delta tables are defined by the contents of the delta log directory, not by CREATE TABLE statements. That is why SHOW CREATE TABLE cannot reconstruct partitions, properties or constraints.The only reliable migrat...

  • 0 kudos
Sainath368
by Contributor
  • 344 Views
  • 4 replies
  • 2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 344 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 2 kudos
3 More Replies
ganesh_raskar
by New Contributor II
  • 212 Views
  • 5 replies
  • 0 kudos

Installing Custom Packages on Serverless Compute via Databricks Connect

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.Works: Traditional Compute ClusterCustom package pre-i...

Data Engineering
data-engineering
databricks-connect
  • 212 Views
  • 5 replies
  • 0 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 0 kudos

Hi @ganesh_raskar - If you can provide which custom package and exact code and error, I can try to replicate at my end and explore the suitable option. 

  • 0 kudos
4 More Replies
hnnhhnnh
by New Contributor II
  • 174 Views
  • 1 replies
  • 0 kudos

Title: How to handle type widening (int→bigint) in DLT streaming tables without dropping the table

SetupBronze source table (external to DLT, CDF & type widening enabled):# Source table properties:# delta.enableChangeDataFeed: "true"# delta.enableDeletionVectors: "true"# delta.enableTypeWidening: "true"# delta.minReaderVersion: "3"# delta.minWrite...

  • 174 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
New Contributor II
  • 0 kudos

Hi @hnnhhnnh DLT streaming tables that use apply changes do not support widening the data type of key columns such as changing an integer to a bigint after the table is created. Even though Delta and Unity Catalog support type widening in general, DL...

  • 0 kudos
ismaelhenzel
by Contributor III
  • 253 Views
  • 1 replies
  • 1 kudos

Resolved! Declarative Pipelines - Dynamic Overwrite

Regarding the limitations of declarative pipelines—specifically the inability to use replaceWhere—I discovered through testing that materialized views actually support dynamic overwrites. This handles several scenarios where replaceWhere would typica...

  • 253 Views
  • 1 replies
  • 1 kudos
Latest Reply
omsingh
New Contributor III
  • 1 kudos

This is a really interesting find, and honestly not something most people expect from materialized views.Under the hood, MVs in Databricks declarative pipelines are still Delta tables. So when you set partitionOverwriteMode=dynamic and partition by a...

  • 1 kudos
Joost1024
by New Contributor III
  • 963 Views
  • 6 replies
  • 4 kudos

Resolved! Read Array of Arrays of Objects JSON file using Spark

Hi Databricks Community! This is my first post in this forum, so I hope you can forgive me if it's not according to the forum best practices After lots of searching, I decided to share the peculiar issue I'm running into in this community.I try to lo...

  • 963 Views
  • 6 replies
  • 4 kudos
Latest Reply
Joost1024
New Contributor III
  • 4 kudos

I guess I was a bit over enthusiastic by accepting the answer.When I run the following on the single object array of arrays (as shown in the original post) I get a single row with column "value" and value null. from pyspark.sql import functions as F,...

  • 4 kudos
5 More Replies
Shimon
by New Contributor II
  • 388 Views
  • 2 replies
  • 0 kudos

Jackson version conflict

Hi,I am trying to implement the Spark TableProvider api and i am experiencing a jar conflict (I am using the 17.3 runtime). com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.15.2 requires Jackson Databind version >= 2.15.0 and < 2.1...

  • 388 Views
  • 2 replies
  • 0 kudos
Latest Reply
Shimon
New Contributor II
  • 0 kudos

For now we are trying to contact Databricks, In worst case scenario we were planning to shade the dependencies we need.would love to hear what has worked for you.Best,Shimon

  • 0 kudos
1 More Replies
Labels