cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianRRL
by Honored Contributor
  • 41 Views
  • 2 replies
  • 2 kudos

Resolved! Serverless Compute Spark Version Flexibility?

Hi there, I'm wondering what determines the Serverless Compute spark version? Is it based on the current DBR LTS? And is there a way to modify the spark version for serverless compute?For example, when I check the spark version for our serverless com...

ChristianRRL_0-1768409059721.png ChristianRRL_1-1768409577998.png
  • 41 Views
  • 2 replies
  • 2 kudos
Latest Reply
Databricks77
  • 2 kudos

Serverless compute always run on the latest runtime version. You cannot choose it like in standard compute.

  • 2 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 89 Views
  • 4 replies
  • 7 kudos

Resolved! Testing Spark Declarative Pipeline in Docker Container > PySparkRuntimeError

Hi there, I see via an announcement last year that Spark Declarative Pipeline (previously DLT) was getting open sourced into Apache Spark, and I see that this recently is true as of Apache 4.1:Spark Declarative Pipelines Programming Guide I'm trying ...

ChristianRRL_0-1768361209159.png
  • 89 Views
  • 4 replies
  • 7 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 7 kudos

Hi @ChristianRRL ,In addition to @osingh 's answers, check out this old but good blog post about how to structure the pipelines's code to enable dev and test cycle: https://www.databricks.com/blog/applying-software-development-devops-best-practices-d...

  • 7 kudos
3 More Replies
Dhruv-22
by Contributor II
  • 96 Views
  • 3 replies
  • 0 kudos

Merge with schema evolution fails because of upper case columns

The following is a minimal reproducible example of what I'm facing right now.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.test_table ( id INT ); INSERT INTO edw_nprd_aen.bronze.test_table VALUES (1); SELECT * FROM edw_nprd_aen.bronze.test_tab...

Dhruv22_0-1768233514715.png Dhruv22_1-1768233551139.png Dhruv22_0-1768234077162.png
  • 96 Views
  • 3 replies
  • 0 kudos
Latest Reply
css-1029
New Contributor
  • 0 kudos

Hi @Dhruv-22,It's actually not a bug. Let me explain what's happening.The Root CauseThe issue stems from how schema evolution works with Delta Lake's MERGE statement, combined with Spark SQL's case-insensitivity settings.Here's the key insight: spark...

  • 0 kudos
2 More Replies
bsr
by New Contributor II
  • 956 Views
  • 4 replies
  • 4 kudos

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

After upgrading from DBR 17.3.2 to DBR 17.3.3, we started seeing a flood of DEBUG logs like this in job outputs:```DEBUG:ThreadMonitor:Logging python thread stack frames for MainThread and py4j threads: DEBUG:ThreadMonitor:Logging Thread-8 (run) stac...

  • 956 Views
  • 4 replies
  • 4 kudos
Latest Reply
WAHID
New Contributor II
  • 4 kudos

@iyashk-DBWe are currently using DBR version 17.3 LTS, and the issue is still occurring.Do you know when the fix is expected to be applied? We need this information to decide whether we should wait for the fix or proceed with the workaround you propo...

  • 4 kudos
3 More Replies
rijin-thomas
by New Contributor II
  • 251 Views
  • 4 replies
  • 3 kudos

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can ...

  • 251 Views
  • 4 replies
  • 3 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 3 kudos

Hi @rijin-thomas - Can you please allow the CIDR block for databricks account VPC from aws document db sg ( Executor connectivity stated by@bianca_unifeye ) . 

  • 3 kudos
3 More Replies
SaugatMukherjee
by New Contributor III
  • 217 Views
  • 2 replies
  • 1 kudos

Structured streaming for iceberg tables

According to this https://iceberg.apache.org/docs/latest/spark-structured-streaming/ , we can stream from iceberg tables. I have ensured that my source table is Iceberg version 3, but no matter what I do, I get Iceberg does not streaming reads. Looki...

  • 217 Views
  • 2 replies
  • 1 kudos
Latest Reply
SaugatMukherjee
New Contributor III
  • 1 kudos

Hi,Iceberg streaming is possible in Databricks. One does not need to change to Delta Lake. In my previous attempt, I used "load" while reading the source iceberg table. One should instead use "table". Load apparently seems to take a path and not a ta...

  • 1 kudos
1 More Replies
AcrobaticMonkey
by New Contributor III
  • 127 Views
  • 2 replies
  • 2 kudos

Salesforce Connector SCD2 - Get new record with isDeleted = true on deletion

Hi all,I'm using the Databricks Salesforce connector to ingest tables with history tracking enabled (SCD Type 2).When records are deleted in Salesforce, the connector closes the existing record by setting the end date. The isDeleted flag remains fals...

  • 127 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Greetings @AcrobaticMonkey , I put on my researcher hat and dug into our internal docs. Here is what I found:  Short answer: this isn’t configurable today. The connector’s SCD Type 2 behavior “closes” a record by setting __END_AT and does not emit a ...

  • 2 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 148 Views
  • 2 replies
  • 0 kudos

Behavior of Zstd Compression for Delta Tables Across Different Databricks Runtime Versions

Hi all,For ZSTD compression, as per the documentation, any table created with DBR 16.0 or newer (or Apache Spark 3.5+) uses Zstd as the default compression codec instead of Snappy.I explicitly set the table property to Zstd:spark.sql("""ALTER TABLE m...

  • 148 Views
  • 2 replies
  • 0 kudos
Latest Reply
JAHNAVI
Databricks Employee
  • 0 kudos

@pooja_bhumandla  New files written by DBR 15.4 (or any pre‑16.0 runtime) will still use Zstd as long as the table property delta.compression.codec = 'zstd' remains set on the table.When we explicitly run: ALTER TABLE my_tableSET TBLPROPERTIES ('delt...

  • 0 kudos
1 More Replies
tonkol
by New Contributor II
  • 101 Views
  • 1 replies
  • 0 kudos

Migrate on-premise delta tables to Databricks (Azure)

Hi There,I have the situation that we've decided to migrate our on-premise delta-lake to Azure Databricks.Because of networking I can only "push" the data from on-prem to cloud.What would be the best way to replicate all tables: schema+partitioning i...

  • 101 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
New Contributor II
  • 0 kudos

The correct solution is not SQL based.Delta tables are defined by the contents of the delta log directory, not by CREATE TABLE statements. That is why SHOW CREATE TABLE cannot reconstruct partitions, properties or constraints.The only reliable migrat...

  • 0 kudos
Sainath368
by Contributor
  • 346 Views
  • 4 replies
  • 2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 346 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 2 kudos
3 More Replies
ganesh_raskar
by New Contributor II
  • 224 Views
  • 5 replies
  • 0 kudos

Installing Custom Packages on Serverless Compute via Databricks Connect

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.Works: Traditional Compute ClusterCustom package pre-i...

Data Engineering
data-engineering
databricks-connect
  • 224 Views
  • 5 replies
  • 0 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 0 kudos

Hi @ganesh_raskar - If you can provide which custom package and exact code and error, I can try to replicate at my end and explore the suitable option. 

  • 0 kudos
4 More Replies
hnnhhnnh
by New Contributor II
  • 175 Views
  • 1 replies
  • 0 kudos

Title: How to handle type widening (int→bigint) in DLT streaming tables without dropping the table

SetupBronze source table (external to DLT, CDF & type widening enabled):# Source table properties:# delta.enableChangeDataFeed: "true"# delta.enableDeletionVectors: "true"# delta.enableTypeWidening: "true"# delta.minReaderVersion: "3"# delta.minWrite...

  • 175 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
New Contributor II
  • 0 kudos

Hi @hnnhhnnh DLT streaming tables that use apply changes do not support widening the data type of key columns such as changing an integer to a bigint after the table is created. Even though Delta and Unity Catalog support type widening in general, DL...

  • 0 kudos
ismaelhenzel
by Contributor III
  • 259 Views
  • 1 replies
  • 1 kudos

Resolved! Declarative Pipelines - Dynamic Overwrite

Regarding the limitations of declarative pipelines—specifically the inability to use replaceWhere—I discovered through testing that materialized views actually support dynamic overwrites. This handles several scenarios where replaceWhere would typica...

  • 259 Views
  • 1 replies
  • 1 kudos
Latest Reply
osingh
Contributor
  • 1 kudos

This is a really interesting find, and honestly not something most people expect from materialized views.Under the hood, MVs in Databricks declarative pipelines are still Delta tables. So when you set partitionOverwriteMode=dynamic and partition by a...

  • 1 kudos
Joost1024
by New Contributor III
  • 975 Views
  • 6 replies
  • 4 kudos

Resolved! Read Array of Arrays of Objects JSON file using Spark

Hi Databricks Community! This is my first post in this forum, so I hope you can forgive me if it's not according to the forum best practices After lots of searching, I decided to share the peculiar issue I'm running into in this community.I try to lo...

  • 975 Views
  • 6 replies
  • 4 kudos
Latest Reply
Joost1024
New Contributor III
  • 4 kudos

I guess I was a bit over enthusiastic by accepting the answer.When I run the following on the single object array of arrays (as shown in the original post) I get a single row with column "value" and value null. from pyspark.sql import functions as F,...

  • 4 kudos
5 More Replies
Labels