cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

halsgbs
by New Contributor
  • 15 Views
  • 4 replies
  • 2 kudos

Warehouse ID specified in job yaml file for sql tasks

My goal is to trigger an alert I have through a job, and it seems I have to specify the warehouse id within the job yaml file itself. We have different environments with different warehouse ids, and the issue is that if I specify the warehouse id in ...

  • 15 Views
  • 4 replies
  • 2 kudos
Latest Reply
halsgbs
New Contributor
  • 2 kudos

Thank you! looks like the alert_id also needs to be parametised, and I was wondering if its possible to use a job parameter to do so? If I can use the alert name then that would be great but I believe it has to be the alert id, which will be differen...

  • 2 kudos
3 More Replies
ShivangiB1
by New Contributor III
  • 129 Views
  • 6 replies
  • 0 kudos

Sql server setup for lakeflow sql server connector to create ingestion

When i am executing below command change instance is getting created but without lakeflow as prefix, i read the documentation and it mentioned that to track schema evolution we need to have prefix, can I please get some assistance.Command Used:EXEC d...

  • 129 Views
  • 6 replies
  • 0 kudos
Latest Reply
ShivangiB1
New Contributor III
  • 0 kudos

and when i altered the table got below warning : WARNING: Table [dbo].[test_table] has a pre-existing capture instance named 'dbo_test_table' that was not created by lakeflow. Lakeflow will preserve this instance and create its own instance alongside...

  • 0 kudos
5 More Replies
rijin-thomas
by New Contributor
  • 171 Views
  • 1 replies
  • 0 kudos

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can ...

  • 171 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
Contributor
  • 0 kudos

If PyMongo works but the Spark connector times out, the issue is almost always JVM TLS configuration or executor-level network access, not credentials or the database itself. TLS handling (most common cause):The MongoDB Spark connector runs on the JV...

  • 0 kudos
ChrisRose
by Visitor
  • 26 Views
  • 6 replies
  • 1 kudos

Result Difference Between View and Manually Run View Query

I am experiencing an issue where a view does not display the correct results, but running the view query manually in either a new notebook or the SQL Editor displays different, correct results. I have tried switching the compute resource in the noteb...

  • 26 Views
  • 6 replies
  • 1 kudos
Latest Reply
bianca_unifeye
Contributor
  • 1 kudos

There are 2 fixes that I can think off Option A:  Make first_value deterministic  first_value(Customer_ID, true) OVER ( PARTITION BY customer_name ORDER BY submitted ASC, event_id ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) U...

  • 1 kudos
5 More Replies
AcrobaticMonkey
by New Contributor II
  • 18 Views
  • 1 replies
  • 0 kudos

Salesforce Connector SCD2 - Get new record with isDeleted = true on deletion

Hi all,I'm using the Databricks Salesforce connector to ingest tables with history tracking enabled (SCD Type 2).When records are deleted in Salesforce, the connector closes the existing record by setting the end date. The isDeleted flag remains fals...

  • 18 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @AcrobaticMonkey , I put on my researcher hat and dug into our internal docs. Here is what I found:  Short answer: this isn’t configurable today. The connector’s SCD Type 2 behavior “closes” a record by setting __END_AT and does not emit a ...

  • 0 kudos
SaugatMukherjee
by New Contributor III
  • 11 Views
  • 1 replies
  • 0 kudos

Structured streaming for iceberg tables

According to this https://iceberg.apache.org/docs/latest/spark-structured-streaming/ , we can stream from iceberg tables. I have ensured that my source table is Iceberg version 3, but no matter what I do, I get Iceberg does not streaming reads. Looki...

  • 11 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @SaugatMukherjee , I did some research and this is what I found.  You’re running into a real (and documented) Databricks limitation here: managed Iceberg tables cannot be used as a streaming source today. That’s true even though upstream Ap...

  • 0 kudos
Malthe
by Contributor III
  • 15 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to update DLT-based materialized view if clustering key is missing

If we set up a materialized view with a clustering key, and then update the definition such that this key is no longer part of the table, Databricks complains:Run ALTER TABLE ... CLUSTER BY ... to repair Delta clustering metadata.But this is not poss...

  • 15 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Malthe , Currently, there is no supported way to repair broken clustering metadata in Delta materialised views if you remove the clustering key from the definition, other than dropping and recreating the materialised view. Additionally, a full...

  • 1 kudos
bsr
by New Contributor II
  • 476 Views
  • 3 replies
  • 3 kudos

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

After upgrading from DBR 17.3.2 to DBR 17.3.3, we started seeing a flood of DEBUG logs like this in job outputs:```DEBUG:ThreadMonitor:Logging python thread stack frames for MainThread and py4j threads: DEBUG:ThreadMonitor:Logging Thread-8 (run) stac...

  • 476 Views
  • 3 replies
  • 3 kudos
Latest Reply
bsr
New Contributor II
  • 3 kudos

Thanks for the quick response!

  • 3 kudos
2 More Replies
ruicarvalho_de
by New Contributor III
  • 77 Views
  • 7 replies
  • 0 kudos

Databricks API - Get Dashboard Owner?

Hi all!I'm trying to identify the owner of a dashboard using the API.Here's a code snippet as an example:import json dashboard_id = "XXXXXXXXXXXXXXXXXXXXXXXXXX" url = f"{workspace_url}/api/2.0/lakeview/dashboards/{dashboard_id}" headers = {"Authoriz...

  • 77 Views
  • 7 replies
  • 0 kudos
Latest Reply
JAHNAVI
Databricks Employee
  • 0 kudos

@ruicarvalho_de I don't think we have a direct way to fetch the owner of the dashboard through API. We can try using an API call to retrieve the dashboard IDs and then use the following API call /api/2.0/permissions/{workspace_object_type}/{workspace...

  • 0 kudos
6 More Replies
sophi8876
by New Contributor
  • 47 Views
  • 1 replies
  • 0 kudos

Small thing, but does anyone else hate manually formatting values for SQL IN clauses?

This feels like one of those tiny annoyances that adds up. I often copy a list of IDs from logs / Excel / emails and need to drop them into a MySQL or Trino IN (...) clause. Half the time I end up manually adding quotes or commas, or fixing formattin...

  • 47 Views
  • 1 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hey @sophi8876 , I generally use IDE' s like IntelliJ or VS Code and write a regex to add commas and quotes and it works for me and is faster enough as well

  • 0 kudos
manjeetgahlawat
by Visitor
  • 14 Views
  • 1 replies
  • 1 kudos

DLT Pipeline issue

 Hello Everyone, I have setup a DLT pipeline and while running it first time, I am getting the below issue:key not found: test_bronze_dltNoSuchElementExceptionkey not found: test_bronze_dlt test_bronze_dlt - this my DLT table name that is expected to...

  • 14 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @manjeetgahlawat , NoSuchElementException: key not found: test_bronze_dlt occurs when the table/view in the pipeline references a LIVE dataset named test_bronze_dlt, but DLT cannot find a dataset with that exact name in the pipeline graph. (So ...

  • 1 kudos
Lon_Fortes
by New Contributor III
  • 8775 Views
  • 4 replies
  • 1 kudos

Resolved! How can I check that column on a delta table has a "NOT NULL" constraint or not?

Title pretty much says it all - I'm trying to determine whether or not a column on my existing delta table was defined as NOT NULL or not. It does not show up in any of the metadata (describe detail, describe history, show tblproperties). Thanks in...

  • 8775 Views
  • 4 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

@muki , you can run SHOW CREATE TABLE <catalog>.<schema>.<table> and in that also you can also see the constraints applied.

  • 1 kudos
3 More Replies
yzhang
by Contributor
  • 2387 Views
  • 10 replies
  • 3 kudos

iceberg with partitionedBy option

I am able to create a UnityCatalog iceberg format table:    df.writeTo(full_table_name).using("iceberg").create()However, if I am adding option partitionedBy I will get an error.  df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_dat...

  • 2387 Views
  • 10 replies
  • 3 kudos
Latest Reply
LazyGenius
New Contributor III
  • 3 kudos

@Sanjeeb2024 If your question is for me, then I will say it depends on use case!!As if you have very big data to be ingested in table then you would prefer creating table and then ingest data into it using simultaneous jobs

  • 3 kudos
9 More Replies
pooja_bhumandla
by New Contributor III
  • 40 Views
  • 2 replies
  • 0 kudos

Behavior of Zstd Compression for Delta Tables Across Different Databricks Runtime Versions

Hi all,For ZSTD compression, as per the documentation, any table created with DBR 16.0 or newer (or Apache Spark 3.5+) uses Zstd as the default compression codec instead of Snappy.I explicitly set the table property to Zstd:spark.sql("""ALTER TABLE m...

  • 40 Views
  • 2 replies
  • 0 kudos
Latest Reply
JAHNAVI
Databricks Employee
  • 0 kudos

@pooja_bhumandla  New files written by DBR 15.4 (or any pre‑16.0 runtime) will still use Zstd as long as the table property delta.compression.codec = 'zstd' remains set on the table.When we explicitly run: ALTER TABLE my_tableSET TBLPROPERTIES ('delt...

  • 0 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 42 Views
  • 3 replies
  • 1 kudos

Collecting Delta Stats for Columns Used in Filters Beyond Default First 32 Columns

Hi community,When using Delta Lake, data skipping relies on column statistics (min/max values). By default, we collect stats for:The first 32 columns in the table (based on position) and 4 special columns.This gives roughly 36 columns with stats.Howe...

  • 42 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 1 kudos

Hi @pooja_bhumandla - If your table is a managed table, better enable the predictive optimization, this way Databricks will automatically runs analyze and collect the stats. 

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels