cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

YFL
by New Contributor III
  • 5113 Views
  • 12 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 5113 Views
  • 12 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
11 More Replies
ptambe
by New Contributor III
  • 4003 Views
  • 7 replies
  • 3 kudos

Resolved! Is Concurrent Writes from multiple databricks clusters to same delta table on S3 Supported?

Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have a...

  • 4003 Views
  • 7 replies
  • 3 kudos
Latest Reply
dennyglee
New Contributor III
  • 3 kudos

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Del...

  • 3 kudos
6 More Replies
Gary_Irick
by New Contributor III
  • 6855 Views
  • 10 replies
  • 12 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 6855 Views
  • 10 replies
  • 12 kudos
Latest Reply
talenik
New Contributor III
  • 12 kudos

Hi @Kaniz_Fatma , I have few queries on Directory Names with Column Mapping. I have this delta table on ADLS and I am trying to read it, but I am getting below error. How can we read delta tables with column mapping enabled with pyspark?Can you pleas...

  • 12 kudos
9 More Replies
MRTN
by New Contributor III
  • 8222 Views
  • 4 replies
  • 3 kudos

Resolved! Feature request delta tables : drop duplicate rows

A deltaTable.dropDuplicates(columns) would be a very nice feature, simplifying the complex procedures that are suggested online. Or am I missing any existing procedures that can be done withouth merge operations or similar?

  • 8222 Views
  • 4 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

I created a feature request in the delta table project: [Feature Request] data deduplication on existing delta table · Issue #1767 · delta-io/delta (github.com)

  • 3 kudos
3 More Replies
deng77
by New Contributor III
  • 30447 Views
  • 11 replies
  • 2 kudos

Resolved! Using current_timestamp as a default value in a delta table

I want to add a column to an existing delta table with a timestamp for when the data was inserted. I know I can do this by including current_timestamp with my SQL statement that inserts into the table. Is it possible to add a column to an existing de...

  • 30447 Views
  • 11 replies
  • 2 kudos
Latest Reply
Vaibhav1000
New Contributor II
  • 2 kudos

Can you please provide information on the additional expenses related to using this feature compared to not utilizing it at all?

  • 2 kudos
10 More Replies
Chris_Konsur
by New Contributor III
  • 14802 Views
  • 4 replies
  • 6 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 14802 Views
  • 4 replies
  • 6 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 6 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 6 kudos
3 More Replies
Christine
by Contributor II
  • 6475 Views
  • 9 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 6475 Views
  • 9 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor III
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
8 More Replies
BeginnerBob
by New Contributor III
  • 24865 Views
  • 6 replies
  • 3 kudos

Convert Date to YYYYMMDD in databricks sql

Hi,I have a date column in a delta table called ADate. I need this in the format YYYYMMDD.In TSQL this is easy. However, I can't seem to be able to do this without splitting the YEAR, MONTH and Day and concatenating them together.Any ideas?

  • 24865 Views
  • 6 replies
  • 3 kudos
Latest Reply
JayDoubleYou42
New Contributor II
  • 3 kudos

I'll share I'm having a variant of the same issue. I have a varchar field in the form YYYYMMDD which I'm trying to join to another varchar field from another table in the form of MM/DD/YYYY. Does anyone know of a way to do this in SPARK SQL without s...

  • 3 kudos
5 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 1545 Views
  • 3 replies
  • 7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

  • 1545 Views
  • 3 replies
  • 7 kudos
Latest Reply
Poovarasan
New Contributor III
  • 7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns 

  • 7 kudos
2 More Replies
zyang
by Contributor
  • 9317 Views
  • 13 replies
  • 13 kudos

Option "delta.columnMapping.mode","name" introduces unexpected result

Hi, I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.A...

image
  • 9317 Views
  • 13 replies
  • 13 kudos
Latest Reply
CkoockieMonster
New Contributor II
  • 13 kudos

Hello, I'm a bit late to the party, but I'll put that for posterity:There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't thi...

  • 13 kudos
12 More Replies
Mado
by Valued Contributor II
  • 8188 Views
  • 3 replies
  • 0 kudos

Resolved! How to enforce delta table column to have unique values?

Hi,I have defined a delta table with a primary key:%sql   CREATE TABLE IF NOT EXISTS test_table_pk ( table_name STRING NOT NULL, label STRING NOT NULL, table_location STRING NOT NULL,   CONSTRAINT test_table_pk_col PRIMARY KEY(table_name) ...

image
  • 8188 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveL
New Contributor II
  • 0 kudos

I'm with you.  But it DOES make sense because DBx databases are not application databases.  DBx is not intended to be used like this.  DBx databases are repositories for any ingested abstract data.  To manage the ingestion is purpose-built databases ...

  • 0 kudos
2 More Replies
sanjay
by Valued Contributor II
  • 9334 Views
  • 8 replies
  • 0 kudos

error after updating delta table com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update

Hi,I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting errorcom.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If ...

  • 9334 Views
  • 8 replies
  • 0 kudos
Latest Reply
Sanjeev_Chauhan
New Contributor II
  • 0 kudos

Hi Sanjay, You can try adding .option("overwriteSchema", "true")

  • 0 kudos
7 More Replies
Leszek
by Contributor
  • 7054 Views
  • 1 replies
  • 2 kudos

IDENTITY columns generating every other number when merging

Hi,I'm doing merge to my Delta Table which has IDENTITY column:Id BIGINT GENERATED ALWAYS AS IDENTITYInserted data has in the id column every other number, like this:Is this expected behavior? Is there any workaround to make number increasing by 1?

image
  • 7054 Views
  • 1 replies
  • 2 kudos
Latest Reply
Dataspeaksss
New Contributor II
  • 2 kudos

Were you able to resolve it? I'm facing the same issue.

  • 2 kudos
DJey
by New Contributor III
  • 8202 Views
  • 5 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 8202 Views
  • 5 replies
  • 2 kudos
Latest Reply
DJey
New Contributor III
  • 2 kudos

@Vidula Khanna​  Enabling the below property resolved my issue:spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled",True) Thanks v much!

  • 2 kudos
4 More Replies
Greg
by New Contributor III
  • 1352 Views
  • 1 replies
  • 4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

  • 1352 Views
  • 1 replies
  • 4 kudos
Latest Reply
Jb11
New Contributor II
  • 4 kudos

Did you already solved this problem?

  • 4 kudos
Labels