cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MRTN
by New Contributor III
  • 7314 Views
  • 4 replies
  • 3 kudos

Resolved! Feature request delta tables : drop duplicate rows

A deltaTable.dropDuplicates(columns) would be a very nice feature, simplifying the complex procedures that are suggested online. Or am I missing any existing procedures that can be done withouth merge operations or similar?

  • 7314 Views
  • 4 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

I created a feature request in the delta table project: [Feature Request] data deduplication on existing delta table · Issue #1767 · delta-io/delta (github.com)

  • 3 kudos
3 More Replies
deng77
by New Contributor III
  • 27462 Views
  • 11 replies
  • 2 kudos

Resolved! Using current_timestamp as a default value in a delta table

I want to add a column to an existing delta table with a timestamp for when the data was inserted. I know I can do this by including current_timestamp with my SQL statement that inserts into the table. Is it possible to add a column to an existing de...

  • 27462 Views
  • 11 replies
  • 2 kudos
Latest Reply
Vaibhav1000
New Contributor II
  • 2 kudos

Can you please provide information on the additional expenses related to using this feature compared to not utilizing it at all?

  • 2 kudos
10 More Replies
Chris_Konsur
by New Contributor III
  • 13159 Views
  • 4 replies
  • 6 kudos

Resolved! Error: The associated location ... is not empty but it's not a Delta table

I try to create a table but I get this error: AnalysisException: Cannot create table ('`spark_catalog`.`default`.`citation_all_tenants`'). The associated location ('dbfs:/user/hive/warehouse/citation_all_tenants') is not empty but it's not a Delta t...

  • 13159 Views
  • 4 replies
  • 6 kudos
Latest Reply
sachin_tirth
New Contributor II
  • 6 kudos

Hi Team, I am facing the same issue. When we try to load data to table in production batch getting error as table not in delta format. there is no recent change in table. and we are not trying any create or replace table. this is existing table in pr...

  • 6 kudos
3 More Replies
Christine
by Contributor II
  • 5741 Views
  • 9 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 5741 Views
  • 9 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor III
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
8 More Replies
BeginnerBob
by New Contributor III
  • 21859 Views
  • 6 replies
  • 3 kudos

Convert Date to YYYYMMDD in databricks sql

Hi,I have a date column in a delta table called ADate. I need this in the format YYYYMMDD.In TSQL this is easy. However, I can't seem to be able to do this without splitting the YEAR, MONTH and Day and concatenating them together.Any ideas?

  • 21859 Views
  • 6 replies
  • 3 kudos
Latest Reply
JayDoubleYou42
New Contributor II
  • 3 kudos

I'll share I'm having a variant of the same issue. I have a varchar field in the form YYYYMMDD which I'm trying to join to another varchar field from another table in the form of MM/DD/YYYY. Does anyone know of a way to do this in SPARK SQL without s...

  • 3 kudos
5 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 1411 Views
  • 3 replies
  • 7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

  • 1411 Views
  • 3 replies
  • 7 kudos
Latest Reply
Poovarasan
New Contributor III
  • 7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns 

  • 7 kudos
2 More Replies
zyang
by Contributor
  • 7925 Views
  • 13 replies
  • 13 kudos

Option "delta.columnMapping.mode","name" introduces unexpected result

Hi, I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.A...

image
  • 7925 Views
  • 13 replies
  • 13 kudos
Latest Reply
CkoockieMonster
New Contributor II
  • 13 kudos

Hello, I'm a bit late to the party, but I'll put that for posterity:There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't thi...

  • 13 kudos
12 More Replies
Mado
by Valued Contributor II
  • 7388 Views
  • 3 replies
  • 0 kudos

Resolved! How to enforce delta table column to have unique values?

Hi,I have defined a delta table with a primary key:%sql   CREATE TABLE IF NOT EXISTS test_table_pk ( table_name STRING NOT NULL, label STRING NOT NULL, table_location STRING NOT NULL,   CONSTRAINT test_table_pk_col PRIMARY KEY(table_name) ...

image
  • 7388 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveL
New Contributor II
  • 0 kudos

I'm with you.  But it DOES make sense because DBx databases are not application databases.  DBx is not intended to be used like this.  DBx databases are repositories for any ingested abstract data.  To manage the ingestion is purpose-built databases ...

  • 0 kudos
2 More Replies
sanjay
by Valued Contributor II
  • 8266 Views
  • 8 replies
  • 0 kudos

error after updating delta table com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update

Hi,I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting errorcom.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If ...

  • 8266 Views
  • 8 replies
  • 0 kudos
Latest Reply
Sanjeev_Chauhan
New Contributor II
  • 0 kudos

Hi Sanjay, You can try adding .option("overwriteSchema", "true")

  • 0 kudos
7 More Replies
Gary_Irick
by New Contributor III
  • 5993 Views
  • 9 replies
  • 12 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 5993 Views
  • 9 replies
  • 12 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 12 kudos

Hi @Gary_Irick, @gongasxavi , @Pete_Cotton , @aleks1601 ,    Certainly, let’s address your questions regarding Delta table partition directories and column mapping.   Directory Names with Column Mapping: When you enable column mapping in a Delta tabl...

  • 12 kudos
8 More Replies
Leszek
by Contributor
  • 6924 Views
  • 1 replies
  • 2 kudos

IDENTITY columns generating every other number when merging

Hi,I'm doing merge to my Delta Table which has IDENTITY column:Id BIGINT GENERATED ALWAYS AS IDENTITYInserted data has in the id column every other number, like this:Is this expected behavior? Is there any workaround to make number increasing by 1?

image
  • 6924 Views
  • 1 replies
  • 2 kudos
Latest Reply
Dataspeaksss
New Contributor II
  • 2 kudos

Were you able to resolve it? I'm facing the same issue.

  • 2 kudos
DJey
by New Contributor III
  • 7465 Views
  • 5 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 7465 Views
  • 5 replies
  • 2 kudos
Latest Reply
DJey
New Contributor III
  • 2 kudos

@Vidula Khanna​  Enabling the below property resolved my issue:spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled",True) Thanks v much!

  • 2 kudos
4 More Replies
Greg
by New Contributor III
  • 1242 Views
  • 1 replies
  • 4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

  • 1242 Views
  • 1 replies
  • 4 kudos
Latest Reply
Jb11
New Contributor II
  • 4 kudos

Did you already solved this problem?

  • 4 kudos
Maksym
by New Contributor III
  • 6280 Views
  • 4 replies
  • 7 kudos

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark .readStream .format("cloudFiles") .option('cloudFil...

  • 6280 Views
  • 4 replies
  • 7 kudos
Latest Reply
lassebe
New Contributor II
  • 7 kudos

I had the same issue: files would randomly not be loaded.Setting `.option("cloudFiles.useIncrementalListing", False)` Seemed to do the trick!

  • 7 kudos
3 More Replies
Matt_L
by New Contributor III
  • 5318 Views
  • 3 replies
  • 3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

  • 5318 Views
  • 3 replies
  • 3 kudos
Latest Reply
Matt_L
New Contributor III
  • 3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

  • 3 kudos
2 More Replies
Labels