cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BeginnerBob
by New Contributor III
  • 15183 Views
  • 6 replies
  • 3 kudos

Convert Date to YYYYMMDD in databricks sql

Hi,I have a date column in a delta table called ADate. I need this in the format YYYYMMDD.In TSQL this is easy. However, I can't seem to be able to do this without splitting the YEAR, MONTH and Day and concatenating them together.Any ideas?

  • 15183 Views
  • 6 replies
  • 3 kudos
Latest Reply
JayDoubleYou42
New Contributor II
  • 3 kudos

I'll share I'm having a variant of the same issue. I have a varchar field in the form YYYYMMDD which I'm trying to join to another varchar field from another table in the form of MM/DD/YYYY. Does anyone know of a way to do this in SPARK SQL without s...

  • 3 kudos
5 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 699 Views
  • 3 replies
  • 7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

  • 699 Views
  • 3 replies
  • 7 kudos
Latest Reply
Poovarasan
New Contributor II
  • 7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns 

  • 7 kudos
2 More Replies
zyang
by Contributor
  • 4835 Views
  • 13 replies
  • 13 kudos

Option "delta.columnMapping.mode","name" introduces unexpected result

Hi, I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.A...

image
  • 4835 Views
  • 13 replies
  • 13 kudos
Latest Reply
CkoockieMonster
New Contributor II
  • 13 kudos

Hello, I'm a bit late to the party, but I'll put that for posterity:There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't thi...

  • 13 kudos
12 More Replies
Mado
by Valued Contributor II
  • 5345 Views
  • 3 replies
  • 0 kudos

Resolved! How to enforce delta table column to have unique values?

Hi,I have defined a delta table with a primary key:%sql   CREATE TABLE IF NOT EXISTS test_table_pk ( table_name STRING NOT NULL, label STRING NOT NULL, table_location STRING NOT NULL,   CONSTRAINT test_table_pk_col PRIMARY KEY(table_name) ...

image
  • 5345 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveL
New Contributor II
  • 0 kudos

I'm with you.  But it DOES make sense because DBx databases are not application databases.  DBx is not intended to be used like this.  DBx databases are repositories for any ingested abstract data.  To manage the ingestion is purpose-built databases ...

  • 0 kudos
2 More Replies
MRTN
by New Contributor III
  • 5494 Views
  • 3 replies
  • 3 kudos

Resolved! Feature request delta tables : drop duplicate rows

A deltaTable.dropDuplicates(columns) would be a very nice feature, simplifying the complex procedures that are suggested online. Or am I missing any existing procedures that can be done withouth merge operations or similar?

  • 5494 Views
  • 3 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

I created a feature request in the delta table project: [Feature Request] data deduplication on existing delta table · Issue #1767 · delta-io/delta (github.com)

  • 3 kudos
2 More Replies
sanjay
by Valued Contributor II
  • 5661 Views
  • 8 replies
  • 0 kudos

error after updating delta table com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update

Hi,I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting errorcom.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If ...

  • 5661 Views
  • 8 replies
  • 0 kudos
Latest Reply
Sanjeev_Chauhan
New Contributor II
  • 0 kudos

Hi Sanjay, You can try adding .option("overwriteSchema", "true")

  • 0 kudos
7 More Replies
deng77
by New Contributor III
  • 16637 Views
  • 10 replies
  • 2 kudos

Resolved! Using current_timestamp as a default value in a delta table

I want to add a column to an existing delta table with a timestamp for when the data was inserted. I know I can do this by including current_timestamp with my SQL statement that inserts into the table. Is it possible to add a column to an existing de...

  • 16637 Views
  • 10 replies
  • 2 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 2 kudos

-- Alter the table to use the GENERATED ALWAYS functionality for the created_at column ALTER TABLE example_table ADD COLUMN created_at TIMESTAMP GENERATED ALWAYS AS CURRENT_TIMESTAMP();@Michael Burch​ Hi , Did you try using GENERATED ALWAYS feature. ...

  • 2 kudos
9 More Replies
Gary_Irick
by New Contributor III
  • 4219 Views
  • 9 replies
  • 12 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 4219 Views
  • 9 replies
  • 12 kudos
Latest Reply
Kaniz
Community Manager
  • 12 kudos

Hi @Gary_Irick, @gongasxavi , @Pete_Cotton , @aleks1601 ,    Certainly, let’s address your questions regarding Delta table partition directories and column mapping.   Directory Names with Column Mapping: When you enable column mapping in a Delta tabl...

  • 12 kudos
8 More Replies
Leszek
by Contributor
  • 2344 Views
  • 1 replies
  • 2 kudos

IDENTITY columns generating every other number when merging

Hi,I'm doing merge to my Delta Table which has IDENTITY column:Id BIGINT GENERATED ALWAYS AS IDENTITYInserted data has in the id column every other number, like this:Is this expected behavior? Is there any workaround to make number increasing by 1?

image
  • 2344 Views
  • 1 replies
  • 2 kudos
Latest Reply
Dataspeaksss
New Contributor II
  • 2 kudos

Were you able to resolve it? I'm facing the same issue.

  • 2 kudos
DJey
by New Contributor III
  • 4445 Views
  • 5 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 4445 Views
  • 5 replies
  • 2 kudos
Latest Reply
DJey
New Contributor III
  • 2 kudos

@Vidula Khanna​  Enabling the below property resolved my issue:spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled",True) Thanks v much!

  • 2 kudos
4 More Replies
Greg
by New Contributor III
  • 940 Views
  • 1 replies
  • 4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

  • 940 Views
  • 1 replies
  • 4 kudos
Latest Reply
Jb11
New Contributor II
  • 4 kudos

Did you already solved this problem?

  • 4 kudos
Christine
by Contributor
  • 3844 Views
  • 8 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 3844 Views
  • 8 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor II
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
7 More Replies
Maksym
by New Contributor III
  • 4936 Views
  • 4 replies
  • 7 kudos

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark .readStream .format("cloudFiles") .option('cloudFil...

  • 4936 Views
  • 4 replies
  • 7 kudos
Latest Reply
lassebe
New Contributor II
  • 7 kudos

I had the same issue: files would randomly not be loaded.Setting `.option("cloudFiles.useIncrementalListing", False)` Seemed to do the trick!

  • 7 kudos
3 More Replies
Matt_L
by New Contributor III
  • 4515 Views
  • 3 replies
  • 3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

  • 4515 Views
  • 3 replies
  • 3 kudos
Latest Reply
Matt_L
New Contributor III
  • 3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

  • 3 kudos
2 More Replies
suresh1122
by New Contributor III
  • 7956 Views
  • 11 replies
  • 7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

  • 7956 Views
  • 11 replies
  • 7 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

  • 7 kudos
10 More Replies
Labels