Hi,I have a date column in a delta table called ADate. I need this in the format YYYYMMDD.In TSQL this is easy. However, I can't seem to be able to do this without splitting the YEAR, MONTH and Day and concatenating them together.Any ideas?
I'll share I'm having a variant of the same issue. I have a varchar field in the form YYYYMMDD which I'm trying to join to another varchar field from another table in the form of MM/DD/YYYY. Does anyone know of a way to do this in SPARK SQL without s...
Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...
Hi, I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.A...
Hello, I'm a bit late to the party, but I'll put that for posterity:There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't thi...
Hi,I have defined a delta table with a primary key:%sql
CREATE TABLE IF NOT EXISTS test_table_pk (
table_name STRING NOT NULL,
label STRING NOT NULL,
table_location STRING NOT NULL,
CONSTRAINT test_table_pk_col PRIMARY KEY(table_name)
...
I'm with you. But it DOES make sense because DBx databases are not application databases. DBx is not intended to be used like this. DBx databases are repositories for any ingested abstract data. To manage the ingestion is purpose-built databases ...
A deltaTable.dropDuplicates(columns) would be a very nice feature, simplifying the complex procedures that are suggested online. Or am I missing any existing procedures that can be done withouth merge operations or similar?
I created a feature request in the delta table project: [Feature Request] data deduplication on existing delta table · Issue #1767 · delta-io/delta (github.com)
Hi,I have pipeline running. I have updated one file in delta table which is already processed. Now I am getting errorcom.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: Detected a data update. This is currently not supported. If ...
I want to add a column to an existing delta table with a timestamp for when the data was inserted. I know I can do this by including current_timestamp with my SQL statement that inserts into the table. Is it possible to add a column to an existing de...
-- Alter the table to use the GENERATED ALWAYS functionality for the created_at column
ALTER TABLE example_table
ADD COLUMN created_at TIMESTAMP GENERATED ALWAYS AS CURRENT_TIMESTAMP();@Michael Burch​ Hi , Did you try using GENERATED ALWAYS feature. ...
I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...
Hi @Gary_Irick, @gongasxavi , @Pete_Cotton , @aleks1601 ,
Certainly, let’s address your questions regarding Delta table partition directories and column mapping.
Directory Names with Column Mapping:
When you enable column mapping in a Delta tabl...
Hi,I'm doing merge to my Delta Table which has IDENTITY column:Id BIGINT GENERATED ALWAYS AS IDENTITYInserted data has in the id column every other number, like this:Is this expected behavior? Is there any workaround to make number increasing by 1?
Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...
I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...
Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...
I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark
.readStream
.format("cloudFiles")
.option('cloudFil...
Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...
Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...
I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...
You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time