cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tinai_long
by New Contributor III
  • 9038 Views
  • 10 replies
  • 4 kudos

Resolved! How to refresh a single table in Delta Live Tables?

Suppose I have a Delta Live Tables framework with 2 tables: Table 1 ingests from a json source, Table 2 reads from Table 1 and runs some transformation.In other words, the data flow is json source -> Table 1 -> Table 2. Now if I find some bugs in the...

  • 9038 Views
  • 10 replies
  • 4 kudos
Latest Reply
cpayne_vax
New Contributor III
  • 4 kudos

Answering my own question: nowadays (February 2024) this can all be done via the UI.When viewing your DLT pipeline there is a "Select tables for refresh" button in the header. If you click this, you can select individual tables, and then in the botto...

  • 4 kudos
9 More Replies
Tahseen0354
by Valued Contributor
  • 22380 Views
  • 9 replies
  • 5 kudos

Resolved! Getting "Job aborted due to stage failure" SparkException when trying to download full result

I have generated a result using SQL. But whenever I try to download the full result (1 million rows), it is throwing SparkException. I can download the preview result but not the full result. Why ? What happens under the hood when I try to download ...

  • 22380 Views
  • 9 replies
  • 5 kudos
Latest Reply
ac567
New Contributor II
  • 5 kudos

Job aborted due to stage failure: Task 6506 in stage 46.0 failed 4 times, most recent failure: Lost task 6506.3 in stage 46.0 (TID 12896) (10.**.***.*** executor 12): java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (a...

  • 5 kudos
8 More Replies
brendanc19
by New Contributor III
  • 3927 Views
  • 6 replies
  • 2 kudos

Resolved! Does cancelling a job run rollback any actions performed by query plan?

If I were to stop a rather large job run, say half way thru execution, will any actions performed on our Delta tables persist or will they be rolled back?Are there any other risks that I need to be aware of in terms of cancelling a job run half way t...

  • 3927 Views
  • 6 replies
  • 2 kudos
Latest Reply
fabian_r
New Contributor II
  • 2 kudos

Hi, is there any way to ensure transaction control in delta protocol in 2024 across tables for failing jobs?

  • 2 kudos
5 More Replies
labromb
by Contributor
  • 12463 Views
  • 10 replies
  • 4 kudos

How to pass configuration values to a Delta Live Tables job through the Delta Live Tables API

Hi Community,I have successfully run a job through the API but would need to be able to pass parameters (configuration) to the DLT workflow via the APII have tried passing JSON in this format:{ "full_refresh": "true", "configuration": [ ...

  • 12463 Views
  • 10 replies
  • 4 kudos
Latest Reply
Edthehead
Contributor II
  • 4 kudos

You cannot pass parameters from a Databricks job to a DLT pipeline. Atleast not yet. You can see from the DLT rest API that there is no option for it to accept any parameters.But there is a workaround.But there is a workaround.With the assumption tha...

  • 4 kudos
9 More Replies
Data_Engineer3
by Contributor III
  • 2812 Views
  • 5 replies
  • 0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL]​ #[Spark streaming]​ #[Spark structured streaming]​ #Spark​ 

  • 2812 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html  Also, what is the challenge while using foreachbatch?

  • 0 kudos
4 More Replies
SRK
by Contributor III
  • 3397 Views
  • 5 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 3397 Views
  • 5 replies
  • 7 kudos
Latest Reply
maddy08
New Contributor II
  • 7 kudos

just to clarify, are you reading kafka and writing into adls in json files? like for each message from kafka is 1 json file in adls ?

  • 7 kudos
4 More Replies
brickster_2018
by Databricks Employee
  • 6939 Views
  • 5 replies
  • 1 kudos
  • 6939 Views
  • 5 replies
  • 1 kudos
Latest Reply
Panda
Valued Contributor
  • 1 kudos

@brickster_2018  @elgeo CRC ensures: Correctness, Recovery, ConsistencyChecksum VerificationRead ValidationConsistent TransactionsRecovery MechanismCommit OptimizationIt(CRC) ensures data isn't corrupted during storage or transfer, verifies consisten...

  • 1 kudos
4 More Replies
AsfandQ
by New Contributor III
  • 16121 Views
  • 7 replies
  • 6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

  • 16121 Views
  • 7 replies
  • 6 kudos
Latest Reply
Personal1
New Contributor II
  • 6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

  • 6 kudos
6 More Replies
DJey
by New Contributor III
  • 13898 Views
  • 6 replies
  • 2 kudos

Resolved! MergeSchema Not Working

Hi All, I have a scenario where my Exisiting Delta Table looks like below:Now I have an incremental data with an additional column i.e. owner:Dataframe Name --> scdDFBelow is the code snippet to merge Incremental Dataframe to targetTable, but the new...

image image image image
  • 13898 Views
  • 6 replies
  • 2 kudos
Latest Reply
Amin112
New Contributor II
  • 2 kudos

In Databricks Runtime 15.2 and above, you can specify schema evolution in a merge statement using SQL or Delta table APIs:MERGE WITH SCHEMA EVOLUTION INTO targetUSING sourceON source.key = target.keyWHEN MATCHED THENUPDATE SET *WHEN NOT MATCHED THENI...

  • 2 kudos
5 More Replies
Valentin1
by New Contributor III
  • 7368 Views
  • 6 replies
  • 3 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

  • 7368 Views
  • 6 replies
  • 3 kudos
Latest Reply
lprevost
Contributor
  • 3 kudos

I totally agree that this is a gap in the Databricks solution.  This gap exists between a static read and real time streaming.   My problem (and suspect there are many use cases) is that I have slowly changing data coming into structured folders via ...

  • 3 kudos
5 More Replies
bgerhardi
by New Contributor III
  • 9765 Views
  • 12 replies
  • 13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

  • 9765 Views
  • 12 replies
  • 13 kudos
Latest Reply
Pelle123
New Contributor II
  • 13 kudos

I am wondering if there is an answer to this question now.

  • 13 kudos
11 More Replies
YFL
by New Contributor III
  • 6628 Views
  • 11 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 6628 Views
  • 11 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
10 More Replies
ptambe
by New Contributor III
  • 4884 Views
  • 6 replies
  • 3 kudos

Resolved! Is Concurrent Writes from multiple databricks clusters to same delta table on S3 Supported?

Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have a...

  • 4884 Views
  • 6 replies
  • 3 kudos
Latest Reply
dennyglee
Databricks Employee
  • 3 kudos

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Del...

  • 3 kudos
5 More Replies
Gary_Irick
by New Contributor III
  • 9324 Views
  • 9 replies
  • 10 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 9324 Views
  • 9 replies
  • 10 kudos
Latest Reply
talenik
New Contributor III
  • 10 kudos

Hi @Retired_mod , I have few queries on Directory Names with Column Mapping. I have this delta table on ADLS and I am trying to read it, but I am getting below error. How can we read delta tables with column mapping enabled with pyspark?Can you pleas...

  • 10 kudos
8 More Replies
Oliver_Angelil
by Valued Contributor II
  • 9020 Views
  • 9 replies
  • 6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Screenshot 2023-05-09 at 17.02.04
  • 9020 Views
  • 9 replies
  • 6 kudos
Latest Reply
Rahul_S
New Contributor II
  • 6 kudos

Informative.

  • 6 kudos
8 More Replies
Labels