cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ThomazRossito
by New Contributor II
  • 9 Views
  • 0 replies
  • 0 kudos

Post: Lakehouse Federation - Databricks

Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...

Data Engineering
data engineer
Lakehouse
SQL Analytics
  • 9 Views
  • 0 replies
  • 0 kudos
Jotav93
by New Contributor
  • 71 Views
  • 1 replies
  • 0 kudos

Move a delta table from a non UC metastore to a UC metastore preserving history

Hi, I am using Azure databricks and we recently enabled UC in our workspace. We have some tables in our non UC metastore that we want to move to a UC enabled metastore. Is there any way we can move these tables without loosing the delta table history...

Data Engineering
delta
unity
  • 71 Views
  • 1 replies
  • 0 kudos
Latest Reply
ThomazRossito
New Contributor II
  • 0 kudos

Hello,It is possible to have the expected result with dbutils.fs.cp("Origin location", "Destination location", True) and then create the table with the LOCATION of the Destination locationHope this helps

  • 0 kudos
brian_zavareh
by New Contributor III
  • 1285 Views
  • 5 replies
  • 4 kudos

Resolved! Optimizing Delta Live Table Ingestion Performance for Large JSON Datasets

I'm currently facing challenges with optimizing the performance of a Delta Live Table pipeline in Azure Databricks. The task involves ingesting over 10 TB of raw JSON log files from an Azure Data Lake Storage account into a bronze Delta Live Table la...

Data Engineering
autoloader
bigdata
delta-live-tables
json
  • 1285 Views
  • 5 replies
  • 4 kudos
Latest Reply
standup1
New Contributor II
  • 4 kudos

Hey @brian_zavareh , see this document. I hope this can help.https://learn.microsoft.com/en-us/azure/databricks/compute/cluster-config-best-practicesJust keep in mind that there's some extra cost from Azure VM side, check your Azure Cost Analysis for...

  • 4 kudos
4 More Replies
Spenyo
by New Contributor II
  • 110 Views
  • 1 replies
  • 1 kudos

Delta table size not shrinking after Vacuum

Hi team.Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.What we done:SET spark.databricks.delta.retentionDurationCheck.enabled = false ALT...

chrome_KZMxPl8x1d.png
  • 110 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Spenyo,  Consider increasing the retention duration if you need to retain historical data for longer periods.If you’re not using time travel, you can set a retention interval of at least 7 days to strike a balance between history retention and st...

  • 1 kudos
SyedSaqib
by New Contributor II
  • 143 Views
  • 2 replies
  • 0 kudos

Delta Live Table : [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view

Hi,I have a delta live table workflow with storage enabled for cloud storage to a blob store.Syntax of bronze table in notebook===@dlt.table(spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"},table_properties = {"quality": "bron...

  • 143 Views
  • 2 replies
  • 0 kudos
Latest Reply
SyedSaqib
New Contributor II
  • 0 kudos

Hi Kaniz,Thanks for replying back.I am using python for delta live table creation, so how can I set these configurations?When creating the table, add the IF NOT EXISTS clause to tolerate pre-existing objects.consider using the OR REFRESH clause Answe...

  • 0 kudos
1 More Replies
Anandsingh
by New Contributor
  • 149 Views
  • 1 replies
  • 0 kudos

Writing to multiple files/tables from data held within a single file through autoloader

I have a requirement to read and parse JSON files using autoloader where incoming JSON file has multiple sub entities. Each sub entity needs to go into its own delta table. Alternatively we can write each entity data to individual files. We can use D...

  • 149 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

I think using DLT's medallion architecture should be helpful in this scenario. You can write all the incoming data to one bronze table and one silver table. And you can have multiple gold tables based on the value of the sub-entities.

  • 0 kudos
Kavi_007
by New Contributor III
  • 947 Views
  • 7 replies
  • 1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

  • 947 Views
  • 7 replies
  • 1 kudos
Latest Reply
Kavi_007
New Contributor III
  • 1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

  • 1 kudos
6 More Replies
kurokaj
by New Contributor
  • 163 Views
  • 1 replies
  • 0 kudos

DLT Autoloader stuck in reading Avro files from Azure blob storage

I have a DLT pipeline joining data from streaming tables to metadata of Avro files located in Azure blob storage. The avro files are loaded using autoloader. Up until 25.3. (about 20:00UTC) the pipeline worked fine, but then suddenly got stuck in ini...

image.png
Data Engineering
autoloader
AVRO
dlt
LTS
  • 163 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @kurokaj,  If the schema of the input data changes while an update is running, the update may be logged as CANCELED and automatically retried1. Ensure that there haven’t been any unexpected schema changes in your Avro files during the problematic ...

  • 0 kudos
EDDatabricks
by Contributor
  • 185 Views
  • 1 replies
  • 0 kudos

Expected size of managed Storage Accounts

Dear all,we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).Dur...

Data Engineering
Filesize
LOGS
Managed Storage Account
  • 185 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @EDDatabricks, Let’s address your questions regarding Azure-managed storage accounts: What do these Storage Accounts contain? An Azure storage account contains various data objects, including: Blobs: Used for storing unstructured data like ima...

  • 0 kudos
vijay_boopathy
by New Contributor
  • 259 Views
  • 1 replies
  • 0 kudos

Hive vs Delta

I'm curious about your experiences with Hive and Delta Lake. What are the advantages of using Delta over Hive, and in what scenarios would you recommend choosing Delta for data processing tasks? I'd appreciate any insights or recommendations based on...

  • 259 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Valued Contributor II
  • 0 kudos

Delta Lake offers several advantages over Hive. One of the key benefits is its design for petabyte-scale data lakes with streaming and fast access at the forefront. This makes it more suitable for near-real-time streams, unlike Hive. Delta Lake also ...

  • 0 kudos
sharma_kamal
by New Contributor III
  • 387 Views
  • 2 replies
  • 1 kudos

Resolved! Getting errors while reading data from URL

I'm encountering some issues while trying to read a public dataset from a URL using Databricks. Here's the code snippet(along with errors) I'm working with: I'm confused about Delta format error here.When I read data from a URL, how would it have a D...

sharma_kamal_1-1710132330915.png
  • 387 Views
  • 2 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
New Contributor III
  • 1 kudos

@sharma_kamal  Please disable the formatCheck in notebook and check if you could read the data The configuration command %sql SET spark.databricks.delta.formatCheck.enabled=false will disable the format check for Delta tables in Databricks. Databrick...

  • 1 kudos
1 More Replies
hyedesign
by New Contributor II
  • 313 Views
  • 3 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 313 Views
  • 3 replies
  • 0 kudos
Latest Reply
hyedesign
New Contributor II
  • 0 kudos

Using sample data sets. Here is the full code. This error does seem to be related to runtime version 15,df_source = spark.readStream.format("delta").table("`cat1`.`bronze`.`officer_info`")df_orig_state = spark.read.format("delta").table("`sample-db`....

  • 0 kudos
2 More Replies
EDDatabricks
by Contributor
  • 434 Views
  • 2 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 434 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @EDDatabricks,  Thank you for providing the details about your PySpark streaming and batch jobs operating on a Delta Table.  The concurrency issue you’re encountering seems to be related to the deletion of records from your Delta Table (TableA) du...

  • 0 kudos
1 More Replies
Mkk1
by New Contributor
  • 294 Views
  • 1 replies
  • 0 kudos

Joining tables across DLT pipelines

How can I join a silver table (s1) from a DLT pipeline (D1) to another silver table (S2) from a different DLT pipeline (D2)?#DLT #DeltaLiveTables

  • 294 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Mkk1, To join a silver table from one Delta Live Tables (DLT) pipeline to another silver table from a different DLT pipeline, you can follow these steps: Read the Silver Tables: In your DLT pipeline code, read the silver tables you want to jo...

  • 0 kudos
JoseMacedo
by New Contributor II
  • 252 Views
  • 3 replies
  • 0 kudos

How to cache on 500 billion rows

Hello!I'm using a server less SQL cluster on Data bricks and I have a dataset on Delta Table that has 500 billion rows. I'm trying to filter to have around 7 billion and the cache that dataset to use it on other queries and make it run faster.When I ...

  • 252 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I missed the 'serverless sql' part.  CACHE is for spark, I don´t think it works for serverless sql.Here is how caching works on DBSQL.

  • 0 kudos
2 More Replies
Labels