cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Valued Contributor
  • 1183 Views
  • 1 replies
  • 0 kudos

Resolved! How much space does the metadata for a Delta table take up?

If you have a lot of transactions in a table it seems like the Delta log keeping track of all those transactions would get pretty large. Does the size of the metadata become a problem over time?

  • 1183 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Yes, the size of the metadata can become a problem over time but not because of performance but because of storage costs. Delta performance will not degrade due to the size of the metadata, but your cloud storage bill can increase. By default Delta h...

  • 0 kudos
User16830818524
by New Contributor II
  • 929 Views
  • 1 replies
  • 0 kudos

Is it possible to read a Delta table directly using Koalas?

Can I read a Delta table directly using Koalas or do I need to read using Spark and then convert the Spark dataframe to a Koalas dataframe?

  • 929 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Yes, you can use the "read_delta" function. Documentation.

  • 0 kudos
User16826992666
by Valued Contributor
  • 1199 Views
  • 1 replies
  • 0 kudos

Resolved! If I create a shallow clone of a Delta table, then add data to the clone, where is that data stored?

Since a shallow clone only copies the metadata of the original table, I'm wondering where new data would end up. Is it even possible to add data to a shallow clone? Is the data written back to the original source file location?

  • 1199 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Shallow Clones are really useful for short-lived use cases such as testing and experimentation . It duplicates the metadata from the source table - and any new data added would go to the location specified while creating the shallow table. >Is the da...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1505 Views
  • 1 replies
  • 0 kudos

Resolved! If I create a clone of a Delta table, does it stay in sync with the original table?

Basically wondering what happens to the clone when updates are made to the original Delta table. Will the changes apply to the cloned table as well?

  • 1505 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The clone is not a replica and so updates made to the original delta table wouldn't be applies to the clone. However, shallow clones reference data files in the source directory. If you run vacuum on the source table, clients will no longer be able t...

  • 0 kudos
User16826992666
by Valued Contributor
  • 3561 Views
  • 2 replies
  • 0 kudos

Resolved! Can multiple streams write to a Delta table at the same time?

Wondering if there any dangers to doing this, and if it's a best practice. I'm concerned there could be conflicts but I'm not sure how Delta would handle it.

  • 3561 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

>Can multiple streams write to a Delta table at the same time?Yes delta uses optimistic concurrency control and configurable isolation levels>I'm concerned there could be conflicts but I'm not sure how Delta would handle it.Write operations can resul...

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 867 Views
  • 1 replies
  • 0 kudos

Delta concurrency write Issue

What is concurrent issue in delta, If at a time if we try to write same delta table , it some times fail , how to mitigate that

  • 867 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Delta Lake uses optimistic concurrency control to provide transactional guarantees between writes. Read: Reads (if needed) the latest available version of the table to identify which files need to be modified (that is, rewritten).Write: Stages all th...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1685 Views
  • 1 replies
  • 0 kudos

Resolved! Should I use Z Ordering on my Delta table every time I run Optimize?

Wondering if it always makes sense or if there are some situations where you might only want to run optimize

  • 1685 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Its good idea to optimize at end of each batch job to avoid any small files situation, Z order is optional and can be applied on few non partition columns which are used frequently in read operationsZORDER BY -> Colocate column information in the sam...

  • 0 kudos
jose_gonzalez
by Moderator
  • 23435 Views
  • 1 replies
  • 0 kudos

Resolved! What's the difference between mode("append") and mode("overwrite") on my Delta table

I would like to know the difference between .mode("append") and .mode("overwrite") when writing my Delta table

  • 23435 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Mode "append" atomically adds new data to an existing Delta table and "overwrite" atomically replaces all of the data in a table.

  • 0 kudos
jose_gonzalez
by Moderator
  • 5727 Views
  • 1 replies
  • 0 kudos

Resolved! How can I read a specific Delta table part file?

is there a way to read a specific part off a delta table? When I try to read the parquet file as parquet I get an error in the notebook that I’m using the incorrect format as it’s part of a delta table. I just want to read a single Parquet file, not ...

  • 5727 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Disable Delta format to read as Parquet you need to set to false the following Spark settings:>> SET spark.databricks.delta.formatCheck.enabled=false OR>> spark.conf.set("spark.databricks.delta.formatCheck.enabled", "false")its not recommended to re...

  • 0 kudos
jose_gonzalez
by Moderator
  • 5314 Views
  • 1 replies
  • 0 kudos

Resolved! How to get the size of my Delta table

I would like to know how to get the total size of my Delta table

  • 5314 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

The following Kb will show a step by step example on how to get the size of a Delta table https://kb.databricks.com/sql/find-size-of-table.html

  • 0 kudos
User16789201666
by Contributor II
  • 1026 Views
  • 1 replies
  • 0 kudos
  • 1026 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16789201666
Contributor II
  • 0 kudos

There isn’t a problem purging old data. When using auto loader it’ll take into account new data being added.

  • 0 kudos
jose_gonzalez
by Moderator
  • 1111 Views
  • 1 replies
  • 0 kudos

how often should I vacuum my Delta table?

I would like to know how often do I need to vacuum my delta table to clean old files?

  • 1111 Views
  • 1 replies
  • 0 kudos
Latest Reply
RonanStokes_DB
New Contributor III
  • 0 kudos

The requirements for Vacuum will depend on your application needs and the rate of arrival of new data. Vacuuming removes old versions of data.If you need to be able to query earlier versions of data many months after the original ingest time, then i...

  • 0 kudos
jose_gonzalez
by Moderator
  • 1746 Views
  • 2 replies
  • 0 kudos

how to partition my Delta table?

I would like to follow best practices to partition my Delta table. Should I partition by unique ID or date?

  • 1746 Views
  • 2 replies
  • 0 kudos
Latest Reply
RonanStokes_DB
New Contributor III
  • 0 kudos

Depending on the amount of data per partition - you may also want to consider partitioning by week, month or quarter.The partitioning decision is often tied to the tiering model of data storage. For a Bronze ingest layer, the optimal partitioning is ...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 3196 Views
  • 1 replies
  • 0 kudos

Resolved! Backfill Delta table

What is the recommended way to backfill a delta table using a series of smaller date partitioned jobs?

  • 3196 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

Another approach you might consider is creating a template notebook to query a known date range with widgets. For example, two date widgets, start time and end time. Then from there you could use Databricks Jobs to update these parameters for each ru...

  • 0 kudos
Labels