cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

qwerty1
by Contributor
  • 367 Views
  • 0 replies
  • 0 kudos

Why am I not able to view all table properties?

We have a live streaming table created using the commandCREATE OR REFRESH STREAMING LIVE TABLE foo TBLPROPERTIES ( "pipelines.autoOptimize.zOrderCols" = "c1,, c2, c3, c4", "delta.randomizeFilePrefixes" = "true" );But when I run the show table propert...

  • 367 Views
  • 0 replies
  • 0 kudos
hello_world
by New Contributor III
  • 2128 Views
  • 3 replies
  • 6 kudos

Resolved! What exactly is Z Ordering and Bloom Filter?

Have gone through the documentation, still cannot understand it.How is bloom filter indexing a column different from z ordering a column?Can somebody explain to me what exactly happens while these two techniques are applied?

  • 2128 Views
  • 3 replies
  • 6 kudos
Latest Reply
Rishabh264
Honored Contributor II
  • 6 kudos

hey @Daniel Sahal​ 1-A Bloomfilter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary textrefer this code snipet to create bloom filter CREATE BLOOMFILTER INDEX ON [TAB...

  • 6 kudos
2 More Replies
numersoz
by New Contributor III
  • 1915 Views
  • 5 replies
  • 10 kudos

Is ZORDER required after table overwrite?

Hi,After appending new values to a delta table, I need to delete duplicate rows.After deleting duplicate rows using PySpark, I overwrite the table (keeping the schema).My question is, do I have to do ZORDER again?Another question, is there another wa...

  • 1915 Views
  • 5 replies
  • 10 kudos
Latest Reply
DeepakMakwana74
New Contributor III
  • 10 kudos

Hii @Nurettin Ersoz​ try to use incremental load of data so it will avoid duplicate and you can use full load once if you have updation in your data

  • 10 kudos
4 More Replies
NOOR_BASHASHAIK
by Contributor
  • 2035 Views
  • 5 replies
  • 4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

image image image image
  • 2035 Views
  • 5 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @NOOR BASHA SHAIK​​, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

  • 4 kudos
4 More Replies
leos1
by New Contributor II
  • 777 Views
  • 2 replies
  • 0 kudos

Resolved! Question regarding ZORDER option of OPTIMIZE

Is the order of the columns in ZORDER important? For example, does ZORDER BY (product, site) and ZORDER BY (site, product) produce the same results?

  • 777 Views
  • 2 replies
  • 0 kudos
Latest Reply
leos1
New Contributor II
  • 0 kudos

thanks for the quick reply

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 1061 Views
  • 1 replies
  • 0 kudos
  • 1061 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

If you expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use ZORDER BY.You can specify multiple columns for ZORDER BY as a comma-separated list. However,...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1184 Views
  • 1 replies
  • 0 kudos

Resolved! Should I use Z Ordering on my Delta table every time I run Optimize?

Wondering if it always makes sense or if there are some situations where you might only want to run optimize

  • 1184 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Its good idea to optimize at end of each batch job to avoid any small files situation, Z order is optional and can be applied on few non partition columns which are used frequently in read operationsZORDER BY -> Colocate column information in the sam...

  • 0 kudos
Labels