cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 1115 Views
  • 4 replies
  • 0 kudos

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

Objective is to make table unique at ID. Table structure is as in attached image.Query used is : selectID,concat_ws(' & ' , collect_list(Distinct Gender)) as Genderfrom tablegroup by IDIt can be possible if we can order values within collect_list and...

  • 1115 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rishabh Shanker​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
3 More Replies
VinayEmmadi
by New Contributor
  • 3642 Views
  • 1 replies
  • 0 kudos

How does hash shuffle join work in Spark?

Hi All, I am trying to understand the internals shuffle hash join. I want to check if my understanding of it is correct. Let’s say I have two tables t1 and t2 joined on column country (8 distinct values). If I set the number of shuffle partitions as ...

  • 3642 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Vinay Emmadi​ : In Spark, a hash shuffle join is a type of join that is used when joining two data sets on a common key. The data is first partitioned based on the join key, and then each partition is shuffled and sent to a node in the cluster. The ...

  • 0 kudos
Tom_Jones
by New Contributor II
  • 10918 Views
  • 3 replies
  • 1 kudos

How to explode an array column and repack the distinct values into one array in DB SQL?

Hi, I am new to DB SQL. I have a table where the array column (cities) contains multiple arrays and some have multiple duplicate values. I need to unpack the array values into rows so I can list the distinct values. The following query works for this...

  • 10918 Views
  • 3 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

try to use SQL windows functions here

  • 1 kudos
2 More Replies
Zii
by New Contributor II
  • 1880 Views
  • 2 replies
  • 2 kudos

Resolved! Delta Live Tables Quality check for distinct Values

Hi All, I have been having an issue identifying how to do a uniqueness check for the quality check. Below is an example. @dlt.expect("origin_not_dup", "origin is distinct from origin") def harmonized_data(): df=dlt.read("raw_data") for col in...

  • 1880 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Ramzi Alashabi​ , We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will...

  • 2 kudos
1 More Replies
Artem_Yevtushen
by New Contributor III
  • 1282 Views
  • 1 replies
  • 2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

collect set table
  • 1282 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Artem Yevtushenko​ - This is great! Thank you for sharing!

  • 2 kudos
Labels