Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

02-08-2023 3:45:09 AM

2462 Views
4 replies
0 kudos

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

Objective is to make table unique at ID. Table structure is as in attached image.Query used is : selectID,concat_ws(' & ' , collect_list(Distinct Gender)) as Genderfrom tablegroup by IDIt can be possible if we can order values within collect_list and...

Data Engineering

2462 Views
4 replies
0 kudos

02-08-2023 3:45:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 1:37:37 AM

0 kudos

Hi @Rishabh Shanker Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

0 kudos

04-10-2023 1:37:37 AM

3 More Replies

by VinayEmmadi • New Contributor

01-25-2023 10:57:45 AM

8732 Views
1 replies
1 kudos

How does hash shuffle join work in Spark?

Hi All, I am trying to understand the internals shuffle hash join. I want to check if my understanding of it is correct. Let’s say I have two tables t1 and t2 joined on column country (8 distinct values). If I set the number of shuffle partitions as ...

Data Engineering

8732 Views
1 replies
1 kudos

01-25-2023 10:57:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:17:39 PM

1 kudos

@Vinay Emmadi : In Spark, a hash shuffle join is a type of join that is used when joining two data sets on a common key. The data is first partitioned based on the join key, and then each partition is shuffled and sent to a node in the cluster. The ...

1 kudos

03-08-2023 8:17:39 PM

by test_user • New Contributor II

12-23-2022 6:07:04 AM

32836 Views
3 replies
1 kudos

How to explode an array column and repack the distinct values into one array in DB SQL?

Hi, I am new to DB SQL. I have a table where the array column (cities) contains multiple arrays and some have multiple duplicate values. I need to unpack the array values into rows so I can list the distinct values. The following query works for this...

Data Engineering

32836 Views
3 replies
1 kudos

12-23-2022 6:07:04 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-23-2022 8:35:31 PM

1 kudos

try to use SQL windows functions here

1 kudos

12-23-2022 8:35:31 PM

2 More Replies

by Zii • New Contributor II

05-02-2022 8:34:29 AM

3622 Views
0 replies
1 kudos

Delta Live Tables Quality check for distinct Values

Hi All, I have been having an issue identifying how to do a uniqueness check for the quality check. Below is an example. @dlt.expect("origin_not_dup", "origin is distinct from origin") def harmonized_data(): df=dlt.read("raw_data") for col in...

Data Engineering

3622 Views
0 replies
1 kudos

05-02-2022 8:34:29 AM

by Artem_Y • Databricks Employee

10-13-2021 5:45:06 PM

2686 Views
1 replies
2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

Data Engineering

2686 Views
1 replies
2 kudos

10-13-2021 5:45:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-14-2021 10:25:19 AM

2 kudos

@Artem Yevtushenko - This is great! Thank you for sharing!

2 kudos

10-14-2021 10:25:19 AM

Databricks Community

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

How does hash shuffle join work in Spark?

How to explode an array column and repack the distinct values into one array in DB SQL?

Delta Live Tables Quality check for distinct Values

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...