cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 2303 Views
  • 4 replies
  • 0 kudos

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

Objective is to make table unique at ID. Table structure is as in attached image.Query used is : selectID,concat_ws(' & ' , collect_list(Distinct Gender)) as Genderfrom tablegroup by IDIt can be possible if we can order values within collect_list and...

  • 2303 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rishabh Shanker​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
3 More Replies
Harun
by Honored Contributor
  • 9353 Views
  • 2 replies
  • 0 kudos

Issue with Pyspark GroupBy GroupedData

Hi Guys,I am working on streaming data movement from bronze to silver. My bronze table is having a entity_name column, based on the entity_name column i need to create multiple silver tables.I tried the below approach, But it is failing with error "'...

  • 9353 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Harun Raseed Basheer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 0 kudos
1 More Replies
tusworten
by New Contributor II
  • 6437 Views
  • 3 replies
  • 3 kudos

Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.

I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...

1
  • 6437 Views
  • 3 replies
  • 3 kudos
Latest Reply
tusworten
New Contributor II
  • 3 kudos

Hi @Kaniz Fatma​ Her answer didn't solve my problem but it was useful to learn more about UDFS, which I did not know.

  • 3 kudos
2 More Replies
SindhuG
by New Contributor
  • 1098 Views
  • 0 replies
  • 0 kudos

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

my dataframe looks like this.df = Datecolumn2column3Machine1-jan-2020A2-jan-2020--- A 18-jan-2020 A 11-jan-2020 B 12-jan-2020 B 6-feb-2020C7-feb-2020---C14-feb-2020C Date details csv file looks like this D = MachineSelected DateA15-jan-2020C12-f...

  • 1098 Views
  • 0 replies
  • 0 kudos
LaurentThiebaud
by New Contributor
  • 5989 Views
  • 1 replies
  • 0 kudos

Sort within a groupBy with dataframe

Using Spark DataFrame, eg. myDf .filter(col("timestamp").gt(15000)) .groupBy("groupingKey") .agg(collect_list("aDoubleValue")) I want the collect_list to return the result, but ordered according to "timestamp". i.a. I want the GroupBy results...

  • 5989 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Laurent Thiebaud,Please use the below format to sort within a groupby, import org.apache.spark.sql.functions._ df.groupBy("columnA").agg(sort_array(collect_list("columnB")))

  • 0 kudos
Labels