Topics with Label: Different Number

by KKo • Contributor III

03-28-2022 12:47:46 PM

15331 Views
3 replies
2 kudos

Resolved! Union Multiple dataframes in loop, with different schema

With in a loop I have few dataframes created. I can union them with out an issue if they have same schema using (df_unioned = reduce(DataFrame.unionAll, df_list). Now my problem is how to union them if one of the dataframe in df_list has different nu...

Data Engineering

15331 Views
3 replies
2 kudos

03-28-2022 12:47:46 PM

View Replies

Latest Reply

anoopunni
New Contributor II

07-23-2023 8:47:55 PM

2 kudos

Hi,I have come across same scenario, using reduce() and unionByname we can implement the solution as below:val lstDF: List[Datframe] = List(df1,df2,df3,df4,df5)val combinedDF = lstDF.reduce((df1, df2) => df1.unionByName(df2, allowMissingColumns = tru...

2 kudos

07-23-2023 8:47:55 PM

2 More Replies

by User16826994223 • Honored Contributor III

06-25-2021 10:33:41 AM

2157 Views
1 replies
1 kudos

spark data frame parquet vs delta : rows Doesn't match

I have data written in Delta on ADLS. As I understand the delta also internal file in parquet format but when Iread the file in different format I got different record countspark.read.parquet() or spark.read.format('delta').load()df = spark.read.for...

Data Engineering

2157 Views
1 replies
1 kudos

06-25-2021 10:33:41 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 10:34:02 AM

1 kudos

I think you have written in delta twice using overwrite mode .But Delta is versioned data format - when you use overwrite, it doesn't delete previous data, it just writes new files, and don't delete files immediately - they are just marked as delete...

1 kudos

06-25-2021 10:34:02 AM

Databricks Community

Forum Posts

Resolved! Union Multiple dataframes in loop, with different schema

spark data frame parquet vs delta : rows Doesn't match