cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KKo
by Contributor III
  • 7166 Views
  • 4 replies
  • 2 kudos

Resolved! Union Multiple dataframes in loop, with different schema

With in a loop I have few dataframes created. I can union them with out an issue if they have same schema using (df_unioned = reduce(DataFrame.unionAll, df_list). Now my problem is how to union them if one of the dataframe in df_list has different nu...

  • 7166 Views
  • 4 replies
  • 2 kudos
Latest Reply
anoopunni
New Contributor II
  • 2 kudos

Hi,I have come across same scenario, using reduce() and unionByname we can implement the solution as below:val lstDF: List[Datframe] = List(df1,df2,df3,df4,df5)val combinedDF = lstDF.reduce((df1, df2) => df1.unionByName(df2, allowMissingColumns = tru...

  • 2 kudos
3 More Replies
User16826994223
by Honored Contributor III
  • 1320 Views
  • 1 replies
  • 1 kudos

spark data frame parquet vs delta : rows Doesn't match

I have data written in Delta on ADLS. As I understand the delta also internal file in parquet format but when Iread the file in different format I got different record countspark.read.parquet() or spark.read.format('delta').load()df = spark.read.for...

  • 1320 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

I think you have written in delta twice using overwrite mode .But Delta is versioned data format - when you use overwrite, it doesn't delete previous data, it just writes new files, and don't delete files immediately - they are just marked as delete...

  • 1 kudos
Labels