Hi All,
I would you to get some ideas on how to improve performance on a data frame with around 10M rows.
adls- gen2
df1 =source1 , format , parquet ( 10 m)
df2 =source2 , format , parquet ( 10 m)
df = join df1 and df2 type =inner join
df.count() is taking for ever.
trying to join the above sources and aggregate them and write back to adls .