Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

How can i read parquet file compressed by snappy?

Hi All, I wanted to read parqet file compressed by snappy into Spark RDD input file name is: part-m-00000.snappy.parquet i have used sqlContext.setConf("spark.sql.parquet.compression.codec.", "snappy") val inputRDD=sqlContext.parqetFile(args(0)) whe...

Mallesh by New Contributor
  • 7240 Views
  • 1 replies
  • 0 kudos

How to avoid empty/null keys in DataFrame groupby?

Hi I have Spark job which does group by and I cant avoid it because of my use case. I have large dataset around 1 TB which I need to process/update in DataFrame. Now my jobs shuffles huge data and slows things because of shuffling and groupby. One r...

UmeshKacha by New Contributor II
  • 3911 Views
  • 3 replies
  • 0 kudos

How do I clear all output results in a notebook?

I'm building notebooks for tutorial sessions and I want to clear all the output results from the notebook before distributing it to the participants. This functionality exists in Juypter but I can't find it in Databricks. Any pointers?

KendraVant by New Contributor II
  • 3445 Views
  • 5 replies
  • 0 kudos