cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Santosh09
by New Contributor II
  • 3561 Views
  • 5 replies
  • 3 kudos

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.

image.png
  • 3561 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16764241763
Honored Contributor
  • 3 kudos

@shiva Santosh​ Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis​  mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.

  • 3 kudos
4 More Replies
Kaniz
by Community Manager
  • 1073 Views
  • 1 replies
  • 0 kudos
  • 1073 Views
  • 1 replies
  • 0 kudos
Latest Reply
saipujari_spark
Valued Contributor
  • 0 kudos

We can add a new column using the withColumn() method of the data frame, like belowfrom pyspark.sql.functions import lit   df = sqlContext.createDataFrame( [(1, "a"), (2, "b")], ("c1", "c2"))   df_new_col = df.withColumn("c3", lit(0)) df_new_col....

  • 0 kudos
dshosseinyousef
by New Contributor II
  • 7327 Views
  • 2 replies
  • 0 kudos

how to Calculate quantile on grouped data in spark Dataframe

I have the following sparkdataframe : agent_id/ payment_amount a /1000 b /1100 a /1100 a /1200 b /1200 b /1250 a /10000 b /9000 my desire output would be something like <code>agen_id 95_quantile a whatever is95 quantile for a...

  • 7327 Views
  • 2 replies
  • 0 kudos
Latest Reply
Weiluo__David_R
New Contributor II
  • 0 kudos

For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see th...

  • 0 kudos
1 More Replies
dshosseinyousef
by New Contributor II
  • 4836 Views
  • 2 replies
  • 0 kudos

How to extract year and week number from a columns in a sparkDataFrame?

I have the following sparkdataframe : sale_id/ created_at 1 /2016-05-28T05:53:31.042Z 2 /2016-05-30T12:50:58.184Z 3/ 2016-05-23T10:22:18.858Z 4 /2016-05-27T09:20:15.158Z 5 /2016-05-21T08:30:17.337Z 6 /2016-05-28T07:41:14.361Z i need t add a year-wee...

  • 4836 Views
  • 2 replies
  • 0 kudos
Latest Reply
theodondre
New Contributor II
  • 0 kudos

THIS IS HOW HE DOCUMENTATION LOOKS LIKE

  • 0 kudos
1 More Replies
Labels