Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.
@shiva Santosh​ Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis​ mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.
We can add a new column using the withColumn() method of the data frame, like belowfrom pyspark.sql.functions import lit
df = sqlContext.createDataFrame(
[(1, "a"), (2, "b")], ("c1", "c2"))
df_new_col = df.withColumn("c3", lit(0))
df_new_col....
I have the following sparkdataframe :
agent_id/ payment_amount
a /1000
b /1100
a /1100
a /1200
b /1200
b /1250
a /10000
b /9000
my desire output would be something like
<code>agen_id 95_quantile
a whatever is95 quantile for a...
For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see th...
I have the following sparkdataframe :
sale_id/ created_at
1 /2016-05-28T05:53:31.042Z
2 /2016-05-30T12:50:58.184Z
3/ 2016-05-23T10:22:18.858Z
4 /2016-05-27T09:20:15.158Z
5 /2016-05-21T08:30:17.337Z
6 /2016-05-28T07:41:14.361Z
i need t add a year-wee...