Topics with Label: Sparkdataframe

Forum Posts

Sorted by:

by Santosh09 • New Contributor II

01-18-2022 3:07:25 AM

3561 Views
5 replies
3 kudos

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.

Data Engineering

3561 Views
5 replies
3 kudos

01-18-2022 3:07:25 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

03-14-2022 8:27:45 AM

3 kudos

@shiva Santosh Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.

3 kudos

03-14-2022 8:27:45 AM

4 More Replies

by Kaniz • Community Manager

09-22-2021 1:57:14 PM

1073 Views
1 replies
0 kudos

How do I add a new column to a Spark DataFrame (using PySpark)?

Data Engineering

1073 Views
1 replies
0 kudos

09-22-2021 1:57:14 PM

View Replies

Latest Reply

saipujari_spark
Valued Contributor

09-22-2021 3:39:58 PM

0 kudos

We can add a new column using the withColumn() method of the data frame, like belowfrom pyspark.sql.functions import lit df = sqlContext.createDataFrame( [(1, "a"), (2, "b")], ("c1", "c2")) df_new_col = df.withColumn("c3", lit(0)) df_new_col....

0 kudos

09-22-2021 3:39:58 PM

by dshosseinyousef • New Contributor II

09-22-2016 1:29:26 AM

7327 Views
2 replies
0 kudos

how to Calculate quantile on grouped data in spark Dataframe

I have the following sparkdataframe : agent_id/ payment_amount a /1000 b /1100 a /1100 a /1200 b /1200 b /1250 a /10000 b /9000 my desire output would be something like <code>agen_id 95_quantile a whatever is95 quantile for a...

Data Engineering

7327 Views
2 replies
0 kudos

09-22-2016 1:29:26 AM

View Replies

Latest Reply

Weiluo__David_R
New Contributor II

12-30-2016 10:17:54 AM

0 kudos

For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see th...

0 kudos

12-30-2016 10:17:54 AM

1 More Replies

by dshosseinyousef • New Contributor II

09-20-2016 12:48:29 AM

4836 Views
2 replies
0 kudos

How to extract year and week number from a columns in a sparkDataFrame?

I have the following sparkdataframe : sale_id/ created_at 1 /2016-05-28T05:53:31.042Z 2 /2016-05-30T12:50:58.184Z 3/ 2016-05-23T10:22:18.858Z 4 /2016-05-27T09:20:15.158Z 5 /2016-05-21T08:30:17.337Z 6 /2016-05-28T07:41:14.361Z i need t add a year-wee...

Data Engineering

4836 Views
2 replies
0 kudos

09-20-2016 12:48:29 AM

View Replies

Latest Reply

theodondre
New Contributor II

12-19-2016 8:45:24 AM

0 kudos

THIS IS HOW HE DOCUMENTATION LOOKS LIKE

0 kudos

12-19-2016 8:45:24 AM

1 More Replies

Databricks

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

How do I add a new column to a Spark DataFrame (using PySpark)?

how to Calculate quantile on grouped data in spark Dataframe

How to extract year and week number from a columns in a sparkDataFrame?