Topics with Label: Pyspark Dataframes

Forum Posts

Sorted by:

by MartinB • Contributor III

09-11-2021 3:34:17 AM

14955 Views
5 replies
3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Data Engineering

14955 Views
5 replies
3 kudos

09-11-2021 3:34:17 AM

View Replies

Latest Reply

ThePhil
New Contributor II

01-31-2025 2:26:53 PM

3 kudos

Be aware, that in Databricks 15.2 LTS this behavior is broken.I cannot find the code, but most likely related to the following option:https://github.com/apache/spark/commit/c1c710e7da75b989f4d14e84e85f336bc10920e0#diff-f9ddcc6cba651c6ebfd34e29ef049c3...

3 kudos

01-31-2025 2:26:53 PM

4 More Replies

by amitdatabricksc • New Contributor II

10-16-2021 11:53:55 AM

9141 Views
2 replies
0 kudos

AttributeError: 'NoneType' object has no attribute 'repartition'

I am using a framework and i have a query where i am doing,df = seg_df.select(*).write.option("compression", "gzip') and i am getting below error,When i don't do the write.option i am not getting below error. Why is it giving me repartition error. Wh...

Data Engineering

9141 Views
2 replies
0 kudos

10-16-2021 11:53:55 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-18-2021 3:21:28 PM

0 kudos

hi @AMIT GADHAVI ,could you provide more details? for example, what is your data source? how do you repartition? etc

0 kudos

10-18-2021 3:21:28 PM

1 More Replies

by TrinaDe • New Contributor II

07-15-2021 8:11:23 AM

5708 Views
1 replies
1 kudos

How can we join two pyspark dataframes side by side (without using join,equivalent to pd.concat() in pandas) ? I am trying to join two extremely large dataframes where each is of the order of 50 million.

My two dataframes look like new_df2_record1 and new_df2_record2 and the expected output dataframe I want is like new_df2: The code I have tried is the following: If I print the top 5 rows of new_df2, it gives the output as expected but I cannot pri...

Data Engineering

5708 Views
1 replies
1 kudos

07-15-2021 8:11:23 AM

View Replies

Latest Reply

TrinaDe
New Contributor II

07-15-2021 8:21:19 AM

1 kudos

The code in a more legible format:

1 kudos

07-15-2021 8:21:19 AM

by Anonymous • Not applicable

06-22-2021 7:24:36 PM

1317 Views
0 replies
0 kudos

What are the resulting steps when two pyspark dataframes are co-grouped by a common key & a function is applied to each co-group?

Data Engineering

1317 Views
0 replies
0 kudos

06-22-2021 7:24:36 PM

Databricks Community

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

AttributeError: 'NoneType' object has no attribute 'repartition'

How can we join two pyspark dataframes side by side (without using join,equivalent to pd.concat() in pandas) ? I am trying to join two extremely large dataframes where each is of the order of 50 million.

What are the resulting steps when two pyspark dataframes are co-grouped by a common key & a function is applied to each co-group?