cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16776430979
by New Contributor III
  • 1041 Views
  • 1 replies
  • 0 kudos

How to optimize conversion between PySpark and Arrow?

Seems like you can convert between dataframes and Arrow objects by using Pandas as an intermediary, but there are some limitations (e.g. it collects all records in the DataFrame to the driver and should be done on a small subset of the data, you hit ...

  • 1041 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @josephine.ho! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
User16776430979
by New Contributor III
  • 1916 Views
  • 1 replies
  • 0 kudos

How to optimize and convert a Spark DataFrame to Arrow?

Example use case: When connecting a sample Plotly Dash application to a large dataset, in order to test the performance, I need the file format to be in either hdf5 or arrow. According to this doc: Optimize conversion between PySpark and pandas DataF...

  • 1916 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ josephine.ho! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 0 kudos
vaio
by New Contributor II
  • 5117 Views
  • 6 replies
  • 0 kudos

Convert String to Timestamp

I have a dataset with one column of string type ('2014/12/31 18:00:36'). How can I convert it to timastamp type with PySpark?

  • 5117 Views
  • 6 replies
  • 0 kudos
Latest Reply
gideon
New Contributor II
  • 0 kudos

hope you dont mind if i ask you to elaborate further for a shaper understanding? see my basketball court layout at https://www.recreationtipsy.com/basketball-court/

  • 0 kudos
5 More Replies
SohelKhan
by New Contributor II
  • 13679 Views
  • 5 replies
  • 0 kudos

Resolved! Pyspark DataFrame: Converting one column from string to float/double

Pyspark 1.6: DataFrame: Converting one column from string to float/double I have two columns in a dataframe both of which are loaded as string. DF = rawdata.select('house name', 'price') I want to convert DF.price to float. DF = rawdata.select('hous...

  • 13679 Views
  • 5 replies
  • 0 kudos
Latest Reply
AidanCondron
New Contributor II
  • 0 kudos

Slightly simpler: df_num = df.select(df.employment.cast("float"), df.education.cast("float"), df.health.cast("float")) This works with multiple columns, three shown here.

  • 0 kudos
4 More Replies
Labels