<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pyspark DataFrame: Converting one column from string to float/double in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29861#M21562</link>
    <description>&lt;P&gt;The &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;cast&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt; function can convert the specified columns into different dataTypes. You shouldn't need a UDF to do this. If rawdata is a DataFrame, this should work:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast" target="test_blank"&gt;https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = rawdata.select(col('house name'), rawdata.price.cast('float').alias('price'))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 11 Mar 2016 18:03:34 GMT</pubDate>
    <dc:creator>raela</dc:creator>
    <dc:date>2016-03-11T18:03:34Z</dc:date>
    <item>
      <title>Pyspark DataFrame: Converting one column from string to float/double</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29857#M21558</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Pyspark 1.6: DataFrame: Converting one column from string to float/double&lt;/P&gt;
&lt;P&gt;I have two columns in a dataframe both of which are loaded as string.&lt;/P&gt;
&lt;P&gt;DF = rawdata.select('house name', 'price')&lt;/P&gt;
&lt;P&gt;I want to convert DF.price to float.&lt;/P&gt;
&lt;P&gt;DF = rawdata.select('house name', float('price')) #did not work&lt;/P&gt;
&lt;P&gt;DF[DF.price = float(DF.price)) # did not work&lt;/P&gt;
&lt;P&gt;DF.price = DF.price.astype(float) # Panda like script did not work&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Would you please help to convert it in Dataframe?&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;I know how to convert in the RDD: DF.map(lambda x: float(x.price)&lt;/P&gt;
&lt;P&gt;&lt;B&gt;But, I am trying to do all the conversion in the Dataframe.&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;Note: My platform does not have the same interface as the Databrick platform, in which you can change the column type during loading the file.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Feb 2016 16:34:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29857#M21558</guid>
      <dc:creator>SohelKhan</dc:creator>
      <dc:date>2016-02-22T16:34:33Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark DataFrame: Converting one column from string to float/double</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29858#M21559</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;You can use udf to do that. But unfortunately , there's no builtin for this type conversion.&lt;/P&gt;
&lt;P&gt;sqlContext.udf.register("float",lambda x:float(x))&lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import expr&lt;/P&gt;
&lt;P&gt;DF = rawdata.select('house name', expr(float('price'))&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Feb 2016 05:55:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29858#M21559</guid>
      <dc:creator>zjffdu</dc:creator>
      <dc:date>2016-02-25T05:55:58Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark DataFrame: Converting one column from string to float/double</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29859#M21560</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I fixed it as follows:&lt;/P&gt;
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.functions import udf&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.types import StringType&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;def string_to_float(x):&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;PRE&gt;&lt;CODE&gt;  return float(x)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;udfstring_to_float = udf(string_to_float, StringType())&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;PRE&gt;&lt;CODE&gt;rawdata.withColumn("name",udfstring_to_float("numberfloat") )&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;Out[8]: DataFrame[name: string, number_int: int, numberfloat: double] &lt;/P&gt;</description>
      <pubDate>Sat, 27 Feb 2016 23:28:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29859#M21560</guid>
      <dc:creator>SohelKhan</dc:creator>
      <dc:date>2016-02-27T23:28:40Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark DataFrame: Converting one column from string to float/double</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29860#M21561</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Thanks for the suggestion. Sorry though, it did not work.&lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import udf &lt;/P&gt;
&lt;P&gt;sqlContext.udf.register("float",lambda x:float(x)) &lt;/P&gt;
&lt;P&gt;from pyspark.sql.functions import expr &lt;/P&gt;
&lt;P&gt;DF = rawdata.select('name', expr(float('numberfloat')))&lt;/P&gt;
&lt;P&gt;---------------------------------------------------------------------------&lt;/P&gt;
&lt;P&gt;ValueError Traceback (most recent call last) &lt;/P&gt;
&lt;P&gt;&amp;lt;ipython-input-13-243d7c9f050e&amp;gt; in &amp;lt;module&amp;gt;() &lt;/P&gt;
&lt;P&gt; 4from pyspark.sql.functions import expo&lt;/P&gt;
&lt;P&gt; 5----&amp;gt; &lt;/P&gt;
&lt;P&gt;6DF = rawdata.select('name', expr(float('numberfloat')))&lt;/P&gt;
&lt;P&gt;ValueError: could not convert string to float: numberfloat&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 06:36:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29860#M21561</guid>
      <dc:creator>SohelKhan</dc:creator>
      <dc:date>2016-02-29T06:36:12Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark DataFrame: Converting one column from string to float/double</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29861#M21562</link>
      <description>&lt;P&gt;The &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;cast&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt; function can convert the specified columns into different dataTypes. You shouldn't need a UDF to do this. If rawdata is a DataFrame, this should work:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast" target="test_blank"&gt;https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = rawdata.select(col('house name'), rawdata.price.cast('float').alias('price'))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 18:03:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29861#M21562</guid>
      <dc:creator>raela</dc:creator>
      <dc:date>2016-03-11T18:03:34Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark DataFrame: Converting one column from string to float/double</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29862#M21563</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Slightly simpler:&lt;/P&gt;
&lt;P&gt;df_num = df.select(df.employment.cast("float"), &lt;/P&gt;&lt;P&gt;&lt;/P&gt; df.education.cast("float"), &lt;P&gt;&lt;/P&gt; df.health.cast("float"))
&lt;P&gt;This works with multiple columns, three shown here.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2017 16:31:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-dataframe-converting-one-column-from-string-to-float/m-p/29862#M21563</guid>
      <dc:creator>AidanCondron</dc:creator>
      <dc:date>2017-01-11T16:31:34Z</dc:date>
    </item>
  </channel>
</rss>

