<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Couldn't convert string to float when fit model in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26773#M18785</link>
    <description>&lt;P&gt;Hi @Enrico Cascavilla​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did you were able to find the solution or your still are looking for help? If you did find the solution, please mark it as best.&lt;/P&gt;</description>
    <pubDate>Tue, 07 Jun 2022 16:23:03 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-06-07T16:23:03Z</dc:date>
    <item>
      <title>Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26760#M18772</link>
      <description>&lt;P&gt;Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I pull the data from snowflake&lt;/P&gt;&lt;P&gt;df = spark.read \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.format("snowflake") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.options(**options) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option('query', query) \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.load()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Check the data type of the features with prinSchema()&lt;/P&gt;&lt;P&gt;convert to pandas with&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.to_pandas_on_spark() &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and I have the &lt;U&gt;FIRST PROBLEM&lt;/U&gt; that all the column become 'object' type&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I convert the column in float /int&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and I run a simple RandomForest classifier &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;from sklearn.ensemble import RandomForestClassifier as srf&lt;/P&gt;&lt;P&gt;model = srf()&lt;/P&gt;&lt;P&gt;X = df[['col_float]]&lt;/P&gt;&lt;P&gt;y=df['label']&lt;/P&gt;&lt;P&gt;model.fit(X, y)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and here I have the&lt;U&gt; SECOND PROBLEM &lt;/U&gt;I keep receiving this error&lt;/P&gt;&lt;P&gt;ValueError: could not convert string to float: 'col_float'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have been looking at different tutorials, trying different things. I think it might be something silly because I am naive in databricks but I am wasting so much time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does anyone had the some issue or knows what is happening?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:50:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26760#M18772</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-01T11:50:05Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26761#M18773</link>
      <description>&lt;P&gt;can you check &lt;A href="https://stackoverflow.com/questions/33481572/pyspark-topandas-results-in-object-column-where-expected-numeric-one" alt="https://stackoverflow.com/questions/33481572/pyspark-topandas-results-in-object-column-where-expected-numeric-one" target="_blank"&gt;this SO topic&lt;/A&gt;?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:57:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26761#M18773</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-03-01T11:57:36Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26762#M18774</link>
      <description>&lt;P&gt;Hi, thanks for replying. I did check, but nothing changed.&lt;/P&gt;&lt;P&gt;I still have both problem, when I convert to pandas everything is still an object&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and then I convert the column but still i have that valueerror&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 13:02:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26762#M18774</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-01T13:02:58Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26763#M18775</link>
      <description>&lt;P&gt;Can you check what types the df has &lt;B&gt;before&lt;/B&gt; converting it to pandas?&lt;/P&gt;&lt;P&gt;Then &lt;A href="https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/types.html" alt="https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/types.html" target="_blank"&gt;check here&lt;/A&gt; how this would translate in pandas.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 14:09:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26763#M18775</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-03-01T14:09:42Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26764#M18776</link>
      <description>&lt;P&gt;it is a pyspark.sqldataframe.dataframe&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;to convert to pandas I have tried with&lt;/P&gt;&lt;P&gt;df.to_pandas_on_spark()&lt;/P&gt;&lt;P&gt;df.toPandas()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;import pyspark.pandas as ps&lt;/P&gt;&lt;P&gt;ps.DataFrame(df)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;all of them same result with everything becoming an object.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;but at the same time why also after i convert the columns into float I get the error that can't convert string to float&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 15:27:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26764#M18776</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-01T15:27:56Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26765#M18777</link>
      <description>&lt;P&gt;clearly the conversion is not what you expect.&lt;/P&gt;&lt;P&gt;What I mean is: can you check the schema of the dataframe (pyspark dataframe) and see what column types it has.&lt;/P&gt;&lt;P&gt;Because depending on this pandas will cast them or put them into object type.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 15:30:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26765#M18777</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-03-01T15:30:12Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26766#M18778</link>
      <description>&lt;P&gt;the schema of the spark dataframe is perfectly fine with all the features different (date,  string, decimal)&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 16:26:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26766#M18778</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-01T16:26:51Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26767#M18779</link>
      <description>&lt;P&gt;date translates to object,&lt;/P&gt;&lt;P&gt;string translates to object,&lt;/P&gt;&lt;P&gt;decimal translates to object&lt;/P&gt;&lt;P&gt;(see link I posted)&lt;/P&gt;&lt;P&gt;This is normal behavior.&lt;/P&gt;&lt;P&gt;You should convert the object type in pandas,&lt;/P&gt;&lt;P&gt;&lt;A href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html" alt="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html" target="_blank"&gt;https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 16:33:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26767#M18779</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-03-01T16:33:02Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26768#M18780</link>
      <description>&lt;P&gt;Ok, understood the trasformation to pandas, thank you :). &lt;/P&gt;&lt;P&gt;But since I had everything in an object format I always converted all the columns to the correct format using astype(format)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;so when I run df.dtypes I see the correct format &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;but still when I try to fit a model it gives me the ValueError: could not convert string to float: 'name of the first feature'&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 17:06:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26768#M18780</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-01T17:06:19Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26769#M18781</link>
      <description>&lt;P&gt;could it be the comma's and thousand separators?&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/39125665/cannot-convert-string-to-float-in-pandas-valueerror" alt="https://stackoverflow.com/questions/39125665/cannot-convert-string-to-float-in-pandas-valueerror" target="_blank"&gt;https://stackoverflow.com/questions/39125665/cannot-convert-string-to-float-in-pandas-valueerror&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 08:15:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26769#M18781</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-03-02T08:15:02Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26770#M18782</link>
      <description>&lt;P&gt;This is the weird thing. The column is already being transformed in float and you can see that when you call dtypes, so if I try to do one of these methods to check commas or anything else it says&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"Cannot call StringMethods on type FloatType"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;but the same error when I try to fit the model. To make it easy I am trying to fit a model with only 1 feature.&lt;/P&gt;&lt;P&gt;To me seems that the error is about the name of the column like it is trying to fit the name of the column. Usually when print the ValueError should give you the string/ value that cannot convert to float, and in this case it give me the name of the column&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 09:33:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26770#M18782</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-02T09:33:37Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26771#M18783</link>
      <description>&lt;P&gt;I can add that if I convert the data type in spark&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if I use toPandas() --&amp;gt; then it works&lt;/P&gt;&lt;P&gt;if I use to_pandas_on_spark() --&amp;gt;same error&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 13:04:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26771#M18783</guid>
      <dc:creator>enri_casca</dc:creator>
      <dc:date>2022-03-02T13:04:30Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26772#M18784</link>
      <description>&lt;P&gt;did you figure this one out @Enrico Cascavilla​&amp;nbsp;?&lt;/P&gt;</description>
      <pubDate>Wed, 04 May 2022 16:16:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26772#M18784</guid>
      <dc:creator>Dan_Z</dc:creator>
      <dc:date>2022-05-04T16:16:04Z</dc:date>
    </item>
    <item>
      <title>Re: Couldn't convert string to float when fit model</title>
      <link>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26773#M18785</link>
      <description>&lt;P&gt;Hi @Enrico Cascavilla​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did you were able to find the solution or your still are looking for help? If you did find the solution, please mark it as best.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 16:23:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/couldn-t-convert-string-to-float-when-fit-model/m-p/26773#M18785</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-06-07T16:23:03Z</dc:date>
    </item>
  </channel>
</rss>

