<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic read percentage values in spark ( no casting ) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34122#M24903</link>
    <description>&lt;P&gt;I have a xlsx file which has a single column ;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;percentage&lt;/P&gt;&lt;P&gt;30%&lt;/P&gt;&lt;P&gt;40%&lt;/P&gt;&lt;P&gt;50%&lt;/P&gt;&lt;P&gt;-10%&lt;/P&gt;&lt;P&gt;0.00%&lt;/P&gt;&lt;P&gt;0%&lt;/P&gt;&lt;P&gt;0.10%&lt;/P&gt;&lt;P&gt;110%&lt;/P&gt;&lt;P&gt;99.99%&lt;/P&gt;&lt;P&gt;99.98%&lt;/P&gt;&lt;P&gt;-99.99%&lt;/P&gt;&lt;P&gt;-99.98%&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;when i read this using Apache-Spark out put i get is,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;|percentage|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;|       0.3|&lt;/P&gt;&lt;P&gt;|       0.4|&lt;/P&gt;&lt;P&gt;|       0.5|&lt;/P&gt;&lt;P&gt;|      -0.1|&lt;/P&gt;&lt;P&gt;|       0.0|&lt;/P&gt;&lt;P&gt;|       0.0|&lt;/P&gt;&lt;P&gt;|     0.001|&lt;/P&gt;&lt;P&gt;|       1.1|&lt;/P&gt;&lt;P&gt;|    0.9999|&lt;/P&gt;&lt;P&gt;|    0.9998|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;expected output is ,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;|percentage|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;|       30%|&lt;/P&gt;&lt;P&gt;|       40%|&lt;/P&gt;&lt;P&gt;|       50%|&lt;/P&gt;&lt;P&gt;|      -10%|&lt;/P&gt;&lt;P&gt;|     0.00%|&lt;/P&gt;&lt;P&gt;|        0%|&lt;/P&gt;&lt;P&gt;|     0.10%|&lt;/P&gt;&lt;P&gt;|      110%|&lt;/P&gt;&lt;P&gt;|    99.99%|&lt;/P&gt;&lt;P&gt;|    99.98%|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My code -&lt;/P&gt;&lt;P&gt;val spark = SparkSession&lt;/P&gt;&lt;P&gt;    .builder&lt;/P&gt;&lt;P&gt;    .appName("trimTest")&lt;/P&gt;&lt;P&gt;    .master("local[*]")&lt;/P&gt;&lt;P&gt;    .getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;  val df = spark.read&lt;/P&gt;&lt;P&gt;      .format("com.crealytics.spark.excel").&lt;/P&gt;&lt;P&gt;      option("header", "true").&lt;/P&gt;&lt;P&gt;      option("maxRowsInMemory", 1000).&lt;/P&gt;&lt;P&gt;      option("inferSchema", "true").&lt;/P&gt;&lt;P&gt;  load("data/percentage.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;  df.printSchema()&lt;/P&gt;&lt;P&gt;  df.show(10)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I Don't want to use casting or turning inferschema to false, i want a way to read percentage value as percentage not as double or string.&lt;/P&gt;</description>
    <pubDate>Wed, 01 Dec 2021 13:11:00 GMT</pubDate>
    <dc:creator>sarvesh</dc:creator>
    <dc:date>2021-12-01T13:11:00Z</dc:date>
    <item>
      <title>read percentage values in spark ( no casting )</title>
      <link>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34122#M24903</link>
      <description>&lt;P&gt;I have a xlsx file which has a single column ;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;percentage&lt;/P&gt;&lt;P&gt;30%&lt;/P&gt;&lt;P&gt;40%&lt;/P&gt;&lt;P&gt;50%&lt;/P&gt;&lt;P&gt;-10%&lt;/P&gt;&lt;P&gt;0.00%&lt;/P&gt;&lt;P&gt;0%&lt;/P&gt;&lt;P&gt;0.10%&lt;/P&gt;&lt;P&gt;110%&lt;/P&gt;&lt;P&gt;99.99%&lt;/P&gt;&lt;P&gt;99.98%&lt;/P&gt;&lt;P&gt;-99.99%&lt;/P&gt;&lt;P&gt;-99.98%&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;when i read this using Apache-Spark out put i get is,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;|percentage|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;|       0.3|&lt;/P&gt;&lt;P&gt;|       0.4|&lt;/P&gt;&lt;P&gt;|       0.5|&lt;/P&gt;&lt;P&gt;|      -0.1|&lt;/P&gt;&lt;P&gt;|       0.0|&lt;/P&gt;&lt;P&gt;|       0.0|&lt;/P&gt;&lt;P&gt;|     0.001|&lt;/P&gt;&lt;P&gt;|       1.1|&lt;/P&gt;&lt;P&gt;|    0.9999|&lt;/P&gt;&lt;P&gt;|    0.9998|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;expected output is ,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;|percentage|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;|       30%|&lt;/P&gt;&lt;P&gt;|       40%|&lt;/P&gt;&lt;P&gt;|       50%|&lt;/P&gt;&lt;P&gt;|      -10%|&lt;/P&gt;&lt;P&gt;|     0.00%|&lt;/P&gt;&lt;P&gt;|        0%|&lt;/P&gt;&lt;P&gt;|     0.10%|&lt;/P&gt;&lt;P&gt;|      110%|&lt;/P&gt;&lt;P&gt;|    99.99%|&lt;/P&gt;&lt;P&gt;|    99.98%|&lt;/P&gt;&lt;P&gt;+----------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My code -&lt;/P&gt;&lt;P&gt;val spark = SparkSession&lt;/P&gt;&lt;P&gt;    .builder&lt;/P&gt;&lt;P&gt;    .appName("trimTest")&lt;/P&gt;&lt;P&gt;    .master("local[*]")&lt;/P&gt;&lt;P&gt;    .getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;  val df = spark.read&lt;/P&gt;&lt;P&gt;      .format("com.crealytics.spark.excel").&lt;/P&gt;&lt;P&gt;      option("header", "true").&lt;/P&gt;&lt;P&gt;      option("maxRowsInMemory", 1000).&lt;/P&gt;&lt;P&gt;      option("inferSchema", "true").&lt;/P&gt;&lt;P&gt;  load("data/percentage.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;  df.printSchema()&lt;/P&gt;&lt;P&gt;  df.show(10)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I Don't want to use casting or turning inferschema to false, i want a way to read percentage value as percentage not as double or string.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 13:11:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34122#M24903</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-12-01T13:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: read percentage values in spark ( no casting )</title>
      <link>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34123#M24904</link>
      <description>&lt;P&gt;Output is rather correct as this is as percentage are in excel (what is seen in excel is just formatting of cells). In Spark the same 100% = 1.&lt;/P&gt;&lt;P&gt;If you want to display as percentage for example in dashboard you just need to concatenate % sign.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;.withColumn("rate",(col("rate") * 100).cast("int"))
.withColumn("rate",concat((col("rate") * 100).cast("int"),lit('%')))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 13:32:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34123#M24904</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-12-01T13:32:10Z</dc:date>
    </item>
    <item>
      <title>Re: read percentage values in spark ( no casting )</title>
      <link>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34124#M24905</link>
      <description>&lt;P&gt;Affirmative.  This is how excel stores percentages.  What you see is just cell formatting.&lt;/P&gt;&lt;P&gt;Databricks notebooks do not (yet?)  have the possibility to format the output.&lt;/P&gt;&lt;P&gt;But it is easy to use a BI tool on top of Databricks, where you can change the formatting.&lt;/P&gt;&lt;P&gt;And that is in my opinion how it should be done.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 13:42:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34124#M24905</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-12-01T13:42:43Z</dc:date>
    </item>
    <item>
      <title>Re: read percentage values in spark ( no casting )</title>
      <link>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34125#M24906</link>
      <description>&lt;P&gt;casting is not what i want suppose i get a big excel file with millions of rows, casting will make it super slow.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 13:51:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34125#M24906</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-12-01T13:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: read percentage values in spark ( no casting )</title>
      <link>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34126#M24907</link>
      <description>&lt;P&gt;No necessarely.  Millions of rows is not that much.  For Excel it is, but not for Spark.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Dec 2021 13:54:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-percentage-values-in-spark-no-casting/m-p/34126#M24907</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-12-01T13:54:10Z</dc:date>
    </item>
  </channel>
</rss>

