<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ANALYZE TABLE showing NULLs for all statistics in Spark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21689#M14820</link>
    <description>&lt;P&gt;can you share what the  *newtitanic* is I think that you would have done something similar&lt;/P&gt;&lt;P&gt;spark.sql("create table newtitanic as select * from titanic")&lt;/P&gt;&lt;P&gt;something like this works for me, but the issue is i first make a temp view then again create a table which would be persisted in memory.&lt;/P&gt;</description>
    <pubDate>Sun, 04 Dec 2022 07:12:25 GMT</pubDate>
    <dc:creator>chhavibansal</dc:creator>
    <dc:date>2022-12-04T07:12:25Z</dc:date>
    <item>
      <title>ANALYZE TABLE showing NULLs for all statistics in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21685#M14816</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;var df2 = spark.read
    .format("csv")
    .option("sep", ",")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("src/main/resources/datasets/titanic.csv")
&amp;nbsp;
df2.createOrReplaceTempView("titanic")
spark.table("titanic").cache()
&amp;nbsp;
spark.sql("Analyze table titanic compute statistics for all columns")
spark.sql("desc extended titanic Name").show(100, false)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I have created a spark session, imported a dataset and then trying to register it as a temp table, upon using analyze command i gett all statistics value as NULL for all column.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;+--------------+----------+
|info_name     |info_value|
+--------------+----------+
|col_name      |Name      |
|data_type     |string    |
|comment       |NULL      |
|min           |NULL      |
|max           |NULL      |
|num_nulls     |NULL      |
|distinct_count|NULL      |
|avg_col_len   |NULL      |
|max_col_len   |NULL      |
|histogram     |NULL      |
+--------------+----------+&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Can someone suggest what is it that i am doing wrong.&lt;/P&gt;&lt;P&gt;The thing I noticed is if i make a new table&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt; spark.sql("create table newtitanic as select * from titanic")
spark.sql("Analyze table newtitanic compute statistics for all columns")
spark.sql("desc extended newtitanic Name").show(130, false)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;this will fetch me statistics for all columns.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Nov 2022 19:08:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21685#M14816</guid>
      <dc:creator>chhavibansal</dc:creator>
      <dc:date>2022-11-18T19:08:00Z</dc:date>
    </item>
    <item>
      <title>Re: ANALYZE TABLE showing NULLs for all statistics in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21686#M14817</link>
      <description>&lt;P&gt;hey ,&lt;/P&gt;&lt;P&gt;I have testing this but it is working fine for me, can you please share the data set link by that we can test and provide you better solution&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is my snapshot that what result I got&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1148iD84AF36BE857C90C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2022 06:15:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21686#M14817</guid>
      <dc:creator>Aviral-Bhardwaj</dc:creator>
      <dc:date>2022-12-04T06:15:54Z</dc:date>
    </item>
    <item>
      <title>Re: ANALYZE TABLE showing NULLs for all statistics in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21687#M14818</link>
      <description>&lt;P&gt;Hi @Aviral Bhardwaj​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for the answer.&lt;/P&gt;&lt;P&gt;My question is more about using analyze table command followed by describe extended on the temp view that is created.  you are using the right dataset as shared in the ss.  I have shared all the sequence of commands which lead to the state of getting null stats.&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2022 06:56:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21687#M14818</guid>
      <dc:creator>chhavibansal</dc:creator>
      <dc:date>2022-12-04T06:56:20Z</dc:date>
    </item>
    <item>
      <title>Re: ANALYZE TABLE showing NULLs for all statistics in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21688#M14819</link>
      <description>&lt;P&gt;@Chhavi Bansal​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it is happening because you are using specifically Name column while describing &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;so see this &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1147iA7EFF6B32F9200FC/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope you got some idea here&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aviral Bhardwaj&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2022 07:07:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21688#M14819</guid>
      <dc:creator>Aviral-Bhardwaj</dc:creator>
      <dc:date>2022-12-04T07:07:32Z</dc:date>
    </item>
    <item>
      <title>Re: ANALYZE TABLE showing NULLs for all statistics in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21689#M14820</link>
      <description>&lt;P&gt;can you share what the  *newtitanic* is I think that you would have done something similar&lt;/P&gt;&lt;P&gt;spark.sql("create table newtitanic as select * from titanic")&lt;/P&gt;&lt;P&gt;something like this works for me, but the issue is i first make a temp view then again create a table which would be persisted in memory.&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2022 07:12:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/analyze-table-showing-nulls-for-all-statistics-in-spark/m-p/21689#M14820</guid>
      <dc:creator>chhavibansal</dc:creator>
      <dc:date>2022-12-04T07:12:25Z</dc:date>
    </item>
  </channel>
</rss>

