11-18-2022 11:08 AM
var df2 = spark.read
.format("csv")
.option("sep", ",")
.option("header", "true")
.option("inferSchema", "true")
.load("src/main/resources/datasets/titanic.csv")
df2.createOrReplaceTempView("titanic")
spark.table("titanic").cache()
spark.sql("Analyze table titanic compute statistics for all columns")
spark.sql("desc extended titanic Name").show(100, false)
I have created a spark session, imported a dataset and then trying to register it as a temp table, upon using analyze command i gett all statistics value as NULL for all column.
+--------------+----------+
|info_name |info_value|
+--------------+----------+
|col_name |Name |
|data_type |string |
|comment |NULL |
|min |NULL |
|max |NULL |
|num_nulls |NULL |
|distinct_count|NULL |
|avg_col_len |NULL |
|max_col_len |NULL |
|histogram |NULL |
+--------------+----------+
Can someone suggest what is it that i am doing wrong.
The thing I noticed is if i make a new table
spark.sql("create table newtitanic as select * from titanic")
spark.sql("Analyze table newtitanic compute statistics for all columns")
spark.sql("desc extended newtitanic Name").show(130, false)
this will fetch me statistics for all columns.
12-03-2022 10:15 PM
12-03-2022 10:56 PM
Hi @Aviral Bhardwaj
Thank you for the answer.
My question is more about using analyze table command followed by describe extended on the temp view that is created. you are using the right dataset as shared in the ss. I have shared all the sequence of commands which lead to the state of getting null stats.
12-03-2022 11:07 PM
12-03-2022 11:12 PM
can you share what the *newtitanic* is I think that you would have done something similar
spark.sql("create table newtitanic as select * from titanic")
something like this works for me, but the issue is i first make a temp view then again create a table which would be persisted in memory.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now