โ09-23-2021 12:36 AM
โ09-23-2021 09:14 AM
Assume that "df" is a Dataframe. The following code (with comments) will show various options to describe a dataframe.
# get a row count
df.count()
# get the approximate count (faster than the .count())
df.rdd.countApprox()
# print the schema (shape of your df)
df.printSchema()
# get the columns as a list
df.columns
# get the columns and types as tuples in a list
df.dtypes
โ09-23-2021 09:14 AM
Assume that "df" is a Dataframe. The following code (with comments) will show various options to describe a dataframe.
# get a row count
df.count()
# get the approximate count (faster than the .count())
df.rdd.countApprox()
# print the schema (shape of your df)
df.printSchema()
# get the columns as a list
df.columns
# get the columns and types as tuples in a list
df.dtypes
โ04-18-2022 02:10 AM
Thank you @Ryan Chynowethโ for your answer.
โ04-18-2022 01:43 AM
To obtain the shape of a data frame in PySpark, you can obtain the number of rows through "DF.count()" and the number of columns through "len(DF.columns)". The code below provides the shape of PySpark data frame "DF".
print((DF.count(), len(DF.columns)))
โ04-18-2022 02:10 AM
Thank you @Feras Sederโ for the answer.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.