cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to find the size or shape of a DataFrame in PySpark?

Kaniz
Community Manager
Community Manager
 
1 ACCEPTED SOLUTION

Accepted Solutions

Ryan_Chynoweth
Honored Contributor III

Assume that "df" is a Dataframe. The following code (with comments) will show various options to describe a dataframe.

# get a row count
df.count()
 
# get the approximate count (faster than the .count())
df.rdd.countApprox()
 
# print the schema (shape of your df)
df.printSchema()
 
# get the columns as a list
df.columns
 
# get the columns and types as tuples in a list
df.dtypes

View solution in original post

4 REPLIES 4

Ryan_Chynoweth
Honored Contributor III

Assume that "df" is a Dataframe. The following code (with comments) will show various options to describe a dataframe.

# get a row count
df.count()
 
# get the approximate count (faster than the .count())
df.rdd.countApprox()
 
# print the schema (shape of your df)
df.printSchema()
 
# get the columns as a list
df.columns
 
# get the columns and types as tuples in a list
df.dtypes

Thank you @Ryan Chynowethโ€‹  for your answer.

Feras
New Contributor II

To obtain the shape of a data frame in PySpark, you can obtain the number of rows through "DF.count()" and the number of columns through "len(DF.columns)". The code below provides the shape of PySpark data frame "DF".

print((DF.count(), len(DF.columns)))

Kaniz
Community Manager
Community Manager

Thank you @Feras Sederโ€‹ for the answer.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.