cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Length Value of a column in pyspark

RohiniMathur
New Contributor II

Hello,

i am using pyspark 2.12

After Creating Dataframe can we measure the length value for each row.

For Example: I am measuring length of a value in column 2

Input file

|TYCO|1303|

|EMC |120989|

|VOLVO|102329|

|BMW|130157|

|FORD|004|

Output in Dataframe i am trying to get like

|TYCO|1303|4

|EMC |120989|6

|VOLVO|1023295|7

|BMW|130157|6

|FORD|004| 3

Please suggest if it is possible.

1 ACCEPTED SOLUTION

Accepted Solutions

lee
Contributor

You can use the length function for this

from pyspark.sql.functions import length
mock_data = [('TYCO', '1303'),('EMC', '120989'), ('VOLVO', '102329'),('BMW', '130157'),('FORD', '004')]
df = spark.createDataFrame(mock_data, ['col1', 'col2'])
df2 = df.withColumn('length_col2', length(df.col2))

Published notebook to show full example:

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/3249...

View solution in original post

1 REPLY 1

lee
Contributor

You can use the length function for this

from pyspark.sql.functions import length
mock_data = [('TYCO', '1303'),('EMC', '120989'), ('VOLVO', '102329'),('BMW', '130157'),('FORD', '004')]
df = spark.createDataFrame(mock_data, ['col1', 'col2'])
df2 = df.withColumn('length_col2', length(df.col2))

Published notebook to show full example:

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/3249...

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.