cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding Rename in Databricks Now there are multiple ways to rename Spark Data Frame Columns or Expressions. We can rename columns or expressions...

Aviral-Bhardwaj
Esteemed Contributor III

Understanding Rename in Databricks

Now there are multiple ways to rename Spark Data Frame Columns or Expressions.

We can rename columns or expressions using alias as part of select

We can add or rename columns or expressions using withColumn on top of the Data Frame.

We can rename one column at a time using withColumnRenamed on top of the Data Frame.

We typically use withColumn to perform row-level transformations and then to provide a name to the result. If we give the same name as an existing column, then the column will be replaced with a new one.

If we want just to rename the column then it is better to use withColumnRenamed.

If we want to apply any transformation, we need to use select or withColumn

We can rename a bunch of columns using toDF.

First, we will understand what is the withColumnRenamed and withColumn

withColumn

help(MyFirstDataFrame.withColumn)
 
#RESULT
 
Help on method withColumn in module pyspark.sql.dataframe:
 
withColumn(colName: str, col: pyspark.sql.column.Column) -> 'DataFrame' method of pyspark.sql.dataframe.DataFrame instance
    Returns a new :class:`DataFrame` by adding a column or replacing the
    existing column that has the same name.
    
    The column expression must be an expression over this :class:`DataFrame`; attempting to add
    a column from some other :class:`DataFrame` will raise an error.
    
    .. versionadded:: 1.3.0
    
    Parameters
    ----------
    colName : str
        string, name of the new column.
    col : :class:`Column`
        a :class:`Column` expression for the new column.
    
    Notes
    -----
    This method introduces a projection internally. Therefore, calling it multiple
    times, for instance, via loops in order to add multiple columns can generate big
    plans which can cause performance issues and even `StackOverflowException`.
    To avoid this, use :func:`select` with the multiple columns at once.
    
    Examples
    --------
    >>> df.withColumn('age2', df.age + 2).collect()
    [Row(age=2, name='Alice', age2=4), Row(age=5, name='Bob', age2=7)]

withColumnRenamed

help(MyFirstDataFrame.withColumnRenamed)
 
#RESULT
 
Help on method withColumnRenamed in module pyspark.sql.dataframe:
 
withColumnRenamed(existing: str, new: str) -> 'DataFrame' method of pyspark.sql.dataframe.DataFrame instance
    Returns a new :class:`DataFrame` by renaming an existing column.
    This is a no-op if schema doesn't contain the given column name.
    
    .. versionadded:: 1.3.0
    
    Parameters
    ----------
    existing : str
        string, name of the existing column to rename.
    new : str
        string, new name of the column.
    
    Examples
    --------
    >>> df.withColumnRenamed('age', 'age2').collect()
    [Row(age2=2, name='Alice'), Row(age2=5, name='Bob')]

Now we will do its Practical in our Dataframe

withColumn -We can create whole New Column

eg:withColumn(NewColumn, OldColumn)

MyFirstDataFrame.select('FirstName','LastName','age','city','Salary','Bonus') \
.withColumn('NewColFirstName',MyFirstDataFrame.FirstName).show()
 
#RESULT
 
+---------+--------+---+-----------+------+-----+---------------+
|FirstName|LastName|age|       city|Salary|Bonus|NewColFirstName|
+---------+--------+---+-----------+------+-----+---------------+
|      Ram|  Tiwari| 25|  Bangalore|  2000|  100|            Ram|
|    Shyam|      NC| 30|    Chennai|  3000|  300|          Shyam|
|    Rohan|    Jaki| 45|     Andhra|  1500|  150|          Rohan|
|   Ritesh|  sharma| 35|Rameshwaram|  2500|  250|         Ritesh|
+---------+--------+---+-----------+------+-----+---------------+
 
#I HAVE ADDED FORWARD SLASH for just visbility of code you can remove that
 
#Now using the []
 
MyFirstDataFrame.select('FirstName','LastName','age','city','Salary','Bonus').\
withColumn('NewColFirstName',MyFirstDataFrame['FirstName']).show()
 
#Result
+---------+--------+---+-----------+------+-----+---------------+
|FirstName|LastName|age|       city|Salary|Bonus|NewColFirstName|
+---------+--------+---+-----------+------+-----+---------------+
|      Ram|  Tiwari| 25|  Bangalore|  2000|  100|            Ram|
|    Shyam|      NC| 30|    Chennai|  3000|  300|          Shyam|
|    Rohan|    Jaki| 45|     Andhra|  1500|  150|          Rohan|
|   Ritesh|  sharma| 35|Rameshwaram|  2500|  250|         Ritesh|
+---------+--------+---+-----------+------+-----+---------------+
 
#Now i hope you understood what I am trying to say

withColumnRenamed- We can create whole New Column eg:withColumnRenamed(OldColumnName, NewName)

#our Old Result
MyFirstDataFrame.select("*").show()
 
+---------+--------+---+-----------+------+-----+
|FirstName|LastName|age|       city|Salary|Bonus|
+---------+--------+---+-----------+------+-----+
|      Ram|  Tiwari| 25|  Bangalore|  2000|  100|
|    Shyam|      NC| 30|    Chennai|  3000|  300|
|    Rohan|    Jaki| 45|     Andhra|  1500|  150|
|   Ritesh|  sharma| 35|Rameshwaram|  2500|  250|
+---------+--------+---+-----------+------+-----+
 
 
MyFirstDataFrame.select('FirstName','LastName','age','city','Salary','Bonus')./
withColumnRenamed('FirstName', 'UsersFirstName')./
withColumnRenamed('LastName', 'userLastName')./
withColumnRenamed('age', 'UserAge').show()
 
#Result
 
+--------------+------------+-------+-----------+------+-----+
|UsersFirstName|userLastName|UserAge|       city|Salary|Bonus|
+--------------+------------+-------+-----------+------+-----+
|           Ram|      Tiwari|     25|  Bangalore|  2000|  100|
|         Shyam|          NC|     30|    Chennai|  3000|  300|
|         Rohan|        Jaki|     45|     Andhra|  1500|  150|
|        Ritesh|      sharma|     35|Rameshwaram|  2500|  250|
+--------------+------------+-------+-----------+------+-----+

and Finally, we have an alias function for changing the Column name below is the code

#our Old Result
MyFirstDataFrame.select("*").show()
 
+---------+--------+---+-----------+------+-----+
|FirstName|LastName|age|       city|Salary|Bonus|
+---------+--------+---+-----------+------+-----+
|      Ram|  Tiwari| 25|  Bangalore|  2000|  100|
|    Shyam|      NC| 30|    Chennai|  3000|  300|
|    Rohan|    Jaki| 45|     Andhra|  1500|  150|
|   Ritesh|  sharma| 35|Rameshwaram|  2500|  250|
+---------+--------+---+-----------+------+-----+
 
MyFirstDataFrame. \
    select(
        MyFirstDataFrame['FirstName'].alias('UsersFirstName'),
        MyFirstDataFrame['LastName'].alias('UserLastName'),
        MyFirstDataFrame['age'].alias('UserAge')
    ). \
    withColumn('User_Full_Name', concat(col('UsersFirstName'), lit(', '), col('userLastName'))). \
    show()
 
#Result
 
+--------------+------------+-------+--------------+
|UsersFirstName|UserLastName|UserAge|User_Full_Name|
+--------------+------------+-------+--------------+
|           Ram|      Tiwari|     25|   Ram, Tiwari|
|         Shyam|          NC|     30|     Shyam, NC|
|         Rohan|        Jaki|     45|   Rohan, Jaki|
|        Ritesh|      sharma|     35|Ritesh, sharma|
+--------------+------------+-------+--------------+

2 REPLIES 2

Ajay-Pandey
Esteemed Contributor III

Very informative, Thanks for sharing

Aviral-Bhardwaj
Esteemed Contributor III

thank you sir

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.