Understanding Rename in Databricks
Now there are multiple ways to rename Spark Data Frame Columns or Expressions.
We can rename columns or expressions using alias as part of select
We can add or rename columns or expressions using withColumn on top of the Data Frame.
We can rename one column at a time using withColumnRenamed on top of the Data Frame.
We typically use withColumn to perform row-level transformations and then to provide a name to the result. If we give the same name as an existing column, then the column will be replaced with a new one.
If we want just to rename the column then it is better to use withColumnRenamed.
If we want to apply any transformation, we need to use select or withColumn
We can rename a bunch of columns using toDF.
First, we will understand what is the withColumnRenamed and withColumn
withColumn
help(MyFirstDataFrame.withColumn)
#RESULT
Help on method withColumn in module pyspark.sql.dataframe:
withColumn(colName: str, col: pyspark.sql.column.Column) -> 'DataFrame' method of pyspark.sql.dataframe.DataFrame instance
Returns a new :class:`DataFrame` by adding a column or replacing the
existing column that has the same name.
The column expression must be an expression over this :class:`DataFrame`; attempting to add
a column from some other :class:`DataFrame` will raise an error.
.. versionadded:: 1.3.0
Parameters
----------
colName : str
string, name of the new column.
col : :class:`Column`
a :class:`Column` expression for the new column.
Notes
-----
This method introduces a projection internally. Therefore, calling it multiple
times, for instance, via loops in order to add multiple columns can generate big
plans which can cause performance issues and even `StackOverflowException`.
To avoid this, use :func:`select` with the multiple columns at once.
Examples
--------
>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name='Alice', age2=4), Row(age=5, name='Bob', age2=7)]
withColumnRenamed
help(MyFirstDataFrame.withColumnRenamed)
#RESULT
Help on method withColumnRenamed in module pyspark.sql.dataframe:
withColumnRenamed(existing: str, new: str) -> 'DataFrame' method of pyspark.sql.dataframe.DataFrame instance
Returns a new :class:`DataFrame` by renaming an existing column.
This is a no-op if schema doesn't contain the given column name.
.. versionadded:: 1.3.0
Parameters
----------
existing : str
string, name of the existing column to rename.
new : str
string, new name of the column.
Examples
--------
>>> df.withColumnRenamed('age', 'age2').collect()
[Row(age2=2, name='Alice'), Row(age2=5, name='Bob')]
Now we will do its Practical in our Dataframe
withColumn -We can create whole New Column
eg:withColumn(NewColumn, OldColumn)
MyFirstDataFrame.select('FirstName','LastName','age','city','Salary','Bonus') \
.withColumn('NewColFirstName',MyFirstDataFrame.FirstName).show()
#RESULT
+---------+--------+---+-----------+------+-----+---------------+
|FirstName|LastName|age| city|Salary|Bonus|NewColFirstName|
+---------+--------+---+-----------+------+-----+---------------+
| Ram| Tiwari| 25| Bangalore| 2000| 100| Ram|
| Shyam| NC| 30| Chennai| 3000| 300| Shyam|
| Rohan| Jaki| 45| Andhra| 1500| 150| Rohan|
| Ritesh| sharma| 35|Rameshwaram| 2500| 250| Ritesh|
+---------+--------+---+-----------+------+-----+---------------+
#I HAVE ADDED FORWARD SLASH for just visbility of code you can remove that
#Now using the []
MyFirstDataFrame.select('FirstName','LastName','age','city','Salary','Bonus').\
withColumn('NewColFirstName',MyFirstDataFrame['FirstName']).show()
#Result
+---------+--------+---+-----------+------+-----+---------------+
|FirstName|LastName|age| city|Salary|Bonus|NewColFirstName|
+---------+--------+---+-----------+------+-----+---------------+
| Ram| Tiwari| 25| Bangalore| 2000| 100| Ram|
| Shyam| NC| 30| Chennai| 3000| 300| Shyam|
| Rohan| Jaki| 45| Andhra| 1500| 150| Rohan|
| Ritesh| sharma| 35|Rameshwaram| 2500| 250| Ritesh|
+---------+--------+---+-----------+------+-----+---------------+
#Now i hope you understood what I am trying to say
withColumnRenamed- We can create whole New Column eg:withColumnRenamed(OldColumnName, NewName)
#our Old Result
MyFirstDataFrame.select("*").show()
+---------+--------+---+-----------+------+-----+
|FirstName|LastName|age| city|Salary|Bonus|
+---------+--------+---+-----------+------+-----+
| Ram| Tiwari| 25| Bangalore| 2000| 100|
| Shyam| NC| 30| Chennai| 3000| 300|
| Rohan| Jaki| 45| Andhra| 1500| 150|
| Ritesh| sharma| 35|Rameshwaram| 2500| 250|
+---------+--------+---+-----------+------+-----+
MyFirstDataFrame.select('FirstName','LastName','age','city','Salary','Bonus')./
withColumnRenamed('FirstName', 'UsersFirstName')./
withColumnRenamed('LastName', 'userLastName')./
withColumnRenamed('age', 'UserAge').show()
#Result
+--------------+------------+-------+-----------+------+-----+
|UsersFirstName|userLastName|UserAge| city|Salary|Bonus|
+--------------+------------+-------+-----------+------+-----+
| Ram| Tiwari| 25| Bangalore| 2000| 100|
| Shyam| NC| 30| Chennai| 3000| 300|
| Rohan| Jaki| 45| Andhra| 1500| 150|
| Ritesh| sharma| 35|Rameshwaram| 2500| 250|
+--------------+------------+-------+-----------+------+-----+
and Finally, we have an alias function for changing the Column name below is the code
#our Old Result
MyFirstDataFrame.select("*").show()
+---------+--------+---+-----------+------+-----+
|FirstName|LastName|age| city|Salary|Bonus|
+---------+--------+---+-----------+------+-----+
| Ram| Tiwari| 25| Bangalore| 2000| 100|
| Shyam| NC| 30| Chennai| 3000| 300|
| Rohan| Jaki| 45| Andhra| 1500| 150|
| Ritesh| sharma| 35|Rameshwaram| 2500| 250|
+---------+--------+---+-----------+------+-----+
MyFirstDataFrame. \
select(
MyFirstDataFrame['FirstName'].alias('UsersFirstName'),
MyFirstDataFrame['LastName'].alias('UserLastName'),
MyFirstDataFrame['age'].alias('UserAge')
). \
withColumn('User_Full_Name', concat(col('UsersFirstName'), lit(', '), col('userLastName'))). \
show()
#Result
+--------------+------------+-------+--------------+
|UsersFirstName|UserLastName|UserAge|User_Full_Name|
+--------------+------------+-------+--------------+
| Ram| Tiwari| 25| Ram, Tiwari|
| Shyam| NC| 30| Shyam, NC|
| Rohan| Jaki| 45| Rohan, Jaki|
| Ritesh| sharma| 35|Ritesh, sharma|
+--------------+------------+-------+--------------+
AviralBhardwaj