- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2022 02:40 AM
This must be trivial, but I must have missed something.
I have a dataframe (test1) and want to round all the columns listed in list of columns (col_list)
here is the code I am running:
col_list = ['measure1', 'measure2', 'measure3']
for i in col_list:
rounding = test1\
.withColumn(i, round(col(i),0))
display(rounding)
and as a result only the last column has its values rounded.
What am I missing to have all the measures rounded?
data for testing:
car model measure1 measure2 measure3
Nissan aa 1.11 0.3 34
Toyota bb 1.12 0.4 111
BMW cc 1.13 0.5 1.9
- Labels:
-
Loop
-
Pyspark
-
Pyspark Databricks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2022 05:27 AM
Because it's a for loop, and in the last loop column measure3 is selected. The variable rounding is assigned a new dataframe with changes that occur on column measure3 only.
Try the following code:
rounding = test1
for i in col_list:
rounding = rounding\
.withColumn(i, round(col(i),0))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2022 05:27 AM
Because it's a for loop, and in the last loop column measure3 is selected. The variable rounding is assigned a new dataframe with changes that occur on column measure3 only.
Try the following code:
rounding = test1
for i in col_list:
rounding = rounding\
.withColumn(i, round(col(i),0))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2022 06:50 AM
You're absolutely right. thanks