Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2018 04:18 AM
Thanks! This seems to work for me; although I was nervous about using monotonically_increasing_id() because the ids could be different for corresponding rows in the two dataframes if any parts of them were generated in different partitions. But I guess if df2 is generated from df1 the chances of that happening in practice are fairly low...? Anyway, I use a "left_outer" join to make sure I keep the rows I want in the dataframe I consider most important.