cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Tags to run S3 lifecycle rules

andresalvati
New Contributor II

Hello,

Is it possible to utilize S3 tags when writing a DataFrame with PySpark? Or is the only option to write the dataframe and then use boto3 to tag all the files?

More information about S3 object tagging is here: Amazon S3 Object Tagging.

Thank you.

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @andresalvati , 

The typical approach is to write the DataFrame and then use the AWS SDK, such as boto3 for Python, to set the S3 object tags on the files individually after they have been written.

Here's a general outline of how you could do this using boto3:

  1. Write the DataFrame to S3 using PySpark's DataFrameWriter. For example:
df.write.csv("s3://your-bucket/your-path")
  1. After writing the data, use boto3 to set the desired S3 object tags.
  2. Here's a simplified example:
import boto3 # Initialize S3 client s3 = boto3.client('s3') # Specify your bucket and path bucket_name = 'your-bucket' path = 'your-path/' # List objects in the path objects = s3.list