Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2024 12:22 AM
from pyspark.sql.functions import col, round, sum
# Step 1: Read the data from the source table
invoice_df = spark.table("invoice_tbl")
# Step 2: Perform the transformation
# Aggregate the data by country and invoice_date
aggregated_df = invoice_df.groupBy("country", "invoice_date") \
.agg(round(sum(col("quantity") * col("unit_price")), 2).alias("total_sales"))
# Step 3: Write the result as a Parquet file
# Define the output path
parquet_path = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/path/to/gold/location/country_wise_daily_sales.parquet"
# Save the DataFrame to Parquet format
aggregated_df.write.format("parquet").mode("overwrite").save(parquet_path)
print("Table has been created and saved in Parquet format.")@PraveenReddy21 Try with this , this code is to create the parquet external table for gold layer .
Rishabh Pandey