Re: i created External database but unable to tr...

Rishabh-Pandey · ‎08-22-2024

from pyspark.sql.functions import col, round, sum

# Step 1: Read the data from the source table
invoice_df = spark.table("invoice_tbl")

# Step 2: Perform the transformation
# Aggregate the data by country and invoice_date
aggregated_df = invoice_df.groupBy("country", "invoice_date") \
    .agg(round(sum(col("quantity") * col("unit_price")), 2).alias("total_sales"))

# Step 3: Write the result as a Parquet file
# Define the output path
parquet_path = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/path/to/gold/location/country_wise_daily_sales.parquet"

# Save the DataFrame to Parquet format
aggregated_df.write.format("parquet").mode("overwrite").save(parquet_path)

print("Table has been created and saved in Parquet format.")

@PraveenReddy21 Try with this , this code is to create the parquet external table for gold layer .

Rishabh Pandey

View solution in original post