cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

i created External database but unable to transferring table to Storage Acc(BLOBcontainer-Gold)

PraveenReddy21
New Contributor III

Hi , 

I done activities  Bronze and Silver , after i trying to saving table to Gold  container but unable to storing .

i created External database .

I want store  data to PARQUET but not supporting ,only DELTA.

only  MANAGED LOCATION supporting but unable to  create directly  location .

can any one please  help me on urgent based

Regards,

Praveen.

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Rishabh-Pandey
Esteemed Contributor
from pyspark.sql.functions import col, round, sum

# Step 1: Read the data from the source table
invoice_df = spark.table("invoice_tbl")

# Step 2: Perform the transformation
# Aggregate the data by country and invoice_date
aggregated_df = invoice_df.groupBy("country", "invoice_date") \
    .agg(round(sum(col("quantity") * col("unit_price")), 2).alias("total_sales"))

# Step 3: Write the result as a Parquet file
# Define the output path
parquet_path = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/path/to/gold/location/country_wise_daily_sales.parquet"

# Save the DataFrame to Parquet format
aggregated_df.write.format("parquet").mode("overwrite").save(parquet_path)

print("Table has been created and saved in Parquet format.")

@PraveenReddy21 Try with this , this code is to create the parquet external table for gold layer .

Rishabh Pandey

View solution in original post

7 REPLIES 7

Rishabh-Pandey
Esteemed Contributor

Hi @PraveenReddy21  can you provide the ss what code you are using while saving it to parquet format and how you are doing to get the more understanding because generally it will not happen , i am sure you are having some issues with your flow .

Rishabh Pandey

hi , please find the  below  

CREATE DATABASE IF NOT EXISTS sales_dbdb
MANAGED LOCATION 'abfss://unity-catalog-storage@dbstorage7wauw2kjcu3u6.dfs.core.windows.net/107512296614625'

and
create table sales_dbdb.country_wise_daily_salesale
using delta as
select country, invoice_date, round(sum(quantity*unit_price),2) as total_sales from invoice_tbl group by country, invoice_date

I want  create PARQUET format and transfer  table to BLOB-CONTAINER-GOLD

 

 

Hi  Rishabh,

First i tired  but not working

create database if not exists sales_db
location "/mnt/lakehouse/gold/sales_db"
 
Later  started  below 

i am trying  to  create 

%sql
CREATE DATABASE IF NOT EXISTS sales_dbdb
LOCATION 'abfss://<storagelocation>.dfs.core.windows.net/<container>/sales_dbdb'

and
create table sales_dbdb.country_wise_daily_salesale
using PARQUEST as
select country, invoice_date, round(sum(quantity*unit_price),2) as total_sales from invoice_tbl group by country, invoice_date

not  working .

if have  steps   please tell me .

Shall we  create Blob container database directly  .

Thank You .

Praveen.

 

 

Why are you using PARQUEST , can you replace it with parquet and let me know what error you are getting exactly .also what i understood from you is that you have done with the bronze and silver layer table now you want to create gold layer and gold table , so before moving out to the final conclusion i have some doubts .

1-Bronze and silver table are managed or external table ?

2-Bronze and silver table are delta table or parquet?

3-You want to create a external gold table in parquet format ?

Rishabh Pandey

Hi,

i am using parquet format  only.

Bronze and Silver  both  are parquet  tables , its not  managed tables .

---

3-You want to create a external gold table in parquet format ?  yes  , i want  to  create  table  gold blob container.

Rishabh-Pandey
Esteemed Contributor
from pyspark.sql.functions import col, round, sum

# Step 1: Read the data from the source table
invoice_df = spark.table("invoice_tbl")

# Step 2: Perform the transformation
# Aggregate the data by country and invoice_date
aggregated_df = invoice_df.groupBy("country", "invoice_date") \
    .agg(round(sum(col("quantity") * col("unit_price")), 2).alias("total_sales"))

# Step 3: Write the result as a Parquet file
# Define the output path
parquet_path = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/path/to/gold/location/country_wise_daily_sales.parquet"

# Save the DataFrame to Parquet format
aggregated_df.write.format("parquet").mode("overwrite").save(parquet_path)

print("Table has been created and saved in Parquet format.")

@PraveenReddy21 Try with this , this code is to create the parquet external table for gold layer .

Rishabh Pandey

PraveenReddy21
New Contributor III

Thank You  Rishabh.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group