cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Adding new field to Delta live Table

Madalian
New Contributor III

Hi Experts,

I have on Bronze layer all delta merge files (Parquet) format.

I am converting these files into delta live tables in silver layer. While doing so, I am unable to add current time stamp column.

Following is the script:

from pyspark.sql.functions import *
from pyspark.sql.types import *
import dlt
currentdate= spark.sql("select string(current_timestamp)").collect()[0][0]
@Dlt.table
def t_stream_silver_00_XXXX():
return (
spark.readStream
.format("delta")
.option("cloudFiles.format", "parquet")
.option(key = "SILVER_LOADING_TIME", value =currentdate)
.load("abfss://bronze-prod-XXXXX@adlsg2SAPPPPpsi001.dfs.core.windows.net/PROD/XXXX/Delta/00_XXXX/")
)

it didt not throw any error at the same time no column being added. Can you please help?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @MadalianLet’s address the issue you’re facing while converting your Parquet files from the Bronze layer to Delta live tables in the Silver layer.

  1. Column Addition: It appears that you’re trying to add a current timestamp column named SILVER_LOADING_TIME to your Delta table. However, despite no errors being thrown, the column is not being added as expected.

  2. Best Practices for Bronze Layer: Before we dive into the specific issue, let’s discuss some best practices related to the Bronze layer in the Medallion architecture:

  3. Troubleshooting the Issue: Let’s focus on resolving the issue by adding the timestamp column. Here are a few steps to consider:

    • Check the Data: First, verify that your Parquet files in the Bronze layer indeed contain the data you expect. Ensure that the relevant timestamp information is present.
    • Column Name Case Sensitivity: Delta Lake is case-sensitive when it comes to column names. Make sure that the column name you’re using (SILVER_LOADING_TIME) matches the actual column name in your Parquet files.
    • Column Data Type: Confirm that the data type of the SILVER_LOADING_TIME column aligns with your expectations (e.g., timestamp or string).
  4. Debugging Approach: To debug further, consider the following steps:

    • Inspect Schema: Load the Parquet files directly (outside the Delta table creation) and inspect the schema. Ensure that the timestamp column is present and has the correct data type.
    • Explicit Column Addition: Instead of relying on the option(key = "SILVER_LOADING_TIME", value = currentdate) approach, explicitly add the timestamp column using withColumn during DataFrame creation.
  5. Example Code Modification: Here’s an adjusted version of your script that explicitly adds the timestamp column:

    from pyspark.sql.functions import *
    from pyspark.sql.types import *
    import dlt
    
    currentdate = spark.sql("select current_timestamp").collect()[0][0]
    
    @Dlt.table
    def t_stream_silver_00_XXXX():
        return (
            spark.readStream
                .format("delta")
                .option("cloudFiles.format", "parquet")
                .load("abfss://bronze-prod-XXXXX@adlsg2SAPPPPpsi001.dfs.core.windows.net/PROD/XXXX/Delta/00_XXXX/")
                .withColumn("SILVER_LOADING_TIME", currentdate)
        )
    
  6. Further Investigation: If the issue persists, consider checking the logs or monitoring the execution to identify any additional clues. Also, ensure that the Delta Lake version you’re using supports the features you’re trying to utilize.

Remember that the Bronze layer primarily focuses on quick change data capture (CDC) and historical a...34.

Feel free to adapt the code snippet above and explore other debugging options to resolve the issue.

If you encounter any specific errors or need further assistance, feel free to ask! 😊

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group