cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta write stream to different folders dynamically based on input file

Krishna264
New Contributor

I have root folder and files are getting ingested in sub foldersโ€‹ . Want to build a workflow which will write stream based on file being ingested

2 REPLIES 2

Anonymous
Not applicable

@Krishnamoorthy Natarajanโ€‹ : Please try to use the foreachBatch() method to apply custom processing on the output data of each micro-batch. Sample code is as below

from pyspark.sql.functions import input_file_name
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
 
# Define your schema
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])
 
# Define your streaming data source
input_path = "/mnt/input-folder/*/*/*.csv"
df = spark.readStream.schema(schema).option("maxFilesPerTrigger", 1).csv(input_path).withColumn("input_file", input_file_name())
 
# Define the foreachBatch function to write to Delta
def write_to_delta(df, epoch_id):
    # Get the input file path
    input_file = df.select("input_file").first()[0]
 
    # Define the output path based on the input file
    output_path = "/mnt/output-folder/" + input_file.split("/")[-3] + "/" + input_file.split("/")[-2]
 
    # Write the data to Delta
    df.write.format("delta").mode("append").option("path", output_path).save()
 
# Apply the foreachBatch function on the output data
df.writeStream.foreachBatch(write_to_delta).start().awaitTermination()

Anonymous
Not applicable

Hi @Krishnamoorthy Natarajanโ€‹ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.