Databricks Community

BriGuy · ‎11-15-2023

I have created a process that runs a notebook multiple times in parallel with different parameters. This was working quite quickly. However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to append the existing delta table. This seems to significantly slow down the process. I think each appends to the table are blocking other appends.

Is there a best practice for logging into 1 table from multiple processes? If I were using SQL server I'd consider setting up a message queue. It is acceptable for log entries to be inserted out of sequence as we can sort by timestamp but it is essential that all log entries are inserted even if the parent process is terminated.

As the table will only ever be appended to I don't need the time travel functionality if that makes a difference.

Thanks

BriGuy · ‎12-11-2023

I don't think this relates to my issue.

Databricks Community

process logging optimisation

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟