Databricks Community

BriGuy · ‎11-15-2023

I have created a process that runs a notebook multiple times in parallel with different parameters. This was working quite quickly. However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to append the existing delta table. This seems to significantly slow down the process. I think each appends to the table are blocking other appends.

Is there a best practice for logging into 1 table from multiple processes? If I were using SQL server I'd consider setting up a message queue. It is acceptable for log entries to be inserted out of sequence as we can sort by timestamp but it is essential that all log entries are inserted even if the parent process is terminated.

As the table will only ever be appended to I don't need the time travel functionality if that makes a difference.

Thanks