process logging optimisation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-15-2023 07:25 AM
I have created a process that runs a notebook multiple times in parallel with different parameters. This was working quite quickly. However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to append the existing delta table. This seems to significantly slow down the process. I think each appends to the table are blocking other appends.
Is there a best practice for logging into 1 table from multiple processes? If I were using SQL server I'd consider setting up a message queue. It is acceptable for log entries to be inserted out of sequence as we can sort by timestamp but it is essential that all log entries are inserted even if the parent process is terminated.
As the table will only ever be appended to I don't need the time travel functionality if that makes a difference.
Thanks
- Labels:
-
Delta Lake
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2023 07:32 AM
I don't think this relates to my issue.

