cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

process logging optimisation

BriGuy
New Contributor II

I have created a process that runs a notebook multiple times in parallel with different parameters.  This was working quite quickly.  However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to append the existing delta table.  This seems to significantly slow down the process.  I think each appends to the table are blocking other appends.

Is there a best practice for logging into 1 table from multiple processes?  If I were using SQL server I'd consider setting up a message queue.  It is acceptable for log entries to be inserted out of sequence as we can sort by timestamp but it is essential that all log entries are inserted even if the parent process is terminated.

As the table will only ever be appended to I don't need the time travel functionality if that makes a difference.  

Thanks

1 REPLY 1

BriGuy
New Contributor II

I don't think this relates to my issue.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group