Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2022 02:46 AM
Hey Guys,
While i was training i noticed two things that might cause the error.
The first one is after a training session was crashed, the GPU memory was almost full ( checked with nvidia smi command).
The second one is that i saw in gangila metrics a Swap above the total memory of the cluster.
In my use case i use make_reader from petastorm to read petastorm dataset and its default workers_count is 10, While i changed workers_count to 4 I didn't got any error.
I didn't figure out if I'm truly right and what the right way to overcome this,
Would like to hear you opnion,
Thanks!