We looked deeper into your issue.
First of all
Websocket connection lost. message is not related to the problem. There were just temporary network problems few times during experiment life-time but they were handled by Neptune just fine. We should probably add some additional message informing that connection was restored, so those warnings won’t be confusing.
Secondly, if you look at the monitoring tab in your experiment’s dashboard you can see that about 7th hour of your experiment life-time the GPU utilization dropped to 0% and since that moment CPU utilization had almost constant value of about 12.5%.
12.5% of 8 cores is exactly 100% of one core. So it looks like your code ran into an infinite loop, stopped to do any calculations on GPU and so stopped to produce any output.