I’m not an expert and might be wrong, but here’s what I think happend:
We use Kubernetes for scheduling experiments in the cloud, but the logs suggest that it was the Linux kernel of underlying virtual machine that killed your process.
Our tools detected it as a “failed experiment” since the process, from our point of view just ended with non-zero exit code, which obviously is a bug and needs fixing on our side.
Kubernetes usually sends a different signal to inform us that experiment used too much RAM and was killed.
As for the monitoring tab, you should have a graph there showing memory usages, but as you found in logs, the hardware_metric_reporting_thread failed to send final memory usage statistics.
Regarding worker pricing:
In your case, you get the same results when using a
m worker and
m-8k80 worker since you don’t use the GPUs (and
m is cheaper)
For example, every
xs worker has
13GB RAM, and every
m worker you get
If you want to increase RAM I suggest using
l (lowercase L) worker for
104GB RAM or even
xl worker for 208 GB RAM
I’ll let our team know that the pricing page needs improving for clarity.