I'm also interested in what they did to kube-scheduler. They say that the scheduler is a big source of latency because of its "one-pod-at-a-time" behaviour, and that this doesn't work for "ultra-scale" or big ML workloads. But then they say,
"However, we achieved consistently a high throughput of 500 pods/second even at the 100K node scale by carefully tailoring scheduler plugins based on the workload and optimizing node filtering/scoring parameters."
How does this work??? What did they do??? Again, frustratingly light on details.