Stanford Researchers Introduce Sophia: A Scalable Second-Order Optimizer For Language Model Pre-Training