The auto-regressive attention at the heart of the Transformer, and programs like it, becomes a scaling nightmare. A rece