Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression

arXiv preprint

paper

BibTeX

@misc{kaufman2025morello,
  title={Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression}, 
  author = {Samuel J. Kaufman and Ren{\`e} Just and Rastislav Bodik},
  year={2025},
  month = {May},
  day = {3},
  eprint={2505.01637},
  archivePrefix={arXiv},
  primaryClass={cs.PL},
  url={https://arxiv.org/abs/2505.01637}, 
  doi = {10.48550/arXiv.2505.01637},
}

Abstract

High-throughput neural network inference requires coordinating many optimization decisions, including parallel tiling, microkernel selection, and data layout. The product of these decisions forms a search space of programs which is typically intractably large. Existing approaches (e.g., auto-schedulers) often address this problem by sampling this space heuristically. In contrast, we introduce a dynamic-programming-based approach to explore more of the search space by iteratively decomposing large program specifications into smaller specifications reachable from a set of rewrites, then composing a final program from each rewrite that minimizes an affine cost model. To reduce memory requirements, we employ a novel memoization table representation, which indexes specifications by coordinates in Z_{≥0} and compresses identical, adjacent solutions. This approach can visit a much larger set of programs than prior work. To evaluate the approach, we developed Morello, a compiler which lowers specifications roughly equivalent to a few-node XLA computation graph to x86. Notably, we found that an affine cost model is sufficient to surface high-throughput programs. For example, Morello synthesized a collection of matrix multiplies.