Skip to content

An Optimizing Compiler for Recommendation Model Inference

License

Notifications You must be signed in to change notification settings

AlibabaResearch/recom

Repository files navigation

RECom

RECom is an ML compiler that aims to accelerate the expensive embedding column processing during the inference of deep recommendation models. Key features of RECom:

  • We propose the inter-subgraph parallelism-oriented fusion method to generate efficient GPU codes to process massive embedding columns in parallel.
  • We recognize the shape computation problems that arise in dynamic shape scenarios and adopt an approach based on symbolic expressions to solve them.
  • We develop an embedding column optimization module to eliminate redundant computations.

Currently, RECom is implemented as a TensorFlow add-on based on TensorFlow Addons using C++. We also utilize the SymEngine Library to perform symbolic expression computations to handle dynamic shapes.

The optimization workflow of RECom.

Getting Started

Performance

We evaluate RECom on four real-world in-house production recommendation models in Alibaba and two synthesized models. Experimental results show that for all models under any batch size, RECom outperforms the three TensorFlow baselines significantly. On average, RECom achieves speedups of 6.61×, 51.45×, and 8.96× for end-to-end inference latency compared with TF-CPU, TF-GPU, and TF-CPU-GPU, respectively.

End-to-end Performance of RECom and TensorFlow baselines. The vertical axes are latency in the log scale.

Citation

RECom is a collaborative research project between Alibaba Group and Renmin University of China. You can find our ASPLOS'23 paper in this link.

If you use this codebase or otherwise found our work valuable, please cite:

@inproceedings{pan2023recom,
  title={RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns},
  author={Pan, Zaifeng and Zheng, Zhen and Zhang, Feng and Wu, Ruofan and Liang, Hao and Wang, Dalin and Qiu, Xiafei and Bai, Junjie and Lin, Wei and Du, Xiaoyong},
  booktitle={Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4},
  pages={268--286},
  year={2023}
}

About

An Optimizing Compiler for Recommendation Model Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published