Inference Acceleration for Large Language Models on CPUs

Nov 25, 2024

In this paper, we explore the utilization of CPUs for accelerating the inference of large language models.

Related Research & Thoughts