Abstract: As DNNs are developing rapidly, the computational and memory burden imposed on hardware systems grows exponentially. This becomes even more severe for large language models (LLMs) and ...