
Jinyang Du, Ruihao Gong#, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Xuefei Wang, Yaoyuan Wang, Jinyang Guo, Dahua Lin, Xianglong Liu (# corresponding author)
Findings of the Association for Computational Linguistics (ACL Findings) 2026 First Author ACL Findings
Half-S revisits FP4 scaling for heavy-tailed LLM tensors and proposes a minimal scale correction that improves quantization grid utilization for practical near-lossless 4-bit training.
Jinyang Du, Ruihao Gong#, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Xuefei Wang, Yaoyuan Wang, Jinyang Guo, Dahua Lin, Xianglong Liu (# corresponding author)
Findings of the Association for Computational Linguistics (ACL Findings) 2026 First Author ACL Findings
Half-S revisits FP4 scaling for heavy-tailed LLM tensors and proposes a minimal scale correction that improves quantization grid utilization for practical near-lossless 4-bit training.

Jinyang Du, Jinyang Guo, Yifu Ding, Xianglong Liu
IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 2025 First Author
This work designs low-bit FlashAttention operators with Triton, using operator fusion and mixed-precision execution to improve long-context quantized inference efficiency.
Jinyang Du, Jinyang Guo, Yifu Ding, Xianglong Liu
IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 2025 First Author
This work designs low-bit FlashAttention operators with Triton, using operator fusion and mixed-precision execution to improve long-context quantized inference efficiency.

Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Yong Yang, Shiqiao Gu, Haotong Qin, Jinyang Guo, Dahua Lin, Michele Magno, Xianglong Liu
Neural Networks 2025 Journal
This survey reviews low-bit quantization for large language models from basic formats, system support, and algorithmic strategies, connecting practical toolchains with future efficient LLM deployment.
Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Yong Yang, Shiqiao Gu, Haotong Qin, Jinyang Guo, Dahua Lin, Michele Magno, Xianglong Liu
Neural Networks 2025 Journal
This survey reviews low-bit quantization for large language models from basic formats, system support, and algorithmic strategies, connecting practical toolchains with future efficient LLM deployment.