2026

Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training
Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training

Jinyang Du, Ruihao Gong#, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Xuefei Wang, Yaoyuan Wang, Jinyang Guo, Dahua Lin, Xianglong Liu (# corresponding author)

Findings of the Association for Computational Linguistics (ACL Findings) 2026 First Author ACL Findings

Half-S revisits FP4 scaling for heavy-tailed LLM tensors and proposes a minimal scale correction that improves quantization grid utilization for practical near-lossless 4-bit training.

Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training

Jinyang Du, Ruihao Gong#, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Xuefei Wang, Yaoyuan Wang, Jinyang Guo, Dahua Lin, Xianglong Liu (# corresponding author)

Findings of the Association for Computational Linguistics (ACL Findings) 2026 First Author ACL Findings

Half-S revisits FP4 scaling for heavy-tailed LLM tensors and proposes a minimal scale correction that improves quantization grid utilization for practical near-lossless 4-bit training.

2025

Low-bit FlashAttention Accelerated Operator Design Based on Triton
Low-bit FlashAttention Accelerated Operator Design Based on Triton

Jinyang Du, Jinyang Guo, Yifu Ding, Xianglong Liu

IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 2025 First Author

This work designs low-bit FlashAttention operators with Triton, using operator fusion and mixed-precision execution to improve long-context quantized inference efficiency.

Low-bit FlashAttention Accelerated Operator Design Based on Triton

Jinyang Du, Jinyang Guo, Yifu Ding, Xianglong Liu

IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 2025 First Author

This work designs low-bit FlashAttention operators with Triton, using operator fusion and mixed-precision execution to improve long-context quantized inference efficiency.

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Yong Yang, Shiqiao Gu, Haotong Qin, Jinyang Guo, Dahua Lin, Michele Magno, Xianglong Liu

Neural Networks 2025 Journal

This survey reviews low-bit quantization for large language models from basic formats, system support, and algorithmic strategies, connecting practical toolchains with future efficient LLM deployment.

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Yong Yang, Shiqiao Gu, Haotong Qin, Jinyang Guo, Dahua Lin, Michele Magno, Xianglong Liu

Neural Networks 2025 Journal

This survey reviews low-bit quantization for large language models from basic formats, system support, and algorithmic strategies, connecting practical toolchains with future efficient LLM deployment.