通过裁剪(Pruning)和知识蒸馏(Knowledge Distillation)实现紧凑的语言模型
论文《通过裁剪和知识蒸馏实现紧凑的语言模型》(Compact Language Models via Prun […]
通过裁剪(Pruning)和知识蒸馏(Knowledge Distillation)实现紧凑的语言模型 Read More »
论文《通过裁剪和知识蒸馏实现紧凑的语言模型》(Compact Language Models via Prun […]
通过裁剪(Pruning)和知识蒸馏(Knowledge Distillation)实现紧凑的语言模型 Read More »
摘要 论文Gradient Boosting Reinforcement Learning介绍了梯度增强强化学
GBRL, Gradient Boosting Reinforcement Learning, 梯度增强强化学习 Read More »
论文DoRA: Weight-Decomposed Low-Rank Adaptation(权重分解低秩适配)
DoRA: Weight-Decomposed Low-Rank Adaptation(权重分解低秩适配) Read More »
对于深度学习推理的整数量化(Quantization),这篇论文《用于深度学习推理的整数量化:原理与经验评估》
用于深度学习推理的整数量化(Integer Quantization):原理与经验评估 Read More »
Nvidia这两天发布了MambaVision,即一种新型混合Mamba-Transformer视觉Backb
MambaVision:一种新型混合Mamba-Transformer视觉Backbone Read More »