Deepseek

用ChatGPT o3-mini-high分析Deepseek刚开源的FlashMLA

发表评论 / IT, Tech / NullThought

用ChatGPT o3-mini-high分析Deepseek刚开源的FlashMLA。上传FlashMLA工 […]

用ChatGPT o3-mini-high分析Deepseek刚开源的FlashMLA Read More »

NSA（Native Sparse Attention）机制：原生可训练的稀疏注意力架构，通过硬件对齐优化和算法创新实现高效长序列建模

发表评论 / Tech / NullThought

论文Native Sparse Attention: Hardware-Aligned and Nativel

NSA（Native Sparse Attention）机制：原生可训练的稀疏注意力架构，通过硬件对齐优化和算法创新实现高效长序列建模 Read More »

DeepSeek-R1技术报告概述

发表评论 / Tech / NullThought

近年来，大规模语言模型（LLMs）的快速发展使其在推理、代码生成、科学计算等任务上展现出越来越强的能力，逐步缩

DeepSeek-R1技术报告概述 Read More »

rStar-Math：有效提升小型语言模型（Small Language Models, SLMs）在数学推理任务中的表现

发表评论 / Tech, 科学 / NullThought

论文rStar-Math: Small LLMs Can Master Math Reasoning with

rStar-Math：有效提升小型语言模型（Small Language Models, SLMs）在数学推理任务中的表现 Read More »

DeepSeek-V3技术报告概述

发表评论 / Tech / NullThought

一、简介近年来，大型语言模型（LLM）取得了飞速发展，向通用人工智能（AGI）迈进的步伐越来越快。DeepS

DeepSeek-V3技术报告概述 Read More »

让大模型扮演猴子

发表评论 / Tech, 科学 / NullThought

论文Large Language Monkeys: Scaling Inference Compute wit

让大模型扮演猴子 Read More »