NSA(Native Sparse Attention)机制:原生可训练的稀疏注意力架构,通过硬件对齐优化和算法创新实现高效长序列建模
论文Native Sparse Attention: Hardware-Aligned and Nativel […]
NSA(Native Sparse Attention)机制:原生可训练的稀疏注意力架构,通过硬件对齐优化和算法创新实现高效长序列建模 Read More »
论文Native Sparse Attention: Hardware-Aligned and Nativel […]
NSA(Native Sparse Attention)机制:原生可训练的稀疏注意力架构,通过硬件对齐优化和算法创新实现高效长序列建模 Read More »
论文Ultra-Sparse Memory Network提出了一种名为UltraMem的新型神经网络架构,解
UltraMem:利用大规模的超稀疏内存层(Ultra-Sparse Memory Layer)显著提升Transformer模型性能 Read More »
论文Token Statistics Transformer: Linear-Time Attention v
论文Lossless Compression of Vector IDs for Approximate Ne
采用非对称数字系统(ANS)和波列树(Wavelet Trees)的无损压缩方法,对近似最近邻搜索(ANNS)中的向量ID和图结构进行优化 Read More »
近年来,机器人技术和具身人工智能(AI)领域取得了显著进展,特别是在模仿学习(Imitation Learni
DINO-WM:基于预训练视觉特征,可实现零样本(Zeor-shot)规划的世界模型(World Model ) Read More »