计算机视觉（CV）

RAG-Anything: “面向一切模态”的统一 RAG 框架

发表评论 / Tech / NullThought

论文RAG-Anything: All-in-One RAG Framework提出 RAG-Anything […]

RAG-Anything: “面向一切模态”的统一 RAG 框架 Read More »

Reducto：给LLM/RAG喂料（结构化数据）的产品

发表评论 / IT, Tech / NullThought

今日试用了Reducto的产品，可以用于RAG，例如专业文档的chunking。 Reducto公司介绍一、

Reducto：给LLM/RAG喂料（结构化数据）的产品 Read More »

Paper2Video：从论文自动生成学术展示视频

发表评论 / Tech / NullThought

论文Paper2Video: Automatic Video Generation from Scientif

Paper2Video：从论文自动生成学术展示视频 Read More »

Diffusion Transformer (DiT)

发表评论 / Tech / NullThought

扩散模型在图像生成上长期以卷积式 U-Net 为主干，但论文Scalable Diffusion Models

Diffusion Transformer (DiT) Read More »

关于卷积网络的一篇老论文：Inception架构

发表评论 / Tech / NullThought

论文Going Deeper with Convolutions发表于2014年，论文的研究验证了通过稠密组件

关于卷积网络的一篇老论文：Inception架构 Read More »

TexTok：基于文本条件引导的图像分词框架。在编码器（Tokenizer）与解码器（Detokenizer）中均引入图像对应的文字描述嵌入，作为语义条件，指导图像的压缩与重建

发表评论 / Tech / NullThought

论文Language-Guided Image Tokenization for Generation提出的T

TexTok：基于文本条件引导的图像分词框架。在编码器（Tokenizer）与解码器（Detokenizer）中均引入图像对应的文字描述嵌入，作为语义条件，指导图像的压缩与重建 Read More »