真假消息传播的事实 – 思空，简观

很不幸，假消息（或者说错误消息）就是比真消息传播得更远、更快、更深、更广。

2018年3月，MIT的三位学者Soroush Vosoughi,Deb Roy,Sinan Aral在Science杂志上发表了研究论文，The spread of true and false news online，揭示了真假消息传播的一些事实真相。

主要的研究结论如下：
1.Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information, and the effects were more pronounced for false political news than for false news about terrorism, natural disasters, science, urban legends, or financial information.
虚假传播扩散明显要比真相传播更远、更快、更深、更广，其中政治相关的假消息尤甚。
2.We found that false news was more novel than true news, which suggests that people were more likely to share novel information.
假消息的新奇（新鲜）度往往高于真消息，这也意味着人们喜欢分享新奇（新鲜）的消息。
3.Whereas false stories inspired fear, disgust, and surprise in replies, true stories inspired anticipation, sadness, joy, and trust.
假消息往往带给人恐惧、厌恶和惊讶，而真消息引起期待、伤心、快乐和信任。
4.Contrary to conventional wisdom, robots accelerated the spread of true and false news at the same rate, implying that false news spreads more than the truth because humans, not robots, are more likely to spread it.
人们常以为机器人账号会加剧假消息的传播，但事实是机器人账号同等加速真、假消息的传播，这意味着假消息的传播，人为因素是关键。

研究的数据样本来自Twitter，数据是2006年到2017年间被传播的约126,000条消息，这些消息被约3百万人传播了超过450万次。应该说研究的数据样本足够典型和足够大。

论文中的图表直观展示了真、假消息传播的对比，例如下面这些图：
图中涉及的一些基本概念如下：
1.CCDF, Complementary Cumulative Distribution Function.
互补累积分布函数：对连续函数，所有大于a的值，其出现概率的和。举例：所有被研究的传播链中，传播链深度大于10的传播链出现概率的和。
2.Cascade Depth, the number of retweet hops from the origin tweet over time, where a hop is a retweet by a new unique user.
传播链深度：消息被单一不同用户转发的层级数。
3.Cascade Size, the number of users involved in the cascade over time.
传播链规模：消息传播涉及的所有用户的数量。
4.Maximum Breadth, the maximum number of users involved in the cascade at any depth.
最大传播宽度：消息被转发过程中，任一转发层级的最大用户数。
5.Structural Virality, a measure that interpolates between content spread through a single, large broadcast and that which spreads through multiple generations, with any one individual directly responsible for only a fraction of the total spread.
结构性传播度：姑且这么翻译。理解下来应该是一个标度值，该标度值在“传播次数少但广”与“传播次数多但每次涉及的用户较少”之间取值。本论文的取值范围为1-100，数值越大，穿透力越强。