think的知识库

AI绘图 Deep Diffusion Model 学习综述

由think创建,最终由think 被浏览 37 用户

Overview

最近很多人因AI绘图的表现而大受震撼,甚至有人说2022年是AI绘图元年,Diffusion Model(扩散模型)在其中起了主要作用。

OpenAI、Google、Facebook、微软等国外AI头部企业都发表了相关研究成果和原型。OpenAI 的 DALL·E 2 ,只需输入简单的文本(prompt),它就可以生成多张 1024*1024 的高清图像。在 DALL·E 2 公布没多久,谷歌随后发布了 Imagen,这是一个文本到图像的 AI 模型,它能够通过给定的文本描述生成该场景下逼真的图像。就在前几天,Stability.Ai 公开发布文本生成图像模型 Stable Diffusion 的最新版本,其生成的图像达到商用级别。自 2020 年谷歌发布 DDPM 以来,扩散模型就逐渐成为生成领域的一个新热点。之后 OpenAI 推出 GLIDE、ADM-G 模型等,都让扩散模型火出圈。

很多研究者认为,基于扩散模型的文本图像生成模型不但参数量小,生成的图像质量却更高,大有要取代 GAN 的势头。不过,扩散模型背后的数学公式让许多研究者望而却步,众多研究者认为,其比 VAE、GAN 要难理解得多。目前国内这方面的研究还比较少。本文目标是整理相关资料。

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}

AI绘画工具列表

AI绘画工具,Text-to-Image 领域的佼佼者

Disco Diffusion

Midjourney

DALL·E

  • DALL-E 2是来自OpenAI的一个新的人工智能系统,可以从自然语言的描述中创造出现实的图像和艺术。
  • 内测申请链接

Imagen

  • 来自Google,对标DALL E,号称“前所未有的写实感和深度的语言理解”
  • Imagen is an AI system that creates photorealistic images from input text
  • unprecedented photorealism × deep level of language understanding
  • 官网
  • 论文

Parti

  • 来自谷歌,继 Imagen 之后又出了一个更强,像素更高,细节更丰富的王者,最多能支持扩展到200亿个参数
  • Pathways Autoregressive Text-to-Image 模型 (Parti),一种自回归文本到图像生成模型,可实现高保真逼真的图像生成,并支持涉及复杂构图和世界知识的内容丰富的合成。Parti 和 Imagen 在探索两个不同系列的生成模型方面是互补的——分别是自回归和扩散。
  • 官网
  • 论文
  • Github

Make-A-Scene

NUMA

Diffusion Stable

Tiamit

AI绘画原理

图片是如何被AI生成的

假设我们有十万张图片,包含各种肤色、姿态和表情的真实人脸。如何用AI生成一张不存在的人脸呢?一个想法是:

  • 把一张512x512尺寸人脸图像X送入模型(Encoder),得到一个1x256维的浮点数向量z
  • 另一个模型(Decoder)负责将该向量z再还原成512x512的人脸图片,记做X‘
  • 模型训练的目的,是不断调整Encoder和Decoder各自的参数,以降低(X,X')之间的偏差

找了张VAE结构图作为近似说明:

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}如果模型训练完毕,此时我们得到了所有输入图片X={x1, x2, ..., xn}通过Encoder的向量z,将其可视化一下,结果大概是下图这样:

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}图中的每个点都是一张真实人脸图片经过Encoder编码后的结果,因为肤色、姿态或者性别被聚成不同颜色的簇。如果我们从上图的可视化空间中随机采样,或者在几个点之间插值,就得到了一个新的z,用这个z再经过Decoder,就可以生成一张现实中不存在的人脸。

这种方法推广到非人脸生成也是类似的原理,例如我们有一个包含车、森林、船等各式各样图片的大杂烩数据集,用此方法得到的z可视化结果大概长这样:

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}

图片生成的几种方式

除了上文中提到的VAE,图像生成还有其他几种方式,这里只介绍GAN和diffusion model。

GAN

生成对抗网络(Generative adversarial networks, GANs)是近几年来最热门、最有趣的一种生成模型,它的思路朴素又巧妙。在讲述之前,容我先虚构一个故事:

假设你是永生者,即被砍头之后又可以带着之前的记忆复活。

此刻你在1980年的中国,穷困潦倒。你动了心思,想凭借自己画画的手艺伪造假币财富自由。于是从村口买了几张白纸,把100元面值的形状和图案牢记于心。画完一百张,你成了万元户,拿着它们去找村口商店老头、菜市场的小贩买东西,他们都没法辨认真假。

有一天小贩去银行存钱,被验钞机发现了破绽,你被捕了,game over。

你重生了,吸取了上次的教训,从黑市买了台跟县银行一模一样的验钞机,每天捣鼓它鉴伪的原理、改进工艺。终于有一天,假钞放进去不滴滴了,你大喜过望,假币再次在县城泛滥。

你的产业越做越大,终于引起了当地警方的注意,起因是有职员发现流通货币统计结果存在异常。有关部门意识到现有的验钞技术实在落伍,使用你的假币作为测试样本,很快更换了更先进的验钞机,全国推广,你再次因伪造假钞被击毙。

...

无数的攻防战中,你经历数次生命轮回,造的假钞越来越难以用肉眼辨伪,除了最新的第99代以外,其他旧版验钞机在这些假钞面前溃不成军,它们成了某种无限接近真实的虚假。

这便是GAN的核心思想,如果从理论的角度阐述,大概是:

  1. 给定一个真实世界的数据集,图片尺寸均为512x512,以及一个生成器(Generator,以下简称G)和一个判别器(Discriminator,以下简称D)。生成器G负责生成假的图片(伪造假钞),判别器D负责鉴别一张图片是否为真,并输出0/1二分类结果(验钞机)。
  2. 随机初始化一个1x128维度的向量z,G以z作为输入生成一张512x512的图片X',从真实数据集中随机一张图片X,将(X, X')这两张图送进D,由它来判断哪张图是真的,哪张是假的。并把判断依据反馈给G。
  3. G的目的是不断生成更像真实数据集里的图片以企图骗过D,而D学习如何判断送过来的两张图片哪张是真的、哪张是假的。

在不断的攻防战中,Generator生成的图像就会无限逼近真实数据集。

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}这里其实有个很有趣的前提,就是D在早期其实也不知道如何辨别真假,它也在根据G的伪造结果一步步学习,正因为它有可能鉴别错误,才给G提供了可以钻空子的余地。倘若一开始D就像第99代验钞机那么先进,G可能永远摸不着门道如何造假,直接放弃了(模型崩溃)。

Diffusion Model

和GAN基于对抗的思路不同,Diffusion Model想法是:给一张真实的图像不断增加高斯噪声,直到它最终的分布就是高斯分布,然后逆序从高斯分布重建这张图。

假设x0是一张真实图像,那么不断叠加高斯噪声T次,得到的图像序列(x1, x2, .., xt-1, xt)会越来越模糊,直到xT完全符合高斯分布,模型学习如何从xt->xt-1的“降噪”过程,如下图所示:

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}AI怎么用文字绘画?

截止到目前位置,无论是GAN或是Diffusion Model,都只是一个单纯的图像生成工具,和文字没有任何关系。想打造一个根据某段描述文字绘画的模型,还缺了点东西将二者联接起来。在当前节点它是CLIP。

CLIP

CLIP(Contrastive Language–Image Pre-training) 是OpenAI在2021年提出的一个模型,它给自己的定位是连接文本与图片(Connecting Text and Images),通俗一点的解释是把图片和与其内容相符的文字描述关联起来:一张图片和一句文本描述越贴切,其CLIP下的相似度越高。

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}CLIP模型使用分类器来提取图像的标签,例如上图中图片里包含了狗,如果文字里出现了"dog",二者的embedding在某个维度相似度就很高,尽管其他维度可能差异较大。这种训练方式减少了对大量{图像,文字描述}配对数据的依赖。

开始绘画

既然可以计算一张任意图片和一句任意文本间的CLIP表征相似度,就可以开始用GAN/diffusion 模型来实现AI绘画了,做法如下:

  1. 给定一句文本,通过CLIP模型得到其表征embT。
  2. 使用GAN/diffusion模型,随机生成一张图像(比如噪声),并通过CLIP得到其表征embI。
  3. 计算(embT, embI)二者的相似度,并不断迭代修改这张图片,使(embT, embI)更接近。

整个过程不需要训练任何模型,CLIP和GAN/diffusion模型的参数均是固定的,唯一要做的是不断进行类似下图的迭代过程以生成更符合文字描述的图片。

{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}附一个改装的disco diffusion(模型是guided diffusion + CLIP):

Disco Diffusion 代码操作

Disco Diffusion

Disco Diffusion 是发布于 Google Colab 平台的一款利用人工智能深度学习进行数字艺术创作的工具,可以在 Google Colab 直接运行,也可以部署到本地运行。Disco Diffusion 可以把给出的 Prompts(提示/描述)由文字信息变成图像信息,把用文字描述的画面「画」出来。

Prompt: A digital painting of cyberpunk city by beeple, mist, trending on artstation, V-Ray.{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100} Prompt: Spaceship about to landing on a cornfield, steampunk, clouds in the sky, by Greg Rutkowski, concept art.{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}

Disco Diffusion视频教程

Google Colab平台

https://www.bilibili.com/video/BV1BY4y1Y7zX

超详细教程

https://www.bilibili.com/video/BV1b5411X7MM

入门教程-运行colab代码

  1. 打开:https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb

  2. 保存为自己的副本

  3. 运行setup,确认输出中可以看到GPU信息

    {w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}{w:100}

  4. 什么都不用改,直接都往下运行,看看效果吧

理解扩散模型:一个统一的视角

Understanding Diffusion Models

来自 Google Research 的研究者撰文《Understanding Diffusion Models: A Unified Perspective》,本文以极其详细的方式展示了扩散模型背后的数学原理,目的是让其他研究者可以跟随并了解扩散模型是什么以及它们是如何工作的。

至于这篇论文有多「数学」,论文作者是这样描述的:我们以及其令人痛苦的细节(excruciating detail)展示了这些模型背后的数学。

论文共分为 6 部分,主要包括生成模型;ELBO、VAE 和分级 VAE;变分扩散模型;基于分数的生成模型等。

生成模型

给定分布中的观察样本 x,生成模型的目标是学习为其真实数据分布 p(x) 进行建模。模型学习完之后,我们就可以生成新的样本。此外,在某些形式下,我们也可以使用学习模型来进行评估观察或对数据进行采样。

当前研究文献中,有几个重要方向,本文只在高层次上简要介绍,主要包括:GAN,其对复杂分布的采样过程进行建模,该过程以对抗方式学习。生成模型,我们也可称之为「基于似然,likelihood-based」的方法,这类模型可以将高似然分配给观察到的数据样本,通常包括自回归、归一化流、VAE。基于能量的建模,在这种方法中,分布被学习为任意灵活的能量函数,然后被归一化。在基于分数的生成模型中,其没有学习对能量函数本身进行建模,而是将基于能量模型的分数学习为神经网络。

在这项研究中,本文探索和回顾了扩散模型,正如文中展示的那样,它们具有基于可能性和基于分数的解释。

变分扩散模型

以简单的方式来看,一个变分扩散模型(Variational Diffusion Model, VDM)可以被考虑作为具有三个主要限制(或假设)的马尔可夫分层变分自编码器(MHVAE),它们分别为:

潜在维度完全等同于数据维度; 每个时间步上潜在编码器的结构没有被学到,它被预定义为线性高斯模型。换言之,它是以之前时间步的输出为中心的高斯分布; 潜在编码器的高斯参数随时间变化,过程中最终时间步 T 的潜在分布标是准高斯分布。

变分扩散模型的视觉展示图{w:100}{w:100}{w:100}{w:100}此外,研究者明确维护了来自标准马尔可夫分层变分自编码器的分层转换之间的马尔可夫属性。他们对以上三个主要假设的含义一一做了扩展。

从第一个假设开始,由于符号的滥用,现在可以将真实数据样本和潜在变量表示为 x_t,其中 t=0 表示真实样本数据,t ∈ [1, T] 表示相应的潜在变量,它的层级结构由 t 进行索引。VDM 后验与 MHVAE 后验相同,但现在可以重写为如下:

{w:100}{w:100}{w:100}{w:100}从第二个假设,已知的是编码器中每个潜在变量的分布都是以之前分层潜在变量为中心的高斯分布。与 MHVAE 不同的是,编码器在每个时间步上的结构没有被学到,它被固定为一个线性高斯模型,其中均值和标准差都可以预先设置为超参数或者作为参数学得。在数学上,编码器转换表示为如下:

{w:100}{w:100}{w:100}{w:100}对第三个假设,α_t 根据固定或可学得的 schedule 而随时间演化,使得最终潜在变量 p(x_T) 的分布为标准高斯分布。然后可以更新 MHVAE 的联合分布,将 VDM 的联合分布写为如下:

{w:100}{w:100}{w:100}{w:100}总的来说,这一系列假设描述了一个图像随时间演化的稳定噪声。研究者通过添加高斯噪声渐进地破坏图像,直到最终变得与高斯噪声完全相同。

与任何 HVAE 相似的是,VDM 可以通过最大化证据下界(Evidence Lower Bound, ELBO)来优化,可以推导如下:

{w:100}{w:100}{w:100}{w:100}ELBO 的解释过程如下图 4 所示:

{w:100}{w:100}{w:100}{w:100}

三种等价的解释

正如之前证明的,一个变分扩散模型可以简单地通过学习神经网络来训练,以从任意噪声版本 x_t 及其时间索引 t 中预测原始自然图像 x_0。但是,x_0 有两个等价的参数化,使得可以对 VDM 展开两种进一步的解释。

首先可以利用重参数化技巧。在推导 q(x_t|x_0) 的形式时,文中公式 69 可以被重新排列为如下:

{w:100}{w:100}{w:100}{w:100}将其带入之前推导出的真实去噪转换均值 µ_q(x_t, x_0),则可以重新推导如下:

{w:100}{w:100}{w:100}{w:100}因此可以将近似去噪转换均值 $µ_θ(x_t, t)$ 设置为如下:

{w:100}{w:100}{w:100}{w:100}并且相应的优化问题变成如下:

{w:100}{w:100}{w:100}{w:100}为了推导出变分扩散模型的三种常见解释,需要求助于 Tweedie 公式,它指的是当给定样本时,指数族分布的真实均值可以通过样本的最大似然估计(也称为经验均值)加上一些涉及估计分数的校正项来估计。

从数学上讲,对于一个高斯变量 z ∼ N (z; µ_z, Σ_z),Tweedie 公式表示如下:

{w:100}{w:100}{w:100}{w:100}

基于分数的生成模型

研究者已经表明,变分扩散模型可以简单地通过优化一个神经网络 $s_θ(x_t, t)$ 来学得,以预测一个得分函数$∇ log p(x_t)$。但是,推导中的得分项来自 Tweedie 公式的应用。这并不一定为解读得分函数究竟是什么或者它为什么值得建模提供好的直觉或洞见。

好在可以借助另一类生成模型,即基于分数的生成模型,来获得这种直觉。研究者的确证明了之前推导出的 VDM 公式具有等价的基于分数的生成建模公式,使得可以在这两种解释之间灵活切换。

为了理解为什么优化一个得分函数是有意义的,研究者重新审视了基于能量的模型。任意灵活的概率分布可以写成如下形式:

好在可以借助另一类生成模型,即基于分数的生成模型,来获得这种直觉。研究者的确证明了之前推导出的 VDM 公式具有等价的基于分数的生成建模公式,使得可以在这两种解释之间灵活切换。

为了理解为什么优化一个得分函数是有意义的,研究者重新审视了基于能量的模型。任意灵活的概率分布可以写成如下形式:

{w:100}{w:100}{w:100}{w:100}避免计算或建模归一化常数的一种方式是使用神经网络 $s_θ(x)$ 来学习分布 $p(x)$ 的得分函数$∇ log p(x)$。这是观察到了公式 152 两边可以进行对数求导:

{w:100}{w:100}{w:100}{w:100}它可以自由地表示为神经网络,不涉及任何归一化常数。通过利用真值得分函数最小化 Fisher 散度,可以优化得分函数。

{w:100}{w:100}{w:100}{w:100}直观地讲,得分函数在数据 x 所在的整个空间上定义了一个向量场,并指向模型,具体如下图 6 所示。

{w:100}{w:100}{w:100}{w:100}最终,研究者从训练目标和抽样过程两方面,建立了变分扩散模型和基于分数的生成模型之间的显式关联。

论文原文

https://arxiv.org/pdf/2208.11970.pdf

https://arxiv.org/pdf/2208.11970.pdf

\

Diffusion Model学习代码

Diffusion Model学习笔记和文章

Diffusion Model论文阅读

什么是 Diffusion Models

https://www.bilibili.com/video/BV1cW4y1z7pp

Probabilistic Diffusion Model

概率扩散模型理论与完整PyTorch代码详细解读

https://www.bilibili.com/video/BV1b541197HX

OpenAI Diffusion Model

改进版扩散模型PyTorch代码逐行深入讲解

https://www.bilibili.com/video/BV1sG411s7vV

讲解 Diffusion Probabilistic Models

Google Brain Jascha Sohl-Dickstein

https://www.bilibili.com/video/BV1f541197Gr

Diffusion Models Beat GANs on Image Synthesis

PPT:https://github.com/scilearner/papernotclear

论文:http://arxiv.org/abs/2105.05233

https://www.bilibili.com/video/BV1HS4y1n7N6

扩散模型 Diffusion Model

https://www.bilibili.com/video/BV1cW4y1z7pp

扩散模型ddpm公式推导

https://www.bilibili.com/video/BV11N4y157pd

DiffusionCLIP:用于鲁棒图像处理的文本引导扩散模型

https://www.bilibili.com/video/BV1fS4y1i7Vu/?spm_id_from=autoNext

DrawBench Prompts

Prompts Category
A red colored car. Colors
A black colored car. Colors
A pink colored car. Colors
A black colored dog. Colors
A red colored dog. Colors
A blue colored dog. Colors
A green colored banana. Colors
A red colored banana. Colors
A black colored banana. Colors
A white colored sandwich. Colors
A black colored sandwich. Colors
An orange colored sandwich. Colors
A pink colored giraffe. Colors
A yellow colored giraffe. Colors
A brown colored giraffe. Colors
A red car and a white sheep. Colors
A blue bird and a brown bear. Colors
A green apple and a black backpack. Colors
A green cup and a blue cell phone. Colors
A yellow book and a red vase. Colors
A white car and a red sheep. Colors
A brown bird and a blue bear. Colors
A black apple and a green backpack. Colors
A blue cup and a green cell phone. Colors
A red book and a yellow vase. Colors
A horse riding an astronaut. Conflicting
A pizza cooking an oven. Conflicting
A bird scaring a scarecrow. Conflicting
A blue coloured pizza. Conflicting
Hovering cow abducting aliens. Conflicting
A panda making latte art. Conflicting
A shark in the desert. Conflicting
An elephant under the sea. Conflicting
Rainbow coloured penguin. Conflicting
A fish eating a pelican. Conflicting
One car on the street. Counting
Two cars on the street. Counting
Three cars on the street. Counting
Four cars on the street. Counting
Five cars on the street. Counting
One dog on the street. Counting
Two dogs on the street. Counting
Three dogs on the street. Counting
Four dogs on the street. Counting
Five dogs on the street. Counting
One cat and one dog sitting on the grass. Counting
One cat and two dogs sitting on the grass. Counting
One cat and three dogs sitting on the grass. Counting
Two cats and one dog sitting on the grass. Counting
Two cats and two dogs sitting on the grass. Counting
Two cats and three dogs sitting on the grass. Counting
Three cats and one dog sitting on the grass. Counting
Three cats and two dogs sitting on the grass. Counting
Three cats and three dogs sitting on the grass. Counting
A triangular purple flower pot. A purple flower pot in the shape of a triangle. DALL-E
A triangular orange picture frame. An orange picture frame in the shape of a triangle. DALL-E
A triangular pink stop sign. A pink stop sign in the shape of a triangle. DALL-E
A cube made of denim. A cube with the texture of denim. DALL-E
A sphere made of kitchen tile. A sphere with the texture of kitchen tile. DALL-E
A cube made of brick. A cube with the texture of brick. DALL-E
A collection of nail is sitting on a table. DALL-E
A single clock is sitting on a table. DALL-E
A couple of glasses are sitting on a table. DALL-E
An illustration of a large red elephant sitting on a small blue mouse. DALL-E
An illustration of a small green elephant standing behind a large red mouse. DALL-E
A small blue book sitting on a large red book. DALL-E
A stack of 3 plates. A blue plate is on the top, sitting on a blue plate. The blue plate is in the middle, sitting on a green plate. The green plate is on the bottom. DALL-E
A stack of 3 cubes. A red cube is on the top, sitting on a red cube. The red cube is in the middle, sitting on a green cube. The green cube is on the bottom. DALL-E
A stack of 3 books. A green book is on the top, sitting on a red book. The red book is in the middle, sitting on a blue book. The blue book is on the bottom. DALL-E
An emoji of a baby panda wearing a red hat, green gloves, red shirt, and green pants. DALL-E
An emoji of a baby panda wearing a red hat, blue gloves, green shirt, and blue pants. DALL-E
A fisheye lens view of a turtle sitting in a forest. DALL-E
A side view of an owl sitting in a field. DALL-E
A cross-section view of a brain. DALL-E
A vehicle composed of two wheels held in a frame one behind the other, propelled by pedals and steered with handlebars attached to the front wheel. Descriptions
A large motor vehicle carrying passengers by road, typically one serving the public on a fixed route and for a fare. Descriptions
A small vessel propelled on water by oars, sails, or an engine. Descriptions
A connection point by which firefighters can tap into a water supply. Descriptions
A machine next to a parking space in a street, into which the driver puts money so as to be authorized to park the vehicle for a particular length of time. Descriptions
A device consisting of a circular canopy of cloth on a folding metal frame supported by a central rod, used as protection against rain or sometimes sun. Descriptions
A separate seat for one person, typically with a back and four legs. Descriptions
An appliance or compartment which is artificially kept cool and used to store food and drink. Descriptions
A mechanical or electrical device for measuring time. Descriptions
An instrument used for cutting cloth, paper, and other thin material, consisting of two blades laid one on top of the other and fastened in the middle so as to allow them to be opened and closed by a thumb and finger inserted through rings on the end of their handles. Descriptions
A large plant-eating domesticated mammal with solid hoofs and a flowing mane and tail, used for riding, racing, and to carry and pull loads. Descriptions
A long curved fruit which grows in clusters and has soft pulpy flesh and yellow skin when ripe. Descriptions
A small domesticated carnivorous mammal with soft fur, a short snout, and retractable claws. It is widely kept as a pet or for catching mice, and many breeds have been developed. Descriptions
A domesticated carnivorous mammal that typically has a long snout, an acute sense of smell, nonretractable claws, and a barking, howling, or whining voice. Descriptions
An organ of soft nervous tissue contained in the skull of vertebrates, functioning as the coordinating center of sensation and intellectual and nervous activity. Descriptions
An American multinational technology company that focuses on artificial intelligence, search engine, online advertising, cloud computing, computer software, quantum computing, e-commerce, and consumer electronics. Descriptions
A large keyboard musical instrument with a wooden case enclosing a soundboard and metal strings, which are struck by hammers when the keys are depressed. The strings' vibration is stopped by dampers when the keys are released and can be regulated for length and volume by two or three pedals. Descriptions
A type of digital currency in which a record of transactions is maintained and new units of currency are generated by the computational solution of mathematical problems, and which operates independently of a central bank. Descriptions
A large thick-skinned semiaquatic African mammal, with massive jaws and large tusks. Descriptions
A machine resembling a human being and able to replicate certain human movements and functions automatically. Descriptions
Paying for a quarter-sized pizza with a pizza-sized quarter. Gary Marcus et al.
An oil painting of a couple in formal evening wear going home get caught in a heavy downpour with no umbrellas. Gary Marcus et al.
A grocery store refrigerator has pint cartons of milk on the top shelf, quart cartons on the middle shelf, and gallon plastic jugs on the bottom shelf. Gary Marcus et al.
In late afternoon in January in New England, a man stands in the shadow of a maple tree. Gary Marcus et al.
An elephant is behind a tree. You can see the trunk on one side and the back legs on the other. Gary Marcus et al.
A tomato has been put on top of a pumpkin on a kitchen stool. There is a fork sticking into the pumpkin. The scene is viewed from above. Gary Marcus et al.
A pear cut into seven pieces arranged in a ring. Gary Marcus et al.
A donkey and an octopus are playing a game. The donkey is holding a rope on one end, the octopus is holding onto the other. The donkey holds the rope in its mouth. A cat is jumping over the rope. Gary Marcus et al.
Supreme Court Justices play a baseball game with the FBI. The FBI is at bat, the justices are on the field. Gary Marcus et al.
Abraham Lincoln touches his toes while George Washington does chin-ups. Lincoln is barefoot. Washington is wearing boots. Gary Marcus et al.
Tcennis rpacket. Misspellings
Bzaseball galove. Misspellings
Rbefraigerator. Misspellings
Dininrg tablez. Misspellings
Pafrking metr. Misspellings
A smafml vessef epropoeilled on watvewr by ors, sauls, or han engie. Misspellings
A sjmall domesticated carnivorious mammnal with sof fuh,y a sthort sout, and retracwtablbe flaws. It iw widexly kept as a pet or for catchitng mic, ad many breeds zhlyde beefn develvoked. Misspellings
An instqrumemnt used for cutting cloth, paper, axdz othr thdin mteroial, consamistng of two blades lad one on tvopb of the other and fhastned in tle mixdqdjle so as to bllow them txo be pened and closed by thumb and fitngesr inserted tgrough rings on kthe end oc thei vatndlzes. Misspellings
A domesticated carnivvorous mzammal that typicbally hfaas a lons sfnout, an acxujte sense off osmell, noneetractaaln crlaws, anid xbarkring,y howlingu, or whining rvoiche. Misspellings
A ldarge keybord msical instroument lwith a woden case enmclosig a qsouvnkboajrd and mfgtal strivgf, which are strucrk b hammrs when the nels are depresdsmed.f lhe strsingsj' vibration ie stopped by damperds when the keys re released and can bce regulavewdd for lengh and vnolume y two or three pedalvs. Misspellings
A train on top of a surfboard. Positional
A wine glass on top of a dog. Positional
A bicycle on top of a boat. Positional
An umbrella on top of a spoon. Positional
A laptop on top of a teddy bear. Positional
A giraffe underneath a microwave. Positional
A donut underneath a toilet. Positional
A hair drier underneath a sheep. Positional
A tennis racket underneath a traffic light. Positional
A zebra underneath a broccoli. Positional
A banana on the left of an apple. Positional
A couch on the left of a chair. Positional
A car on the left of a bus. Positional
A cat on the left of a dog. Positional
A carrot on the left of a broccoli. Positional
A pizza on the right of a suitcase. Positional
A cat on the right of a tennis racket. Positional
A stop sign on the right of a refrigerator. Positional
A sheep to the right of a wine glass. Positional
A zebra to the right of a fire hydrant. Positional
Acersecomicke. Rare Words
Jentacular. Rare Words
Matutinal. Rare Words
Peristeronic. Rare Words
Artophagous. Rare Words
Backlotter. Rare Words
Octothorpe. Rare Words
A church with stained glass windows depicting a hamburger and french fries. Reddit
Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna. Reddit
A baby fennec sneezing onto a strawberry, detailed, macro, studio light, droplets, backlit ears. Reddit
A photo of a confused grizzly bear in calculus class. Reddit
An ancient Egyptian painting depicting an argument over whose turn it is to take out the trash. Reddit
A fluffy baby sloth with a knitted hat trying to figure out a laptop, close up, highly detailed, studio lighting, screen reflecting in its eyes. Reddit
A tiger in a lab coat with a 1980s Miami vibe, turning a well oiled science content machine, digital art. Reddit
A 1960s yearbook photo with animals dressed as humans. Reddit
Lego Arnold Schwarzenegger. Reddit
A yellow and black bus cruising through the rainforest. Reddit
A medieval painting of the wifi not working. Reddit
An IT-guy trying to fix hardware of a PC tower is being tangled by the PC cables like Laokoon. Marble, copy after Hellenistic original from ca. 200 BC. Found in the Baths of Trajan, 1506. Reddit
35mm macro shot a kitten licking a baby duck, studio lighting. Reddit
McDonalds Church. Reddit
Photo of an athlete cat explaining it's latest scandal at a press conference to journalists. Reddit
Greek statue of a man tripping over a cat. Reddit
An old photograph of a 1920s airship shaped like a pig, floating over a wheat field. Reddit
Photo of a cat singing in a barbershop quartet. Reddit
A painting by Grant Wood of an astronaut couple, american gothic style. Reddit
An oil painting portrait of the regal Burger King posing with a Whopper. Reddit
A keyboard made of water, the water is made of light, the light is turned off. Reddit
Painting of Mona Lisa but the view is from behind of Mona Lisa. Reddit
Hyper-realistic photo of an abandoned industrial site during a storm. Reddit
A screenshot of an iOS app for ordering different types of milk. Reddit
A real life photography of super mario, 8k Ultra HD. Reddit
Colouring page of large cats climbing the eifel tower in a cyberpunk future. Reddit
Photo of a mega Lego space station inside a kid's bedroom. Reddit
A spider with a moustache bidding an equally gentlemanly grasshopper a good day during his walk to work. Reddit
A photocopy of a photograph of a painting of a sculpture of a giraffe. Reddit
A bridge connecting Europe and North America on the Atlantic Ocean, bird's eye view. Reddit
A maglev train going vertically downward in high speed, New York Times photojournalism. Reddit
A magnifying glass over a page of a 1950s batman comic. Reddit
A car playing soccer, digital art. Reddit
Darth Vader playing with raccoon in Mars during sunset. Reddit
A 1960s poster warning against climate change. Reddit
Illustration of a mouse using a mushroom as an umbrella. Reddit
A realistic photo of a Pomeranian dressed up like a 1980s professional wrestler with neon green and neon orange face paint and bright green wrestling tights with bright orange boots. Reddit
A pyramid made of falafel with a partial solar eclipse in the background. Reddit
A storefront with 'Hello World' written on it. Text
A storefront with 'Diffusion' written on it. Text
A storefront with 'Text to Image' written on it. Text
A storefront with 'NeurIPS' written on it. Text
A storefront with 'Deep Learning' written on it. Text
A storefront with 'Google Brain Toronto' written on it. Text
A storefront with 'Google Research Pizza Cafe' written on it. Text
A sign that says 'Hello World'. Text
A sign that says 'Diffusion'. Text
A sign that says 'Text to Image'. Text
A sign that says 'NeurIPS'. Text
A sign that says 'Deep Learning'. Text
A sign that says 'Google Brain Toronto'. Text
A sign that says 'Google Research Pizza Cafe'. Text
New York Skyline with 'Hello World' written with fireworks on the sky. Text
New York Skyline with 'Diffusion' written with fireworks on the sky. Text
New York Skyline with 'Text to Image' written with fireworks on the sky. Text
New York Skyline with 'NeurIPS' written with fireworks on the sky. Text
New York Skyline with 'Deep Learning' written with fireworks on the sky. Text
New York Skyline with 'Google Brain Toronto' written with fireworks on the sky. Text
New York Skyline with 'Google Research Pizza Cafe' written with fireworks on the sky. Text

\

资料来源

\

{link}