CLIP

13 Posts

Different Media, Similar Embeddings: ImageBind, the AI model that binds data from seven data types at once

The ability of OpenAI’s CLIP to produce similar embeddings of a text phrase and a matching image opened up applications like classifying images according to labels that weren’t in the training set. A new model extends this capability to seven data types.

CLIP

Like Diffusion but Faster: The Paella model for fast image generation, explained

The ability to generate realistic images without waiting would unlock applications from engineering to entertainment and beyond. New work takes a step in that direction.

Example of generation of new videos out of existing ones, using Gen-1

CLIP

Text-Driven Video Alteration: Gen-1 uses text prompts to modify videos.

On the heels of systems that generate video directly from text, new work uses text to adjust the imagery in existing videos. Researchers unveiled Gen-1...

Animated graphs showing how an ensemble of fine-tuned models can provide better performance.

CLIP

Ensemble Models Simplified: New Machine Learning Research Simplifies Ensembles

A CLIP model whose weights were the mean of an ensemble of fine-tuned models performed as well as the ensemble and better than its best-performing constituent.

CLIP

Text-to-Image Goes Viral: Inside Craiyon, Formerly Known as DALL·E Mini

A homebrew re-creation of OpenAI’s DALL·E model is the latest internet sensation. Craiyon has been generating around 50,000 user-prompted images daily, thanks to its ability to produce visual mashups like Darth Vader ice fishing and photorealistic Pokemon characters.

CLIP

Yale Song: Foundation models for vision

Large models pretrained on immense quantities of text have been proven to provide strong foundations for solving specialized language tasks. My biggest hope for AI in 2022 is...

CLIP

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive

Illustration showing a witch cooking a copy of the Mona Lisa wearing a witch hat)

CLIP

Artistry Is Obsolete: Is AI Making Human Artists Obsolete?

Is human creativity being replaced by the synthetic equivalent? The fear: AI is cranking out increasingly sophisticated visual, musical, and literary works. AI-generated media will flood the market, squeezing out human artists and depriving the world of their creativity.

Series of example of accurate and inaccurate matching images to text

CLIP

Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data

The emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue.

Series of images showing some of the findings of the new study by researchers at Stanford’s Human AI Institute

CLIP

Weak Foundations Make Weak Models: Foundation AI Models Pass Flaws to Fine-Tuned Variants

A new study examines a major strain of recent research: huge models pretrained on immense quantities of uncurated, unlabeled data and then fine-tuned on a smaller, curated corpus.

CLIP

CLIP Art: Creating AI art by pairing CLIP with GAN models

Creative engineers are combining deep learning systems to produce a groundswell of generated imagery. Researchers, hackers, and artists are producing new works by pairing CLIP, a pretrained image classifier, with a generative adversarial network (GAN).

CLIP

I Know It When I See It: Zero-shot detection for objects not in training data.

Object detectors typically detect only items that were labeled in their training data. A new method liberates them to locate and recognize a much wider variety of objects.

AI-generated images with the model DALL-E

CLIP

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Two new models show a surprisingly sharp sense of the relationship between words and images. OpenAI, the for-profit research lab, announced a pair of models that have produced impressive results in multimodal learning: DALL·E.

CLIP

Different Media, Similar Embeddings: ImageBind, the AI model that binds data from seven data types at once

Like Diffusion but Faster: The Paella model for fast image generation, explained

Text-Driven Video Alteration: Gen-1 uses text prompts to modify videos.

Ensemble Models Simplified: New Machine Learning Research Simplifies Ensembles

Text-to-Image Goes Viral: Inside Craiyon, Formerly Known as DALL·E Mini

Yale Song: Foundation models for vision

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.

Artistry Is Obsolete: Is AI Making Human Artists Obsolete?

Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data

Weak Foundations Make Weak Models: Foundation AI Models Pass Flaws to Fine-Tuned Variants

CLIP Art: Creating AI art by pairing CLIP with GAN models

I Know It When I See It: Zero-shot detection for objects not in training data.

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Subscribe to The Batch