CLIP

13 Posts

Different Media, Similar Embeddings: ImageBind, the AI model that binds data from seven data types at once
CLIP

Different Media, Similar Embeddings: ImageBind, the AI model that binds data from seven data types at once

The ability of OpenAI’s CLIP to produce similar embeddings of a text phrase and a matching image opened up applications like classifying images according to labels that weren’t in the training set. A new model extends this capability to seven data types.
Like Diffusion but Faster: The Paella model for fast image generation, explained
CLIP

Like Diffusion but Faster: The Paella model for fast image generation, explained

The ability to generate realistic images without waiting would unlock applications from engineering to entertainment and beyond. New work takes a step in that direction.
Example of generation of new videos out of existing ones, using Gen-1
CLIP

Text-Driven Video Alteration: Gen-1 uses text prompts to modify videos.

On the heels of systems that generate video directly from text, new work uses text to adjust the imagery in existing videos. Researchers unveiled Gen-1...
Animated graphs showing how an ensemble of fine-tuned models can provide better performance.
CLIP

Ensemble Models Simplified: New Machine Learning Research Simplifies Ensembles

A CLIP model whose weights were the mean of an ensemble of fine-tuned models performed as well as the ensemble and better than its best-performing constituent.
Different images generated by DALL·E
CLIP

Text-to-Image Goes Viral: Inside Craiyon, Formerly Known as DALL·E Mini

A homebrew re-creation of OpenAI’s DALL·E model is the latest internet sensation. Craiyon has been generating around 50,000 user-prompted images daily, thanks to its ability to produce visual mashups like Darth Vader ice fishing and photorealistic Pokemon characters.
Photograph of Yale Song
CLIP

Yale Song: Foundation models for vision

Large models pretrained on immense quantities of text have been proven to provide strong foundations for solving specialized language tasks. My biggest hope for AI in 2022 is...
Illustration of a woman riding a sled
CLIP

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive
Illustration showing a witch cooking a copy of the Mona Lisa wearing a witch hat)
CLIP

Artistry Is Obsolete: Is AI Making Human Artists Obsolete?

Is human creativity being replaced by the synthetic equivalent? The fear: AI is cranking out increasingly sophisticated visual, musical, and literary works. AI-generated media will flood the market, squeezing out human artists and depriving the world of their creativity.
Series of example of accurate and inaccurate matching images to text
CLIP

Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data

The emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue.
Series of images showing some of the findings of the new study by researchers at Stanford’s Human AI Institute
CLIP

Weak Foundations Make Weak Models: Foundation AI Models Pass Flaws to Fine-Tuned Variants

A new study examines a major strain of recent research: huge models pretrained on immense quantities of uncurated, unlabeled data and then fine-tuned on a smaller, curated corpus.
Series of AI generated imagery
CLIP

CLIP Art: Creating AI art by pairing CLIP with GAN models

Creative engineers are combining deep learning systems to produce a groundswell of generated imagery. Researchers, hackers, and artists are producing new works by pairing CLIP, a pretrained image classifier, with a generative adversarial network (GAN).
Image showing how object detectors work
CLIP

I Know It When I See It: Zero-shot detection for objects not in training data.

Object detectors typically detect only items that were labeled in their training data. A new method liberates them to locate and recognize a much wider variety of objects.
AI-generated images with the model DALL-E
CLIP

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Two new models show a surprisingly sharp sense of the relationship between words and images. OpenAI, the for-profit research lab, announced a pair of models that have produced impressive results in multimodal learning: DALL·E.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox