CaiT

1 Post

Shifted Patch Tokenization (SPT) | Locality Self-Attention (LSA)
CaiT

Less Data for Vision Transformers: Boosting Vision Transformer Performance with Less Data

Vision Transformer (ViT) outperformed convolutional neural networks in image classification, but it required more training data. New work enabled ViT and its variants to outperform other architectures with less training data.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox