Generalized Visual Language Models

Processing images to generate text, such as image captioning and visual question-answering, has been studied for years. Traditionally such systems rely on an object detection network as a vision encoder to capture visual features and then produce text via a text decoder. Given a large amount of existing literature, in this post, I would like to only focus on one approach for solving vision language tasks, which is to extend pre-trained generalized language models to be capable of consuming visual signals....

June 9, 2022 · 24 min · Lilian Weng

Learning with not Enough Data Part 3: Data Generation

Here comes the Part 3 on learning with not enough data (Previous: Part 1 and Part 2). Let’s consider two approaches for generating synthetic data for training. Augmented data. Given a set of existing training samples, we can apply a variety of augmentation, distortion and transformation to derive new data points without losing the key attributes. We have covered a bunch of augmentation methods on text and images in a previous post on contrastive learning....

April 15, 2022 · 28 min · Lilian Weng

Learning with not Enough Data Part 2: Active Learning

This is part 2 of what to do when facing a limited amount of labeled data for supervised learning tasks. This time we will get some amount of human labeling work involved, but within a budget limit, and therefore we need to be smart when selecting which samples to label. Notations Symbol Meaning $K$ Number of unique class labels. $(\mathbf{x}^l, y) \sim \mathcal{X}, y \in \{0, 1\}^K$ Labeled dataset....

February 20, 2022 · 22 min · Lilian Weng

Learning with not Enough Data Part 1: Semi-Supervised Learning

When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed. Pre-training + fine-tuning: Pre-train a powerful task-agnostic model on a large unsupervised data corpus, e.g. pre-training LMs on free text, or pre-training vision models on unlabelled images via self-supervised learning, and then fine-tune it on the downstream task with a small set of labeled samples. Semi-supervised learning: Learn from the labelled and unlabeled samples together....

December 5, 2021 · 26 min · Lilian Weng

How to Train Really Large Models on Many GPUs?

[Updated on 2022-03-13: add expert choice routing.] In recent years, we are seeing better results on many NLP benchmark tasks with larger pre-trained language models. How to train large and deep neural networks is challenging, as it demands a large amount of GPU memory and a long horizon of training time. However an individual GPU worker has limited memory and the sizes of many large models have grown beyond a single GPU....

September 24, 2021 · 21 min · Lilian Weng