Few-shot learning (FSL) is transforming machine learning (ML) by enabling models to learn and generate accurate outputs from just a handful of examples, unlike traditional methods that require vast datasets. This guide explores how FSL works, its applications, comparisons with zero-shot learning (ZSL), and its challenges and potential.
Table of contents
What is few-shot learning (FSL)?
Few-shot learning (FSL) refers to a family of ML techniques designed to create adaptable models capable of generating accurate outputs after being trained on just a few labeled examples per category. When only one labeled example per category is available, it’s called one-shot learning. For instance, modern smartphones leverage FSL to recognize a user’s face with just a few photos—or even a single photo.
FSL is particularly valuable because it allows ML models to tackle problems where data is scarce, as it often is in the real world. FSL models can also handle a broader range of tasks than traditional supervised learning models because they learn to generalize. This saves resources because it’s often cheaper and faster to adapt an FSL model to a new task than to train an entirely new model from scratch. FSL is often described as teaching ML models to “think” more like humans by learning to abstract from just a handful of examples.
FSL is often used for computer vision applications but is also deployed in robotics and natural language processing (NLP). For example, FSL has been used to translate ancient Sumerian texts—a helpful task given that Sumerian language experts are in short supply. The Sumerian translator FSL models learned how to translate from just a small set of high-quality samples of cuneiform tablets. They then accurately translated large amounts of unfamiliar text for scholars to analyze.
Few-shot learning vs. few-shot prompting: What’s the difference?
FSL and few-shot prompting are related concepts in ML and NLP, but they serve different purposes.
Few-shot learning
FSL is a model-training technique that teaches models to classify unseen data. It works by adjusting model parameters to adapt to new kinds of classification tasks, drawing on prior knowledge. FSL is related to supervised learning, but the difference is that FSL models are trained on a much more limited dataset.
Few-shot prompting
Few-shot prompting is a way of working with large language models (LLMs). It uses in-context learning—a type of learning in which the model uses information from the prompt, such as format and sentiment, to predict an output. Unlike FSL and traditional supervised learning, few-shot prompting does not involve changing the parameters of the LLM. When you use few-shot prompting, you provide the LLM with several examples of the type of response you’re looking for. Like FSL, few-shot prompting is about helping a model generalize by exposing it to a few examples of a similar task.
How few-shot learning works
Few-shot learning involves two stages: First, models are pre-trained on a general dataset to learn about the world. Then they undergo task adaptation, where models learn how to generalize from small data samples.
Pre-training
The first stage for most FSL models begins with pre-training on a large labeled dataset, just like supervised learning. The model performs feature extraction on this dataset and learns to classify examples by developing a knowledge base about patterns and relationships in the data.
Task adaptation
After pre-training, the next stage of FSL is training the model to generalize to new classification tasks. This is called task adaptation and happens over multiple training episodes.
In each episode, there is a support set of two to five examples for the model to study and a query set with unseen targets for the model to try to classify. This framework is called N-way K-shot classification, in which N refers to the number of categories (called classes), and K refers to the number of labeled examples (shots) of each category.
All FSL models are designed to achieve task adaptation. Within the FSL technique set, one of the most important and exciting research areas is meta-learning.
Meta-learning approaches
Meta-learning involves exposing the model to tasks similar to or related to the classification task the model was initially trained to solve. It gets just a few examples of each new task, but from these, it learns to generalize by developing a meta-framework for what to do when given any unfamiliar task.
Broadly speaking, there are three kinds of approaches to meta-learning:
- Optimization-based learning: This includes approaches that train models to improve their parameters quickly. Some of them use a two-stage process where a learner is trained on a specific task and then a meta-learner uses the loss function from the learner stage to improve the model’s parameters for the next task.
- Metric-level learning: Used mostly for computer vision tasks, metric learning works by mapping extracted features in an embedding space and using the distance between features on the map to output a probability that two images are similar.
- Model-agnostic meta-learning (MAML): In MAML, the goal of the training process is to reduce the number of gradient steps required to optimize the model parameters, regardless of the task. MAML analyzes learning processes for tasks, infers patterns in how the process works, and develops models that act as shortcuts, speeding up the learning process with each new task it sees.
The list of model architectures that use meta-learning techniques is growing all the time as researchers devise new ways to help models become adaptable.
Non-meta-learning approaches
There are also FSL and FSL-adjacent methods that do not use meta-learning. FSL is sometimes deployed alongside these techniques to create a hybrid approach:
- Transfer learning: This method involves taking a pre-trained model and fine-tuning the outer layers of the neural network. Transfer learning is more useful in scenarios where the task you want the model to perform is close to the task it has already trained on.
- Data augmentation: FSL can be strengthened with data augmentation, which involves using your limited data as a base to create synthetic data using generative adversarial networks (GANs) or variational autoencoders to increase the number of samples for your training set.
Few-shot learning vs. zero-shot learning
Few-shot learning (or one-shot learning) is often used in scenarios where there is limited but high-quality data to train a model. But what about if you have no high-quality data at all? In zero-shot learning (ZSL), you give your model no examples and instead ask it to rely solely on prior knowledge and semantic embeddings it can draw on to handle unfamiliar tasks.
ZSL offers a fast, flexible solution for handling situations with very little data. However, ZSL models can struggle with domain shifting—meaning they may struggle if the type of data they are seeing is too different from their knowledge base—and it can be difficult to evaluate how well a model is performing.
Applications for few-shot learning
The applications for FSL are wide-ranging and constantly evolving, but it has enormous potential to be useful in areas where you have relatively few examples available. Some recent areas of research for use cases include:
- Medical diagnostics: FSL can aid in image-based tumor classification where there isn’t enough labeled data for traditional supervised learning models to be helpful.
- Remote sensing: FSL can speed up remote sensing tasks like using UAV footage to assess the impacts of environmental disasters.
- F1 racecar prototyping: FSL models are pre-trained on fluid- and aero-dynamics and other data for hundreds of cars over thousands of races. They then use FSL to predict aerodynamics and part degradation for new car prototypes based on a small number of expensive test runs.
- Machine translation: FSL has helped build more-efficient machine translators that use very little input and can capture nuances in dialect and regional variation with unprecedented accuracy.
- Robotics: FSL is being used to teach robots to learn to grasp objects by watching human demonstrations.
- Sentiment analysis: An FSL model originally trained on hotel reviews can be used to classify restaurant reviews.
FSL is also part of the quest to build artificial general intelligence because it more closely mimics how humans approach problem-solving.
Benefits of few-shot learning
The main benefits of FSL models are that they can handle problems where limited data is available, and they can help reduce the computational and financial resources required to train new models.
Generalizing with limited data
FSL models can do this because they do not memorize images, sounds, or language through many iterations. Instead, they learn to analyze similarities and differences quickly. Whereas traditional models excel at highly specific tasks like identifying a particular species of bird or matching fingerprints, they fail as soon as you ask them to complete any other task.
Using fewer resources
Techniques like MAML are a much more efficient way to use model-training resources. They allow very expensive large-scale models to be quickly and efficiently adapted to specific use cases without expensive retraining steps. One of the big challenges in machine learning is how much data is required to train a model to produce useful outputs, both in terms of compiling large, high-quality datasets and in how much time and computation is required. FSL promises to solve many real-world problems where data is scarce or crosses domains.
Challenges of few-shot learning
Despite its promise, FSL has challenges that can hinder model effectiveness.
Overfitting
Using limited datasets can cause overfitting, where the model aligns too closely with the data in its training sets and struggles to generalize. This is a familiar problem in ML that occurs more frequently with FSL than with other ML approaches. An FSL model that overfits will perform well on test data but won’t identify new categories when presented with real-world examples. To prevent this, it is important to have diversity in the limited samples used for few-shot training. Data augmentation, discussed above, tries to alleviate overfitting by synthesizing more examples for training.
Data quality
High-quality data in both pre-training and the few-shot learning stage is important. FSL models are more easily hampered by noisy, poorly labeled data. They also don’t do well when data has too much of one kind and not of another or has too many features for the model to analyze; in these cases, they tend to become overly complex. Researchers can sometimes cope with these problems by using regularization techniques, which are ways to smooth out data to help a model figure out what to pay attention to and what to ignore.