Section 06

The Code: Few-Shot Sentiment Classification

Language Models are Few-Shot Learners 2020

The Code: Few-Shot Sentiment Classification

We’ll demonstrate few-shot learning using a smaller language model (GPT-2, which runs on Google Colab). The principle is the same as GPT-3, just smaller scale.

Code Example 1: Few-Shot Sentiment with GPT-2

# Install transformers library (run once)
# !pip install transformers torch

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
model.eval()  # Set to evaluation mode (no training)

# Define the few-shot prompt for sentiment classification
few_shot_prompt = """Classify the sentiment of each review as positive, negative, or neutral.

Review: "The movie was excellent!"
Sentiment: positive

Review: "Terrible customer service."
Sentiment: negative

Review: "The product works, nothing special."
Sentiment: neutral

Review: "I absolutely loved it!"
Sentiment:"""

# Tokenize the prompt
input_ids = tokenizer.encode(few_shot_prompt, return_tensors='pt')

# Generate the next token (the model's prediction)
with torch.no_grad():
    output = model.generate(
        input_ids,
        max_length=input_ids.shape[1] + 5,  # Generate 5 more tokens
        temperature=0.7,  # Lower = more confident; higher = more random
        top_p=0.9,  # Nucleus sampling: only consider top 90% of probability
        do_sample=True  # Use sampling instead of greedy
    )

# Decode the generated tokens back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated output:")
print(generated_text)

Output (example):

Generated output:
Classify the sentiment of each review as positive, negative, or neutral.

Review: "The movie was excellent!"
Sentiment: positive

Review: "Terrible customer service."
Sentiment: negative

Review: "The product works, nothing special."
Sentiment: neutral

Review: "I absolutely loved it!"
Sentiment: positive

The model predicted positive, which is correct.

Code Example 2: Zero-Shot vs Few-Shot Comparison

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()

# Zero-shot prompt: no examples, just the task description
zero_shot_prompt = """Classify the sentiment: "Great product, highly recommend!"
Sentiment:"""

# Few-shot prompt: 3 examples + task
few_shot_prompt = """Classify sentiment as positive, negative, or neutral.

Review: "Excellent quality."
Sentiment: positive

Review: "Poor quality."
Sentiment: negative

Review: "It's okay."
Sentiment: neutral

Review: "Great product, highly recommend!"
Sentiment:"""

# Function to generate and decode
def predict_sentiment(prompt, model, tokenizer, max_tokens=10):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    with torch.no_grad():
        output = model.generate(
            input_ids,
            max_length=input_ids.shape[1] + max_tokens,
            temperature=0.7,
            do_sample=False  # Greedy decoding for consistency
        )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Compare outputs
print("ZERO-SHOT OUTPUT:")
print(predict_sentiment(zero_shot_prompt, model, tokenizer))
print("\n" + "="*50 + "\n")
print("FEW-SHOT OUTPUT:")
print(predict_sentiment(few_shot_prompt, model, tokenizer))

Typical output:

ZERO-SHOT OUTPUT:
Classify the sentiment: "Great product, highly recommend!"
Sentiment: positive

==================================================

FEW-SHOT OUTPUT:
Classify sentiment as positive, negative, or neutral.

Review: "Excellent quality."
Sentiment: positive

Review: "Poor quality."
Sentiment: negative

Review: "It's okay."
Sentiment: neutral

Review: "Great product, highly recommend!"
Sentiment: positive

Few-shot often produces more consistent and accurate results because the examples clarify the task format.

Code Explanation

Line-by-line breakdown of Example 1:

from transformers import GPT2LMHeadModel, GPT2Tokenizer  
# Import the model and tokenizer classes from Hugging Face

import torch
# PyTorch library for tensor operations

model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
# Load the pre-trained GPT-2 tokenizer (converts text ↔ tokens)

model = GPT2LMHeadModel.from_pretrained(model_name)
# Load the pre-trained GPT-2 model (124M parameters, much smaller than GPT-3)

model.eval()
# Set model to evaluation mode (disables dropout, batch norm if present)

few_shot_prompt = """..."""
# Define the few-shot prompt with examples and a test task

input_ids = tokenizer.encode(few_shot_prompt, return_tensors='pt')
# Convert text → token IDs; return_tensors='pt' gives PyTorch tensor

with torch.no_grad():
    # Disable gradient computation (we're not training, just inferring)
    
    output = model.generate(...)
    # Generate tokens auto-regressively:
    # 1. Compute probability distribution over next token
    # 2. Sample or pick the highest-probability token
    # 3. Add it to the sequence
    # 4. Repeat until max_length is reached

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
# Convert token IDs back to text

Key Parameters

Temperature: Controls randomness.

  • temperature=0.1 → Very confident, greedy (always pick the highest prob token)
  • temperature=0.7 → Balanced (some randomness, some certainty)
  • temperature=2.0 → Very random

top_p (nucleus sampling): Only consider the top fraction of probability mass.

  • top_p=0.9 → Consider tokens that make up 90% of cumulative probability
  • top_p=1.0 → Consider all tokens (unrestricted)

do_sample: If True, sample from the distribution; if False, use greedy decoding (pick the highest-probability token).

Running on Google Colab

  1. Open Google Colab
  2. Create a new notebook
  3. Paste Example 1 or 2 code into a cell
  4. Run the cell: Colab will install transformers and torch automatically on first import
  5. Observe the output

The code will run for free on Colab’s GPUs (usually a Tesla T4 or K80).

Why This Demonstrates GPT-3’s Principle

  • We’re using GPT-2 (124M parameters), not GPT-3 (175B), because GPT-3 requires paid API access.
  • The principle is the same: Examples in the prompt guide the model without fine-tuning.
  • Few-shot works better than zero-shot because the model learns the pattern from context.
  • No fine-tuning: We don’t update any weights. All learning is in-context.

With GPT-3 (175B parameters), the accuracy would be higher and the model could handle more complex tasks (math, code, reasoning).


Key Takeaways from This Section

  • Transformers library provides pre-trained models like GPT-2.
  • Few-shot prompting is as simple as writing examples in text.
  • Temperature controls output randomness.
  • No fine-tuning required: Just prompt and generate.
  • Works on free Google Colab GPUs.

Next: Section 07: Limitations