Context: The Era of Proprietary Giants and Scaling Uncertainties — LLaMA: Open and Efficient Foundation Language Models

By late 2022, the landscape of large language models was dominated by a few proprietary models:

GPT-3 (175 billion parameters, OpenAI, 2020)
GPT-3.5 (larger, OpenAI, 2022)
PaLM (540 billion parameters, Google, 2022)
Chinchilla (70 billion parameters, DeepMind, 2022)

These models set the state-of-the-art on benchmarks. But they had a critical limitation: they were closed. You could not access the model weights. You could only call an API or wait for papers to be published. This meant:

Research bottleneck: Researchers outside OpenAI, Google, and DeepMind could not easily experiment with frontier models.
Expensive: Training your own model at this scale required massive compute resources (thousands of GPUs, millions of dollars).
Accessibility: Only wealthy institutions could push forward AI research. Students, startups, and researchers in small-town India had no way to work with models at the frontier.

The Scaling Laws Insight (Paper 13 Context)

In late 2022, DeepMind published the Chinchilla paper, which found something surprising:

The prevailing wisdom had been: “Bigger models are better. Train the largest model you can afford.”

Chinchilla showed: “No. Train a smaller model on MORE data.”

Specifically, Chinchilla found that for a fixed compute budget C, you should allocate roughly:

Model size N: proportional to C^(something)
Data tokens D: roughly equal to model size (D ≈ 20N)

This meant that GPT-3’s allocation was suboptimal. GPT-3 trained a 175B model on ~300B tokens. Chinchilla showed you could train a 70B model on 1.4 trillion tokens with similar compute, and it would perform better.

The Open-Source Gap

Despite these insights, there was a gap:

Chinchilla weights were not released. DeepMind published the paper and findings, but the model itself remained proprietary.
GPT-3 weights were not released. OpenAI operated GPT-3 as a closed API service.
Open models (like BLOOM) existed, but they were smaller and less capable than proprietary models.

This created a two-tier research ecosystem:

Tier 1 (Proprietary): Companies with massive compute (OpenAI, Google, DeepMind, Meta) trained huge models. These were the frontier.
Tier 2 (Academic/Open): Researchers used open models like BLOOM (176B), which were good but not as good as GPT-3 or PaLM.

Meta’s Bet

Meta AI (then known as Facebook AI Research) had been training large models but releasing them less publicly than they could have. By early 2023, the company saw an opportunity:

Apply Chinchilla scaling: Train models at 7B, 13B, 33B, and 65B parameters on 1.4 trillion tokens (publicly available data).
Release the weights: Unlike DeepMind and OpenAI, publish the model weights (with research-only licensing) so the community could study, fine-tune, and build on them.
Beat GPT-3: Show that a much smaller, openly-released model could match or exceed GPT-3 in capability.

This would democratize frontier-level LLMs, shifting power from closed labs to the broader research community.

The Architectural Context

At the time, the standard transformer architecture (from Paper 08) was well-established. But there were known inefficiencies:

LayerNorm was the standard way to normalize activations. It worked but was computationally expensive and could be unstable.
ReLU activation in feedforward layers was standard, but newer activations like Swish were showing promise.
Absolute positional embeddings (learned position embeddings) worked but had issues with generalization to longer sequences.
No pre-normalization: Many models applied LayerNorm after attention/FFN (post-norm), which was less stable than pre-normalization.

Meta combined these insights into a single set of architectural improvements that, while not revolutionary individually, collectively made training more efficient and inference faster.

The Timing

The timing was crucial:

February 2023: LLaMA paper released on arXiv
March 2023: LLaMA weights leaked/shared online (despite research-only licensing)
April 2023: Alpaca released (Stanford fine-tuning LLaMA on instruction data)
May 2023: Vicuna, WizardLM, and dozens of other fine-tunes appear
July 2023: LLaMA 2 released with commercial licensing
2024: LLaMA 3 released; Mistral 7B and other open models follow

The release (and leak) of LLaMA weights triggered an explosion of open-source LLM research that continues today.

Why This Mattered

Before LLaMA: If you wanted to work with a frontier LLM, you could:

Pay OpenAI/API to use GPT-3
Wait for papers and try to reproduce (often impossible without massive compute)
Train your own from scratch (too expensive for most)

After LLaMA: You could:

Download the weights from Hugging Face
Run inference on a single consumer GPU
Fine-tune for your own tasks with a laptop or modest server
Understand the code, make modifications, and experiment

This shift from proprietary to open was revolutionary for the field.