Section 08

Impact: What Changed After Gemini

Gemini: A Family of Highly Capable Multimodal Models 2023

Impact: What Changed After Gemini

Immediate Industry Response

When Gemini was announced in December 2023, the AI landscape shifted:

OpenAI’s Counter: GPT-4 Turbo

Days after Gemini’s announcement, OpenAI released GPT-4 Turbo with:

  • 128K context (vs Gemini’s 32K) — 4x longer
  • Better performance on some benchmarks
  • Lower price point

Message: “We’re not being overtaken.”

Microsoft’s AI Strategy

Microsoft, which invested $10B+ in OpenAI, accelerated:

  • Copilot integration across Office, Windows
  • Azure OpenAI availability
  • Competition with Google’s position in enterprise

Meta’s Response

Meta, having released open-source LLaMA, doubled down on:

  • Code Llama for software engineering
  • Open-source multimodal models
  • Positioning against closed, proprietary models

Google’s Internal Acceleration

Gemini announcement triggered massive organizational changes at Google:

DeepMind-Brain Merger Activation

The July 2023 announcement of Google merging DeepMind and Google Brain became real. Suddenly:

  • The same organization built Transformers, BERT, LaMDA, and PaLM
  • All efforts unified toward Gemini variants
  • Resources flowed to AI (not just DeepSearch and other projects)

Product Integration

Gemini became the backbone for:

  • Google Bard → renamed to Gemini (March 2024)
  • Gmail Smart Compose — powered by Gemini
  • Google Workspace — Docs, Sheets, Slides now have Gemini drafting
  • Google Search — “AI Overviews” powered by Gemini

Organizational Pivot

  • Sundar Pichai (Google CEO) personally overseeing AI strategy
  • CEO bonuses tied to AI competitiveness
  • Hiring surge in AI teams across all major cities

Gemini 1.5: The Real Breakthrough

The initial Gemini (1.0) had limitations (32K context, data contamination questions). But Gemini 1.5 (May 2024) changed everything:

1 Million Token Context

Gemini 1.0:  32K tokens (32,000 words)
Gemini 1.5:  1M tokens (1,000,000 words)

Practical examples:
- 1M tokens ≈ 750 pages of a novel
- 1M tokens ≈ 12 hours of meeting transcripts (converted to text)
- 1M tokens ≈ An entire codebase

This was a 10x jump, the largest context of any frontier model at the time.

How It Works

Gemini 1.5 uses techniques like:

  • RoPE (Rotary Position Embeddings) — better positional encoding for long contexts
  • Efficient attention patterns — not fully dense, uses sparse/sliding-window attention
  • Hardware optimization — custom kernels to fit 1M tokens in GPU memory

This leap didn’t come from novel research — it came from engineering and scale.

Open-Source Spin-Off: Gemma

Google released Gemma — a family of open-source models based on Gemini technology:

Gemma 2B   — Mobile, on-device (like Nano)
Gemma 7B   — Laptop, edge devices
Gemma 13B  — Server-side open-source (like Pro)

Why release open-source?

  1. Build ecosystem: Developers prefer open models (LLaMA lesson)
  2. Research: Academic teams can build on it
  3. PR: “We’re not locking up AI” (unlike OpenAI)
  4. Competitive pressure: Match Meta’s LLaMA success

Gemma became Google’s answer to LLaMA — practical, usable, open.

Benchmark Improvements

Gemini 1.0 claims were questioned, but subsequent releases solidified:

MMLU (Knowledge Benchmark)

GPT-3.5:       70%
LLaMA 70B:     82%
GPT-4:         86.4%
Gemini 1.0:    90.04% (claimed, disputed)
Gemini 1.5:    95.9% (more trusted)

Long-Context Tasks

New benchmarks emerged for long-context understanding:

Gemini 1.5 vs GPT-4 Turbo on 1M-token tasks:
- Needle-in-haystack: Gemini 1.5 ✓, GPT-4 struggles
- Long document QA: Gemini 1.5 significantly better
- Code repository understanding: Gemini 1.5 advantage

Industry Acceptance of “Multimodality as Default”

Post-Gemini, the industry consensus shifted:

Before Gemini: “Multimodal is a research problem, text-only is production.”

After Gemini: “Multimodal is expected, text-only is legacy.”

Consequences

  1. OpenAI accelerated vision capabilities for GPT-4V and GPT-4 Turbo
  2. Anthropic added vision to Claude
  3. Startups began building multimodal-first (not text-first with vision bolted on)
  4. Academic research pivoted toward multimodal reasoning

Market Impact

Google’s Market Position

  • Gemini restored confidence that Google wasn’t “behind” in AI
  • Helped retain talent (people want to work on state-of-the-art)
  • Gave Google negotiating power with enterprise customers (“We have our own frontier model”)

Pricing and Accessibility

GPT-4 (via API):  $0.03 per 1K input tokens (expensive)
Gemini Pro (API): $0.0005 per 1K tokens (100x cheaper!)
Gemini Nano:      Free on Pixel phones

Lower pricing democratized access — students and researchers could afford to use Gemini at scale.

What’s Coming (Implied by Gemini’s Success)

  1. Gemini 2.0 — Rumored for late 2024, potentially with audio/video APIs
  2. On-Device Models — Nano extended to more phones and devices
  3. Specialized Variants — Medical Gemini, Legal Gemini, Scientific Gemini (fine-tuned)
  4. Multimodal Reasoning — Better cross-modal alignment, not just token-level multimodality

The Unspoken Impact: Credibility

Perhaps the biggest impact: Google proved it could compete with OpenAI.

Before Gemini:

  • “Is Google falling behind in AI?” (real concern in 2023)
  • Bard perceived as weaker than ChatGPT
  • OpenAI had narrative momentum

After Gemini:

  • “Google has multiple world-class models” (Gemini, Gemma, PaLM family)
  • Bard rebranded → Gemini
  • Narrative reset to “competition is real”

This mattered for:

  • Investor confidence: Google’s stock price wasn’t hurt by AI fears
  • Talent retention: Top researchers stayed at Google
  • Enterprise sales: Companies could choose Google for AI, not just OpenAI

Next: Summary: Everything You Need to Know