Dense computation

Appears in 1 paper

The standard approach in neural networks: every parameter fires for every input.

As used in Paper 09 — Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer →

The standard approach in neural networks: every parameter fires for every input. A dense FFN with 1 billion parameters uses all 1 billion for every token. Contrasted with sparse computation, where only a subset of parameters activates per input.

Paper 09 — Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer →

Appears in papers