BooksCorpus
The training dataset for GPT-1: approximately 7,000 unpublished novels scraped from the web, totalling ~800 million words.
The training dataset for GPT-1: approximately 7,000 unpublished novels scraped from the web, totalling ~800 million words. Chosen for its long-range narrative structure.