From f39d609b741128f5a55ba68eeddffe8c9f7e83d2 Mon Sep 17 00:00:00 2001
From: Simeon Emanuilov <simeon.emanuilov@gmail.com>
Date: Tue, 9 Sep 2025 23:55:59 +0300
Subject: [PATCH] fix: move misplaced paragraph to correct position

The "Broadly, they can be grouped into three categories" paragraph was incorrectly placed in the middle of the chronological model timeline and has been moved to its proper location at the end of the list, just before the three architecture categories are defined.
---
 chapters/en/chapter1/4.mdx | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/chapters/en/chapter1/4.mdx b/chapters/en/chapter1/4.mdx
index 3870b541f..22a06a5c1 100644
--- a/chapters/en/chapter1/4.mdx
+++ b/chapters/en/chapter1/4.mdx
@@ -28,14 +28,13 @@ The [Transformer architecture](https://arxiv.org/abs/1706.03762) was introduced
 
 - **October 2018**: [BERT](https://arxiv.org/abs/1810.04805), another large pretrained model, this one designed to produce better summaries of sentences (more on this in the next chapter!)
 
-- **February 2019**: [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), an improved (and bigger) version of GPT that was not immediately publicly released due to ethical concerns
+- **February 2019**: [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), an improved (and bigger) version of GPT that was not immediately publicly released due to ethical concerns.
 
 - **October 2019**: [T5](https://huggingface.co/papers/1910.10683), A multi-task focused implementation of the sequence-to-sequence Transformer architecture.
 
 - **May 2020**, [GPT-3](https://huggingface.co/papers/2005.14165), an even bigger version of GPT-2 that is able to perform well on a variety of tasks without the need for fine-tuning (called _zero-shot learning_)
 
-- **January 2022**: [InstructGPT](https://huggingface.co/papers/2203.02155), a version of GPT-3 that was trained to follow instructions better
-This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:
+- **January 2022**: [InstructGPT](https://huggingface.co/papers/2203.02155), a version of GPT-3 that was trained to follow instructions better.
 
 - **January 2023**: [Llama](https://huggingface.co/papers/2302.13971), a large language model that is able to generate text in a variety of languages.
 
@@ -45,6 +44,7 @@ This list is far from comprehensive, and is just meant to highlight a few of the
 
 - **November 2024**: [SmolLM2](https://huggingface.co/papers/2502.02737), a state-of-the-art small language model (135 million to 1.7 billion parameters) that achieves impressive performance despite its compact size, and unlocking new possibilities for mobile and edge devices.
 
+This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:
 - GPT-like (also called _auto-regressive_ Transformer models)
 - BERT-like (also called _auto-encoding_ Transformer models) 
 - T5-like (also called _sequence-to-sequence_ Transformer models)