Skip to content

Conversation

omar-El-Baz
Copy link

Hi there!
I noticed a small but important error in the "Models" section of Chapter 2 of the course. This PR corrects it.
The Problem:
In the subsection "Why is all of this necessary?", the text demonstrates tokenizing two sentences that result in lists of different lengths. However, the following paragraph incorrectly states:
This "array" is already of rectangular shape, so converting it to a tensor is easy:

import torch
model_inputs = torch.tensor(encoded_sequences)

This is a contradiction, and the code snippet would fail, which could be confusing for learners.
The Solution:
I've updated the text to correctly identify that the lists are of different lengths and therefore require padding to be converted into a rectangular tensor. I also removed the erroneous code snippet.
This change makes the explanation clearer, more accurate, and reinforces the importance of the padding concept discussed just before this section.
Thanks for maintaining this wonderful course

The section "Why is all of this necessary?" contained a logical
contradiction. It showed two tokenized sequences of different lengths
but then claimed the resulting array was "already of rectangular shape."

This change corrects the text to accurately state that the sequences
are of different lengths and cannot be directly converted to a tensor
without padding. It also removes a misleading code snippet that would
raise an error if executed.

This improves the clarity and correctness of the course material.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants