tk_v0.1 #2462

miao200years · 2025-08-22T11:57:56Z

U:Tokenizer v0.1
D:45T、45Lite、Qwen/Qwen2.5-7B-Instruct-1M、Qwen/Qwen3-32B

paddle-bot · 2025-08-22T11:58:02Z

Thanks for your contribution!

lugimzzz · 2025-08-22T14:32:00Z

paddleformers/utils/downloader.py

@@ -530,6 +529,7 @@ def get_static_model_on_pdc(remote_path, local_path, timeout, enable_flash_devic
    Returns:
        str: path to load static model
    """
+


记个TODO，后续get_static_model_on_pdc这个函数可以删掉

lugimzzz · 2025-08-22T14:36:14Z

paddleformers/transformers/configuration_utils.py

@@ -568,6 +566,8 @@ def __init__(self, **kwargs):
        self.use_filtered_label_loss = kwargs.pop("use_filtered_label_loss", False)
        self.loss_subbatch_seqlen = kwargs.pop("loss_subbatch_seqlen", -1)

+        from ..quantization.quantization_config import QuantizationConfig


https://github.com/PaddlePaddle/PaddleFormers/blob/develop/paddleformers/quantization/quantization_config.py#L116 QuantizationConfig中这个paddle的依赖也只是个判断而已，不如try：from paddle.nn.quant.quantized_linear import _get_arch_info 没有就不对GPU版本做判断就行

lugimzzz · 2025-08-22T14:37:18Z

paddleformers/transformers/auto/tokenizer.py

-    Adapted from transformers.AutoTokenizer.from_pretrained with modifications:
-    1. Added get_paddleformers_tokenizer_config() to extend tokenizer_config.json download source
-    2. Explicitly binds PaddleTokenizerMixin to the tokenizer class before final instantiation
+    绑定 PaddleTokenizerMixin，如果 Paddle 可用则绑定，否则返回原类


注释不要用中文

lugimzzz · 2025-08-22T14:37:42Z

paddleformers/transformers/auto/tokenizer.py

@@ -205,7 +245,9 @@ def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs):

            if tokenizer_class is None:
                raise ValueError(f"Tokenizer class {tokenizer_class_name} is not currently imported.")
-            tokenizer_class = type(tokenizer_class.__name__, (PaddleTokenizerMixin, tokenizer_class), {})
+
+            # 绑定 PaddleTokenizerMixin


lugimzzz · 2025-08-22T14:42:39Z

paddleformers/transformers/qwen2/tokenizer.py

@@ -14,6 +14,9 @@
 # limitations under the License.
 import transformers as hf

-from ..tokenizer_utils import warp_tokenizer
+try:


我们的目标是处理掉PaddleTokenizerMixin中对Paddle冗余的依赖，而不是让paddleformers中的tokenizer不使用 PaddleTokenizerMixin

tk_v0.1

b59a7ac

paddle-bot bot added the contributor label Aug 22, 2025

lugimzzz reviewed Aug 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tk_v0.1 #2462

tk_v0.1 #2462

miao200years commented Aug 22, 2025

Uh oh!

paddle-bot bot commented Aug 22, 2025

Uh oh!

lugimzzz Aug 22, 2025

Uh oh!

lugimzzz Aug 22, 2025

Uh oh!

lugimzzz Aug 22, 2025

Uh oh!

lugimzzz Aug 22, 2025

Uh oh!

lugimzzz Aug 22, 2025

Uh oh!

Uh oh!

tk_v0.1 #2462

Are you sure you want to change the base?

tk_v0.1 #2462

Conversation

miao200years commented Aug 22, 2025

Uh oh!

paddle-bot bot commented Aug 22, 2025

Uh oh!

lugimzzz Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

lugimzzz Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

lugimzzz Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

lugimzzz Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

lugimzzz Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!