视觉理解模型微调的lora推理不正确的问题 #3000
zhuchen1109
started this conversation in
General
Replies: 3 comments 1 reply
-
mlp.0 mlp.1 这俩应该不用 tp。 |
Beta Was this translation helpful? Give feedback.
0 replies
-
我排查发现,在vision的mlp.fc1层,其推理结果的tensor里包含了大量nan值。想请教下,这可能是什么原因呢?我使用transformer推理没有出现这样的问题。推理代码对应位置: |
Beta Was this translation helpful? Give feedback.
1 reply
-
我梳理了继承于BaseLinear所有layer的is_tp和all_reduce,都修改为False。还是有nan值,想请教下,这个可能是什么原因导致的呢? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我使用swift微调Qwen2-VL-7B-Instruct模型,微调的 "target_modules": [ "up_proj", "attn.proj", "qkv", "down_proj", "mlp.0", "gate_proj", "k_proj","o_proj", "fc2", "q_proj", "mlp.2", "v_proj", "fc1" ],包含了vision部分的attn.proj、mlp.0、mlp.2。


遇到第一个问题是,patch.py add_adapters方法里,mod.lora_adapters[target_name] = lora,这里target_name不能包含".",我这里修改代码逻辑绕过的,这个逻辑修改能在后面load_lora_weights时正确的加载权重,修改如下截图所示:
遇到第二问题是,visual.merger.mlp这层因没有实现BaseLinear,mlp.0和mlp.2这二层不能加载lora权重,我将原来的nn.Linear修改为BaseLinear实现,修改如下截图所示:
经过上述修改后,我能正常的初始化模型并正常工作,但在我跑验证集的时候,发现结果都是错的。
想请教下,我这修改是有什么问题吗,我还需要做什么工作才能正常工作呢?
Beta Was this translation helpful? Give feedback.
All reactions