Query Regarding ZeRO-1 in ColossalAI Not Sharding Optimizer State #4328
              
                Unanswered
              
          
                  
                    
                      yhna940
                    
                  
                
                  asked this question in
                Community | Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
I have been recently studying the ZeRO-1 strategy implemented by ColossalAI and have noticed something that seems quite unusual. As per my understanding, ColossalAI employs the LowLevelZeroOptimizer for its ZeRO-1 strategy.
According to the relevant literature, ZeRO-1 should shard the optimizer state, akin to what is done in fairscale's OSS or torch's Zero Redundancy. However, as I was perusing through the inner workings of the LowLevelZeroOptimizer, I couldn't find any section where the optimizer's state is sharded. I was able to confirm that it shards the gradients and parameters but not the optimizer state.
I am seeking verification regarding my understanding of this matter. Is it indeed the case that ColossalAI's ZeRO-1 doesn't shard the optimizer state or am I missing something? I would appreciate any insights or clarifications that you can provide.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions