Strategies for Server/RPC and mixed performance machines #7468
SoftwareRenderer
started this conversation in
Ideas
Replies: 1 comment
-
|
Thanks for sharing this. Another work in this direction is the paddler project mentioned in #7369. The RPC backend is a simple proxy for existing I think that combining the backend scheduler with high-level orchestration like yours can unlock the full potential of distributed LLM inference. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a similar use-case as the one described in #6829 and wanted to share what I've done to approach this: https://github.com/SoftwareRenderer/llmwrangler . The goal for this is to avoid bottlenecks when including CPU instances along GPU.
I'm hoping that these ideas are useful (if not already implemented), and they can be integrated into the RPC backend. I need to brush up on my C, so it'll be a while until I can do it myself (and not be ashamed to submit a PR for the code).
In
llmwranglerthere's a couple features:@rgerganov Tagging you since it looks like you're doing most of the heavy lifting on RPC
Beta Was this translation helpful? Give feedback.
All reactions