Skip to content

Conversation

@slimtune2023
Copy link

@slimtune2023 slimtune2023 commented Mar 24, 2025

Completed Features

There were a few changes I added, mostly in updating the current energy estimates for remote model inference as well as integrating this energy tracking with the minions app UI. The following list describes the features and tests I added:

Local energy consumption tracking for Linux

  • Currently, power monitoring is implemented for Mac and NVIDIA devices, so I integrated power tracking for Linux devices using RAPL (which my current device has) into the functions of the PowerMonitor class (energy_tracking.py)
  • The measurements received match up with the measurements from other Linux tools for measuring device power consumption (e.g. powerstat)

Integration with Minion(s) protocols

  • Added power monitoring to both minion.py and minions.py
  • Used power measurements to provide estimate for local energy consumption, and combined these values with estimates for remote energy consumption, which were then returned from the respective call functions

Integration with app UI

  • Added visualization of local/remote energy consumption both when using the Minion(s) protocol and when estimating for remote-only inference
  • Also displays total amount of energy used in each case, as well as total energy savings

Added more detailed estimates for cloud inference energy consumption

  • Create a function (better_cloud_inference_energy_estimate in energy_tracking.py) which differentiates between the prefill and decoding stages of inference
  • Also updates some of the pre-defined constants (like power and GPU utilization) to match up with recent papers and tests done with LLM inference on GPUs
  • Added quadratic dependence of the FLOPs required for inference on the number of input/output tokens, as seen in the attention mechanism for transformers (this has more of an impact for the larger input context lengths, which is Minion(s) primary focus)
  • These estimates work per-prompt and are then combined together for the final estimate, and I was able to integrate this functionality with minion.py (TODO: implement with minions.py)
  • Added basic test script comparing old estimates with new estimates for varied input/output token lengths

Next Steps

There were also some features I wanted to explore further which I didn't get a chance to implement:

  • Make local power monitoring specific to what processes are explicitly involved with the local LLM inference (since the user can have a lot of other processes running on the local device which can also have significant power usage and affect the final energy calculations)
  • Have visualizations of power usage per prompt and over the time it takes for the protocol to complete
  • Benchmark energy savings over the specific choice of the local model and over different context lengths
  • Integrate prompt/completion tokens tracking with the Usage class directly (this would likely entail changes in other files which make use of the Usage class, so I'm waiting before I implement it fully)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant