Skip to content

Conversation

@mcfedr
Copy link

@mcfedr mcfedr commented Sep 1, 2023

Enable different modes for feature flags

fixes #65

@mplachter
Copy link
Contributor

mplachter commented Sep 1, 2023

@mcfedr Thank you for the MR.

We have had a ton of feedback on the gauges.
We need to do something different... 🤔

Currently (to your point), we add the gauges up (which does not make sense to do)

This approach of just having the new gauge as the value, I think, can be misleading, as well as having a randomly high or low value as it's not an aggregate from all the containers\functions\pods sending metrics into the gateway. I think this is better than currently summing the gauge, as it makes no sense to do so...

My ideal situation would be to make a running average or median of the metric gauge over a given interval. I do think this will take some time to implement I do have a draft MR #57 to add this but to keep it performant I had to duplicate the memory footprint which isn't ideal (we may have to rewrite the structs and move away from slices of metrics)

We could change this behavior based on a CLI flag so we can still implement your desired change but make it so it doesn't change the current behavior unless a CLI flag is passed.

@djeebus thoughts?

@djeebus
Copy link
Collaborator

djeebus commented Sep 1, 2023

Yeah, I agree w/ all your points. A flag, something like --gauge-behavior=sum|replace, would probably be a good start, which leaves us an opening to do something like median, max, etc.

@mcfedr
Copy link
Author

mcfedr commented Sep 1, 2023

Yes, in my use case the value is an calculated value from some data, so no matter which process (in my case lambda functions) sends it, its the latest value - so i certainly wouldnt want any manipulation of the value

but i also thought about a cli flag, might be helpful to have different options

@SpangleLabs
Copy link

People keep suggesting this: #65 #71 and it seems like if you want the gauge to replace, then you should push it to a prometheus push gateway, not an aggregation gateway.
It feels like an aggregation gateway should aggregate.

I think there are cases where aggregating the gauges is exactly what's wanted, especially in summaries for example.

It seems weird to add a CLI flag that turns the aggregation gateway into a push gateway though. I can understand wanting some metrics to be aggregated and some to be latest, in which case, push some to an aggregation gateway, and some to a push gateway?

The idea of the aggregation gateway showing the average gauge value of the last N pushes or M minutes also seems odd? It would need to be synced to the fetching or the pushing, and liable to go out of sync with either. If you want a running total, that seems like something an aggregated summary could do, or a prometheus query over a latest value gauge?

@mcfedr
Copy link
Author

mcfedr commented Sep 1, 2023

@SpangleLabs Gauge are different from Counters though, thats clearly why this is happening

I could use both push gateway and aggregation gateway - but it does mean my app has to make two http requests to send all the metrics, and thats a big disadvantage

@SpangleLabs
Copy link

Yeah, I'm just quite hesitant on the idea that an aggregation gateway should also do the job of a push gateway?

I guess if you need to save on web requests then it would be helpful to have a gateway that can serve the purposes of both.. But even then that would seem best configured at a metric or job level, rather than a gateway level? Though I've no idea what syntax could enable that. Tying the operation of your application and monitoring, to the configuration of your aggregation gateway seems a weird choice, and doesn't seem to properly separate concerns.

I feel it boils down to an idea of "do one thing well"

(And certainly, making a breaking change to the behaviour seems unwise)

In fairness though, I've been digging through to find my concrete use-cases for aggregating gauges, and most of them are in places where we were rolling our own summaries, as the aggregation gateway did not support Summary metrics at the time. I might be suffering some anchoring based on that

@mcfedr mcfedr force-pushed the gauge-replace branch 2 times, most recently from a759536 to 1141a76 Compare September 1, 2023 16:17
@mcfedr mcfedr changed the title fix: replace gauge values with the latest feat: flag to enable gauge replacement Sep 1, 2023
@mcfedr
Copy link
Author

mcfedr commented Sep 1, 2023

So what about something like this? I'm not 100% sure its the best way to pass the flag though, but its functional like this.

@mcfedr mcfedr force-pushed the gauge-replace branch 2 times, most recently from e9abc2a to 3fcc9c1 Compare September 1, 2023 16:46
@pablote
Copy link

pablote commented Sep 6, 2023

It'd be great if this or something very similar got merged. Having two push gateways, one that aggregates and one that doesn't does not make sense.

@SpangleLabs
Copy link

Honestly, at this point, I'm growing kind of unsure when one would want a push gateway at all, alongside or instead of an aggregation gateway.

After much thought (and overthought), I think this solution looks great. Having a flag to allow backwards compatibility, but moving towards the better usage of the thing.

(Even the stuff I said before about summaries being simulated with a counter and a gauge is entirely wrong. A summary is 2 counters)

@mcfedr
Copy link
Author

mcfedr commented Sep 12, 2023

@mplachter maybe I can bump this for a review?

@mplachter
Copy link
Contributor

mplachter commented Sep 12, 2023

@mcfedr This looks good if we're able to add a unit_test for this in aggregate_test.go we can get this merged in, If we add this functionality, I want to make sure we do not undo it without the proper knowledge + documentation during a future release :)

@hxnir
Copy link

hxnir commented Jan 1, 2024

Hey, it looks really good and would really help one of my usecases.
Any updates on the progress? If the missing unit test is the problem I am more than happy to add it myself.

@mcfedr
Copy link
Author

mcfedr commented Jan 4, 2024

@hxnir thanks for bumping this, it kept slipping down my list of things to do. I've added a couple of tests to check the new behavoir, @mplachter hopefully good for a merge now.

also with a rebase on main

@JDLK7
Copy link

JDLK7 commented Mar 21, 2024

Hey, do you plan on merging this anytime soon?
Thanks!

@xstephen95x
Copy link

Also curious if this will be picked up.

@KepptnKool
Copy link

@djeebus @mplachter What are the chances of this PR getting merged? Do you need any support? I guess there are quite some users who would like this feature.

@Sfonxs
Copy link

Sfonxs commented Feb 17, 2025

Is there anything preventing this from getting merged? What needs to be done? This looks exactly like the thing we need! 👍

@shahovkit
Copy link

Do I understand correctly that in order to merge this fix, there must be 2 approvals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support replacing rather than summing gauges