-
Notifications
You must be signed in to change notification settings - Fork 490
[Kafka] Add new dashboards and link the newly added dashboards #15328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Kafka] Add new dashboards and link the newly added dashboards #15328
Conversation
…troller network datasets
🚀 Benchmarks reportTo see the full report comment with |
Overall the PR looks good to me. Few comments:
|
This gives (ONLY) the status of the node that is connected to. This does not give the status of all the nodes and its type (leader, follower, etc)
Addressed for the newly added dashboards. For the dashboards that were existing already, this will be taken up in the upcoming PR
This is an existing dashboard. Improvements will be taken up in the upcoming enhancement. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "stracktrace" - the first widget ("number of stracktraces by class"). Also, we should spell this at "stack trace" (two words) in all the widgets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs dashboard was an existing dashboard. I have made the corrections as per your inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The "JVM name" widget is too small to hold the entire name. We should use a widget that is big enough to store the longest JVM name + a buffer
- "JVM version" widget seems like it's showing the version in some scientific notation. Are we supposed to see "+9" in the number?
- "JVM Vendor" widget - is this going to be big enough to hold any vendor name length?
- "Uptime" says "a month" - is there a number missing?
- "Thread count" widget - we shouldn't need the two zeros after the decimal point. Presumably thread count is integers, there are no thread fractions AFAIK
- Heap usage, especially % should be one of the top widgets, given how important memory is. We should of course keep the memory section but do show the most important memory widget also at the top of this dashboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "JVM name" widget is too small to hold the entire name. We should use a widget that is big enough to store the longest JVM name + a buffer
Updated.
"JVM version" widget seems like it's showing the version in some scientific notation. Are we supposed to see "+9" in the number?
The version number do appear in this format.
"JVM Vendor" widget - is this going to be big enough to hold any vendor name length?
Updated to give a larger width
"Uptime" says "a month" - is there a number missing?
Updated from friendly -> accurate option.
"Thread count" widget - we shouldn't need the two zeros after the decimal point. Presumably thread count is integers, there are no thread fractions AFAIK
This is because of the user of compact options. For large value of Thread count, having a decimal value become relevant. But, i agree that it is irrelevant if the value is small (lesser than 100). Presently, we don't have a way to limit this.
Heap usage, especially % should be one of the top widgets, given how important memory is. We should of course keep the memory section but do show the most important memory widget also at the top of this dashboard
Updated as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Shouldn't "Logs" and "Log manager" dashboard tabs be right next to each other, rather than have "Raft" tab between them?
- What does the "Log recovery status" widget show? Count of what? Is it possible to name the y-axis with what is actually counted?
- "Log directory status" = 0. What does it mean when the status is 0? Is this is a good or a bad thing? Does everybody know what 0 means when it comes to log directory status?
- What is a "Dead cleaner threads"? Is that "dead letter queue"?
- We should use US spelling - eg "utilization" instead of "utilisation", "behavior" instead of "behaviour", etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the "Log recovery status" widget show? Count of what? Is it possible to name the y-axis with what is actually counted?
There are two count values here - Remaining logs to recover , Remaining segments to recover. As you can see in the image below, it is represented as two series colours at the bottom of the panel.

"Log directory status" = 0. What does it mean when the status is 0? Is this is a good or a bad thing? Does everybody know what 0 means when it comes to log directory status?
Corrected by changing from a table panel to a search view panel
What is a "Dead cleaner threads"? Is that "dead letter queue"?
Updated as Dead log cleaner threads. These metrics are part of log cleaner metrics
We should use US spelling - eg "utilization" instead of "utilisation", "behavior" instead of "behaviour", etc
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fix spelling to use US spelling (eg "utilization" instead of "utilisation", etc
- What re the units of "Temporary memory" and "Request size distribution" widgets for the y-axis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix spelling to use US spelling (eg "utilization" instead of "utilisation", etc
Updated
What re the units of "Temporary memory" and "Request size distribution" widgets for the y-axis?
It is bytes. It is correctly configured to appear in the Y-axis, next to the Y-axis value (B
in the screenshot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "Poll idle ratio" - is this Poll-to-Idle ratio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ratio of time the Raft IO thread is idle as opposed to doing work.
I have update the panel title as - I/O Thread Idle Ratio , to avoid confusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In the description of the dashboard as well as the first section named "ISR Changes" it might be worth spelling out once (in both places) what ISR means, eg "In-Sync Replicas (ISRs)". In the description of the dashboard - on first mention of the acronym. (optionally) In the naming of the section - "In-Sync Replicas (ISRs) Changes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested - at the first mention of the acronym and the section title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the units for the "At minimum ISR status per topic" and "Under minimum ISR status per topic" widgets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the partition count. Updated the y-axis as Partition count.
The title of the panel is updated as - Partitions below minimum ISR per topic for better clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Are we missing a description widget for the "Overview" dashboard?
- We normally have more colors in the Overview dashboard especially for the most important metrics at the top. Shall we bring back some colors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments and questions. Please let me know if you have any questions
💚 Build Succeeded
History
cc @agithomas |
|
All the comments have been addressed. The placement of the "Logs" menu option is kept in a way so that it is not missed between the Metrics based dashboards. We have two more dashboard panels, for producer and consumers, to be added to the list of available dashboards. I suggest, the menu re-arrangement is taken up as part of this upcoming PR, when we have all the dashboards available. |
Enhancement
Proposed commit message
Add Kafka dashboards for the newly added datasets - jvm, log_manager, raft, replicamanager, topic, controller, network datasets. Link the existing dashboards to the newly added dashboards
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
elastic-package build && elastic-package stack up -v -d --services package-registry
Related issues
#15243
Screenshots
Existing dashboards updated (with link panel)
Overview
Logs
Newly added dashboards
Controller
JVM
Log manager
Network
Raft
Replica manager
Topic