Skip to content

Conversation

agithomas
Copy link
Contributor

@agithomas agithomas commented Sep 16, 2025

Enhancement

Proposed commit message

Add Kafka dashboards for the newly added datasets - jvm, log_manager, raft, replicamanager, topic, controller, network datasets. Link the existing dashboards to the newly added dashboards

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • Package upgrade testing
  • Dashboard loading check

How to test this PR locally

  • elastic-package build && elastic-package stack up -v -d --services package-registry

Related issues

#15243

Screenshots

Existing dashboards updated (with link panel)

Overview

metricbeat_kafka_dashboard

Logs

filebeat-kafka-logs-overview

Newly added dashboards

Controller

metricbeat-kafka-controller

JVM

metricbeat-kafka-jvm

Log manager

metricbeat-kafka-log_manger

Network

metricbeat-kafka-network

Raft

metricbeat-kafka-raft

Replica manager

metricbeat-kafka-replica_manager

Topic

metricbeat-kafka-topic

@agithomas agithomas added Integration:kafka Kafka dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Sep 16, 2025
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@agithomas agithomas marked this pull request as ready for review September 16, 2025 04:54
@agithomas agithomas requested a review from a team as a code owner September 16, 2025 04:54
@shmsr
Copy link
Member

shmsr commented Sep 17, 2025

Overall the PR looks good to me.

Few comments:

  • In "Logs" overview; in the logs, we can filter out "INFO", "DEBUG" by default as the user is more interested in"WARN", "ERROR", etc. I think it is a good default to have.
  • nitpick: Some chart titles have "Sentence case" and some have "Title Case". Should be consistent.
  • In metricbeat-kafka-raft page, clarify "Current state" as the state is just listed as "follower." It would be helpful to have a panel that lists all brokers and their current Raft state (e.g., leader, follower, candidate). But currently, it just says current state but the state is for which broker and all is not there.

@agithomas
Copy link
Contributor Author

agithomas commented Sep 17, 2025

In metricbeat-kafka-raft page, clarify "Current state" as the state is just listed as "follower." It would be helpful to have a panel that lists all brokers and their current Raft state (e.g., leader, follower, candidate). But currently, it just says current state but the state is for which broker and all is not there.

This gives (ONLY) the status of the node that is connected to. This does not give the status of all the nodes and its type (leader, follower, etc)

Some chart titles have "Sentence case" and some have "Title Case". Should be consiste

Addressed for the newly added dashboards. For the dashboards that were existing already, this will be taken up in the upcoming PR

In "Logs" overview; in the logs, we can filter out "INFO", "DEBUG" by default as the user is more interested in"WARN", "ERROR", etc. I think it is a good default to have.

This is an existing dashboard. Improvements will be taken up in the upcoming enhancement.

@agithomas
Copy link
Contributor Author

/test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "stracktrace" - the first widget ("number of stracktraces by class"). Also, we should spell this at "stack trace" (two words) in all the widgets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs dashboard was an existing dashboard. I have made the corrections as per your inputs.

Copy link

@daniela-elastic daniela-elastic Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The "JVM name" widget is too small to hold the entire name. We should use a widget that is big enough to store the longest JVM name + a buffer
  2. "JVM version" widget seems like it's showing the version in some scientific notation. Are we supposed to see "+9" in the number?
  3. "JVM Vendor" widget - is this going to be big enough to hold any vendor name length?
  4. "Uptime" says "a month" - is there a number missing?
  5. "Thread count" widget - we shouldn't need the two zeros after the decimal point. Presumably thread count is integers, there are no thread fractions AFAIK
  6. Heap usage, especially % should be one of the top widgets, given how important memory is. We should of course keep the memory section but do show the most important memory widget also at the top of this dashboard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "JVM name" widget is too small to hold the entire name. We should use a widget that is big enough to store the longest JVM name + a buffer

Updated.

"JVM version" widget seems like it's showing the version in some scientific notation. Are we supposed to see "+9" in the number?

The version number do appear in this format.

"JVM Vendor" widget - is this going to be big enough to hold any vendor name length?

Updated to give a larger width

"Uptime" says "a month" - is there a number missing?

Updated from friendly -> accurate option.

"Thread count" widget - we shouldn't need the two zeros after the decimal point. Presumably thread count is integers, there are no thread fractions AFAIK

This is because of the user of compact options. For large value of Thread count, having a decimal value become relevant. But, i agree that it is irrelevant if the value is small (lesser than 100). Presently, we don't have a way to limit this.

Heap usage, especially % should be one of the top widgets, given how important memory is. We should of course keep the memory section but do show the most important memory widget also at the top of this dashboard

Updated as suggested.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Shouldn't "Logs" and "Log manager" dashboard tabs be right next to each other, rather than have "Raft" tab between them?
  2. What does the "Log recovery status" widget show? Count of what? Is it possible to name the y-axis with what is actually counted?
  3. "Log directory status" = 0. What does it mean when the status is 0? Is this is a good or a bad thing? Does everybody know what 0 means when it comes to log directory status?
  4. What is a "Dead cleaner threads"? Is that "dead letter queue"?
  5. We should use US spelling - eg "utilization" instead of "utilisation", "behavior" instead of "behaviour", etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "Log recovery status" widget show? Count of what? Is it possible to name the y-axis with what is actually counted?

There are two count values here - Remaining logs to recover , Remaining segments to recover. As you can see in the image below, it is represented as two series colours at the bottom of the panel.

image

"Log directory status" = 0. What does it mean when the status is 0? Is this is a good or a bad thing? Does everybody know what 0 means when it comes to log directory status?

Corrected by changing from a table panel to a search view panel

What is a "Dead cleaner threads"? Is that "dead letter queue"?

Updated as Dead log cleaner threads. These metrics are part of log cleaner metrics

We should use US spelling - eg "utilization" instead of "utilisation", "behavior" instead of "behaviour", etc

Updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fix spelling to use US spelling (eg "utilization" instead of "utilisation", etc
  2. What re the units of "Temporary memory" and "Request size distribution" widgets for the y-axis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix spelling to use US spelling (eg "utilization" instead of "utilisation", etc

Updated

What re the units of "Temporary memory" and "Request size distribution" widgets for the y-axis?

It is bytes. It is correctly configured to appear in the Y-axis, next to the Y-axis value (B in the screenshot)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "Poll idle ratio" - is this Poll-to-Idle ratio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio of time the Raft IO thread is idle as opposed to doing work.

I have update the panel title as - I/O Thread Idle Ratio , to avoid confusions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In the description of the dashboard as well as the first section named "ISR Changes" it might be worth spelling out once (in both places) what ISR means, eg "In-Sync Replicas (ISRs)". In the description of the dashboard - on first mention of the acronym. (optionally) In the naming of the section - "In-Sync Replicas (ISRs) Changes"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested - at the first mention of the acronym and the section title.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the units for the "At minimum ISR status per topic" and "Under minimum ISR status per topic" widgets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the partition count. Updated the y-axis as Partition count.

The title of the panel is updated as - Partitions below minimum ISR per topic for better clarity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are we missing a description widget for the "Overview" dashboard?
  2. We normally have more colors in the Overview dashboard especially for the most important metrics at the top. Shall we bring back some colors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested

Copy link

@daniela-elastic daniela-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments and questions. Please let me know if you have any questions

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @agithomas

Copy link

@agithomas
Copy link
Contributor Author

Left a few comments and questions. Please let me know if you have any questions

All the comments have been addressed.

The placement of the "Logs" menu option is kept in a way so that it is not missed between the Metrics based dashboards. We have two more dashboard panels, for producer and consumers, to be added to the list of available dashboards. I suggest, the menu re-arrangement is taken up as part of this upcoming PR, when we have all the dashboards available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:kafka Kafka Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants