Skip to content

Commit d868bdb

Browse files
committed
Further improvements to PStats documentation
1 parent a0c379c commit d868bdb

File tree

5 files changed

+127
-92
lines changed

5 files changed

+127
-92
lines changed

optimization/basic-performance-diagnostics.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ Basic Performance Diagnostics
66
Introductory Performance Diagnostics
77
------------------------------------
88

9-
In Panda3D, the "big gun" of performance analysis is called pstats. This program
10-
gives you real-time diagnostic analysis of a running Panda3D virtual world
11-
broken down into hundreds of different categories.
9+
In Panda3D, the "big gun" of performance analysis is called
10+
:doc:`PStats <pstats/index>`. This program gives you real-time diagnostic
11+
analysis of a running Panda3D virtual world broken down into hundreds of
12+
different categories.
1213

1314
But sometimes, when you've just encountered a problem, you don't want that much
1415
information. Sometimes, you just want a simple question answered, like "how many

optimization/pstats/basic-profiling.rst

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,10 @@ the profiling machine:
125125
.. code-block:: text
126126
127127
pstats-host profiling-machine-ip-or-hostname
128+
pstats-port 5185
129+
130+
By default, the port number used by PStats is 5185. It can be changed by using
131+
the ``pstats-port`` variable, as shown above.
128132

129133
Session Files
130134
-------------
@@ -137,17 +141,20 @@ Use the "Save Session" menu item to store the recorded data in a session file.
137141
At any point, you can launch the PStats server (without a connected client) and
138142
use "Open Session" to review the recorded data.
139143

140-
If you close the PStats Server by accident without saving the session file to
141-
disk, you can start PStats and use the "Open Last Session" menu option to
142-
restore this data.
144+
The PStats Server also automatically stores the last recorded session to a file
145+
in the Panda3D cache directory called ``last-session.pstats``. If you close the
146+
PStats Server by accident without saving the session file to disk, you can start
147+
PStats and use the "Open Last Session" menu option to restore this data.
143148

144149
Exporting to JSON
145150
-----------------
146151

147152
To export the timing information to a format that can be read by other
148153
applications, the "Export to JSON" menu option can be used. The format of this
149-
file is the Trace Event Format. This can be read by a variety of tools,
150-
including the Chrome Tracing tool and the online Perfetto application.
154+
file is the `Trace Event Format <https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview>`__.
155+
This can be read by a variety of tools, including the
156+
`Chrome Tracing tool <about:tracing>`__ and the online
157+
`Perfetto <https://ui.perfetto.dev/>`__ application.
151158

152159
It is also possible to use this feature if no graphical PStats server is
153160
available. To do this, use the ``text-stats`` utility like so::

optimization/pstats/customization.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,24 @@ This will cause the server to remember the current layout of the graph windows.
1717
The next time you start a new session and connect a client, all of the
1818
previously saved graph windows will be reopened.
1919

20+
Guide Bars
21+
----------
22+
23+
The running Panda client suggests its target frame rate, as well as the initial
24+
vertical scale of the strip chart (that is, the height of the colored bars).
25+
You can change the scale freely by clicking within the graph itself and dragging
26+
the mouse up or down as necessary. One of the horizontal guide bars is drawn in
27+
a lighter shade of gray; this one represents the actual target frame rate
28+
suggested by the client. The other, darker, guide bars are drawn automatically
29+
at harmonic subdivisions of the target frame rate. You can change the target
30+
frame rate with the Config.prc variable pstats-target-frame-rate on the client.
31+
32+
You can also create any number of user-defined guide bars by dragging them into
33+
the graph from the gray space immediately above or below the graph. These are
34+
drawn in a dashed blue line. It is sometimes useful to place one of these to
35+
mark a performance level so it may be compared to future values (or to alternate
36+
configurations).
37+
2038
Collector Colors
2139
----------------
2240

optimization/pstats/graph-types.rst

Lines changed: 45 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ Graph Types
55

66
The PStats server offers a range of different graphs, giving different views of
77
the data being sent from the client. The graph windows can be opened from the
8-
Graphs pull-down menu.
8+
Graphs pull-down menu, but they can also be opened by right-clicking a
9+
particular collector in a chart.
910

1011
.. contents::
1112
:local:
@@ -25,58 +26,43 @@ represents the total amount of time spent on each frame; within the frame, the
2526
time is further divided into the primary subdivisions represented by different
2627
color bands (and labeled on the left). These subdivisions are called
2728
"collectors" in the PStats terminology, since they represent time collected by
28-
different tasks.
29-
30-
Normally, the three primary collectors are App, Cull, and Draw, the three stages
31-
of the graphics pipeline. Atop these three colored collectors is the label
32-
"Frame", which represents any remaining time spent in the frame that was not
33-
specifically allocated to one of the three child collectors (normally, there
34-
should not be significant time reported here).
35-
36-
The frame time in milliseconds, averaged over the past three seconds, is drawn
37-
above the upper right corner of the graph. The labels on the guide bars on the
38-
right are also shown in milliseconds; if you prefer to think about a target
39-
frame rate rather than an elapsed time in milliseconds, you may find it useful
40-
to select "Hz" from the Units pulldown menu, which changes the time units
41-
accordingly.
42-
43-
The running Panda client suggests its target frame rate, as well as the initial
44-
vertical scale of the graph (that is, the height of the colored bars). You can
45-
change the scale freely by clicking within the graph itself and dragging the
46-
mouse up or down as necessary. One of the horizontal guide bars is drawn in a
47-
lighter shade of gray; this one represents the actual target frame rate
48-
suggested by the client. The other, darker, guide bars are drawn automatically
49-
at harmonic subdivisions of the target frame rate. You can change the target
50-
frame rate with the Config.prc variable pstats-target-frame-rate on the client.
51-
52-
You can also create any number of user-defined guide bars by dragging them into
53-
the graph from the gray space immediately above or below the graph. These are
54-
drawn in a dashed blue line. It is sometimes useful to place one of these to
55-
mark a performance level so it may be compared to future values (or to alternate
56-
configurations).
57-
58-
The primary collectors labeled on the left might themselves be further
59-
subdivided, if the data is provided by the client. For instance, App is often
60-
divided into Show Code, Animation, and Collisions, where Show Code is the time
29+
different tasks. The top-most label indicates the collector that is currently
30+
being viewed, and the labels below it indicate its subdivisions.
31+
32+
Normally, the primary collector is called "Frame", representing the total amount
33+
of time spent rendering a particular frame. This is subdivided into App, Cull,
34+
and Draw, the three stages of the graphics pipeline, a Wait collector for time
35+
spent waiting on other threads or VSync, and a further \* collector for
36+
operations that may occur across multiple stages. Any remaining time not
37+
specifically allocated to one of those child collectors is assigned to the
38+
parent "Frame" collector (normally, there should not be significant time
39+
reported here).
40+
41+
All of these categories contain further subdivisions, which themselves may be
42+
subdivided further, if this data is provided by the client. For instance, App is
43+
often divided into Tasks, Animation, and Collisions, where Tasks is the time
6144
spent executing any Python code, Animation is the time used to compute any
6245
animated characters, and Collisions is the time spent in the collision
63-
traverser(s).
46+
traverser(s), etc.
6447

6548
To see any of these further breakdowns, double-click on the corresponding
6649
colored label (or on the colored band within the graph itself). This narrows the
67-
focus of the strip chart from the overall frame to just the selected collector,
68-
which has two advantages. Firstly, it may be easier to observe the behavior of
69-
one particular collector when it is drawn alone (as opposed to being stacked on
70-
top of some other color bars), and the time in the upper-right corner will now
71-
reflect just the total time spent within just this collector. Secondly, if there
72-
are further breakdowns to this collector, they will now be shown as further
73-
colored bars. As in the Frame chart, the topmost label is the name of the parent
74-
collector, and any time shown in this color represents time allocated to the
75-
parent collector that is not accounted for by any of the child collectors.
76-
77-
You can further drill down by double-clicking on any of the new labels; or
78-
double-click on the top label, or the white part of the graph, to return back up
79-
to the previous level. Right-clicking a label will provide further options.
50+
focus of the strip chart from the overall frame to just the selected collector.
51+
Not only does it make it easier to observe the behavior of that particular
52+
collector since it is drawn alone (as opposed to being stacked on top of some
53+
other color bars), but if there are further breakdowns to this collector, they
54+
will now be shown as further colored bars. As in the Frame chart, the topmost
55+
label is the name of the currently focused collector, and any time shown in this
56+
color represents time allocated to the current collector that is not accounted
57+
for by any of the child collectors. To return to the parent level, simply
58+
double-click this top-most collector.
59+
60+
The time spent in the currently focused collector, averaged over the past three
61+
seconds, is drawn above the upper right corner of the graph. By default, this is
62+
shown in milliseconds, which is a better metric than a target frame rate, but
63+
the unit can be changed from the Units pulldown menu if desirable. Some
64+
collectors will additionally show a number indicating how often they were
65+
started in the latest frame.
8066

8167
Value-based Strip Charts
8268
------------------------
@@ -106,7 +92,7 @@ The way the bars are stacked indicates how the collectors are nested. Let's say
10692
that Panda3D performs a Cull pass for display region A and B separately. The
10793
Strip Chart view would just tell you the total Cull time in the frame, which
10894
doesn't tell you which scene you need to optimize. The Flame Graph view on the
109-
other hand, will show two separate Cull bars, one stacked above the bar for
95+
other hand will show two separate Cull bars, one stacked above the bar for
11096
display region A, and the other stacked above the bar for display region B.
11197

11298
You can double-click on any bar to focus in to that particular collector and
@@ -139,11 +125,16 @@ clock, the GPU and CPU threads may not be perfectly aligned.
139125
There are several ways to navigate through the timeline. By double-clicking a
140126
particular bar, the view will zoom to fit that bar. You can also use the WASD
141127
keys to navigate, or the scroll wheel of the mouse while holding the control key
142-
on the keyboard.
128+
on the keyboard. If the timeline takes up so much vertical space that it runs
129+
off the edge of the chart, you can use the scroll wheel of the mouse *without*
130+
holding the control key to bring everything into view.
143131

144132
Please note that PStats discards data older than 60 seconds by default. To be
145133
able to see the entire timeline, you need to change the ``pstats-history``
146-
configuration variable.
134+
configuration variable (eg. you could set it to ``inf`` to never discard data).
135+
Furthermore, it is possible to see dropped frames if the frame rate is too high
136+
or if the send queue is full. If you wish to see all frames, increase the
137+
``pstats-max-rate`` and ``pstats-max-queue-size`` variables.
147138

148139
The Piano Roll
149140
--------------
@@ -164,7 +155,9 @@ reads from left to right.)
164155
Unlike a strip chart, a piano roll chart does not show trends; the chart shows
165156
only the current frame's data. The horizontal axis shows time within the frame,
166157
and the individual collectors are stacked up in an arbitrary ordering along the
167-
vertical axis.
158+
vertical axis. It is possible that there are so many collectors that they run
159+
off the edge of the window; in this case, use the scroll wheel on a mouse to
160+
scroll through the label stack on the left side.
168161

169162
The time spent within the frame is drawn from left to right; at any given time,
170163
the collector(s) that are active will be drawn with a horizontal bar. You can

optimization/pstats/internals.rst

Lines changed: 48 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ The PStats Client
1010
-----------------
1111

1212
The client code is in panda/src/pstatclient, and is available to run in every
13-
Panda client unless it is compiled out. (It will be compiled out if OPTIMIZE is
14-
set to level 4, unless DO_PSTATS is also explicitly set to non-empty.)
13+
Panda client unless it is compiled out. (It will be compiled out when building
14+
for Release in CMake or when passing ``--optimize 4`` to makepanda, unless
15+
DO_PSTATS is also explicitly set to non-empty.)
1516

1617
The client code is designed for minimal runtime overhead when it is compiled in
1718
but not enabled (that is, when the client is not in contact with a PStats
@@ -20,39 +21,51 @@ PStats server). It is also designed for zero runtime overhead when it is
2021
compiled out.
2122

2223
There is one global :class:`.PStatClient` class object, which manages all of the
23-
communications on the client side. Each PStatCollector is simply an index into
24-
an array stored within the PStatClient object, although the interface is
25-
intended to hide this detail from the programmer.
24+
communications on the client side. Each :class:`.PStatCollector` is simply an
25+
index into an array stored within the :class:`.PStatClient` object, although the
26+
interface is intended to hide this detail from the programmer.
2627

27-
Initially, before the PStatClient has established a connection, calls to start()
28-
and stop() simply return immediately.
28+
Initially, before the :class:`.PStatClient` has established a connection, calls
29+
to :meth:`~.PStatCollector.start()` and :meth:`~.PStatCollector.stop()` simply
30+
return immediately.
2931

3032
When you call :meth:`.PStatClient.connect()`, the client attempts to contact the
31-
PStatServer via a TCP connection to the hostname and port named in the pstats-
32-
host and pstats-port Config.prc variables, respectively. (The default hostname
33-
and port are localhost and 5185.) You can also pass in a specific hostname
34-
and/or port to the connect() call. Upon successful connection and handshake with
35-
the server, the PStatClient sends a list of the available collectors, along with
36-
their names, colors, and hierarchical relationships, on the TCP channel.
37-
38-
Once connected, each call to start() and stop() adds a collector number and
39-
timestamp to an array maintained by the PStatClient. At the end of each frame,
40-
the PStatClient boils this array into a datagram for shipping to the server.
41-
Each start() and stop() event requires 6 bytes; if the resulting datagram will
42-
fit within a UDP packet (1K bytes, or about 84 start/stop pairs), it is sent
43-
via UDP; otherwise, it is sent on the TCP channel. (Some fraction of the
44-
packets that are eligible for UDP, from 0% to 100%, may be sent via TCP
45-
instead; you can specify this with the pstats-tcp-ratio Config.prc variable.)
33+
PStatServer via a TCP connection to the hostname and port named in the
34+
pstats-host and pstats-port Config.prc variables, respectively. (The default
35+
hostname and port are localhost and 5185.) You can also pass in a specific
36+
hostname and/or port to the :meth:`~.PStatClient.connect()` call. Upon
37+
successful connection and handshake with the server, the :class:`.PStatClient`
38+
sends a list of the available collectors, along with their names, colors, and
39+
hierarchical relationships, on the TCP channel.
40+
41+
Once connected, each call to :meth:`~.PStatCollector.start()` and
42+
:meth:`~.PStatCollector.stop()` adds a collector number and timestamp to an
43+
array maintained by the PStatClient. At the end of each frame, the PStatClient
44+
boils this array into a datagram for shipping to the server.
45+
Each :meth:`~.PStatCollector.start()` and :meth:`~.PStatCollector.stop()` event
46+
requires 6 bytes; if the resulting datagram will fit within a UDP packet (1K
47+
bytes, or about 84 start/stop pairs), it is sent via UDP; otherwise, it is sent
48+
on the TCP channel. (Some fraction of the packets that are eligible for UDP,
49+
from 0% to 100%, may be sent via TCP instead; you can specify this with the
50+
``pstats-tcp-ratio`` Config.prc variable.)
4651

4752
Also, to prevent flooding the network and/or overwhelming the PStats server,
4853
only so many frames of data will be sent per second. This parameter is
49-
controlled by the pstats-max-rate Config.prc variable and is set to 30 by
54+
controlled by the ``pstats-max-rate`` Config.prc variable and is set to 30 by
5055
default. (If the packets are larger than 1K, the max transmission rate is also
5156
automatically reduced further in proportion.) If the frame rate is higher than
5257
this limit, some frames will simply not be transmitted. The server is designed
5358
to cope with missing frames and will assume missing frames are similar to their
5459
neighbors.
5560

61+
Finally, to prevent an excessive backlog building up if there is too much data
62+
for the transmission to handle, Panda3D will only queue up a certain number of
63+
frames of data at a time. This is determined by the value of the
64+
``pstats-max-queue-size`` variable. If the backlog of frames to send is greater
65+
than this value, subsequent frames are dropped. Note that each thread sends its
66+
own frame, so you need to make sure this value is at least as large to
67+
accommodate all the threads sending data at once.
68+
5669
The server does all the work of analyzing the data after that. The client's next
5770
job is simply to clear its array and prepare itself for the next frame.
5871

@@ -64,7 +77,9 @@ server code is in pandatool/src/gtk-stats and pandatool/src/win-stats, for Unix
6477
and Windows, respectively. (There is also an OS-independent text-stats
6578
subdirectory, which builds a trivial PStats server that presents a scrolling-
6679
text interface. This is mainly useful as a proof of technology rather than as a
67-
usable tool.)
80+
usable tool, but it does have an option to output the data in JSON format so
81+
that it can be analyzed by a third-party application, if the PStats server is
82+
not available for this platform.)
6883

6984
The GUI-specific code is the part that manages the interaction with the user via
7085
the creation of windows and the handling of mouse input, etc.; most of the real
@@ -74,24 +89,25 @@ directory.
7489
The PStatServer owns all of the connections, and uses network sockets to
7590
communicate with the clients. It listens on the specified port for new
7691
connections, using the pstats-port Config.prc variable to determine the port
77-
number (this is the same variable that specifies the port to the client).
92+
number (this is the same variable that specifies the port to the client),
93+
although this can be overridden by using the ``-p`` option on the command-line.
7894
Usually you can leave this at its default value of 5185, but there may be some
7995
cases in which that port is already in use on a particular machine (for
8096
instance, maybe someone else is running another PStats server on another display
8197
of the same machine).
8298

8399
Once a connection is received, it creates a PStatMonitor class (this class is
84100
specialized for each of the different GUI variants) that handles all the data
85-
for this particular connection. In the case of the windows pstats.exe program,
86-
each new monitor instance is represented by a new toplevel window. Multiple
87-
monitors can be active at once.
101+
for this particular connection. A PStatMonitor is also created when a session
102+
is loaded from a file.
88103

89104
The work of digesting the data from the client is performed by the PStatView
90105
class, which analyzes the pattern of start and stop timestamps, along with the
91106
relationship data of the various collectors, and boils it down into a list of
92107
the amount of time spent in each collector per frame.
93108

94-
Finally, a PStatStripChart or PStatPianoRoll class object defines the actual
95-
graph output of colored lines and bars; the generic versions of these include
96-
virtual functions to do the actual drawing (the GUI specializations of these
97-
redefine these methods to make the appropriate calls).
109+
Finally, a PStatStripChart, PStatFlameGraph, PStatTimeline or PStatPianoRoll
110+
class object defines the actual graph output of colored lines and bars; the
111+
generic versions of these include virtual functions to do the actual drawing
112+
(the GUI specializations of these redefine these methods to make the appropriate
113+
calls).

0 commit comments

Comments
 (0)