aboutsummaryrefslogtreecommitdiff
path: root/cachegrind/docs/cg-manual.xml
diff options
context:
space:
mode:
Diffstat (limited to 'cachegrind/docs/cg-manual.xml')
-rw-r--r--cachegrind/docs/cg-manual.xml130
1 files changed, 58 insertions, 72 deletions
diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml
index 80f2a8c28..9b4b0b9a2 100644
--- a/cachegrind/docs/cg-manual.xml
+++ b/cachegrind/docs/cg-manual.xml
@@ -59,9 +59,6 @@ and test coverage.</para>
additionally specify <computeroutput>--branch-sim=yes</computeroutput>
on the command line.</para>
-<para>Any feedback, bug-fixes, suggestions, etc, welcome.</para>
-
-
<sect2 id="cg-manual.overview" xreflabel="Overview">
<title>Overview</title>
@@ -119,7 +116,7 @@ outputs of multiple Cachegrind runs, into a single file which you then
use as the input for
<computeroutput>cg_annotate</computeroutput>.</para>
-<para>The steps are described in detail in the following
+<para>These steps are described in detail in the following
sections.</para>
</sect2>
@@ -128,14 +125,14 @@ sections.</para>
<sect2 id="cache-sim" xreflabel="Cache simulation specifics">
<title>Cache simulation specifics</title>
-<para>Cachegrind uses a simulation for a machine with a split L1
-cache and a unified L2 cache. This configuration is used for all
-(modern) x86-based machines we are aware of. Old Cyrix CPUs had
-a unified I and D L1 cache, but they are ancient history
-now.</para>
+<para>Cachegrind simulates a machine with independent
+first level instruction and data caches (I1 and D1), backed by a
+unified second level cache (L2). This configuration is used by almost
+all modern machines. Some old Cyrix CPUs had a unified I and D L1
+cache, but they are ancient history now.</para>
-<para>The more specific characteristics of the simulation are as
-follows.</para>
+<para>Specific characteristics of the simulation are as
+follows:</para>
<itemizedlist>
@@ -162,9 +159,9 @@ follows.</para>
<listitem>
<para>Inclusive L2 cache: the L2 cache replicates all the
entries of the L1 cache. This is standard on Pentium chips,
- but AMD Athlons use an exclusive L2 cache that only holds
- blocks evicted from L1. Ditto AMD Durons and most modern
- VIAs.</para>
+ but AMD Opterons, Athlons and Durons
+ use an exclusive L2 cache that only holds
+ blocks evicted from L1. Ditto most modern VIA CPUs.</para>
</listitem>
</itemizedlist>
@@ -182,6 +179,14 @@ happens. You can manually specify one, two or all three levels
<computeroutput>--D1</computeroutput> and
<computeroutput>--L2</computeroutput> options.</para>
+<para>On PowerPC platforms
+Cachegrind cannot automatically
+determine the cache configuration, so you will
+need to specify it with the
+<computeroutput>--I1</computeroutput>,
+<computeroutput>--D1</computeroutput> and
+<computeroutput>--L2</computeroutput> options.</para>
+
<para>Other noteworthy behaviour:</para>
@@ -385,9 +390,11 @@ programs that spawn child processes.</para>
<title>Cachegrind options</title>
<!-- start of xi:include in the manpage -->
-<para id="cg.opts.para">Manually specifies the I1/D1/L2 cache
-configuration, where <varname>size</varname> and
-<varname>line_size</varname> are measured in bytes. The three items
+<para id="cg.opts.para">Using command line options, you can
+manually specify the I1/D1/L2 cache
+configuration to simulate. For each cache, you can specify the
+size, associativity and line size. The size and line size
+are measured in bytes. The three items
must be comma-separated, but with no spaces, eg:
<literallayout> valgrind --tool=cachegrind --I1=65535,2,64</literallayout>
@@ -551,7 +558,7 @@ Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
<para>Events recorded: event abbreviations are:</para>
<itemizedlist>
<listitem>
- <para><computeroutput>Ir </computeroutput>: I cache reads
+ <para><computeroutput>Ir</computeroutput>: I cache reads
(ie. instructions executed)</para>
</listitem>
<listitem>
@@ -563,7 +570,7 @@ Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
instruction read misses</para>
</listitem>
<listitem>
- <para><computeroutput>Dr </computeroutput>: D cache reads
+ <para><computeroutput>Dr</computeroutput>: D cache reads
(ie. memory reads)</para>
</listitem>
<listitem>
@@ -575,7 +582,7 @@ Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
read misses</para>
</listitem>
<listitem>
- <para><computeroutput>Dw </computeroutput>: D cache writes
+ <para><computeroutput>Dw</computeroutput>: D cache writes
(ie. memory writes)</para>
</listitem>
<listitem>
@@ -613,8 +620,8 @@ Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
</listitem>
<listitem>
- <para>Events shown: the events shown (a subset of events
- gathered). This can be adjusted with the
+ <para>Events shown: the events shown, which is a subset of the events
+ gathered. This can be adjusted with the
<computeroutput>--show</computeroutput> option.</para>
</listitem>
@@ -637,8 +644,8 @@ Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
<listitem>
<para>Threshold: <computeroutput>cg_annotate</computeroutput>
- by default omits functions that cause very low numbers of
- misses to avoid drowning you in information. In this case,
+ by default omits functions that cause very low counts
+ to avoid drowning you in information. In this case,
cg_annotate shows summaries the functions that account for
99% of the <computeroutput>Ir</computeroutput> counts;
<computeroutput>Ir</computeroutput> is chosen as the
@@ -682,28 +689,9 @@ unloading of shared objects) its counts are aggregated into a
single cost centre written as
<computeroutput>(discarded):(discarded)</computeroutput>.</para>
-<para>It is worth noting that functions will come from three
-types of source files:</para>
-
-<orderedlist>
- <listitem>
- <para>From the profiled program
- (<filename>concord.c</filename> in this example).</para>
- </listitem>
- <listitem>
- <para>From libraries (eg. <filename>getc.c</filename>)</para>
- </listitem>
- <listitem>
- <para>From Valgrind's implementation of some libc functions
- (eg. <computeroutput>vg_clientmalloc.c:malloc</computeroutput>).
- These are recognisable because the filename begins with
- <computeroutput>vg_</computeroutput>, and is probably one of
- <filename>vg_main.c</filename>,
- <filename>vg_clientmalloc.c</filename> or
- <filename>vg_mylibc.c</filename>.</para>
- </listitem>
-
-</orderedlist>
+<para>It is worth noting that functions will come both from
+the profiled program (eg. <filename>concord.c</filename>)
+and from libraries (eg. <filename>getc.c</filename>)</para>
<para>There are two ways to annotate source files -- by choosing
them manually, or with the
@@ -759,7 +747,7 @@ found in one of the directories specified with the
and file are both given.</para>
<para>Each line is annotated with its event counts. Events not
-applicable for a line are represented by a `.'; this is useful
+applicable for a line are represented by a dot. This is useful
for distinguishing between an event which cannot happen, and one
which can but did not.</para>
@@ -1063,7 +1051,7 @@ warnings.</para>
<listitem>
<para>Files with more than 65,535 lines cause difficulties
- for the stabs debug info reader. This is because the line
+ for the Stabs-format debug info reader. This is because the line
number in the <computeroutput>struct nlist</computeroutput>
defined in <filename>a.out.h</filename> under Linux is only a
16-bit value. Valgrind can handle some files with more than
@@ -1071,6 +1059,11 @@ warnings.</para>
line number overflows. But some cases are beyond it, in
which case you'll get a warning message explaining that
annotations for the file might be incorrect.</para>
+
+ <para>If you are using gcc 3.1 or later, this is most likely
+ irrelevant, since gcc switched to using the more modern DWARF2
+ format by default at version 3.1. DWARF2 does not have any such
+ limitations on line numbers.</para>
</listitem>
<listitem>
@@ -1087,14 +1080,6 @@ warnings.</para>
<para>This list looks long, but these cases should be fairly
rare.</para>
-<formalpara>
- <title>Note:</title>
- <para><computeroutput>stabs</computeroutput> is not an easy
- format to read. If you come across bizarre annotations that
- look like might be caused by a bug in the stabs reader, please
- let us know.</para>
-</formalpara>
-
</sect2>
@@ -1112,16 +1097,17 @@ shortcomings:</para>
</listitem>
<listitem>
- <para>It doesn't account for other process activity (although
- this is probably desirable when considering a single
- program).</para>
+ <para>It doesn't account for other process activity.
+ This is probably desirable when considering a single
+ program.</para>
</listitem>
<listitem>
<para>It doesn't account for virtual-to-physical address
- mappings; hence the entire simulation is not a true
+ mappings. Hence the simulation is not a true
representation of what's happening in the
- cache.</para>
+ cache. Most caches are physically indexed, but Cachegrind
+ simulates caches using virtual addresses.</para>
</listitem>
<listitem>
@@ -1157,17 +1143,17 @@ shortcomings:</para>
</itemizedlist>
-<para>Another thing worth nothing is that results are very
-sensitive. Changing the size of the
-the executable being profiled, or the size of the the shared objects
-it uses, or even the length of its name can perturb the
-results. Variations will be small, but don't expect perfectly
-repeatable results if your program changes at all.</para>
-
-<para>Beware also of address space randomisation, which many Linux
-distros now do by default. This loads the program and its libraries
-at different randomly chosen address each run, and may also disturb
-the results.</para>
+<para>Another thing worth noting is that results are very sensitive.
+Changing the size of the the executable being profiled, or the sizes
+of any of the shared libraries it uses, or even the length of their
+file names, can perturb the results. Variations will be small, but
+don't expect perfectly repeatable results if your program changes at
+all.</para>
+
+<para>More recent GNU/Linux distributions do address space
+randomisation, in which identical runs of the same program have their
+shared libraries loaded at different locations, as a security measure.
+This also perturbs the results.</para>
<para>While these factors mean you shouldn't trust the results to
be super-accurate, hopefully they should be close enough to be