callgrind/docs/cl-manual.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837

<?xml version="1.0"?> <!-- -*- sgml -*- -->
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>

<chapter id="cl-manual" xreflabel="Callgrind Manual">
<title>Callgrind: a heavyweight profiler</title>


<sect1 id="cl-manual.use" xreflabel="Overview">
<title>Overview</title>

<para>Callgrind is a Valgrind tool for profiling programs
with the ability to construct a call graph from the execution.
By default, the collected data consists of
the number of instructions executed, their attribution
to source lines, and
call relationship among functions together with number of
actually executed calls.
Optionally, a cache simulator (similar to cachegrind) can produce
further information about the memory access behavior of the application.
</para>

<para>The profile data is written out to a file at program
termination. For presentation of the data, and interactive control
of the profiling, two command line tools are provided:</para>
<variablelist>
  <varlistentry>
  <term><command>callgrind_annotate</command></term>
  <listitem>
    <para>This command reads in the profile data, and prints a
    sorted lists of functions, optionally with source annotation.</para>
<!--
    <para>You can read the manpage here: <xref
	      linkend="callgrind-annotate"/>.</para>
-->
    <para>For graphical visualization of the data, check out
    <ulink url="&cl-gui;">KCachegrind</ulink>.</para>

  </listitem>
  </varlistentry>

  <varlistentry>
  <term><command>callgrind_control</command></term>
  <listitem>
    <para>This command enables you to interactively observe and control 
    the status of currently running applications, without stopping
    the application.  You can 
    get statistics information as well as the current stack trace, and
    you can request zeroing of counters or dumping of profile data.</para>
<!--
    <para>You can read the manpage here: <xref linkend="callgrind-control"/>.</para>
-->
  </listitem>
  </varlistentry>
</variablelist>

<para>To use Callgrind, you must specify 
<computeroutput>--tool=callgrind</computeroutput> on the Valgrind 
command line.</para>

  <sect2 id="cl-manual.functionality" xreflabel="Functionality">
  <title>Functionality</title>

<para>Cachegrind provides a flat profile: event counts (reads, misses etc.)
attributed to functions exactly represent events which happened while the
function itself was running, which also is called <emphasis>self</emphasis>
or <emphasis>exclusive</emphasis> cost. In addition, Callgrind further
attributes call sites inside functions with event counts for events which
happened while the call was active, ie. while code was executed which actually
was called from the given call site. Adding these call costs to the self cost of
a function gives the so called <emphasis>inclusive</emphasis> cost.
As an example, inclusive cost of <computeroutput>main()</computeroutput> should
be almost 100 percent (apart from any cost spent in startup before main, such as
initialization of the run time linker or construction of global C++ objects).
</para>

<para>Together with the call graph, this allows you to see the call chains starting
from <computeroutput>main()</computeroutput>, inside which most of the
events were happening. This especially is useful for functions called from
multiple call sites, and where any optimization makes sense only by changing
code in the caller (e.g. by reducing the call count).</para>

<para>Callgrind's cache simulation is based on the 
<ulink url="&cg-tool-url;">Cachegrind tool</ulink>. Read 
<ulink url="&cg-doc-url;">Cachegrind's documentation</ulink> first; 
this page describes the features supported in addition to 
Cachegrind's features.</para>

<para>Callgrinds ability to trace function call varies with the ISA of the
platform it is run on. Its usage was specially tailored for x86 and amd64,
and unfortunately, it currently happens to show quite bad call/return detection
in PPC32/64 code (this is because there are only jump/branch instructions
in the PPC ISA, and Callgrind has to rely on heuristics).</para>

  </sect2>

  <sect2 id="cl-manual.basics" xreflabel="Basic Usage">
  <title>Basic Usage</title>

  <para>As with Cachegrind, you probably want to compile with debugging info
  (the -g flag), but with optimization turned on.</para>

  <para>To start a profile run for a program, execute:
  <screen>callgrind [callgrind options] your-program [program options]</screen>
  </para>

  <para>While the simulation is running, you can observe execution with
  <screen>callgrind_control -b</screen>
  This will print out the current backtrace. To annotate the backtrace with
  event counts, run
  <screen>callgrind_control -e -b</screen>
  </para>

  <para>After program termination, a profile data file named 
  <computeroutput>callgrind.out.pid</computeroutput>
  is generated with <emphasis>pid</emphasis> being the process ID 
  of the execution of this profile run.
  The data file contains information about the calls made in the
  program among the functions executed, together with events of type
  <command>Instruction Read Accesses</command> (Ir).</para>

  <para>To generate a function-by-function summary from the profile
  data file, use
  <screen>callgrind_annotate [options] callgrind.out.pid</screen>
  This summary is similar to the output you get from a Cachegrind
  run with <computeroutput>cg_annotate</computeroutput>: the list
  of functions is ordered by exclusive cost of functions, which also
  are the ones that are shown.
  Important for the additional features of Callgrind are
  the following two options:</para>

  <itemizedlist>
    <listitem>
      <para><option>--inclusive=yes</option>: Instead of using
      exclusive cost of functions as sorting order, use and show
      inclusive cost.</para>
    </listitem>

    <listitem>
      <para><option>--tree=both</option>: Interleaved into the
      ordered list of function, show the callers and the callees
      of each function. In these lines, which represents executed
      calls, the cost gives the number of events spent in the call.
      Indented, above each given function, there is the list of callers,
      and below, the list of callees. The sum of events in calls to
      a given function (caller lines), as well as the sum of events in
      calls from the function (callee lines) together with the self
      cost, gives the total inclusive cost of the function.</para>
     </listitem>
  </itemizedlist>

  <para>Use <option>--auto=yes</option> to get annotated source code
  for all relevant functions for which the source can be found. In
  addition to source annotation as produced by
  <computeroutput>cg_annotate</computeroutput>, you will see the
  annotated call sites with call counts. For all other options, look
  up the manual for <computeroutput>cg_annotate</computeroutput>.
  </para>

  <para>For better call graph browsing experience, it is highly recommended
  to use <ulink url="&cl-gui;">KCachegrind</ulink>. If your code happens
  to spent relevant fractions of cost in <emphasis>cycles</emphasis> (sets
  of functions calling each other in a recursive manner), you have to
  use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput>
  currently does not do any cycle detection, which is important to get correct
  results in this case.</para>

  <para>If you are additionally interested in measuring the 
  cache behavior of your 
  program, use Callgrind with the option
  <option><xref linkend="opt.simulate-cache"/>=yes.</option>
  However, expect a  further slow down approximately by a factor of 2.</para>

  <para>If the program section you want to profile is somewhere in the
  middle of the run, it is beneficial to 
  <emphasis>fast forward</emphasis> to this section without any 
  profiling at all, and switch profiling on later.  This is achieved by using
  <option><xref linkend="opt.instr-atstart"/>=no</option> 
  and interactively use 
  <computeroutput>callgrind_control -i on</computeroutput> before the 
  interesting code section is about to be executed. To exactly specify
  the code position where profiling should start, use the client request
  <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput>.</para>

  <para>If you want to be able to see assembler annotation, specify
  <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce
  profile data at instruction granularity. Note that the resulting profile
  data
  can only be viewed with KCachegrind. For assembler annotation, it also is
  interesting to see more details of the control flow inside of functions,
  ie. (conditional) jumps. This will be collected by further specifying
  <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para>

  </sect2>

</sect1>

<sect1 id="cl-manual.usage" xreflabel="Advanced Usage">
<title>Advanced Usage</title>

  <sect2 id="cl-manual.dumps" 
         xreflabel="Multiple dumps from one program run">
  <title>Multiple profiling dumps from one program run</title>

  <para>Often, you are not interested in characteristics of a full 
  program run, but only of a small part of it (e.g. execution of one
  algorithm).  If there are multiple algorithms or one algorithm 
  running with different input data, it's even useful to get different
  profile information for multiple parts of one program run.</para>

  <para>Profile data files have names of the form
<screen>
callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
</screen>
  </para>
  <para>where <emphasis>pid</emphasis> is the PID of the running 
  program, <emphasis>part</emphasis> is a number incremented on each
  dump (".part" is skipped for the dump at program termination), and 
  <emphasis>threadID</emphasis> is a thread identification 
  ("-threadID" is only used if you request dumps of individual 
  threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para>

  <para>There are different ways to generate multiple profile dumps 
  while a program is running under Callgrind's supervision.  Nevertheless,
  all methods trigger the same action, which is "dump all profile 
  information since the last dump or program start, and zero cost 
  counters afterwards".  To allow for zeroing cost counters without
  dumping, there is a second action "zero all cost counters now". 
  The different methods are:</para>
  <itemizedlist>

    <listitem>
      <para><command>Dump on program termination.</command>
      This method is the standard way and doesn't need any special
      action from your side.</para>
    </listitem>

    <listitem>
      <para><command>Spontaneous, interactive dumping.</command> Use
      <screen>callgrind_control -d [hint [PID/Name]]</screen> to 
      request the dumping of profile information of the supervised
      application with PID or Name.  <emphasis>hint</emphasis> is an
      arbitrary string you can optionally specify to later be able to
      distinguish profile dumps.  The control program will not terminate
      before the dump is completely written.  Note that the application
      must be actively running for detection of the dump command. So,
      for a GUI application, resize the window or for a server send a
      request.</para>
      <para>If you are using <ulink url="&cl-gui;">KCachegrind</ulink>
      for browsing of profile information, you can use the toolbar
      button <command>Force dump</command>. This will request a dump
      and trigger a reload after the dump is written.</para>
    </listitem>

    <listitem>
      <para><command>Periodic dumping after execution of a specified
      number of basic blocks</command>. For this, use the command line
      option <option><xref linkend="opt.dump-every-bb"/>=count</option>.
      </para>
    </listitem>

    <listitem>
      <para><command>Dumping at enter/leave of all functions whose name
      starts with</command> <emphasis>funcprefix</emphasis>.  Use the
      option <option><xref linkend="opt.dump-before"/>=funcprefix</option>
      and <option><xref linkend="opt.dump-after"/>=funcprefix</option>.
      To zero cost counters before entering a function, use
      <option><xref linkend="opt.zero-before"/>=funcprefix</option>.
      The prefix method for specifying function names was choosen to
      ease the use with C++: you don't have to specify full
      signatures.</para> <para>You can specify these options multiple
      times for different function prefixes.</para>
    </listitem>

    <listitem>
      <para><command>Program controlled dumping.</command>
      Put <screen><![CDATA[#include <valgrind/callgrind.h>]]></screen>
      into your source and add 
      <computeroutput>CALLGRIND_DUMP_STATS;</computeroutput> when you
      want a dump to happen. Use 
      <computeroutput>CALLGRIND_ZERO_STATS;</computeroutput> to only 
      zero cost centers.</para>
      <para>In Valgrind terminology, this method is called "Client
      requests".  The given macros generate a special instruction
      pattern with no effect at all (i.e. a NOP). When run under
      Valgrind, the CPU simulation engine detects the special
      instruction pattern and triggers special actions like the ones
      described above.</para>
    </listitem>
  </itemizedlist>

  <para>If you are running a multi-threaded application and specify the
  command line option <option><xref linkend="opt.separate-threads"/>=yes</option>, 
  every thread will be profiled on its own and will create its own
  profile dump. Thus, the last two methods will only generate one dump
  of the currently running thread. With the other methods, you will get
  multiple dumps (one for each thread) on a dump request.</para>

  </sect2>


  <sect2 id="cl-manual.limits" 
         xreflabel="Limiting range of event collection">
  <title>Limiting the range of collected events</title>

  <para>For aggregating events (function enter/leave,
  instruction execution, memory access) into event numbers,
  first, the events must be recognizable by Callgrind, and second,
  the collection state must be switched on.</para>

  <para>Event collection is only possible if <emphasis>instrumentation</emphasis>
  for program code is switched on. This is the default, but for faster
  execution (identical to <computeroutput>valgrind --tool=none</computeroutput>),
  it can be switched off until the program reaches a state in which
  you want to start collecting profiling data.  
  Callgrind can start without instrumentation
  by specifying option <option><xref linkend="opt.instr-atstart"/>=no</option>.
  Instrumentation can be switched on interactively
  with <screen>callgrind_control -i on</screen>
  and off by specifying "off" instead of "on".
  Furthermore, instrumentation state can be programatically changed with
  the macros <computeroutput>CALLGRIND_START_INSTRUMENTATION;</computeroutput>
  and <computeroutput>CALLGRIND_STOP_INSTRUMENTATION;</computeroutput>.
  </para>
  
  <para>In addition to enabling instrumentation, you must also enable
  event collection for the parts of your program you are interested in.
  By default, event collection is enabled everywhere.
  You can limit collection to specific function(s)
  by using 
  <option><xref linkend="opt.toggle-collect"/>=funcprefix</option>. 
  This will toggle the collection state on entering and leaving
  the specified functions.
  When this option is in effect, the default collection state
  at program start is "off".  Only events happening while running
  inside of functions starting with <emphasis>funcprefix</emphasis> will
  be collected. Recursive
  calls of functions with <emphasis>funcprefix</emphasis> do not trigger
  any action.</para>

  <para>It is important to note that with instrumentation switched off, the
  cache simulator cannot see any memory access events, and thus, any
  simulated cache state will be frozen and wrong without instrumentation.
  Therefore, to get useful cache events (hits/misses) after switching on
  instrumentation, the cache first must warm up,
  probably leading to many <emphasis>cold misses</emphasis>
  which would not have happened in reality. If you do not want to see these,
  start event collection a few million instructions after you have switched
  on instrumentation</para>.


  </sect2>


  <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
  <title>Avoiding cycles</title>

  <para>Each group of functions with any two of them happening to have a
  call chain from one to the other, is called a cycle.  For example,
  with A calling B, B calling C, and C calling A, the three functions
  A,B,C build up one cycle.</para>

  <para>If a call chain goes multiple times around inside of a cycle,
  with profiling, you can not distinguish event counts coming from the
  first round or the second. Thus, it makes no sense to attach any inclusive
  cost to a call among functions inside of one cycle.
  If "A &gt; B" appears multiple times in a call chain, you
  have no way to partition the one big sum of all appearances of "A &gt;
  B".  Thus, for profile data presentation, all functions of a cycle are
  seen as one big virtual function.</para>

  <para>Unfortunately, if you have an application using some callback
  mechanism (like any GUI program), or even with normal polymorphism (as
  in OO languages like C++), it's quite possible to get large cycles.
  As it is often impossible to say anything about performance behaviour
  inside of cycles, it is useful to introduce some mechanisms to avoid
  cycles in call graphs.  This is done by treating the same
  function in different ways, depending on the current execution
  context, either by giving them different names, or by ignoring calls to
  functions.</para>

  <para>There is an option to ignore calls to a function with
  <option><xref linkend="opt.fn-skip"/>=funcprefix</option>.  E.g., you
  usually do not want to see the trampoline functions in the PLT sections
  for calls to functions in shared libraries. You can see the difference
  if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>.
  If a call is ignored, cost events happening will be attached to the
  enclosing function.</para>

  <para>If you have a recursive function, you can distinguish the first
  10 recursion levels by specifying
  <option><xref linkend="opt.fn-recursion-num"/>=funcprefix</option>.  
  Or for all functions with 
  <option><xref linkend="opt.fn-recursion"/>=10</option>, but this will 
  give you much bigger profile data files.  In the profile data, you will see
  the recursion levels of "func" as the different functions with names
  "func", "func'2", "func'3" and so on.</para>

  <para>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
  in your program, you usually get a "false" cycle "B &lt;&gt; C". Use 
  <option><xref linkend="opt.fn-caller-num"/>=B</option> 
  <option><xref linkend="opt.fn-caller-num"/>=C</option>,
  and functions "B" and "C" will be treated as different functions 
  depending on the direct caller. Using the apostrophe for appending 
  this "context" to the function name, you get "A &gt; B'A &gt; C'B" 
  and "A &gt; C'A &gt; B'C", and there will be no cycle. Use 
  <option><xref linkend="opt.fn-caller"/>=3</option> to get a 2-caller 
  dependency for all functions.  Note that doing this will increase
  the size of profile data files.</para>

  </sect2>

</sect1>


<sect1 id="cl-manual.options" xreflabel="Command line option reference">
<title>Command line option reference</title>

<para>
In the following, options are grouped into classes, in same order as
the output as <computeroutput>callgrind --help</computeroutput>.
</para>

<sect2 id="cl-manual.options.misc" 
       xreflabel="Miscellaneous options">
<title>Miscellaneous options</title>

<variablelist id="cmd-options.misc">

  <varlistentry>
    <term><option>--help</option></term>
    <listitem>
      <para>Show summary of options. This is a short version of this
      manual section.</para>
    </listitem>
  </varlistentry>

  <varlistentry>
    <term><option>--version</option></term>
    <listitem>
      <para>Show version of callgrind.</para>
    </listitem>
  </varlistentry>

</variablelist>
</sect2>

<sect2 id="cl-manual.options.creation" 
       xreflabel="Dump creation options">
<title>Dump creation options</title>

<para>
These options influence the name and format of the profile data files.
</para>

<variablelist id="cmd-options.creation">

  <varlistentry id="opt.base">
    <term>
      <option><![CDATA[--base=<prefix> [default: callgrind.out] ]]></option>
    </term>
    <listitem>
      <para>Specify the base name for the dump file names. To
      distinguish different profile runs of the same application,
      <computeroutput>.&lt;pid&gt;</computeroutput> is appended to the
      base dump file name with
      <computeroutput>&lt;pid&gt;</computeroutput> being the process ID
      of the profile run (with multiple dumps happening, the file name
      is modified further; see below).</para> <para>This option is
      especially usefull if your application changes its working
      directory.  Usually, the dump file is generated in the current
      working directory of the application at program termination.  By
      giving an absolute path with the base specification, you can force
      a fixed directory for the dump files.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
    <term>
      <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
    </term>
    <listitem>
      <para>This specifies that event counting should be performed at
      per-instruction granularity.
      This allows for assembler code
      annotation, but currently the results can only be shown with KCachegrind.</para>
  </listitem>
  </varlistentry>

  <varlistentry id="opt.dump-line" xreflabel="--dump-line">
    <term>
      <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
    </term>
    <listitem>
      <para>This specifies that event counting should be performed at
      source line granularity. This allows source
      annotation for sources which are compiled with debug information ("-g").</para>
  </listitem>
  </varlistentry>

  <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
    <term>
      <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
    </term>
    <listitem>
      <para>This option influences the output format of the profile data.
      It specifies whether strings (file and function names) should be
      identified by numbers. This shrinks the file size, but makes it more difficult
      for humans to read (which is not recommand either way).</para>
      <para>However, this currently has to be switched off if
      the files are to be read by
      <computeroutput>callgrind_annotate</computeroutput>!</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
    <term>
      <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
    </term>
    <listitem>
      <para>This option influences the output format of the profile data.
      It specifies whether numerical positions are always specified as absolute
      values or are allowed to be relative to previous numbers.
      This shrinks the file size,</para>
      <para>However, this currently has to be switched off if
      the files are to be read by
      <computeroutput>callgrind_annotate</computeroutput>!</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
    <term>
      <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
    </term>
    <listitem>
      <para>When multiple profile data parts are to be generated, these
      parts are appended to the same output file if this option is set to
      "yes". Not recommand.</para>
  </listitem>
  </varlistentry>

</variablelist>
</sect2>

<sect2 id="cl-manual.options.activity" 
       xreflabel="Activity options">
<title>Activity options</title>

<para>
These options specify when actions relating to event counts are to
be executed. For interactive control use
<computeroutput>callgrind_control</computeroutput>.
</para>

<variablelist id="cmd-options.activity">

  <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
    <term>
      <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
    </term>
    <listitem>
      <para>Dump profile data every &lt;count&gt; basic blocks.
      Whether a dump is needed is only checked when Valgrinds internal
      scheduler is run. Therefore, the minimum setting useful is about 100000.
      The count is a 64-bit value to make long dump periods possible.
      </para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.dump-before" xreflabel="--dump-before">
    <term>
      <option><![CDATA[--dump-before=<prefix> ]]></option>
    </term>
    <listitem>
      <para>Dump when entering a function starting with &lt;prefix&gt;</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.zero-before" xreflabel="--zero-before">
    <term>
      <option><![CDATA[--zero-before=<prefix> ]]></option>
    </term>
    <listitem>
      <para>Zero all costs when entering a function starting with &lt;prefix&gt;</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.dump-after" xreflabel="--dump-after">
    <term>
      <option><![CDATA[--dump-after=<prefix> ]]></option>
    </term>
    <listitem>
      <para>Dump when leaving a function starting with &lt;prefix&gt;</para>
    </listitem>
  </varlistentry>

</variablelist>
</sect2>

<sect2 id="cl-manual.options.collection"
       xreflabel="Data collection options">
<title>Data collection options</title>

<para>
These options specify when events are to be aggregated into event counts.
Also see <xref linkend="cl-manual.limits"/>.</para>

<variablelist id="cmd-options.collection">

  <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
    <term>
      <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option>
    </term>
    <listitem>
      <para>Specify if you want Callgrind to start simulation and
      profiling from the beginning of the program.  
      When set to <computeroutput>no</computeroutput>, 
      Callgrind will not be able
      to collect any information, including calls, but it will have at
      most a slowdown of around 4, which is the minimum Valgrind
      overhead.  Instrumentation can be interactively switched on via
      <computeroutput>callgrind_control -i on</computeroutput>.</para>
      <para>Note that the resulting call graph will most probably not
      contain <computeroutput>main</computeroutput>, but will contain all the
      functions executed after instrumentation was switched on.
      Instrumentation can also programatically switched on/off. See the
      Callgrind include file
      <computeroutput>&lt;callgrind.h&gt;</computeroutput> for the macro
      you have to use in your source code.</para> <para>For cache
      simulation, results will be less accurate when switching on
      instrumentation later in the program run, as the simulator starts
      with an empty cache at that moment.  Switch on event collection
      later to cope with this error.</para>
    </listitem>
  </varlistentry>
  
  <varlistentry id="opt.collect-atstart">
    <term>
      <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
    </term>
    <listitem>
      <para>Specify whether event collection is switched on at beginning
      of the profile run.</para>
      <para>To only look at parts of your program, you have two
      possibilities:</para>
      <orderedlist>
      <listitem>
        <para>Zero event counters before entering the program part you
        want to profile, and dump the event counters to a file after
        leaving that program part.</para>
        </listitem>
        <listitem>
          <para>Switch on/off collection state as needed to only see
          event counters happening while inside of the program part you
          want to profile.</para>
        </listitem>
      </orderedlist>
      <para>The second option can be used if the program part you want to
      profile is called many times. Option 1, i.e. creating a lot of
      dumps is not practical here.</para> 
      <para>Collection state can be
      toggled at entry and exit of a given function with the
      option <xref linkend="opt.toggle-collect"/>.  If you use this flag, 
      collection
      state should be switched off at the beginning.  Note that the
      specification of <computeroutput>--toggle-collect</computeroutput>
      implicitly sets
      <computeroutput>--collect-state=no</computeroutput>.</para>
      <para>Collection state can be toggled also by using a Valgrind
      Client Request in your application.  For this, include
      <computeroutput>valgrind/callgrind.h</computeroutput> and specify
      the macro
      <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput> at the
      needed positions. This only will have any effect if run under
      supervision of the Callgrind tool.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
    <term>
      <option><![CDATA[--toggle-collect=<prefix> ]]></option>
    </term>
    <listitem>
      <para>Toggle collection on entry/exit of a function whose name
      starts with
      &lt;prefix&gt;.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps=">
    <term>
      <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
    </term>
    <listitem>
      <para>This specifies whether information for (conditional) jumps
      should be collected.  As above, callgrind_annotate currently is not
      able to show you the data.  You have to use KCachegrind to get jump
      arrows in the annotated code.</para>
    </listitem>
  </varlistentry>

</variablelist>
</sect2>

<sect2 id="cl-manual.options.separation"
       xreflabel="Cost entity separation options">
<title>Cost entity separation options</title>

<para>
These options specify how event counts should be attributed to execution
contexts.
More specifically, they specify e.g. if the recursion level or the
call chain leading to a function should be accounted for, and whether the
thread ID should be remembered.
Also see <xref linkend="cl-manual.cycles"/>.</para>

<variablelist id="cmd-options.separation">

  <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
    <term>
      <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
    </term>
    <listitem>
      <para>This option specifies whether profile data should be generated
      separately for every thread. If yes, the file names get "-threadID"
      appended.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.fn-recursion" xreflabel="--fn-recursion">
    <term>
      <option><![CDATA[--fn-recursion=<level> [default: 2] ]]></option>
    </term>
    <listitem>
      <para>Separate function recursions, maximal &lt;level&gt;.
      See <xref linkend="cl-manual.cycles"/>.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.fn-caller" xreflabel="--fn-caller">
    <term>
      <option><![CDATA[--fn-caller=<callers> [default: 0] ]]></option>
    </term>
    <listitem>
      <para>Separate contexts by maximal &lt;callers&gt; functions in the
      call chain. See <xref linkend="cl-manual.cycles"/>.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
    <term>
      <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
    </term>
    <listitem>
      <para>Ignore calls to/from PLT sections.</para>
    </listitem>
  </varlistentry>
  
  <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
    <term>
      <option><![CDATA[--fn-skip=<function> ]]></option>
    </term>
    <listitem>
      <para>Ignore calls to/from a given function.  E.g. if you have a
      call chain A &gt; B &gt; C, and you specify function B to be
      ignored, you will only see A &gt; C.</para>
      <para>This is very convenient to skip functions handling callback
      behaviour. E.g. for the SIGNAL/SLOT mechanism in QT, you only want
      to see the function emitting a signal to call the slots connected
      to that signal. First, determine the real call chain to see the
      functions needed to be skipped, then use this option.</para>
    </listitem>
  </varlistentry>
  
  <varlistentry id="opt.fn-group">
    <term>
      <option><![CDATA[--fn-group<number>=<function> ]]></option>
    </term>
    <listitem>
      <para>Put a function into a separate group. This influences the
      context name for cycle avoidance. All functions inside of such a
      group are treated as being the same for context name building, which
      resembles the call chain leading to a context. By specifying function
      groups with this option, you can shorten the context name, as functions
      in the same group will not appear in sequence in the name. </para>
    </listitem>
  </varlistentry>
  
  <varlistentry id="opt.fn-recursion-num" xreflabel="--fn-recursion10">
    <term>
      <option><![CDATA[--fn-recursion<number>=<function> ]]></option>
    </term>
    <listitem>
      <para>Separate &lt;number&gt; recursions for &lt;function&gt;.
      See <xref linkend="cl-manual.cycles"/>.</para>
    </listitem>
  </varlistentry>

  <varlistentry id="opt.fn-caller-num" xreflabel="--fn-caller2">
    <term>
      <option><![CDATA[--fn-caller<number>=<function> ]]></option>
    </term>
    <listitem>
      <para>Separate &lt;number&gt; callers for &lt;function&gt;.
      See <xref linkend="cl-manual.cycles"/>.</para>
    </listitem>
  </varlistentry>

</variablelist>
</sect2>

<sect2 id="cl-manual.options.simulation"
       xreflabel="Cache simulation options">
<title>Cache simulation options</title>

<variablelist id="cmd-options.simulation">
  
  <varlistentry id="opt.simulate-cache" xreflabel="--simulate-cache">
    <term>
      <option><![CDATA[--simulate-cache=<yes|no> [default: no] ]]></option>
    </term>
    <listitem>
      <para>Specify if you want to do full cache simulation.  By default,
      only instruction read accesses will be profiled.</para>
    </listitem>
  </varlistentry>
  
</variablelist>

</sect2>

</sect1>

</chapter>