TODO


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268

This is an (incomplete) list of some of the stuff we want to look at doing.

If you're interested in hacking on any of these, please contact the list first
for some pointers and/or read HACKING and doc/CodingStyle.

1.0 release
-----------

(this is a minimal selection of stuff I think we need)

 o default to a vmlinux location: need agreement from kernel developers
 o default to --separate=library (with anon, =none, makes not much sense)
 o prettify image name for .jo files and allow lib-image: to specify it
 o gisle's fixes
 o opreport tgid:<tgid> doesn't work even if .jo files with that pid
 o Fix:

warning: [vdso] (tgid:9236 range:0x7fff98ffd000-0x7fff98fff000) could not be found.
warning: /no-vmlinux could not be found.
warning: /usr/lib64/libpanel-applet-2.so.0.2.27.#prelink#.sXCUK1 (deleted) could not be found.

 o amd64 32 bit build needs a sys32_lookup_dcookie() translator in the
   kernel
 o decide on -m tgid semantics for anon regions
 o if ev67 is not fixed, back it out
 o lapic : module should says "didn't find apic" if needed, FAQ and doc should
  speak a bit about lapic kernel option on x86 and recent kernel
 o see the big comment in db_insert.c, it's possible to allow unlimited
   amount of samples with a very minor change in libdb.
 o if oprofile doesn't recognize the processor selected by the kernel
   opcontrol could setup the module in timer mode (remove/reload prolly), and
   warn the user it must upgrade oprofile to get all the feature from its
   hardware.

Later
-----

 o remove 2.95/2.2 support so we can use boost multi index container in
   symbol/sample container
 o consider if we can improve anon mapping growing support

<movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so /bin/bash | grep vfprintf
<movement> 14        0.1301  6         0.0102  /lib/tls/libc-2.3.2.so   vfprintf
<movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so /usr/bin/vim | grep vfprintf
<movement> 176       2.0927  349       1.2552  /lib/tls/libc-2.3.2.so   vfprintf
<movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so { image:/bin/bash } { image:/usr/bin/vim } | grep vfprintf
<movement> 176      10.9657  +++       349       7.8888  +++       vfprintf
<movement> 14       ---      ---       6        ---      ---       vfprintf
<movement> it seems them as two separate symbols
<movement> but can we remove the app_name from rough_less and still be able to walk the two lists?
<movement> even if we could, it would still go wrong when we're profiling multiple apps

 o Java stuff??
 o with opreport -c I can get "warning: /no-vmlinux could not be found.".
   Should be smarter ?
 o opreport -c gives weird output for an image with no symbols:

    samples  %        symbol name
  15965    100.000  (no symbols)
253      100.000  (no symbols)
  15965    98.4400  (no symbols)
  253       1.5600  (no symbols) [self]

 o consider tagging opreport -c entries with a number like gprof
 o --details for opreport -c, or diff??
 o should [self] entries be ommitted if 0 ??
 o stress test opreport -c: compile a Big Application w/o frame pointer and look
   how driver and opreport -c react.
 o oparchive could fix up {kern} paths with -p (what about diff between
   archive and current though?)
 o can say more in opcontrol --status
 o consider a sort option for diff %
 o opannotate is silent about symbols missing debug info
 o oprofiled.log now contains various statistics about lost sample etc. from
  the driver. Post profile tools must parse that and warn eventually, warning
  must include a proposed work around. User need this: if nothing seems wrong
  people are unlikely to get a look in oprofiled.log (I ran oprofile on 2.6.1
  2 weeks before noticing at 30000 I lost a lot of samples, the profile seemed
  ok du to the randomization of lost samples). As developper we need that too,
  actually we have no clear idea of the behavior on different arch, NUMA etc.
  Not perfect because if the profiler is running the oprofiled.log will show
  those warning only after the first  alarm signal, I think we must dump the
  statistics information after each opcontrol --dump to avoid that.
 o odb_insert() can fail on ftruncate or mremap() in db_manage.c but we don't
  try to recover gracefully.
 o output column shortname headers for opreport -l
 o is relative_to_absolute_path guaranteeing a trailing '/' documented ?
 o move oprofiled.log to OP_SAMPLE_DIR/current ?
 o pp tools must handle samples count overflow (marked as (unsigned)-1)
 o the way we show kernel modules in 2.5 is not very obvious - "/oprofile"
 o oparchive will be more usefull with a --root= options to allow profiling
  on a small box, nfs mount / to another box and transfer sample file and
  binary on a bigger box for analysis. There is also a problem in oparchive
  you can use session: to get the right path to samples files but oprofiled.log
  and abi files path are hardcoded to /var/lib/oprofile.
 o callgraph patch: better way to skip ignored backtrace ?
 o lib-image: and image: behavior depend on --separate=, if --separate=library
  opreport "lib-image:*libc*" --merge=lib works but not
  opreport "image:*libc*" --merge=lib whilst the behavior is reversed if
  --separate==none. Must we take care ?
 o dependencies between profile_container.h symbol_container.h and
  sample_container.h become more and more ugly, I needed to include them
  in a specific order in some source (still true??)
 o add event aliases for common things like icache misses, we must start to 
  think about metrics including simple like event alias mapped to two or more
  events and intepreted specially by user space tools like using the ratio
  of samples; more tricky will be to select an event used as call count (no
  cg on it) and  used to emulate the call count field in gprof. I think this is
  a after 1.0 thing but event aliases must be specified in a way allowing such
  extension
 o do we need an opreport like opreport -c (showing caller/callee at binary
  boundary not symbols) ?
 o we should notice an opcontrol config change (--separate etc.) and
   auto-restart the daemon if necessary (Run)
 o we can add lots more unit tests yet
 o Itanium event constraints are not implemented
 o GUI still has a physical-counter interface, should have a general one
   like opcontrol --event
 o I think we should have the ability to have *fixed* width headers, e.g. :

vma      samples  cum. samples  %           cum. %     symbol name             image name              app name
0804c350 64582    64582         35.0757     35.0757    odb_insert              /usr/loc...in/oprofiled /usr/local/oprofile-pp/bin/oprofiled

  Note the ellipsis
 o should we make the sighup handler re-read counter config and re-start profiling too ?
 o improve --smart-demangle
	o allow user to add it's own pattern in user.pat, document it.
	o hard code ${typename} regular definition to remove all current limitations (difficult, perhaps after 1.0 ?).
 o oprof_start dialog size is too small initially
 o i18n. We need a good formatter, and also remember format_percent()
 o opannotate --source --output-dir=~moz/op/ /usr/bin/oprofiled
   will fail because the ~ is not expanded (no space around it) (popt bug I say)
 o cpu names instead of numbers in 2.4 module/ ?
 o remove 1 and 2 magic numbers for oprof_ready
 o adapt Anton's patch for handling non-symbolled libraries ? (nowaday C++
  anon namespace symbol are static, 3.4 iirc, so with recent distro we are
  more likely to get problems with a "fallback to dynamic symbols" approch)
 o use standard C integer type <stdint.h> int32_t int16_t etc.
 o event multiplexing for real
 o randomizing of reset value
 o XML output
 o profile the NMI handler code
 o opannotate : I added this to the doc about difference between nr samples
  credited to a source function and total number of samples for this function:
   "The missing samples are not lost, they will be credited to another source
    location where the inlined function is defined. The inlined function will
    be credited from multiple call site and merged in one place in the
   annotated source file so there is no way to see from what call site are
   coming the samples for an inlined function."
  I think we can work around this: output multiple instances of inlined
  function like :
  inline foo() { foo: total 1500 30.00 ...
  ... annotated source from all call site 
  inline foo() { foo (call site bar()): total 500 10.00
  .. annotated source from call site bar() etc.
  what about template..., can we do/must we do something like that
  template <class T> eat_cpu() and do a similar things, merging and annotating
  all instantation then annotating for each distinct instantation, this will
  break our "keep the source line number in annotated source file identical to
  the original source"
 o events/mips/34k/events, some events does not make sense, they get identical
  event number, um and counter nr so they overlap, currently commented
 o can we find a more efficient implementation for sparse_array ?
 o libpp/profile.cpp:is_spu_sample_file() can be simplified by using
  read_header()
 o while fixing #1819350 I needed to make extra_images per profile session
   rather than a global var so I think we need to revisit find_image_path(),
   extra_found_images, --image-path (-p).
   Actually we can't do something ala:
   opreport { archive:tmp1 search_path=/lib/modules/2.6.20 } { archive:tmp2 search_path=/.../2.6.20.9 }
   because search_path is specified through -p which is not a part of the
   profile spec. Fixing #1819350 covered all case except this one but w/o any
   user visible change. Another way will be to save the -p option used with
   oparchive in a file at the toplevel of the archive, use it with all tools
   when an archive: is specified on the command line and deprecate the use of
   -p in such case.
 o consider to make extra_images a ref counted object, it's copied by value
  a few time but can contain a lot of string. There is also some ugly public
  member extra_images to fix.
 o daemon bss size can be improved, grep for MAX_PATH to see where dynamic
   allocation can be used, try $ nm oprofiled --size-sort too.

Documentation
-------------

 o the docs should mention the default event for each arch somewhere
 o more discussion of problematic code needs to go in the "interpreting" section. 
 o document gcc 2.95 and linenr info problems especially for inline functions
 o finish the internals manual

JIT support
-----------

 o We need a more dynamic structure to handle entries_address_ascending and
   entries_symbols_ascending, actually many scaling problem occur because they
   are array, this was perfect to get a first implementation focusing on
   handling overlap and all but the need to qsort/copy arrays at each iteration
   is a performance killer. Some sort of AVL tree will do the job.
 o Related to the previous, it's possible to do all processing in opjitconv.c
   in a single left to right walk of the jitentry list.
 o see the FIXME at parse_dump.c:parse_code_unload()
 o Increment JITHEADER_VERSION in jitdump.h to be sure that the new code only
   accepts dump file created by the new code.
 o opjitconv.c:replacement_name() should be enough clever to avoid name
   collision so we can remove the recursive call to disambiguate_symbol_names(),
   need a hash table or some sort of associative array to check quickly if a
   name exists, we will need some sort of avl tree so it's probably better
   to do not implement a hash table only for this purpose.
 o op_write_native_code() must accept one more parameter, the real code size
   which can be zero or equal to code_size, this will allow to create elf
   file w/o any code contents, only a symbol table and .text sections w/o
   contents (yes ELF format allow that). For dynamic binary translation it'll
   avoid to dump tons of code for little use, opannotate --assembly will not
   work on such elf file but it can be a real win. It'll need to add to
   jitrecord0 a real_size field, and some trickery when building the elf file,
   taking care about the case we mix zero code size with non zero code size.
   Perhaps we can use it too for java, filtering native method etc. Actually
   we allow a simplified form of this feature by allowing to disable/enable
   code dumping but at the whole dump level not on a symbol basis, quite
   possible sufficient. [mpj: We're backing away from the idea of dumping
   JIT records without code.  Since BFD asymbol type does not include symbol size,
   the op_bfd technique for determining symbol size relies on knowing the true
   file size; and if code is not included in the .jo file, we don't have true size.]
 o The pipe used for triggering JIT dump conversion should be used for normal
   dumping too.
 o See FIXME in agents/jvmti/libjvmti_oprofile.c:
   If enablement to get line number info would be configurable through command line,
   what should be the default on/off?
 o See FIXME in opjitconv/debug_line.c
 o The way to use the pipe should be made more secure to avoid denial of service
   attacks. We have to think about it.
 o Callgraph does not work properly for the .jo files the JIT support creates.
   See section Chapter 4, sect 2.3.2 "Callgraph and JIT support". Try to figure
   out a way to correlate an anonymous sample callgraph entry with
   the .jo file that may exist for the anonymous code.
 o see mail from Gisle Dankel:
   "JIT_SUPPORT: Adding support for file-backed non-ELF JIT code"
   -> should be changed (if useful) before next release
 o See FIXME in op_header.cpp:
   The check for header.mtime of JIT sample files is not correct because currently
   this mtime value is set to zero due to missing cookie setting for JIT sample files.
   Some additional check/setting to header.mtime should be made for JIT sample files.
 o Mono JIT support:

   2007-11-08: with callgraph massi got
   <massi> oparchive error: parse_filename() invalid filename: /var/lib/oprofile/samples/current/{root}/var/lib/oprofile/samples/current/{root}/home/massi/mono/amd64/bin/mono/{dep}/{anon:anon}/32432.0x40a26000.0x40a36000/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/home/massi/mono/amd64/bin/mono/{dep}/{anon:anon}/32432.0x40a26000.0x40a36000/CPU_CLK_UNHALTED.100000.0.all.all.all/{cg}/{root}/usr/oprofile/bin/oprofiled/CPU_CLK_

   Massi added Mono JIT support, code on the stack is never unloaded and there is
   no byte code, code is always compiled to native machine code, this mean than
   for mono at least we can do callgraph if we can fix this samples filename
   problem.

General checks to make
----------------------
 
 o rgrep FIXME
 o valgrind (--show-reachable=yes --leak-check=yes)
 o audit to track unnecessary include <>
 o gcc 3.0/3.x compile
 o Qt2/3 check, no Qt check
 o verify builds (modversions, kernel versions, athlon etc.). I have the
  necessary stuff to check kernel versions/configurations on PIII core (Phil)
 o use nm and a little script to track unused function
 o test it to hell and back
 o compile all C++ programs with STL_port and test them (gcc 3.4 contain a
   debug mode too but std::string iterator are not checked)
 o There is probably place of post profile tools where looking at errno will give better error messages.