aboutsummaryrefslogtreecommitdiff
path: root/exp-sgcheck/docs/sg-manual.xml
blob: c03e77811d51eff6350ba1f8e810564d7235a1e3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<?xml version="1.0"?> <!-- -*- sgml -*- -->
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>


<chapter id="sg-manual" 
         xreflabel="SGCheck: an experimental stack and global array overrun detector">
  <title>SGCheck: an experimental stack and global array overrun detector</title>

<para>To use this tool, you must specify
<option>--tool=exp-sgcheck</option> on the Valgrind
command line.</para>




<sect1 id="sg-manual.overview" xreflabel="Overview">
<title>Overview</title>

<para>SGCheck is a tool for finding overruns of stack and global
arrays.  It works by using a heuristic approach derived from an
observation about the likely forms of stack and global array accesses.
</para>

</sect1>




<sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options">
<title>SGCheck Command-line Options</title>

<para id="sg.opts.list">There are no SGCheck-specific command-line options at present.</para>
<!--
<para>SGCheck-specific command-line options are:</para>


<variablelist id="sg.opts.list">
</variablelist>
-->

</sect1>



<sect1 id="sg-manual.how-works.sg-checks"
       xreflabel="How SGCheck Works">
<title>How SGCheck Works</title>

<para>When a source file is compiled
with <option>-g</option>, the compiler attaches DWARF3
debugging information which describes the location of all stack and
global arrays in the file.</para>

<para>Checking of accesses to such arrays would then be relatively
simple, if the compiler could also tell us which array (if any) each
memory referencing instruction was supposed to access.  Unfortunately
the DWARF3 debugging format does not provide a way to represent such
information, so we have to resort to a heuristic technique to
approximate it.  The key observation is that
   <emphasis>
   if a memory referencing instruction accesses inside a stack or
   global array once, then it is highly likely to always access that
   same array</emphasis>.</para>

<para>To see how this might be useful, consider the following buggy
fragment:</para>
<programlisting><![CDATA[
   { int i, a[10];  // both are auto vars
     for (i = 0; i <= 10; i++)
        a[i] = 42;
   }
]]></programlisting>

<para>At run time we will know the precise address
of <computeroutput>a[]</computeroutput> on the stack, and so we can
observe that the first store resulting from <computeroutput>a[i] =
42</computeroutput> writes <computeroutput>a[]</computeroutput>, and
we will (correctly) assume that that instruction is intended always to
access <computeroutput>a[]</computeroutput>.  Then, on the 11th
iteration, it accesses somewhere else, possibly a different local,
possibly an un-accounted for area of the stack (eg, spill slot), so
SGCheck reports an error.</para>

<para>There is an important caveat.</para>

<para>Imagine a function such as <function>memcpy</function>, which is used
to read and write many different areas of memory over the lifetime of the
program.  If we insist that the read and write instructions in its memory
copying loop only ever access one particular stack or global variable, we
will be flooded with errors resulting from calls to
<function>memcpy</function>.</para>

<para>To avoid this problem, SGCheck instantiates fresh likely-target
records for each entry to a function, and discards them on exit.  This
allows detection of cases where (e.g.) <function>memcpy</function>
overflows its source or destination buffers for any specific call, but
does not carry any restriction from one call to the next.  Indeed,
multiple threads may make multiple simultaneous calls to
(e.g.) <function>memcpy</function> without mutual interference.</para>

<para>It is important to note that the association is done between
  a <emphasis>binary instruction</emphasis> and an array, the
  <emphasis>first time</emphasis> this binary instruction accesses an
  array during a function call.  When the same instruction is executed
  again during the same function call, then SGCheck might report a
  problem, if these further executions are not accessing the same
  array. This technique causes several limitations in SGCheck, see
  <xref linkend="sg-manual.limitations"/>.
</para>
</sect1>



<sect1 id="sg-manual.cmp-w-memcheck"
       xreflabel="Comparison with Memcheck">
<title>Comparison with Memcheck</title>

<para>SGCheck and Memcheck are complementary: their capabilities do
not overlap.  Memcheck performs bounds checks and use-after-free
checks for heap arrays.  It also finds uses of uninitialised values
created by heap or stack allocations.  But it does not perform bounds
checking for stack or global arrays.</para>

<para>SGCheck, on the other hand, does do bounds checking for stack or
global arrays, but it doesn't do anything else.</para>

</sect1>





<sect1 id="sg-manual.limitations"
       xreflabel="Limitations">
<title>Limitations</title>

<para>This is an experimental tool, which relies rather too heavily on some
not-as-robust-as-I-would-like assumptions on the behaviour of correct
programs.  There are a number of limitations which you should be aware
of.</para>

<itemizedlist>

  <listitem>
   <para>False negatives (missed errors): it follows from the
   description above (<xref linkend="sg-manual.how-works.sg-checks"/>)
   that the first access by a memory referencing instruction to a
   stack or global array creates an association between that
   instruction and the array, which is checked on subsequent accesses
   by that instruction, until the containing function exits.  Hence,
   the first access by an instruction to an array (in any given
   function instantiation) is not checked for overrun, since SGCheck
   uses that as the "example" of how subsequent accesses should
   behave.</para>
   <para>It also means that errors will not be found in an instruction
     executed only once (e.g. because this instruction is not in a loop,
     or the loop is executed only once).</para>
  </listitem>

  <listitem>
   <para>False positives (false errors): similarly, and more serious,
   it is clearly possible to write legitimate pieces of code which
   break the basic assumption upon which the checking algorithm
   depends.  For example:</para>

<programlisting><![CDATA[
  { int a[10], b[10], *p, i;
    for (i = 0; i < 10; i++) {
       p = /* arbitrary condition */  ? &a[i]  : &b[i];
       *p = 42;
    }
  }
]]></programlisting>

   <para>In this case the store sometimes
   accesses <computeroutput>a[]</computeroutput> and
   sometimes <computeroutput>b[]</computeroutput>, but in no cases is
   the addressed array overrun.  Nevertheless the change in target
   will cause an error to be reported.</para>

   <para>It is hard to see how to get around this problem.  The only
   mitigating factor is that such constructions appear very rare, at
   least judging from the results using the tool so far.  Such a
   construction appears only once in the Valgrind sources (running
   Valgrind on Valgrind) and perhaps two or three times for a start
   and exit of Firefox.  The best that can be done is to suppress the
   errors.</para>
  </listitem>

  <listitem>
   <para>Performance: SGCheck has to read all of
   the DWARF3 type and variable information on the executable and its
   shared objects.  This is computationally expensive and makes
   startup quite slow.  You can expect debuginfo reading time to be in
   the region of a minute for an OpenOffice sized application, on a
   2.4 GHz Core 2 machine.  Reading this information also requires a
   lot of memory.  To make it viable, SGCheck goes to considerable
   trouble to compress the in-memory representation of the DWARF3
   data, which is why the process of reading it appears slow.</para>
  </listitem>

  <listitem>
   <para>Performance: SGCheck runs slower than Memcheck.  This is
   partly due to a lack of tuning, but partly due to algorithmic
   difficulties.  The
   stack and global checks can sometimes require a number of range
   checks per memory access, and these are difficult to short-circuit,
   despite considerable efforts having been made.  A
   redesign and reimplementation could potentially make it much faster.
   </para>
  </listitem>

  <listitem>
   <para>Coverage: Stack and global checking is fragile.  If a shared
   object does not have debug information attached, then SGCheck will
   not be able to determine the bounds of any stack or global arrays
   defined within that shared object, and so will not be able to check
   accesses to them.  This is true even when those arrays are accessed
   from some other shared object which was compiled with debug
   info.</para>

   <para>At the moment SGCheck accepts objects lacking debuginfo
   without comment.  This is dangerous as it causes SGCheck to
   silently skip stack and global checking for such objects.  It would
   be better to print a warning in such circumstances.</para>
  </listitem>

  <listitem>
   <para>Coverage: SGCheck does not check whether the areas read
   or written by system calls do overrun stack or global arrays.  This
   would be easy to add.</para>
  </listitem>

  <listitem>
   <para>Platforms: the stack/global checks won't work properly on
   PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
   That's because the stack and global checking requires tracking
   function calls and exits reliably, and there's no obvious way to do
   it on ABIs that use a link register for function returns.
   </para>
  </listitem>

  <listitem>
   <para>Robustness: related to the previous point.  Function
   call/exit tracking for X86 and AMD64 is believed to work properly
   even in the presence of longjmps within the same stack (although
   this has not been tested).  However, code which switches stacks is
   likely to cause breakage/chaos.</para>
  </listitem>
</itemizedlist>

</sect1>





<sect1 id="sg-manual.todo-user-visible"
       xreflabel="Still To Do: User-visible Functionality">
<title>Still To Do: User-visible Functionality</title>

<itemizedlist>

  <listitem>
   <para>Extend system call checking to work on stack and global arrays.</para>
  </listitem>

  <listitem>
   <para>Print a warning if a shared object does not have debug info
   attached, or if, for whatever reason, debug info could not be
   found, or read.</para>
  </listitem>

  <listitem>
   <para>Add some heuristic filtering that removes obvious false
     positives.  This would be easy to do.  For example, an access
     transition from a heap to a stack object almost certainly isn't a
     bug and so should not be reported to the user.</para>
  </listitem>

</itemizedlist>

</sect1>




<sect1 id="sg-manual.todo-implementation"
       xreflabel="Still To Do: Implementation Tidying">
<title>Still To Do: Implementation Tidying</title>

<para>Items marked CRITICAL are considered important for correctness:
non-fixage of them is liable to lead to crashes or assertion failures
in real use.</para>

<itemizedlist>

  <listitem>
   <para> sg_main.c: Redesign and reimplement the basic checking
   algorithm.  It could be done much faster than it is -- the current
   implementation isn't very good.
   </para>
  </listitem>

  <listitem>
   <para> sg_main.c: Improve the performance of the stack / global
   checks by doing some up-front filtering to ignore references in
   areas which "obviously" can't be stack or globals.  This will
   require using information that m_aspacemgr knows about the address
   space layout.</para>
  </listitem>
 
  <listitem>
   <para>sg_main.c: fix compute_II_hash to make it a bit more sensible
   for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
   targets, so this is a bit academic at the moment).</para>
  </listitem>
  
</itemizedlist>

</sect1>



</chapter>