aboutsummaryrefslogtreecommitdiff
path: root/gcc/doc/analyzer.texi
blob: 14034737f83c865a4a25e748f47a8fcf55ff098a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
@c Copyright (C) 2019-2024 Free Software Foundation, Inc.
@c This is part of the GCC manual.
@c For copying conditions, see the file gcc.texi.
@c Contributed by David Malcolm <dmalcolm@redhat.com>.

@node Static Analyzer
@chapter Static Analyzer
@cindex analyzer
@cindex static analysis
@cindex static analyzer

@menu
* Analyzer Internals::       Analyzer Internals
* Debugging the Analyzer::   Useful debugging tips
@end menu

@node Analyzer Internals
@section Analyzer Internals
@cindex analyzer, internals
@cindex static analyzer, internals

@subsection Overview

At a high-level, we're doing coverage-guided symbolic execution of the
user's code.

The analyzer implementation works on the gimple-SSA representation.
(I chose this in the hopes of making it easy to work with LTO to
do whole-program analysis).

The implementation is read-only: it doesn't attempt to change anything,
just emit warnings.

The gimple representation can be seen using @option{-fdump-ipa-analyzer}.
@quotation Tip
If the analyzer ICEs before this is written out, one workaround is to use
@option{--param=analyzer-bb-explosion-factor=0} to force the analyzer
to bail out after analyzing the first basic block.
@end quotation

First, we build a @code{supergraph} which combines the callgraph and all
of the CFGs into a single directed graph, with both interprocedural and
intraprocedural edges.  The nodes and edges in the supergraph are called
``supernodes'' and ``superedges'', and often referred to in code as
@code{snodes} and @code{sedges}.  Basic blocks in the CFGs are split at
interprocedural calls, so there can be more than one supernode per
basic block.  Most statements will be in just one supernode, but a call
statement can appear in two supernodes: at the end of one for the call,
and again at the start of another for the return.

The supergraph can be seen using @option{-fdump-analyzer-supergraph}.

We then build an @code{analysis_plan} which walks the callgraph to
determine which calls might be suitable for being summarized (rather
than fully explored) and thus in what order to explore the functions.

Next is the heart of the analyzer: we use a worklist to explore state
within the supergraph, building an "exploded graph".
Nodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
     "Precise Interprocedural Dataflow Analysis via Graph Reachability"
     (Thomas Reps, Susan Horwitz and Mooly Sagiv) - but note that
we're not using the algorithm described in that paper, just the
``exploded graph'' terminology.

We reuse nodes for <point, state> pairs we've already seen, and avoid
tracking state too closely, so that (hopefully) we rapidly converge
on a final exploded graph, and terminate the analysis.  We also bail
out if the number of exploded <end-of-basic-block, state> nodes gets
larger than a particular multiple of the total number of basic blocks
(to ensure termination in the face of pathological state-explosion
cases, or bugs).  We also stop exploring a point once we hit a limit
of states for that point.

We can identify problems directly when processing a <point,@w{ }state>
instance.  For example, if we're finding the successors of

@smallexample
   <point: before-stmt: "free (ptr);",
    state: @{"ptr": freed@}>
@end smallexample

then we can detect a double-free of "ptr".  We can then emit a path
to reach the problem by finding the simplest route through the graph.

Program points in the analysis are much more fine-grained than in the
CFG and supergraph, with points (and thus potentially exploded nodes)
for various events, including before individual statements.
By default the exploded graph merges multiple consecutive statements
in a supernode into one exploded edge to minimize the size of the
exploded graph.  This can be suppressed via
@option{-fanalyzer-fine-grained}.
The fine-grained approach seems to make things simpler and more debuggable
that other approaches I tried, in that each point is responsible for one
thing.

Program points in the analysis also have a "call string" identifying the
stack of callsites below them, so that paths in the exploded graph
correspond to interprocedurally valid paths: we always return to the
correct call site, propagating state information accordingly.
We avoid infinite recursion by stopping the analysis if a callsite
appears more than @code{analyzer-max-recursion-depth} in a callstring
(defaulting to 2).

@subsection Graphs

Nodes and edges in the exploded graph are called ``exploded nodes'' and
``exploded edges'' and often referred to in the code as
@code{enodes} and @code{eedges} (especially when distinguishing them
from the @code{snodes} and @code{sedges} in the supergraph).

Each graph numbers its nodes, giving unique identifiers - supernodes
are referred to throughout dumps in the form @samp{SN': @var{index}} and
exploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
@samp{EN:29}).

The supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.

The exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
and other dump options.  Exploded nodes are color-coded in the .dot output
based on state-machine states to make it easier to see state changes at
a glance.

@subsection State Tracking

There's a tension between:
@itemize @bullet
@item
precision of analysis in the straight-line case, vs
@item
exponential blow-up in the face of control flow.
@end itemize

For example, in general, given this CFG:

@smallexample
      A
     / \
    B   C
     \ /
      D
     / \
    E   F
     \ /
      G
@end smallexample

we want to avoid differences in state-tracking in B and C from
leading to blow-up.  If we don't prevent state blowup, we end up
with exponential growth of the exploded graph like this:

@smallexample

           1:A
          /   \
         /     \
        /       \
      2:B       3:C
       |         |
      4:D       5:D        (2 exploded nodes for D)
     /   \     /   \
   6:E   7:F 8:E   9:F
    |     |   |     |
   10:G 11:G 12:G  13:G    (4 exploded nodes for G)

@end smallexample

Similar issues arise with loops.

To prevent this, we follow various approaches:

@enumerate a
@item
state pruning: which tries to discard state that won't be relevant
later on withing the function.
This can be disabled via @option{-fno-analyzer-state-purge}.

@item
state merging.  We can try to find the commonality between two
program_state instances to make a third, simpler program_state.
We have two strategies here:

  @enumerate
  @item
     the worklist keeps new nodes for the same program_point together,
     and tries to merge them before processing, and thus before they have
     successors.  Hence, in the above, the two nodes for D (4 and 5) reach
     the front of the worklist together, and we create a node for D with
     the merger of the incoming states.

  @item
     try merging with the state of existing enodes for the program_point
     (which may have already been explored).  There will be duplication,
     but only one set of duplication; subsequent duplicates are more likely
     to hit the cache.  In particular, (hopefully) all merger chains are
     finite, and so we guarantee termination.
     This is intended to help with loops: we ought to explore the first
     iteration, and then have a "subsequent iterations" exploration,
     which uses a state merged from that of the first, to be more abstract.
  @end enumerate

We avoid merging pairs of states that have state-machine differences,
as these are the kinds of differences that are likely to be most
interesting.  So, for example, given:

@smallexample
      if (condition)
        ptr = malloc (size);
      else
        ptr = local_buf;

      .... do things with 'ptr'

      if (condition)
        free (ptr);

      ...etc
@end smallexample

then we end up with an exploded graph that looks like this:

@smallexample

                   if (condition)
                     / T      \ F
            ---------          ----------
           /                             \
      ptr = malloc (size)             ptr = local_buf
          |                               |
      copy of                         copy of
        "do things with 'ptr'"          "do things with 'ptr'"
      with ptr: heap-allocated        with ptr: stack-allocated
          |                               |
      if (condition)                  if (condition)
          | known to be T                 | known to be F
      free (ptr);                         |
           \                             /
            -----------------------------
                         | ('ptr' is pruned, so states can be merged)
                        etc

@end smallexample

where some duplication has occurred, but only for the places where the
the different paths are worth exploringly separately.

Merging can be disabled via @option{-fno-analyzer-state-merge}.
@end enumerate

@subsection Region Model

Part of the state stored at a @code{exploded_node} is a @code{region_model}.
This is an implementation of the region-based ternary model described in
@url{https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs,
"A Memory Model for Static Analysis of C Programs"}
(Zhongxing Xu, Ted Kremenek, and Jian Zhang).

A @code{region_model} encapsulates a representation of the state of
memory, with a @code{store} recording a binding between @code{region}
instances, to @code{svalue} instances.  The bindings are organized into
clusters, where regions accessible via well-defined pointer arithmetic
are in the same cluster.  The representation is graph-like because values
can be pointers to regions.  It also stores a @code{constraint_manager},
capturing relationships between the values.

Because each node in the @code{exploded_graph} has a @code{region_model},
and each of the latter is graph-like, the @code{exploded_graph} is in some
ways a graph of graphs.

There are several ``dump'' functions for use when debugging the analyzer.

Consider this example C code:

@smallexample
void *
calls_malloc (size_t n)
@{
  void *result = malloc (1024);
  return result; /* HERE */
@}

void test (size_t n)
@{
  void *ptr = calls_malloc (n * 4);
  /* etc.  */
@}
@end smallexample

and the state at the point @code{/* HERE */} for the interprocedural
analysis case where @code{calls_malloc} returns back to @code{test}.

Here's an example of printing a @code{program_state} at @code{/* HERE */},
showing the @code{region_model} within it, along with state for the
@code{malloc} state machine.

@smallexample
(gdb) break region_model::on_return
[..snip...]
(gdb) run
[..snip...]
(gdb) up
[..snip...]
(gdb) call state->dump()
State
├─ Region Model
│  ├─ Current Frame: frame: ‘calls_malloc’@@2
│  ├─ Store
│  │  ├─ m_called_unknown_fn: false
│  │  ├─ frame: ‘test’@@1
│  │  │  ╰─ _1: (INIT_VAL(n_2(D))*(size_t)4)
│  │  ╰─ frame: ‘calls_malloc’@@2
│  │     ├─ result_4: &HEAP_ALLOCATED_REGION(27)
│  │     ╰─ _5: &HEAP_ALLOCATED_REGION(27)
│  ╰─ Dynamic Extents
│     ╰─ HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
╰─ ‘malloc’ state machine
   ╰─ 0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked (@{free@}) (‘result_4’)
@end smallexample

Within the store, there are bindings clusters for the SSA names for the
various local variables within frames for @code{test} and
@code{calls_malloc}.  For example,

@itemize @bullet
@item
within @code{test} the whole cluster for @code{_1} is bound
to a @code{binop_svalue} representing @code{n * 4}, and
@item
within @code{test} the whole cluster for @code{result_4} is bound to a
@code{region_svalue} pointing at @code{HEAP_ALLOCATED_REGION(12)}.
@end itemize

Additionally, this latter pointer has the @code{unchecked} state for the
@code{malloc} state machine indicating it hasn't yet been checked against
@code{NULL} since the allocation call.

We also see that the state has captured the size of the heap-allocated
region (``Dynamic Extents'').

This visualization can also be seen within the output of
@option{-fdump-analyzer-exploded-nodes-2} and
@option{-fdump-analyzer-exploded-nodes-3}.

As well as the above visualizations of states, there are tree-like
visualizations for instances of @code{svalue} and @code{region}, showing
their IDs and how they are constructed from simpler symbols:

@smallexample
(gdb) break region_model::set_dynamic_extents
[..snip...]
(gdb) run
[..snip...]
(gdb) up
[..snip...]
(gdb) call size_in_bytes->dump()
(17): ‘long unsigned int’: binop_svalue(mult_expr: ‘*’)
├─ (15): ‘size_t’: initial_svalue
│  ╰─ m_reg: (12): ‘size_t’: decl_region(‘n_2(D)’)
│     ╰─ parent: (9): frame_region(‘test’, index: 0, depth: 1)
│        ╰─ parent: (1): stack region
│           ╰─ parent: (0): root region
╰─ (16): ‘size_t’: constant_svalue (‘4’)
@end smallexample

i.e. that @code{size_in_bytes} is a @code{binop_svalue} expressing
the result of multiplying

@itemize @bullet
@item
the initial value of the @code{PARM_DECL} @code{n_2(D)} for the
parameter @code{n} within the frame for @code{test} by
@item
the constant value @code{4}.
@end itemize

The above visualizations rely on the @code{text_art::widget} framework,
which performs significant work to lay out the output, so there is also
an earlier, simpler, form of dumping available.  For states there is:

@smallexample
(gdb) call state->dump(eg.m_ext_state, true)
rmodel:
stack depth: 2
  frame (index 1): frame: ‘calls_malloc’@@2
  frame (index 0): frame: ‘test’@@1
clusters within frame: ‘test’@@1
  cluster for: _1: (INIT_VAL(n_2(D))*(size_t)4)
clusters within frame: ‘calls_malloc’@@2
  cluster for: result_4: &HEAP_ALLOCATED_REGION(27)
  cluster for: _5: &HEAP_ALLOCATED_REGION(27)
m_called_unknown_fn: FALSE
constraint_manager:
  equiv classes:
  constraints:
dynamic_extents:
  HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
malloc:
  0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked (@{free@}) (‘result_4’)
@end smallexample

or for @code{region_model} just:

@smallexample
(gdb) call state->m_region_model->debug()
stack depth: 2
  frame (index 1): frame: ‘calls_malloc’@@2
  frame (index 0): frame: ‘test’@@1
clusters within frame: ‘test’@@1
  cluster for: _1: (INIT_VAL(n_2(D))*(size_t)4)
clusters within frame: ‘calls_malloc’@@2
  cluster for: result_4: &HEAP_ALLOCATED_REGION(27)
  cluster for: _5: &HEAP_ALLOCATED_REGION(27)
m_called_unknown_fn: FALSE
constraint_manager:
  equiv classes:
  constraints:
dynamic_extents:
  HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
@end smallexample

and for instances of @code{svalue} and @code{region} there is this
older dump implementation, which takes a @code{bool simple} flag
controlling the verbosity of the dump:

@smallexample
(gdb) call size_in_bytes->dump(true)
(INIT_VAL(n_2(D))*(size_t)4)

(gdb) call size_in_bytes->dump(false)
binop_svalue (mult_expr, initial_svalue(‘size_t’, decl_region(frame_region(‘test’, index: 0, depth: 1), ‘size_t’, ‘n_2(D)’)), constant_svalue(‘size_t’, 4))
@end smallexample

@subsection Analyzer Paths

We need to explain to the user what the problem is, and to persuade them
that there really is a problem.  Hence having a @code{diagnostic_path}
isn't just an incidental detail of the analyzer; it's required.

Paths ought to be:
@itemize @bullet
@item
interprocedurally-valid
@item
feasible
@end itemize

Without state-merging, all paths in the exploded graph are feasible
(in terms of constraints being satisfied).
With state-merging, paths in the exploded graph can be infeasible.

We collate warnings and only emit them for the simplest path
e.g. for a bug in a utility function, with lots of routes to calling it,
we only emit the simplest path (which could be intraprocedural, if
it can be reproduced without a caller).

We thus want to find the shortest feasible path through the exploded
graph from the origin to the exploded node at which the diagnostic was
saved.  Unfortunately, if we simply find the shortest such path and
check if it's feasible we might falsely reject the diagnostic, as there
might be a longer path that is feasible.  Examples include the cases
where the diagnostic requires us to go at least once around a loop for a
later condition to be satisfied, or where for a later condition to be
satisfied we need to enter a suite of code that the simpler path skips.

We attempt to find the shortest feasible path to each diagnostic by
first constructing a ``trimmed graph'' from the exploded graph,
containing only those nodes and edges from which there are paths to
the target node, and using Dijkstra's algorithm to order the trimmed
nodes by minimal distance to the target.

We then use a worklist to iteratively build a ``feasible graph''
(actually a tree), capturing the pertinent state along each path, in
which every path to a ``feasible node'' is feasible by construction,
restricting ourselves to the trimmed graph to ensure we stay on target,
and ordering the worklist so that the first feasible path we find to the
target node is the shortest possible path.  Hence we start by trying the
shortest possible path, but if that fails, we explore progressively
longer paths, eventually trying iterations through loops.  The
exploration is captured in the feasible_graph, which can be dumped as a
.dot file via @option{-fdump-analyzer-feasibility} to visualize the
exploration.  The indices of the feasible nodes show the order in which
they were created.  We effectively explore the tree of feasible paths in
order of shortest path until we either find a feasible path to the
target node, or hit a limit and give up.

This is something of a brute-force approach, but the trimmed graph
hopefully keeps the complexity manageable.

This algorithm can be disabled (for debugging purposes) via
@option{-fno-analyzer-feasibility}, which simply uses the shortest path,
and notes if it is infeasible.

The above gives us a shortest feasible @code{exploded_path} through the
@code{exploded_graph} (a list of @code{exploded_edge *}).  We use this
@code{exploded_path} to build a @code{diagnostic_path} (a list of
@strong{events} for the diagnostic subsystem) - specifically a
@code{checker_path}.

Having built the @code{checker_path}, we prune it to try to eliminate
events that aren't relevant, to minimize how much the user has to read.

After pruning, we notify each event in the path of its ID and record the
IDs of interesting events, allowing for events to refer to other events
in their descriptions.  The @code{pending_diagnostic} class has various
vfuncs to support emitting more precise descriptions, so that e.g.

@itemize @bullet
@item
a deref-of-unchecked-malloc diagnostic might use:
@smallexample
  returning possibly-NULL pointer to 'make_obj' from 'allocator'
@end smallexample
for a @code{return_event} to make it clearer how the unchecked value moves
from callee back to caller
@item
a double-free diagnostic might use:
@smallexample
  second 'free' here; first 'free' was at (3)
@end smallexample
and a use-after-free might use
@smallexample
  use after 'free' here; memory was freed at (2)
@end smallexample
@end itemize

At this point we can emit the diagnostic.

@subsection Limitations

@itemize @bullet
@item
Only for C so far
@item
The implementation of call summaries is currently very simplistic.
@item
Lack of function pointer analysis
@item
The constraint-handling code assumes reflexivity in some places
(that values are equal to themselves), which is not the case for NaN.
As a simple workaround, constraints on floating-point values are
currently ignored.
@item
There are various other limitations in the region model (grep for TODO/xfail
in the testsuite).
@item
The constraint_manager's implementation of transitivity is currently too
expensive to enable by default and so must be manually enabled via
@option{-fanalyzer-transitivity}).
@item
The checkers are currently hardcoded and don't allow for user extensibility
(e.g. adding allocate/release pairs).
@item
Although the analyzer's test suite has a proof-of-concept test case for
LTO, LTO support hasn't had extensive testing.  There are various
lang-specific things in the analyzer that assume C rather than LTO.
For example, SSA names are printed to the user in ``raw'' form, rather
than printing the underlying variable name.
@end itemize

@node Debugging the Analyzer
@section Debugging the Analyzer
@cindex analyzer, debugging
@cindex static analyzer, debugging

When debugging the analyzer I normally use all of these options
together:

@smallexample
./xgcc -B. \
  -S \
  -fanalyzer \
  OTHER_GCC_ARGS \
  -wrapper gdb,--args \
  -fdump-analyzer-stderr \
  -fanalyzer-fine-grained \
  -fdump-ipa-analyzer=stderr
@end smallexample

where:

@itemize @bullet
@item @code{./xgcc -B.}
is the usual way to invoke a self-built GCC from within the @file{BUILDDIR/gcc}
subdirectory.

@item @code{-S}
so that the driver (@code{./xgcc}) invokes @code{cc1}, but doesn't bother
running the assembler or linker (since the analyzer runs inside @code{cc1}).

@item @code{-fanalyzer}
enables the analyzer, obviously.

@item @code{-wrapper gdb,--args}
invokes @code{cc1} under the debugger so that I can debug @code{cc1} and
set breakpoints and step through things.

@item @code{-fdump-analyzer-stderr}
so that the logging interface is enabled and goes to stderr, which often
gives valuable context into what's happening when stepping through the
analyzer

@item @code{-fanalyzer-fine-grained}
which splits the effect of every statement into its own
exploded_node, rather than the default (which tries to combine
successive stmts to reduce the size of the exploded_graph).  This makes
it easier to see exactly where a particular change happens.

@item @code{-fdump-ipa-analyzer=stderr}
which dumps the GIMPLE IR seen by the analyzer pass to stderr

@end itemize

Other useful options:

@itemize @bullet
@item @code{-fdump-analyzer-exploded-graph}
which dumps a @file{SRC.eg.dot} GraphViz file that I can look at (with
python-xdot)

@item @code{-fdump-analyzer-exploded-nodes-2}
which dumps a @file{SRC.eg.txt} file containing the full @code{exploded_graph}.

@end itemize

Assuming that you have the
@uref{https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html,,python support scripts for gdb}
installed (which you should do, it makes debugging GCC much easier),
you can use:

@smallexample
(gdb) break-on-saved-diagnostic
@end smallexample

to put a breakpoint at the place where a diagnostic is saved during
@code{exploded_graph} exploration, to see where a particular diagnostic
is being saved, and:

@smallexample
(gdb) break-on-diagnostic
@end smallexample

to put a breakpoint at the place where diagnostics are actually emitted.

@subsection Special Functions for Debugging the Analyzer

The analyzer recognizes various special functions by name, for use
in debugging the analyzer, and for use in DejaGnu tests.

The declarations of these functions can be seen in the testsuite
in @file{analyzer-decls.h}.  None of these functions are actually
implemented in terms of code, merely as @code{known_function} subclasses
(in @file{gcc/analyzer/kf-analyzer.cc}).

@table @code

@item __analyzer_break
Add:
@smallexample
  __analyzer_break ();
@end smallexample
to the source being analyzed to trigger a breakpoint in the analyzer when
that source is reached.  By putting a series of these in the source, it's
much easier to effectively step through the program state as it's analyzed.

@item __analyzer_describe
The analyzer handles:

@smallexample
__analyzer_describe (0, expr);
@end smallexample

by emitting a warning describing the 2nd argument (which can be of any
type), at a verbosity level given by the 1st argument.  This is for use when
debugging, and may be of use in DejaGnu tests.

@item __analyzer_dump
@smallexample
__analyzer_dump ();
@end smallexample

will dump the copious information about the analyzer's state each time it
reaches the call in its traversal of the source.

@item __analyzer_dump_capacity
@smallexample
extern void __analyzer_dump_capacity (const void *ptr);
@end smallexample

will emit a warning describing the capacity of the base region of
the region pointed to by the 1st argument.

@item __analyzer_dump_escaped
@smallexample
extern void __analyzer_dump_escaped (void);
@end smallexample

will emit a warning giving the number of decls that have escaped on this
analysis path, followed by a comma-separated list of their names,
in alphabetical order.

@item __analyzer_dump_path
@smallexample
__analyzer_dump_path ();
@end smallexample

will emit a placeholder ``note'' diagnostic with a path to that call site,
if the analyzer finds a feasible path to it.  This can be useful for
writing DejaGnu tests for constraint-tracking and feasibility checking.

@item __analyzer_dump_exploded_nodes
For every callsite to @code{__analyzer_dump_exploded_nodes} the analyzer
will emit a warning after it finished the analysis containing information
on all of the exploded nodes at that program point.

@smallexample
  __analyzer_dump_exploded_nodes (0);
@end smallexample

will output the number of ``processed'' nodes, and the IDs of
both ``processed'' and ``merger'' nodes, such as:

@smallexample
warning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59]
@end smallexample

With a non-zero argument

@smallexample
  __analyzer_dump_exploded_nodes (1);
@end smallexample

it will also dump all of the states within the ``processed'' nodes.

@item __analyzer_dump_named_constant
When the analyzer sees a call to @code{__analyzer_dump_named_constant} it
will emit a warning describing what is known about the value of a given
named constant, for parts of the analyzer that interact with target
headers.

For example:

@smallexample
__analyzer_dump_named_constant ("O_RDONLY");
@end smallexample

might lead to the analyzer emitting the warning:

@smallexample
warning: named constant 'O_RDONLY' has value '1'
@end smallexample

@item __analyzer_dump_region_model
@smallexample
   __analyzer_dump_region_model ();
@end smallexample
will dump the region_model's state to stderr.

@item __analyzer_dump_state
@smallexample
__analyzer_dump_state ("malloc", ptr);
@end smallexample

will emit a warning describing the state of the 2nd argument
(which can be of any type) with respect to the state machine with
a name matching the 1st argument (which must be a string literal).
This is for use when debugging, and may be of use in DejaGnu tests.

@item __analyzer_eval
@smallexample
__analyzer_eval (expr);
@end smallexample
will emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
truthfulness of the argument.  This is useful for writing DejaGnu tests.

@item __analyzer_get_unknown_ptr
@smallexample
__analyzer_get_unknown_ptr ();
@end smallexample
will obtain an unknown @code{void *}.

@item __analyzer_get_strlen
@smallexample
__analyzer_get_strlen (buf);
@end smallexample
will emit a warning if PTR doesn't point to a null-terminated string.
TODO: eventually get the strlen of the buffer (without the
optimizer touching it).

@end table

@subsection Other Debugging Techniques

To compare two different exploded graphs, try
@code{-fdump-analyzer-exploded-nodes-2 -fdump-noaddr -fanalyzer-fine-grained}.
This will dump a @file{SRC.eg.txt} file containing the full
@code{exploded_graph}. I use @code{diff -u50 -p} to compare two different
such files (e.g. before and after a patch) to find the first place where the
two graphs diverge.  The option @option{-fdump-noaddr} will suppress
printing pointers withihn the dumps (which would otherwise hide the real
differences with irrelevent churn).

The option @option{-fdump-analyzer-json} will dump both the supergraph
and the exploded graph in compressed JSON form.

One approach when tracking down where a particular bogus state is
introduced into the @code{exploded_graph} is to add custom code to
@code{program_state::validate}.

The debug function @code{region::is_named_decl_p} can be used when debugging,
such as for assertions and conditional breakpoints.  For example, when
tracking down a bug in handling a decl called @code{yy_buffer_stack}, I
temporarily added a:
@smallexample
  gcc_assert (!m_base_region->is_named_decl_p ("yy_buffer_stack"));
@end smallexample
to @code{binding_cluster::mark_as_escaped} to trap a point where
@code{yy_buffer_stack} was mistakenly being treated as having escaped.