1 files changed, 645 insertions, 0 deletions
diff --git a/src/site/xdoc/manual/bcel-api.xml b/src/site/xdoc/manual/bcel-api.xml
new file mode 100644
index 00000000..8417f1d1
--- /dev/null
+++ b/src/site/xdoc/manual/bcel-api.xml
@@ -0,0 +1,645 @@
+<?xml version="1.0"?>
+<!--
+    * Licensed to the Apache Software Foundation (ASF) under one
+    * or more contributor license agreements.  See the NOTICE file
+    * distributed with this work for additional information
+    * regarding copyright ownership.  The ASF licenses this file
+    * to you under the Apache License, Version 2.0 (the
+    * "License"); you may not use this file except in compliance
+    * with the License.  You may obtain a copy of the License at
+    * 
+    *   http://www.apache.org/licenses/LICENSE-2.0
+    * 
+    * Unless required by applicable law or agreed to in writing,
+    * software distributed under the License is distributed on an
+    * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    * KIND, either express or implied.  See the License for the
+    * specific language governing permissions and limitations
+    * under the License.    
+-->
+<document>
+  <properties>
+    <title>The BCEL API</title>
+  </properties>
+
+  <body>
+    <section name="The BCEL API">
+      <p>
+        The <font face="helvetica,arial">BCEL</font> API abstracts from
+        the concrete circumstances of the Java Virtual Machine and how to
+        read and write binary Java class files. The API mainly consists
+        of three parts:
+      </p>
+
+      <p>
+
+        <ol type="1">
+          <li> A package that contains classes that describe "static"
+            constraints of class files, i.e., reflects the class file format and
+            is not intended for byte code modifications. The classes may be
+            used to read and write class files from or to a file.  This is
+            useful especially for analyzing Java classes without having the
+            source files at hand.  The main data structure is called
+            <tt>JavaClass</tt> which contains methods, fields, etc..</li>
+
+          <li> A package to dynamically generate or modify
+            <tt>JavaClass</tt> or <tt>Method</tt> objects.  It may be used to
+            insert analysis code, to strip unnecessary information from class
+            files, or to implement the code generator back-end of a Java
+            compiler.</li>
+
+          <li> Various code examples and utilities like a class file viewer,
+            a tool to convert class files into HTML, and a converter from
+            class files to the <a
+                    href="http://jasmin.sourceforge.net">Jasmin</a> assembly
+            language.</li>
+        </ol>
+      </p>
+
+    <subsection name="JavaClass">
+      <p>
+        The "static" component of the <font
+              face="helvetica,arial">BCEL</font> API resides in the package
+        <tt>org.apache.bcel.classfile</tt> and closely represents class
+        files. All of the binary components and data structures declared
+        in the <a
+              href="http://docs.oracle.com/javase/specs/">JVM
+        specification</a> and described in section <a
+              href="#2 The Java Virtual Machine">2</a> are mapped to classes.
+
+        <a href="#Figure 3">Figure 3</a> shows an UML diagram of the
+        hierarchy of classes of the <font face="helvetica,arial">BCEL
+      </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also
+        shows a detailed diagram of the <tt>ConstantPool</tt> components.
+      </p>
+
+      <p align="center">
+        <a name="Figure 3">
+          <img src="../images/javaclass.gif"/> <br/>
+          Figure 3: UML diagram for the JavaClass API</a>
+      </p>
+
+      <p>
+        The top-level data structure is <tt>JavaClass</tt>, which in most
+        cases is created by a <tt>ClassParser</tt> object that is capable
+        of parsing binary class files. A <tt>JavaClass</tt> object
+        basically consists of fields, methods, symbolic references to the
+        super class and to the implemented interfaces.
+      </p>
+
+      <p>
+        The constant pool serves as some kind of central repository and is
+        thus of outstanding importance for all components.
+        <tt>ConstantPool</tt> objects contain an array of fixed size of
+        <tt>Constant</tt> entries, which may be retrieved via the
+        <tt>getConstant()</tt> method taking an integer index as argument.
+        Indexes to the constant pool may be contained in instructions as
+        well as in other components of a class file and in constant pool
+        entries themselves.
+      </p>
+
+      <p>
+        Methods and fields contain a signature, symbolically defining
+        their types.  Access flags like <tt>public static final</tt> occur
+        in several places and are encoded by an integer bit mask, e.g.,
+        <tt>public static final</tt> matches to the Java expression
+      </p>
+
+
+      <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source>
+
+      <p>
+        As mentioned in <a href="jvm.html#Java_class_file_format">section
+        2.1</a> already, several components may contain <em>attribute</em>
+        objects: classes, fields, methods, and <tt>Code</tt> objects
+        (introduced in <a href="jvm.html#Method_code">section 2.3</a>).  The
+        latter is an attribute itself that contains the actual byte code
+        array, the maximum stack size, the number of local variables, a
+        table of handled exceptions, and some optional debugging
+        information coded as <tt>LineNumberTable</tt> and
+        <tt>LocalVariableTable</tt> attributes. Attributes are in general
+        specific to some data structure, i.e., no two components share the
+        same kind of attribute, though this is not explicitly
+        forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped
+        with the component they belong to.
+      </p>
+
+    </subsection>
+
+    <subsection name="Class repository">
+      <p>
+        Using the provided <tt>Repository</tt> class, reading class files into
+        a <tt>JavaClass</tt> object is quite simple:
+      </p>
+
+      <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source>
+
+      <p>
+        The repository also contains methods providing the dynamic equivalent
+        of the <tt>instanceof</tt> operator, and other useful routines:
+      </p>
+
+      <source>
+if (Repository.instanceOf(clazz, super_class)) {
+    ...
+}
+      </source>
+
+    </subsection>
+
+    <h4>Accessing class file data</h4>
+
+      <p>
+        Information within the class file components may be accessed like
+        Java Beans via intuitive set/get methods. All of them also define
+        a <tt>toString()</tt> method so that implementing a simple class
+        viewer is very easy. In fact all of the examples used here have
+        been produced this way:
+      </p>
+
+      <source>
+System.out.println(clazz);
+printCode(clazz.getMethods());
+...
+public static void printCode(Method[] methods) {
+    for (int i = 0; i &lt; methods.length; i++) {
+        System.out.println(methods[i]);
+        
+        Code code = methods[i].getCode();
+        if (code != null) // Non-abstract method
+        System.out.println(code);
+    }
+}
+      </source>
+
+    <h4>Analyzing class data</h4>
+      <p>
+        Last but not least, <font face="helvetica,arial">BCEL</font>
+        supports the <em>Visitor</em> design pattern, so one can write
+        visitor objects to traverse and analyze the contents of a class
+        file. Included in the distribution is a class
+        <tt>JasminVisitor</tt> that converts class files into the <a
+              href="http://jasmin.sourceforge.net">Jasmin</a>
+        assembler language.
+      </p>
+
+    <subsection name="ClassGen">
+      <p>
+        This part of the API (package <tt>org.apache.bcel.generic</tt>)
+        supplies an abstraction level for creating or transforming class
+        files dynamically. It makes the static constraints of Java class
+        files like the hard-coded byte code addresses "generic". The
+        generic constant pool, for example, is implemented by the class
+        <tt>ConstantPoolGen</tt> which offers methods for adding different
+        types of constants. Accordingly, <tt>ClassGen</tt> offers an
+        interface to add methods, fields, and attributes.
+        <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API.
+      </p>
+
+      <p align="center">
+        <a name="Figure 4">
+          <img src="../images/classgen.gif"/>
+          <br/>
+          Figure 4: UML diagram of the ClassGen API</a>
+      </p>
+
+    <h4>Types</h4>
+      <p>
+        We abstract from the concrete details of the type signature syntax
+        (see <a href="jvm.html#Type_information">2.5</a>) by introducing the
+        <tt>Type</tt> class, which is used, for example, by methods to
+        define their return and argument types. Concrete sub-classes are
+        <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt>
+        which consists of the element type and the number of
+        dimensions. For commonly used types the class offers some
+        predefined constants. For example, the method signature of the
+        <tt>main</tt> method as shown in
+        <a href="jvm.html#Type_information">section 2.5</a> is represented by:
+      </p>
+
+      <source>
+Type return_type = Type.VOID;
+Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) };
+      </source>
+
+      <p>
+        <tt>Type</tt> also contains methods to convert types into textual
+        signatures and vice versa. The sub-classes contain implementations
+        of the routines and constraints specified by the Java Language
+        Specification.
+      </p>
+
+    <h4>Generic fields and methods</h4>
+      <p>
+        Fields are represented by <tt>FieldGen</tt> objects, which may be
+        freely modified by the user. If they have the access rights
+        <tt>static final</tt>, i.e., are constants and of basic type, they
+        may optionally have an initializing value.
+      </p>
+
+      <p>
+        Generic methods contain methods to add exceptions the method may
+        throw, local variables, and exception handlers. The latter two are
+        represented by user-configurable objects as well. Because
+        exception handlers and local variables contain references to byte
+        code addresses, they also take the role of an <em>instruction
+        targeter</em> in our terminology. Instruction targeters contain a
+        method <tt>updateTarget()</tt> to redirect a reference. This is
+        somewhat related to the Observer design pattern. Generic
+        (non-abstract) methods refer to <em>instruction lists</em> that
+        consist of instruction objects. References to byte code addresses
+        are implemented by handles to instruction objects. If the list is
+        updated the instruction targeters will be informed about it. This
+        is explained in more detail in the following sections.
+      </p>
+
+      <p>
+        The maximum stack size needed by the method and the maximum number
+        of local variables used may be set manually or computed via the
+        <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods
+        automatically.
+      </p>
+
+    <h4>Instructions</h4>
+      <p>
+        Modeling instructions as objects may look somewhat odd at first
+        sight, but in fact enables programmers to obtain a high-level view
+        upon control flow without handling details like concrete byte code
+        offsets.  Instructions consist of an opcode (sometimes called
+        tag), their length in bytes and an offset (or index) within the
+        byte code. Since many instructions are immutable (stack operators,
+        e.g.), the <tt>InstructionConstants</tt> interface offers
+        shareable predefined "fly-weight" constants to use.
+      </p>
+
+      <p>
+        Instructions are grouped via sub-classing, the type hierarchy of
+        instruction classes is illustrated by (incomplete) figure in the
+        appendix. The most important family of instructions are the
+        <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to
+        targets somewhere within the byte code. Obviously, this makes them
+        candidates for playing an <tt>InstructionTargeter</tt> role,
+        too. Instructions are further grouped by the interfaces they
+        implement, there are, e.g., <tt>TypedInstruction</tt>s that are
+        associated with a specific type like <tt>ldc</tt>, or
+        <tt>ExceptionThrower</tt> instructions that may raise exceptions
+        when executed.
+      </p>
+
+      <p>
+        All instructions can be traversed via <tt>accept(Visitor v)</tt>
+        methods, i.e., the Visitor design pattern. There is however some
+        special trick in these methods that allows to merge the handling
+        of certain instruction groups. The <tt>accept()</tt> do not only
+        call the corresponding <tt>visit()</tt> method, but call
+        <tt>visit()</tt> methods of their respective super classes and
+        implemented interfaces first, i.e., the most specific
+        <tt>visit()</tt> call is last. Thus one can group the handling of,
+        say, all <tt>BranchInstruction</tt>s into one single method.
+      </p>
+
+      <p>
+        For debugging purposes it may even make sense to "invent" your own
+        instructions. In a sophisticated code generator like the one used
+        as a backend of the <a href="http://barat.sourceforge.net">Barat
+        framework</a> for static analysis one often has to insert
+        temporary <tt>nop</tt> (No operation) instructions. When examining
+        the produced code it may be very difficult to track back where the
+        <tt>nop</tt> was actually inserted. One could think of a derived
+        <tt>nop2</tt> instruction that contains additional debugging
+        information. When the instruction list is dumped to byte code, the
+        extra data is simply dropped.
+      </p>
+
+      <p>
+        One could also think of new byte code instructions operating on
+        complex numbers that are replaced by normal byte code upon
+        load-time or are recognized by a new JVM.
+      </p>
+
+    <h4>Instruction lists</h4>
+      <p>
+        An <em>instruction list</em> is implemented by a list of
+        <em>instruction handles</em> encapsulating instruction objects.
+        References to instructions in the list are thus not implemented by
+        direct pointers to instructions but by pointers to instruction
+        <em>handles</em>. This makes appending, inserting and deleting
+        areas of code very simple and also allows us to reuse immutable
+        instruction objects (fly-weight objects). Since we use symbolic
+        references, computation of concrete byte code offsets does not
+        need to occur until finalization, i.e., until the user has
+        finished the process of generating or transforming code. We will
+        use the term instruction handle and instruction synonymously
+        throughout the rest of the paper. Instruction handles may contain
+        additional user-defined data using the <tt>addAttribute()</tt>
+        method.
+      </p>
+
+      <p>
+        <b>Appending:</b> One can append instructions or other instruction
+        lists anywhere to an existing list. The instructions are appended
+        after the given instruction handle. All append methods return a
+        new instruction handle which may then be used as the target of a
+        branch instruction, e.g.:
+      </p>
+
+      <source>
+InstructionList il = new InstructionList();
+...
+GOTO g = new GOTO(null);
+il.append(g);
+...
+// Use immutable fly-weight object
+InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL);
+g.setTarget(ih);
+      </source>
+
+      <p>
+        <b>Inserting:</b> Instructions may be inserted anywhere into an
+        existing list. They are inserted before the given instruction
+        handle. All insert methods return a new instruction handle which
+        may then be used as the start address of an exception handler, for
+        example.
+      </p>
+
+      <source>
+InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP);
+...
+mg.addExceptionHandler(start, end, handler, "java.io.IOException");
+      </source>
+
+      <p>
+        <b>Deleting:</b> Deletion of instructions is also very
+        straightforward; all instruction handles and the contained
+        instructions within a given range are removed from the instruction
+        list and disposed. The <tt>delete()</tt> method may however throw
+        a <tt>TargetLostException</tt> when there are instruction
+        targeters still referencing one of the deleted instructions. The
+        user is forced to handle such exceptions in a <tt>try-catch</tt>
+        clause and redirect these references elsewhere. The <em>peep
+        hole</em> optimizer described in the appendix gives a detailed
+        example for this.
+      </p>
+
+      <source>
+try {
+    il.delete(first, last);
+} catch (TargetLostException e) {
+    for (InstructionHandle target : e.getTargets()) {
+        for (InstructionTargeter targeter : target.getTargeters()) {
+            targeter.updateTarget(target, new_target);
+        }
+    }
+}
+      </source>
+
+      <p>
+        <b>Finalizing:</b> When the instruction list is ready to be dumped
+        to pure byte code, all symbolic references must be mapped to real
+        byte code offsets. This is done by the <tt>getByteCode()</tt>
+        method which is called by default by
+        <tt>MethodGen.getMethod()</tt>. Afterwards you should call
+        <tt>dispose()</tt> so that the instruction handles can be reused
+        internally. This helps to improve memory usage.
+      </p>
+
+      <source>
+InstructionList il = new InstructionList();
+
+ClassGen  cg = new ClassGen("HelloWorld", "java.lang.Object",
+        "&lt;generated&#62;", ACC_PUBLIC | ACC_SUPER, null);
+MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC,
+        Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) }, 
+        new String[] { "argv" }, "main", "HelloWorld", il, cp);
+...
+cg.addMethod(mg.getMethod());
+il.dispose(); // Reuse instruction handles of list
+      </source>
+
+    <h4>Code example revisited</h4>
+      <p>
+        Using instruction lists gives us a generic view upon the code: In
+        <a href="#Figure 5">Figure 5</a> we again present the code chunk
+        of the <tt>readInt()</tt> method of the factorial example in section
+        <a href="jvm.html#Code_example">2.6</a>: The local variables
+        <tt>n</tt> and <tt>e1</tt> both hold two references to
+        instructions, defining their scope.  There are two <tt>goto</tt>s
+        branching to the <tt>iload</tt> at the end of the method. One of
+        the exception handlers is displayed, too: it references the start
+        and the end of the <tt>try</tt> block and also the exception
+        handler code.
+      </p>
+
+      <p align="center">
+        <a name="Figure 5">
+          <img src="../images/il.gif"/>
+          <br/>
+          Figure 5: Instruction list for <tt>readInt()</tt> method</a>
+      </p>
+
+    <h4>Instruction factories</h4>
+      <p>
+        To simplify the creation of certain instructions the user can use
+        the supplied <tt>InstructionFactory</tt> class which offers a lot
+        of useful methods to create instructions from
+        scratch. Alternatively, he can also use <em>compound
+        instructions</em>: When producing byte code, some patterns
+        typically occur very frequently, for instance the compilation of
+        arithmetic or comparison expressions. You certainly do not want
+        to rewrite the code that translates such expressions into byte
+        code in every place they may appear. In order to support this, the
+        <font face="helvetica,arial">BCEL</font> API includes a <em>compound
+        instruction</em> (an interface with a single
+        <tt>getInstructionList()</tt> method). Instances of this class
+        may be used in any place where normal instructions would occur,
+        particularly in append operations.
+      </p>
+
+      <p>
+        <b>Example: Pushing constants</b> Pushing constants onto the
+        operand stack may be coded in different ways. As explained in <a
+              href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are
+        some "short-cut" instructions that can be used to make the
+        produced byte code more compact. The smallest instruction to push
+        a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other
+        possibilities are <tt>bipush</tt> (can be used to push values
+        between -128 and 127), <tt>sipush</tt> (between -32768 and 32767),
+        or <tt>ldc</tt> (load constant from constant pool).
+      </p>
+
+      <p>
+        Instead of repeatedly selecting the most compact instruction in,
+        say, a switch, one can use the compound <tt>PUSH</tt> instruction
+        whenever pushing a constant number or string. It will produce the
+        appropriate byte code instruction and insert entries into to
+        constant pool if necessary.
+      </p>
+
+      <source>
+InstructionFactory f  = new InstructionFactory(class_gen);
+InstructionList    il = new InstructionList();
+...
+il.append(new PUSH(cp, "Hello, world"));
+il.append(new PUSH(cp, 4711));
+...
+il.append(f.createPrintln("Hello World"));
+...
+il.append(f.createReturn(type));
+      </source>
+
+    <h4>Code patterns using regular expressions</h4>
+      <p>
+        When transforming code, for instance during optimization or when
+        inserting analysis method calls, one typically searches for
+        certain patterns of code to perform the transformation at. To
+        simplify handling such situations <font
+              face="helvetica,arial">BCEL </font>introduces a special feature:
+        One can search for given code patterns within an instruction list
+        using <em>regular expressions</em>. In such expressions,
+        instructions are represented by their opcode names, e.g.,
+        <tt>LDC</tt>, one may also use their respective super classes, e.g.,
+        "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>,
+        <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus,
+        the expression
+      </p>
+
+      <source>"NOP+(ILOAD|ALOAD)*"</source>
+
+      <p>
+        represents a piece of code consisting of at least one <tt>NOP</tt>
+        followed by a possibly empty sequence of <tt>ILOAD</tt> and
+        <tt>ALOAD</tt> instructions.
+      </p>
+
+      <p>
+        The <tt>search()</tt> method of class
+        <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular
+        expression and a starting point as arguments and returns an
+        iterator describing the area of matched instructions. Additional
+        constraints to the matching area of instructions, which can not be
+        implemented via regular expressions, may be expressed via <em>code
+        constraint</em> objects.
+      </p>
+
+    <h4>Example: Optimizing boolean expressions</h4>
+      <p>
+        In Java, boolean values are mapped to 1 and to 0,
+        respectively. Thus, the simplest way to evaluate boolean
+        expressions is to push a 1 or a 0 onto the operand stack depending
+        on the truth value of the expression. But this way, the
+        subsequent combination of boolean expressions (with
+        <tt>&amp;&amp;</tt>, e.g) yields long chunks of code that push
+        lots of 1s and 0s onto the stack.
+      </p>
+
+      <p>
+        When the code has been finalized these chunks can be optimized
+        with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt>
+        (e.g.  the comparison of two integers: <tt>if_icmpeq</tt>) that
+        either produces a 1 or a 0 on the stack and is followed by an
+        <tt>ifne</tt> instruction (branch if stack value 0) may be
+        replaced by the <tt>IfInstruction</tt> with its branch target
+        replaced by the target of the <tt>ifne</tt> instruction:
+      </p>
+
+      <source>
+CodeConstraint constraint = new CodeConstraint() {
+    public boolean checkCode(InstructionHandle[] match) {
+        IfInstruction if1 = (IfInstruction) match[0].getInstruction();
+        GOTO g = (GOTO) match[2].getInstruction();
+        return (if1.getTarget() == match[3]) &amp;&amp;
+            (g.getTarget() == match[4]);
+    }
+};
+
+InstructionFinder f = new InstructionFinder(il);
+String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)";
+
+for (Iterator e = f.search(pat, constraint); e.hasNext(); ) {
+    InstructionHandle[] match = (InstructionHandle[]) e.next();;
+    ...
+    match[0].setTarget(match[5].getTarget()); // Update target
+    ...
+    try {
+        il.delete(match[1], match[5]);
+    } catch (TargetLostException ex) {
+        ...
+    }
+}
+      </source>
+
+      <p>
+        The applied code constraint object ensures that the matched code
+        really corresponds to the targeted expression pattern. Subsequent
+        application of this algorithm removes all unnecessary stack
+        operations and branch instructions from the byte code. If any of
+        the deleted instructions is still referenced by an
+        <tt>InstructionTargeter</tt> object, the reference has to be
+        updated in the <tt>catch</tt>-clause.
+      </p>
+
+      <p>
+        <b>Example application:</b>
+        The expression:
+      </p>
+
+      <source>
+        if ((a == null) || (i &lt; 2))
+        System.out.println("Ooops");
+      </source>
+
+      <p>
+        can be mapped to both of the chunks of byte code shown in <a
+              href="#Figure 6">figure 6</a>. The left column represents the
+        unoptimized code while the right column displays the same code
+        after the peep hole algorithm has been applied:
+      </p>
+
+      <p align="center"><a name="Figure 6">
+        <table>
+          <tr>
+            <td valign="top"><pre>
+              5:  aload_0
+              6:  ifnull        #13
+              9:  iconst_0
+              10: goto          #14
+              13: iconst_1
+              14: nop
+              15: ifne          #36
+              18: iload_1
+              19: iconst_2
+              20: if_icmplt     #27
+              23: iconst_0
+              24: goto          #28
+              27: iconst_1
+              28: nop
+              29: ifne          #36
+              32: iconst_0
+              33: goto          #37
+              36: iconst_1
+              37: nop
+              38: ifeq          #52
+              41: getstatic     System.out
+              44: ldc           "Ooops"
+              46: invokevirtual println
+              52: return
+            </pre></td>
+            <td valign="top"><pre>
+              10: aload_0
+              11: ifnull        #19
+              14: iload_1
+              15: iconst_2
+              16: if_icmpge     #27
+              19: getstatic     System.out
+              22: ldc           "Ooops"
+              24: invokevirtual println
+              27: return
+            </pre></td>
+          </tr>
+        </table>
+      </a>
+      </p>
+    </subsection>
+    </section>
+  </body>
+</document>
+\ No newline at end of file