diff options
author | Benedikt Ritter <britter@apache.org> | 2016-06-18 12:25:33 +0000 |
---|---|---|
committer | Benedikt Ritter <britter@apache.org> | 2016-06-18 12:25:33 +0000 |
commit | 52d7fb64a4535f7797eecdce2d27e20fad39ccea (patch) | |
tree | 7b7390cb3a2b4c5726991c7b3f0a699513f13236 /src | |
parent | 6d723105a0f1b64917a7ca2fa93ae5c25d74fcde (diff) | |
download | apache-commons-bcel-52d7fb64a4535f7797eecdce2d27e20fad39ccea.tar.gz |
Split up the manual into separate pages per topic
git-svn-id: https://svn.apache.org/repos/asf/commons/proper/bcel/trunk@1748979 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'src')
-rw-r--r-- | src/site/site.xml | 8 | ||||
-rw-r--r-- | src/site/xdoc/manual.xml | 1680 | ||||
-rw-r--r-- | src/site/xdoc/manual/appendix.xml | 357 | ||||
-rw-r--r-- | src/site/xdoc/manual/application-areas.xml | 146 | ||||
-rw-r--r-- | src/site/xdoc/manual/bcel-api.xml | 645 | ||||
-rw-r--r-- | src/site/xdoc/manual/introduction.xml | 80 | ||||
-rw-r--r-- | src/site/xdoc/manual/jvm.xml | 502 | ||||
-rw-r--r-- | src/site/xdoc/manual/manual.xml | 70 |
8 files changed, 1807 insertions, 1681 deletions
diff --git a/src/site/site.xml b/src/site/site.xml index f343307f..47a25e1f 100644 --- a/src/site/site.xml +++ b/src/site/site.xml @@ -33,7 +33,13 @@ <item name="About" href="index.html"/> <item name="News" href="news.html"/> <item name="Download" href="http://commons.apache.org/bcel/download_bcel.cgi"/> - <item name="Manual" href="manual.html"/> + <item name="Manual" href="manual/manual.html"> + <item name="Introduction" href="manual/introduction.html"/> + <item name="The JVM" href="manual/jvm.html"/> + <item name="The BCEL API" href="manual/bcel-api.html"/> + <item name="Application areas" href="manual/application-areas.html"/> + <item name="Appendix" href="manual/appendix.html"/> + </item> <item name="FAQ" href="faq.html"/> <item name="Used by" href="projects.html"/> <item name="Javadoc (Latest release)" href="javadocs/api-release/index.html"/> diff --git a/src/site/xdoc/manual.xml b/src/site/xdoc/manual.xml deleted file mode 100644 index d9ab982d..00000000 --- a/src/site/xdoc/manual.xml +++ /dev/null @@ -1,1680 +0,0 @@ -<?xml version="1.0"?> -<!-- - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. ---> -<document> - - <properties> - <title>Byte Code Engineering Library (BCEL)</title> - </properties> - - <body> - - <section name="Abstract"> - <p> - Extensions and improvements of the programming language Java and - its related execution environment (Java Virtual Machine, JVM) are - the subject of a large number of research projects and - proposals. There are projects, for instance, to add parameterized - types to Java, to implement <a - href="http://www.eclipse.org/aspectj/">Aspect-Oriented Programming</a>, to - perform sophisticated static analysis, and to improve the run-time - performance. - </p> - - <p> - Since Java classes are compiled into portable binary class files - (called <em>byte code</em>), it is the most convenient and - platform-independent way to implement these improvements not by - writing a new compiler or changing the JVM, but by transforming - the byte code. These transformations can either be performed - after compile-time, or at load-time. Many programmers are doing - this by implementing their own specialized byte code manipulation - tools, which are, however, restricted in the range of their - re-usability. - </p> - - <p> - To deal with the necessary class file transformations, we - introduce an API that helps developers to conveniently implement - their transformations. - </p> - </section> - - <section name="1 Introduction"> - <p> - The <a href="http://java.sun.com/">Java</a> language has become - very popular and many research projects deal with further - improvements of the language or its run-time behavior. The - possibility to extend a language with new concepts is surely a - desirable feature, but the implementation issues should be hidden - from the user. Fortunately, the concepts of the Java Virtual - Machine permit the user-transparent implementation of such - extensions with relatively little effort. - </p> - - <p> - Because the target language of Java is an interpreted language - with a small and easy-to-understand set of instructions (the - <em>byte code</em>), developers can implement and test their - concepts in a very elegant way. One can write a plug-in - replacement for the system's <em>class loader</em> which is - responsible for dynamically loading class files at run-time and - passing the byte code to the Virtual Machine (see section ). - Class loaders may thus be used to intercept the loading process - and transform classes before they get actually executed by the - JVM. While the original class files always remain unaltered, the - behavior of the class loader may be reconfigured for every - execution or instrumented dynamically. - </p> - - <p> - The <font face="helvetica,arial">BCEL</font> API (Byte Code - Engineering Library), formerly known as JavaClass, is a toolkit - for the static analysis and dynamic creation or transformation of - Java class files. It enables developers to implement the desired - features on a high level of abstraction without handling all the - internal details of the Java class file format and thus - re-inventing the wheel every time. <font face="helvetica,arial">BCEL - </font> is written entirely in Java and freely available under the - terms of the <a href="license.html">Apache Software License</a>. - </p> - - <p> - This manual is structured as follows: We give a brief description - of the Java Virtual Machine and the class file format in <a - href="#2 The Java Virtual Machine">section 2</a>. <a href="#3 The - BCEL API">Section 3</a> introduces the <font - face="helvetica,arial">BCEL</font> API. <a href="#4 Application - areas">Section 4</a> describes some typical application areas and - example projects. The appendix contains code examples that are to - long to be presented in the main part of this paper. All examples - are included in the down-loadable distribution. - </p> - - </section> - - <section name="2 The Java Virtual Machine"> - <p> - Readers already familiar with the Java Virtual Machine and the - Java class file format may want to skip this section and proceed - with <a href="#3 The BCEL API">section 3</a>. - </p> - - <p> - Programs written in the Java language are compiled into a portable - binary format called <em>byte code</em>. Every class is - represented by a single class file containing class related data - and byte code instructions. These files are loaded dynamically - into an interpreter (<a - href="http://docs.oracle.com/javase/specs/">Java - Virtual Machine</a>, aka. JVM) and executed. - </p> - - <p> - <a href="#Figure 1">Figure 1</a> illustrates the procedure of - compiling and executing a Java class: The source file - (<tt>HelloWorld.java</tt>) is compiled into a Java class file - (<tt>HelloWorld.class</tt>), loaded by the byte code interpreter - and executed. In order to implement additional features, - researchers may want to transform class files (drawn with bold - lines) before they get actually executed. This application area - is one of the main issues of this article. - </p> - - <p align="center"> - <a name="Figure 1"> - <img src="images/jvm.gif"/> - <br/> - Figure 1: Compilation and execution of Java classes</a> - </p> - - <p> - Note that the use of the general term "Java" implies in fact two - meanings: on the one hand, Java as a programming language, on the - other hand, the Java Virtual Machine, which is not necessarily - targeted by the Java language exclusively, but may be used by <a - href="http://www.robert-tolksdorf.de/vmlanguages.html">other - languages</a> as well. We assume the reader to be familiar with - the Java language and to have a general understanding of the - Virtual Machine. - </p> - - </section> - - <section name="2.1 Java class file format"> - <p> - Giving a full overview of the design issues of the Java class file - format and the associated byte code instructions is beyond the - scope of this paper. We will just give a brief introduction - covering the details that are necessary for understanding the rest - of this paper. The format of class files and the byte code - instruction set are described in more detail in the <a - href="http://docs.oracle.com/javase/specs/">Java - Virtual Machine Specification</a>. Especially, we will not deal - with the security constraints that the Java Virtual Machine has to - check at run-time, i.e. the byte code verifier. - </p> - - <p> - <a href="#Figure 2">Figure 2</a> shows a simplified example of the - contents of a Java class file: It starts with a header containing - a "magic number" (<tt>0xCAFEBABE</tt>) and the version number, - followed by the <em>constant pool</em>, which can be roughly - thought of as the text segment of an executable, the <em>access - rights</em> of the class encoded by a bit mask, a list of - interfaces implemented by the class, lists containing the fields - and methods of the class, and finally the <em>class - attributes</em>, e.g., the <tt>SourceFile</tt> attribute telling - the name of the source file. Attributes are a way of putting - additional, user-defined information into class file data - structures. For example, a custom class loader may evaluate such - attribute data in order to perform its transformations. The JVM - specification declares that unknown, i.e., user-defined attributes - must be ignored by any Virtual Machine implementation. - </p> - - <p align="center"> - <a name="Figure 2"> - <img src="images/classfile.gif"/> - <br/> - Figure 2: Java class file format</a> - </p> - - <p> - Because all of the information needed to dynamically resolve the - symbolic references to classes, fields and methods at run-time is - coded with string constants, the constant pool contains in fact - the largest portion of an average class file, approximately - 60%. In fact, this makes the constant pool an easy target for code - manipulation issues. The byte code instructions themselves just - make up 12%. - </p> - - <p> - The right upper box shows a "zoomed" excerpt of the constant pool, - while the rounded box below depicts some instructions that are - contained within a method of the example class. These - instructions represent the straightforward translation of the - well-known statement: - </p> - - <p align="center"> - <source>System.out.println("Hello, world");</source> - </p> - - <p> - The first instruction loads the contents of the field <tt>out</tt> - of class <tt>java.lang.System</tt> onto the operand stack. This is - an instance of the class <tt>java.io.PrintStream</tt>. The - <tt>ldc</tt> ("Load constant") pushes a reference to the string - "Hello world" on the stack. The next instruction invokes the - instance method <tt>println</tt> which takes both values as - parameters (Instance methods always implicitly take an instance - reference as their first argument). - </p> - - <p> - Instructions, other data structures within the class file and - constants themselves may refer to constants in the constant pool. - Such references are implemented via fixed indexes encoded directly - into the instructions. This is illustrated for some items of the - figure emphasized with a surrounding box. - </p> - - <p> - For example, the <tt>invokevirtual</tt> instruction refers to a - <tt>MethodRef</tt> constant that contains information about the - name of the called method, the signature (i.e., the encoded - argument and return types), and to which class the method belongs. - In fact, as emphasized by the boxed value, the <tt>MethodRef</tt> - constant itself just refers to other entries holding the real - data, e.g., it refers to a <tt>ConstantClass</tt> entry containing - a symbolic reference to the class <tt>java.io.PrintStream</tt>. - To keep the class file compact, such constants are typically - shared by different instructions and other constant pool - entries. Similarly, a field is represented by a <tt>Fieldref</tt> - constant that includes information about the name, the type and - the containing class of the field. - </p> - - <p> - The constant pool basically holds the following types of - constants: References to methods, fields and classes, strings, - integers, floats, longs, and doubles. - </p> - - </section> - - <section name="2.2 Byte code instruction set"> - <p> - The JVM is a stack-oriented interpreter that creates a local stack - frame of fixed size for every method invocation. The size of the - local stack has to be computed by the compiler. Values may also be - stored intermediately in a frame area containing <em>local - variables</em> which can be used like a set of registers. These - local variables are numbered from 0 to 65535, i.e., you have a - maximum of 65536 of local variables per method. The stack frames - of caller and callee method are overlapping, i.e., the caller - pushes arguments onto the operand stack and the called method - receives them in local variables. - </p> - - <p> - The byte code instruction set currently consists of 212 - instructions, 44 opcodes are marked as reserved and may be used - for future extensions or intermediate optimizations within the - Virtual Machine. The instruction set can be roughly grouped as - follows: - </p> - - <p> - <b>Stack operations:</b> Constants can be pushed onto the stack - either by loading them from the constant pool with the - <tt>ldc</tt> instruction or with special "short-cut" - instructions where the operand is encoded into the instructions, - e.g., <tt>iconst_0</tt> or <tt>bipush</tt> (push byte value). - </p> - - <p> - <b>Arithmetic operations:</b> The instruction set of the Java - Virtual Machine distinguishes its operand types using different - instructions to operate on values of specific type. Arithmetic - operations starting with <tt>i</tt>, for example, denote an - integer operation. E.g., <tt>iadd</tt> that adds two integers - and pushes the result back on the stack. The Java types - <tt>boolean</tt>, <tt>byte</tt>, <tt>short</tt>, and - <tt>char</tt> are handled as integers by the JVM. - </p> - - <p> - <b>Control flow:</b> There are branch instructions like - <tt>goto</tt>, and <tt>if_icmpeq</tt>, which compares two integers - for equality. There is also a <tt>jsr</tt> (jump to sub-routine) - and <tt>ret</tt> pair of instructions that is used to implement - the <tt>finally</tt> clause of <tt>try-catch</tt> blocks. - Exceptions may be thrown with the <tt>athrow</tt> instruction. - Branch targets are coded as offsets from the current byte code - position, i.e., with an integer number. - </p> - - <p> - <b>Load and store operations</b> for local variables like - <tt>iload</tt> and <tt>istore</tt>. There are also array - operations like <tt>iastore</tt> which stores an integer value - into an array. - </p> - - <p> - <b>Field access:</b> The value of an instance field may be - retrieved with <tt>getfield</tt> and written with - <tt>putfield</tt>. For static fields, there are - <tt>getstatic</tt> and <tt>putstatic</tt> counterparts. - </p> - - <p> - <b>Method invocation:</b> Static Methods may either be called via - <tt>invokestatic</tt> or be bound virtually with the - <tt>invokevirtual</tt> instruction. Super class methods and - private methods are invoked with <tt>invokespecial</tt>. A - special case are interface methods which are invoked with - <tt>invokeinterface</tt>. - </p> - - <p> - <b>Object allocation:</b> Class instances are allocated with the - <tt>new</tt> instruction, arrays of basic type like - <tt>int[]</tt> with <tt>newarray</tt>, arrays of references like - <tt>String[][]</tt> with <tt>anewarray</tt> or - <tt>multianewarray</tt>. - </p> - - <p> - <b>Conversion and type checking:</b> For stack operands of basic - type there exist casting operations like <tt>f2i</tt> which - converts a float value into an integer. The validity of a type - cast may be checked with <tt>checkcast</tt> and the - <tt>instanceof</tt> operator can be directly mapped to the - equally named instruction. - </p> - - <p> - Most instructions have a fixed length, but there are also some - variable-length instructions: In particular, the - <tt>lookupswitch</tt> and <tt>tableswitch</tt> instructions, which - are used to implement <tt>switch()</tt> statements. Since the - number of <tt>case</tt> clauses may vary, these instructions - contain a variable number of statements. - </p> - - <p> - We will not list all byte code instructions here, since these are - explained in detail in the <a - href="http://docs.oracle.com/javase/specs/">JVM - specification</a>. The opcode names are mostly self-explaining, - so understanding the following code examples should be fairly - intuitive. - </p> - - </section> - - <section name="2.3 Method code"> - <p> - Non-abstract (and non-native) methods contain an attribute - "<tt>Code</tt>" that holds the following data: The maximum size of - the method's stack frame, the number of local variables and an - array of byte code instructions. Optionally, it may also contain - information about the names of local variables and source file - line numbers that can be used by a debugger. - </p> - - <p> - Whenever an exception is raised during execution, the JVM performs - exception handling by looking into a table of exception - handlers. The table marks handlers, i.e., code chunks, to be - responsible for exceptions of certain types that are raised within - a given area of the byte code. When there is no appropriate - handler the exception is propagated back to the caller of the - method. The handler information is itself stored in an attribute - contained within the <tt>Code</tt> attribute. - </p> - - </section> - - <section name="2.4 Byte code offsets"> - <p> - Targets of branch instructions like <tt>goto</tt> are encoded as - relative offsets in the array of byte codes. Exception handlers - and local variables refer to absolute addresses within the byte - code. The former contains references to the start and the end of - the <tt>try</tt> block, and to the instruction handler code. The - latter marks the range in which a local variable is valid, i.e., - its scope. This makes it difficult to insert or delete code areas - on this level of abstraction, since one has to recompute the - offsets every time and update the referring objects. We will see - in <a href="#3.3 ClassGen">section 3.3</a> how <font - face="helvetica,arial">BCEL</font> remedies this restriction. - </p> - - </section> - - <section name="2.5 Type information"> - <p> - Java is a type-safe language and the information about the types - of fields, local variables, and methods is stored in so called - <em>signatures</em>. These are strings stored in the constant pool - and encoded in a special format. For example the argument and - return types of the <tt>main</tt> method - </p> - - <p align="center"> - <source>public static void main(String[] argv)</source> - </p> - - <p> - are represented by the signature - </p> - - <p align="center"> - <source>([java/lang/String;)V</source> - </p> - - <p> - Classes are internally represented by strings like - <tt>"java/lang/String"</tt>, basic types like <tt>float</tt> by an - integer number. Within signatures they are represented by single - characters, e.g., <tt>I</tt>, for integer. Arrays are denoted with - a <tt>[</tt> at the start of the signature. - </p> - - </section> - - <section name="2.6 Code example"> - <p> - The following example program prompts for a number and prints the - factorial of it. The <tt>readLine()</tt> method reading from the - standard input may raise an <tt>IOException</tt> and if a - misspelled number is passed to <tt>parseInt()</tt> it throws a - <tt>NumberFormatException</tt>. Thus, the critical area of code - must be encapsulated in a <tt>try-catch</tt> block. - </p> - - <source> - import java.io.*; - - public class Factorial { - private static BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); - - public static int fac(int n) { - return (n == 0) ? 1 : n * fac(n - 1); - } - - public static int readInt() { - int n = 4711; - try { - System.out.print("Please enter a number> "); - n = Integer.parseInt(in.readLine()); - } catch (IOException e1) { - System.err.println(e1); - } catch (NumberFormatException e2) { - System.err.println(e2); - } - return n; - } - - public static void main(String[] argv) { - int n = readInt(); - System.out.println("Factorial of " + n + " is " + fac(n)); - } - } - </source> - - <p> - This code example typically compiles to the following chunks of - byte code: - </p> - - <source> - 0: iload_0 - 1: ifne #8 - 4: iconst_1 - 5: goto #16 - 8: iload_0 - 9: iload_0 - 10: iconst_1 - 11: isub - 12: invokestatic Factorial.fac (I)I (12) - 15: imul - 16: ireturn - - LocalVariable(start_pc = 0, length = 16, index = 0:int n) - </source> - - <p><b>fac():</b> - The method <tt>fac</tt> has only one local variable, the argument - <tt>n</tt>, stored at index 0. This variable's scope ranges from - the start of the byte code sequence to the very end. If the value - of <tt>n</tt> (the value fetched with <tt>iload_0</tt>) is not - equal to 0, the <tt>ifne</tt> instruction branches to the byte - code at offset 8, otherwise a 1 is pushed onto the operand stack - and the control flow branches to the final return. For ease of - reading, the offsets of the branch instructions, which are - actually relative, are displayed as absolute addresses in these - examples. - </p> - - <p> - If recursion has to continue, the arguments for the multiplication - (<tt>n</tt> and <tt>fac(n - 1)</tt>) are evaluated and the results - pushed onto the operand stack. After the multiplication operation - has been performed the function returns the computed value from - the top of the stack. - </p> - - <source> - 0: sipush 4711 - 3: istore_0 - 4: getstatic java.lang.System.out Ljava/io/PrintStream; - 7: ldc "Please enter a number> " - 9: invokevirtual java.io.PrintStream.print (Ljava/lang/String;)V - 12: getstatic Factorial.in Ljava/io/BufferedReader; - 15: invokevirtual java.io.BufferedReader.readLine ()Ljava/lang/String; - 18: invokestatic java.lang.Integer.parseInt (Ljava/lang/String;)I - 21: istore_0 - 22: goto #44 - 25: astore_1 - 26: getstatic java.lang.System.err Ljava/io/PrintStream; - 29: aload_1 - 30: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V - 33: goto #44 - 36: astore_1 - 37: getstatic java.lang.System.err Ljava/io/PrintStream; - 40: aload_1 - 41: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V - 44: iload_0 - 45: ireturn - - Exception handler(s) = - From To Handler Type - 4 22 25 java.io.IOException(6) - 4 22 36 NumberFormatException(10) - </source> - - <p><b>readInt():</b> First the local variable <tt>n</tt> (at index 0) - is initialized to the value 4711. The next instruction, - <tt>getstatic</tt>, loads the referencs held by the static - <tt>System.out</tt> field onto the stack. Then a string is loaded - and printed, a number read from the standard input and assigned to - <tt>n</tt>. - </p> - - <p> - If one of the called methods (<tt>readLine()</tt> and - <tt>parseInt()</tt>) throws an exception, the Java Virtual Machine - calls one of the declared exception handlers, depending on the - type of the exception. The <tt>try</tt>-clause itself does not - produce any code, it merely defines the range in which the - subsequent handlers are active. In the example, the specified - source code area maps to a byte code area ranging from offset 4 - (inclusive) to 22 (exclusive). If no exception has occurred - ("normal" execution flow) the <tt>goto</tt> instructions branch - behind the handler code. There the value of <tt>n</tt> is loaded - and returned. - </p> - - <p> - The handler for <tt>java.io.IOException</tt> starts at - offset 25. It simply prints the error and branches back to the - normal execution flow, i.e., as if no exception had occurred. - </p> - - </section> - - <section name="3 The BCEL API"> - <p> - The <font face="helvetica,arial">BCEL</font> API abstracts from - the concrete circumstances of the Java Virtual Machine and how to - read and write binary Java class files. The API mainly consists - of three parts: - </p> - - <p> - - <ol type="1"> - <li> A package that contains classes that describe "static" - constraints of class files, i.e., reflects the class file format and - is not intended for byte code modifications. The classes may be - used to read and write class files from or to a file. This is - useful especially for analyzing Java classes without having the - source files at hand. The main data structure is called - <tt>JavaClass</tt> which contains methods, fields, etc..</li> - - <li> A package to dynamically generate or modify - <tt>JavaClass</tt> or <tt>Method</tt> objects. It may be used to - insert analysis code, to strip unnecessary information from class - files, or to implement the code generator back-end of a Java - compiler.</li> - - <li> Various code examples and utilities like a class file viewer, - a tool to convert class files into HTML, and a converter from - class files to the <a - href="http://jasmin.sourceforge.net">Jasmin</a> assembly - language.</li> - </ol> - </p> - </section> - - <section name="3.1 JavaClass"> - <p> - The "static" component of the <font - face="helvetica,arial">BCEL</font> API resides in the package - <tt>org.apache.bcel.classfile</tt> and closely represents class - files. All of the binary components and data structures declared - in the <a - href="http://docs.oracle.com/javase/specs/">JVM - specification</a> and described in section <a - href="#2 The Java Virtual Machine">2</a> are mapped to classes. - - <a href="#Figure 3">Figure 3</a> shows an UML diagram of the - hierarchy of classes of the <font face="helvetica,arial">BCEL - </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also - shows a detailed diagram of the <tt>ConstantPool</tt> components. - </p> - - <p align="center"> - <a name="Figure 3"> - <img src="images/javaclass.gif"/> <br/> - Figure 3: UML diagram for the JavaClass API</a> - </p> - - <p> - The top-level data structure is <tt>JavaClass</tt>, which in most - cases is created by a <tt>ClassParser</tt> object that is capable - of parsing binary class files. A <tt>JavaClass</tt> object - basically consists of fields, methods, symbolic references to the - super class and to the implemented interfaces. - </p> - - <p> - The constant pool serves as some kind of central repository and is - thus of outstanding importance for all components. - <tt>ConstantPool</tt> objects contain an array of fixed size of - <tt>Constant</tt> entries, which may be retrieved via the - <tt>getConstant()</tt> method taking an integer index as argument. - Indexes to the constant pool may be contained in instructions as - well as in other components of a class file and in constant pool - entries themselves. - </p> - - <p> - Methods and fields contain a signature, symbolically defining - their types. Access flags like <tt>public static final</tt> occur - in several places and are encoded by an integer bit mask, e.g., - <tt>public static final</tt> matches to the Java expression - </p> - - - <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source> - - <p> - As mentioned in <a href="#2.1 Java class file format">section - 2.1</a> already, several components may contain <em>attribute</em> - objects: classes, fields, methods, and <tt>Code</tt> objects - (introduced in <a href="#2.3 Method code">section 2.3</a>). The - latter is an attribute itself that contains the actual byte code - array, the maximum stack size, the number of local variables, a - table of handled exceptions, and some optional debugging - information coded as <tt>LineNumberTable</tt> and - <tt>LocalVariableTable</tt> attributes. Attributes are in general - specific to some data structure, i.e., no two components share the - same kind of attribute, though this is not explicitly - forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped - with the component they belong to. - </p> - - </section> - - <section name="3.2 Class repository"> - <p> - Using the provided <tt>Repository</tt> class, reading class files into - a <tt>JavaClass</tt> object is quite simple: - </p> - - <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source> - - <p> - The repository also contains methods providing the dynamic equivalent - of the <tt>instanceof</tt> operator, and other useful routines: - </p> - - <source> - if (Repository.instanceOf(clazz, super_class)) { - ... - }</source> - - </section> - - <section name="3.2.1 Accessing class file data"> - - <p> - Information within the class file components may be accessed like - Java Beans via intuitive set/get methods. All of them also define - a <tt>toString()</tt> method so that implementing a simple class - viewer is very easy. In fact all of the examples used here have - been produced this way: - </p> - - <source> - System.out.println(clazz); - printCode(clazz.getMethods()); - ... - public static void printCode(Method[] methods) { - for (int i = 0; i < methods.length; i++) { - System.out.println(methods[i]); - - Code code = methods[i].getCode(); - if (code != null) // Non-abstract method - System.out.println(code); - } - } - </source> - - </section> - - <section name="3.2.2 Analyzing class data"> - <p> - Last but not least, <font face="helvetica,arial">BCEL</font> - supports the <em>Visitor</em> design pattern, so one can write - visitor objects to traverse and analyze the contents of a class - file. Included in the distribution is a class - <tt>JasminVisitor</tt> that converts class files into the <a - href="http://jasmin.sourceforge.net">Jasmin</a> - assembler language. - </p> - - </section> - - <section name="3.3 ClassGen"> - <p> - This part of the API (package <tt>org.apache.bcel.generic</tt>) - supplies an abstraction level for creating or transforming class - files dynamically. It makes the static constraints of Java class - files like the hard-coded byte code addresses "generic". The - generic constant pool, for example, is implemented by the class - <tt>ConstantPoolGen</tt> which offers methods for adding different - types of constants. Accordingly, <tt>ClassGen</tt> offers an - interface to add methods, fields, and attributes. - <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API. - </p> - - <p align="center"> - <a name="Figure 4"> - <img src="images/classgen.gif"/> - <br/> - Figure 4: UML diagram of the ClassGen API</a> - </p> - - </section> - - <section name="3.3.1 Types"> - <p> - We abstract from the concrete details of the type signature syntax - (see <a href="#2.5 Type information">2.5</a>) by introducing the - <tt>Type</tt> class, which is used, for example, by methods to - define their return and argument types. Concrete sub-classes are - <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt> - which consists of the element type and the number of - dimensions. For commonly used types the class offers some - predefined constants. For example, the method signature of the - <tt>main</tt> method as shown in - <a href="#2.5 Type information">section 2.5</a> is represented by: - </p> - - <source> - Type return_type = Type.VOID; - Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) }; - </source> - - <p> - <tt>Type</tt> also contains methods to convert types into textual - signatures and vice versa. The sub-classes contain implementations - of the routines and constraints specified by the Java Language - Specification. - </p> - </section> - - <section name="3.3.2 Generic fields and methods"> - <p> - Fields are represented by <tt>FieldGen</tt> objects, which may be - freely modified by the user. If they have the access rights - <tt>static final</tt>, i.e., are constants and of basic type, they - may optionally have an initializing value. - </p> - - <p> - Generic methods contain methods to add exceptions the method may - throw, local variables, and exception handlers. The latter two are - represented by user-configurable objects as well. Because - exception handlers and local variables contain references to byte - code addresses, they also take the role of an <em>instruction - targeter</em> in our terminology. Instruction targeters contain a - method <tt>updateTarget()</tt> to redirect a reference. This is - somewhat related to the Observer design pattern. Generic - (non-abstract) methods refer to <em>instruction lists</em> that - consist of instruction objects. References to byte code addresses - are implemented by handles to instruction objects. If the list is - updated the instruction targeters will be informed about it. This - is explained in more detail in the following sections. - </p> - - <p> - The maximum stack size needed by the method and the maximum number - of local variables used may be set manually or computed via the - <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods - automatically. - </p> - - </section> - - <section name="3.3.3 Instructions"> - <p> - Modeling instructions as objects may look somewhat odd at first - sight, but in fact enables programmers to obtain a high-level view - upon control flow without handling details like concrete byte code - offsets. Instructions consist of an opcode (sometimes called - tag), their length in bytes and an offset (or index) within the - byte code. Since many instructions are immutable (stack operators, - e.g.), the <tt>InstructionConstants</tt> interface offers - shareable predefined "fly-weight" constants to use. - </p> - - <p> - Instructions are grouped via sub-classing, the type hierarchy of - instruction classes is illustrated by (incomplete) figure in the - appendix. The most important family of instructions are the - <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to - targets somewhere within the byte code. Obviously, this makes them - candidates for playing an <tt>InstructionTargeter</tt> role, - too. Instructions are further grouped by the interfaces they - implement, there are, e.g., <tt>TypedInstruction</tt>s that are - associated with a specific type like <tt>ldc</tt>, or - <tt>ExceptionThrower</tt> instructions that may raise exceptions - when executed. - </p> - - <p> - All instructions can be traversed via <tt>accept(Visitor v)</tt> - methods, i.e., the Visitor design pattern. There is however some - special trick in these methods that allows to merge the handling - of certain instruction groups. The <tt>accept()</tt> do not only - call the corresponding <tt>visit()</tt> method, but call - <tt>visit()</tt> methods of their respective super classes and - implemented interfaces first, i.e., the most specific - <tt>visit()</tt> call is last. Thus one can group the handling of, - say, all <tt>BranchInstruction</tt>s into one single method. - </p> - - <p> - For debugging purposes it may even make sense to "invent" your own - instructions. In a sophisticated code generator like the one used - as a backend of the <a href="http://barat.sourceforge.net">Barat - framework</a> for static analysis one often has to insert - temporary <tt>nop</tt> (No operation) instructions. When examining - the produced code it may be very difficult to track back where the - <tt>nop</tt> was actually inserted. One could think of a derived - <tt>nop2</tt> instruction that contains additional debugging - information. When the instruction list is dumped to byte code, the - extra data is simply dropped. - </p> - - <p> - One could also think of new byte code instructions operating on - complex numbers that are replaced by normal byte code upon - load-time or are recognized by a new JVM. - </p> - - </section> - - <section name="3.3.4 Instruction lists"> - <p> - An <em>instruction list</em> is implemented by a list of - <em>instruction handles</em> encapsulating instruction objects. - References to instructions in the list are thus not implemented by - direct pointers to instructions but by pointers to instruction - <em>handles</em>. This makes appending, inserting and deleting - areas of code very simple and also allows us to reuse immutable - instruction objects (fly-weight objects). Since we use symbolic - references, computation of concrete byte code offsets does not - need to occur until finalization, i.e., until the user has - finished the process of generating or transforming code. We will - use the term instruction handle and instruction synonymously - throughout the rest of the paper. Instruction handles may contain - additional user-defined data using the <tt>addAttribute()</tt> - method. - </p> - - <p> - <b>Appending:</b> One can append instructions or other instruction - lists anywhere to an existing list. The instructions are appended - after the given instruction handle. All append methods return a - new instruction handle which may then be used as the target of a - branch instruction, e.g.: - </p> - - <source> - InstructionList il = new InstructionList(); - ... - GOTO g = new GOTO(null); - il.append(g); - ... - // Use immutable fly-weight object - InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); - g.setTarget(ih); - </source> - - <p> - <b>Inserting:</b> Instructions may be inserted anywhere into an - existing list. They are inserted before the given instruction - handle. All insert methods return a new instruction handle which - may then be used as the start address of an exception handler, for - example. - </p> - - <source> - InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); - ... - mg.addExceptionHandler(start, end, handler, "java.io.IOException"); - </source> - - <p> - <b>Deleting:</b> Deletion of instructions is also very - straightforward; all instruction handles and the contained - instructions within a given range are removed from the instruction - list and disposed. The <tt>delete()</tt> method may however throw - a <tt>TargetLostException</tt> when there are instruction - targeters still referencing one of the deleted instructions. The - user is forced to handle such exceptions in a <tt>try-catch</tt> - clause and redirect these references elsewhere. The <em>peep - hole</em> optimizer described in the appendix gives a detailed - example for this. - </p> - - <source> - try { - il.delete(first, last); - } catch (TargetLostException e) { - for (InstructionHandle target : e.getTargets()) { - for (InstructionTargeter targeter : target.getTargeters()) { - targeter.updateTarget(target, new_target); - } - } - } - </source> - - <p> - <b>Finalizing:</b> When the instruction list is ready to be dumped - to pure byte code, all symbolic references must be mapped to real - byte code offsets. This is done by the <tt>getByteCode()</tt> - method which is called by default by - <tt>MethodGen.getMethod()</tt>. Afterwards you should call - <tt>dispose()</tt> so that the instruction handles can be reused - internally. This helps to improve memory usage. - </p> - - <source> - InstructionList il = new InstructionList(); - - ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", - "<generated>", ACC_PUBLIC | ACC_SUPER, - null); - MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, - Type.VOID, new Type[] { - new ArrayType(Type.STRING, 1) - }, new String[] { "argv" }, - "main", "HelloWorld", il, cp); - ... - cg.addMethod(mg.getMethod()); - il.dispose(); // Reuse instruction handles of list - </source> - - </section> - - <section name="3.3.5 Code example revisited"> - <p> - Using instruction lists gives us a generic view upon the code: In - <a href="#Figure 5">Figure 5</a> we again present the code chunk - of the <tt>readInt()</tt> method of the factorial example in section - <a href="#2.6 Code example">2.6</a>: The local variables - <tt>n</tt> and <tt>e1</tt> both hold two references to - instructions, defining their scope. There are two <tt>goto</tt>s - branching to the <tt>iload</tt> at the end of the method. One of - the exception handlers is displayed, too: it references the start - and the end of the <tt>try</tt> block and also the exception - handler code. - </p> - - <p align="center"> - <a name="Figure 5"> - <img src="images/il.gif"/> - <br/> - Figure 5: Instruction list for <tt>readInt()</tt> method</a> - </p> - - </section> - - <section name="3.3.6 Instruction factories"> - <p> - To simplify the creation of certain instructions the user can use - the supplied <tt>InstructionFactory</tt> class which offers a lot - of useful methods to create instructions from - scratch. Alternatively, he can also use <em>compound - instructions</em>: When producing byte code, some patterns - typically occur very frequently, for instance the compilation of - arithmetic or comparison expressions. You certainly do not want - to rewrite the code that translates such expressions into byte - code in every place they may appear. In order to support this, the - <font face="helvetica,arial">BCEL</font> API includes a <em>compound - instruction</em> (an interface with a single - <tt>getInstructionList()</tt> method). Instances of this class - may be used in any place where normal instructions would occur, - particularly in append operations. - </p> - - <p> - <b>Example: Pushing constants</b> Pushing constants onto the - operand stack may be coded in different ways. As explained in <a - href="#2.2 Byte code instruction set">section 2.2</a> there are - some "short-cut" instructions that can be used to make the - produced byte code more compact. The smallest instruction to push - a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other - possibilities are <tt>bipush</tt> (can be used to push values - between -128 and 127), <tt>sipush</tt> (between -32768 and 32767), - or <tt>ldc</tt> (load constant from constant pool). - </p> - - <p> - Instead of repeatedly selecting the most compact instruction in, - say, a switch, one can use the compound <tt>PUSH</tt> instruction - whenever pushing a constant number or string. It will produce the - appropriate byte code instruction and insert entries into to - constant pool if necessary. - </p> - - <source> - InstructionFactory f = new InstructionFactory(class_gen); - InstructionList il = new InstructionList(); - ... - il.append(new PUSH(cp, "Hello, world")); - il.append(new PUSH(cp, 4711)); - ... - il.append(f.createPrintln("Hello World")); - ... - il.append(f.createReturn(type)); - </source> - - </section> - - <section name="3.3.7 Code patterns using regular expressions"> - <p> - When transforming code, for instance during optimization or when - inserting analysis method calls, one typically searches for - certain patterns of code to perform the transformation at. To - simplify handling such situations <font - face="helvetica,arial">BCEL </font>introduces a special feature: - One can search for given code patterns within an instruction list - using <em>regular expressions</em>. In such expressions, - instructions are represented by their opcode names, e.g., - <tt>LDC</tt>, one may also use their respective super classes, e.g., - "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>, - <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus, - the expression - </p> - - <source>"NOP+(ILOAD|ALOAD)*"</source> - - <p> - represents a piece of code consisting of at least one <tt>NOP</tt> - followed by a possibly empty sequence of <tt>ILOAD</tt> and - <tt>ALOAD</tt> instructions. - </p> - - <p> - The <tt>search()</tt> method of class - <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular - expression and a starting point as arguments and returns an - iterator describing the area of matched instructions. Additional - constraints to the matching area of instructions, which can not be - implemented via regular expressions, may be expressed via <em>code - constraint</em> objects. - </p> - - </section> - - <section name="3.3.8 Example: Optimizing boolean expressions"> - <p> - In Java, boolean values are mapped to 1 and to 0, - respectively. Thus, the simplest way to evaluate boolean - expressions is to push a 1 or a 0 onto the operand stack depending - on the truth value of the expression. But this way, the - subsequent combination of boolean expressions (with - <tt>&&</tt>, e.g) yields long chunks of code that push - lots of 1s and 0s onto the stack. - </p> - - <p> - When the code has been finalized these chunks can be optimized - with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt> - (e.g. the comparison of two integers: <tt>if_icmpeq</tt>) that - either produces a 1 or a 0 on the stack and is followed by an - <tt>ifne</tt> instruction (branch if stack value 0) may be - replaced by the <tt>IfInstruction</tt> with its branch target - replaced by the target of the <tt>ifne</tt> instruction: - </p> - - <source> - CodeConstraint constraint = new CodeConstraint() { - public boolean checkCode(InstructionHandle[] match) { - IfInstruction if1 = (IfInstruction) match[0].getInstruction(); - GOTO g = (GOTO) match[2].getInstruction(); - return (if1.getTarget() == match[3]) && - (g.getTarget() == match[4]); - } - }; - - InstructionFinder f = new InstructionFinder(il); - String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; - - for (Iterator e = f.search(pat, constraint); e.hasNext(); ) { - InstructionHandle[] match = (InstructionHandle[]) e.next();; - ... - match[0].setTarget(match[5].getTarget()); // Update target - ... - try { - il.delete(match[1], match[5]); - } catch (TargetLostException ex) { - ... - } - } - </source> - - <p> - The applied code constraint object ensures that the matched code - really corresponds to the targeted expression pattern. Subsequent - application of this algorithm removes all unnecessary stack - operations and branch instructions from the byte code. If any of - the deleted instructions is still referenced by an - <tt>InstructionTargeter</tt> object, the reference has to be - updated in the <tt>catch</tt>-clause. - </p> - - <p> - <b>Example application:</b> - The expression: - </p> - - <source> - if ((a == null) || (i < 2)) - System.out.println("Ooops"); - </source> - - <p> - can be mapped to both of the chunks of byte code shown in <a - href="#Figure 6">figure 6</a>. The left column represents the - unoptimized code while the right column displays the same code - after the peep hole algorithm has been applied: - </p> - - <p align="center"><a name="Figure 6"> - <table> - <tr> - <td valign="top"><pre> -5: aload_0 -6: ifnull #13 -9: iconst_0 -10: goto #14 -13: iconst_1 -14: nop -15: ifne #36 -18: iload_1 -19: iconst_2 -20: if_icmplt #27 -23: iconst_0 -24: goto #28 -27: iconst_1 -28: nop -29: ifne #36 -32: iconst_0 -33: goto #37 -36: iconst_1 -37: nop -38: ifeq #52 -41: getstatic System.out -44: ldc "Ooops" -46: invokevirtual println -52: return - </pre></td> - <td valign="top"><pre> -10: aload_0 -11: ifnull #19 -14: iload_1 -15: iconst_2 -16: if_icmpge #27 -19: getstatic System.out -22: ldc "Ooops" -24: invokevirtual println -27: return - </pre></td> - </tr> - </table> - </a> - </p> - - </section> - - <section name="4 Application areas"> - <p> - There are many possible application areas for <font - face="helvetica,arial">BCEL</font> ranging from class - browsers, profilers, byte code optimizers, and compilers to - sophisticated run-time analysis tools and extensions to the Java - language. - </p> - - <p> - Compilers like the <a - href="http://barat.sourceforge.net">Barat</a> compiler use <font - face="helvetica,arial">BCEL</font> to implement a byte code - generating back end. Other possible application areas are the - static analysis of byte code or examining the run-time behavior of - classes by inserting calls to profiling methods into the - code. Further examples are extending Java with Eiffel-like - assertions, automated delegation, or with the concepts of <a - href="http://www.eclipse.org/aspectj/">Aspect-Oriented Programming</a>.<br/> A - list of projects using <font face="helvetica,arial">BCEL</font> can - be found <a href="projects.html">here</a>. - </p> - - </section> - - <section name="4.1 Class loaders"> - <p> - Class loaders are responsible for loading class files from the - file system or other resources and passing the byte code to the - Virtual Machine. A custom <tt>ClassLoader</tt> object may be used - to intercept the standard procedure of loading a class, i.e.m the - system class loader, and perform some transformations before - actually passing the byte code to the JVM. - </p> - - <p> - A possible scenario is described in <a href="#Figure 7">figure - 7</a>: - During run-time the Virtual Machine requests a custom class loader - to load a given class. But before the JVM actually sees the byte - code, the class loader makes a "side-step" and performs some - transformation to the class. To make sure that the modified byte - code is still valid and does not violate any of the JVM's rules it - is checked by the verifier before the JVM finally executes it. - </p> - - <p align="center"> - <a name="Figure 7"> - <img src="images/classloader.gif"/> - <br/> - Figure 7: Class loaders - </a> - </p> - - <p> - Using class loaders is an elegant way of extending the Java - Virtual Machine with new features without actually modifying it. - This concept enables developers to use <em>load-time - reflection</em> to implement their ideas as opposed to the static - reflection supported by the <a - href="http://java.sun.com/j2se/1.3/docs/guide/reflection/index.html">Java - Reflection API</a>. Load-time transformations supply the user with - a new level of abstraction. He is not strictly tied to the static - constraints of the original authors of the classes but may - customize the applications with third-party code in order to - benefit from new features. Such transformations may be executed on - demand and neither interfere with other users, nor alter the - original byte code. In fact, class loaders may even create classes - <em>ad hoc</em> without loading a file at all.<br/> <font - face="helvetica,arial">BCEL</font> has already builtin support for - dynamically creating classes, an example is the ProxyCreator class. - </p> - - </section> - - <section name="4.1.1 Example: Poor Man's Genericity"> - <p> - The former "Poor Man's Genericity" project that extended Java with - parameterized classes, for example, used <font - face="helvetica,arial">BCEL</font> in two places to generate - instances of parameterized classes: During compile-time (with the - standard <tt>javac</tt> with some slightly changed classes) and at - run-time using a custom class loader. The compiler puts some - additional type information into class files (attributes) which is - evaluated at load-time by the class loader. The class loader - performs some transformations on the loaded class and passes them - to the VM. The following algorithm illustrates how the load method - of the class loader fulfills the request for a parameterized - class, e.g., <tt>Stack<String></tt> - </p> - - <p> - <ol type="1"> - <li> Search for class <tt>Stack</tt>, load it, and check for a - certain class attribute containing additional type - information. I.e. the attribute defines the "real" name of the - class, i.e., <tt>Stack<A></tt>.</li> - - <li>Replace all occurrences and references to the formal type - <tt>A</tt> with references to the actual type <tt>String</tt>. For - example the method - </li> - - <source> - void push(A obj) { ... } - </source> - - <p> - becomes - </p> - - <source> - void push(String obj) { ... } - </source> - - <li> Return the resulting class to the Virtual Machine.</li> - </ol> - </p> - - </section> - - <section name="A Appendix"/> - - <section name="HelloWorldBuilder"> - <p> - The following program reads a name from the standard input and - prints a friendly "Hello". Since the <tt>readLine()</tt> method may - throw an <tt>IOException</tt> it is enclosed by a <tt>try-catch</tt> - clause. - </p> - - <source> - import java.io.*; - - public class HelloWorld { - public static void main(String[] argv) { - BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); - String name = null; - - try { - System.out.print("Please enter your name> "); - name = in.readLine(); - } catch (IOException e) { - return; - } - - System.out.println("Hello, " + name); - } - } - </source> - - <p> - We will sketch here how the above Java class can be created from the - scratch using the <font face="helvetica,arial">BCEL</font> API. For - ease of reading we will use textual signatures and not create them - dynamically. For example, the signature - </p> - - <source>"(Ljava/lang/String;)Ljava/lang/StringBuffer;"</source> - - <p> - actually be created with - </p> - - <source>Type.getMethodSignature(Type.STRINGBUFFER, new Type[] { Type.STRING });</source> - - <p><b>Initialization:</b> - First we create an empty class and an instruction list: - </p> - - <source> - ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", - "<generated>", ACC_PUBLIC | ACC_SUPER, null); - ConstantPoolGen cp = cg.getConstantPool(); // cg creates constant pool - InstructionList il = new InstructionList(); - </source> - - <p> -We then create the main method, supplying the method's name and the -symbolic type signature encoded with <tt>Type</tt> objects. - </p> - - <source> - MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, // access flags - Type.VOID, // return type - new Type[] { // argument types - new ArrayType(Type.STRING, 1) }, - new String[] { "argv" }, // arg names - "main", "HelloWorld", // method, class - il, cp); - InstructionFactory factory = new InstructionFactory(cg); - </source> - - <p> - We now define some often used types: - </p> - - <source> - ObjectType i_stream = new ObjectType("java.io.InputStream"); - ObjectType p_stream = new ObjectType("java.io.PrintStream"); - </source> - - <p><b>Create variables <tt>in</tt> and <tt>name</tt>:</b> We call - the constructors, i.e., execute - <tt>BufferedReader(InputStreamReader(System.in))</tt>. The reference - to the <tt>BufferedReader</tt> object stays on top of the stack and - is stored in the newly allocated <tt>in</tt> variable. - </p> - - <source> - il.append(factory.createNew("java.io.BufferedReader")); - il.append(InstructionConstants.DUP); // Use predefined constant - il.append(factory.createNew("java.io.InputStreamReader")); - il.append(InstructionConstants.DUP); - il.append(factory.createFieldAccess("java.lang.System", "in", i_stream, Constants.GETSTATIC)); - il.append(factory.createInvoke("java.io.InputStreamReader", "<init>", - Type.VOID, new Type[] { i_stream }, - Constants.INVOKESPECIAL)); - il.append(factory.createInvoke("java.io.BufferedReader", "<init>", Type.VOID, - new Type[] {new ObjectType("java.io.Reader")}, - Constants.INVOKESPECIAL)); - - LocalVariableGen lg = mg.addLocalVariable("in", - new ObjectType("java.io.BufferedReader"), null, null); - int in = lg.getIndex(); - lg.setStart(il.append(new ASTORE(in))); // "i" valid from here - </source> - - <p> - Create local variable <tt>name</tt> and initialize it to <tt>null</tt>. - </p> - - <source> - lg = mg.addLocalVariable("name", Type.STRING, null, null); - int name = lg.getIndex(); - il.append(InstructionConstants.ACONST_NULL); - lg.setStart(il.append(new ASTORE(name))); // "name" valid from here - </source> - - <p><b>Create try-catch block:</b> We remember the start of the - block, read a line from the standard input and store it into the - variable <tt>name</tt>. - </p> - - <source> - InstructionHandle try_start = - il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, Constants.GETSTATIC)); - - il.append(new PUSH(cp, "Please enter your name> ")); - il.append(factory.createInvoke("java.io.PrintStream", "print", Type.VOID, - new Type[] { Type.STRING }, - Constants.INVOKEVIRTUAL)); - il.append(new ALOAD(in)); - il.append(factory.createInvoke("java.io.BufferedReader", "readLine", - Type.STRING, Type.NO_ARGS, - Constants.INVOKEVIRTUAL)); - il.append(new ASTORE(name)); - </source> - - <p> - Upon normal execution we jump behind exception handler, the target - address is not known yet. - </p> - - <source> - GOTO g = new GOTO(null); - InstructionHandle try_end = il.append(g); - </source> - - <p> - We add the exception handler which simply returns from the method. - </p> - - <source> - InstructionHandle handler = il.append(InstructionConstants.RETURN); - mg.addExceptionHandler(try_start, try_end, handler, "java.io.IOException"); - </source> - - <p> - "Normal" code continues, now we can set the branch target of the <tt>GOTO</tt>. - </p> - - <source> - InstructionHandle ih = - il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, Constants.GETSTATIC)); - g.setTarget(ih); - </source> - - <p><b>Printing "Hello":</b> - String concatenation compiles to <tt>StringBuffer</tt> operations. - </p> - - <source> - il.append(factory.createNew(Type.STRINGBUFFER)); - il.append(InstructionConstants.DUP); - il.append(new PUSH(cp, "Hello, ")); - il.append(factory.createInvoke("java.lang.StringBuffer", "<init>", - Type.VOID, new Type[] { Type.STRING }, - Constants.INVOKESPECIAL)); - il.append(new ALOAD(name)); - il.append(factory.createInvoke("java.lang.StringBuffer", "append", - Type.STRINGBUFFER, new Type[] { Type.STRING }, - Constants.INVOKEVIRTUAL)); - il.append(factory.createInvoke("java.lang.StringBuffer", "toString", - Type.STRING, Type.NO_ARGS, - Constants.INVOKEVIRTUAL)); - - il.append(factory.createInvoke("java.io.PrintStream", "println", - Type.VOID, new Type[] { Type.STRING }, - Constants.INVOKEVIRTUAL)); - il.append(InstructionConstants.RETURN); - </source> - - - <p><b>Finalization:</b> Finally, we have to set the stack size, - which normally would have to be computed on the fly and add a - default constructor method to the class, which is empty in this - case. - </p> - - <source> - mg.setMaxStack(); - cg.addMethod(mg.getMethod()); - il.dispose(); // Allow instruction handles to be reused - cg.addEmptyConstructor(ACC_PUBLIC); - </source> - - <p> - Last but not least we dump the <tt>JavaClass</tt> object to a file. - </p> - - <source> - try { - cg.getJavaClass().dump("HelloWorld.class"); - } catch (IOException e) { - System.err.println(e); - } - </source> - - </section> - - <section name="Peephole optimizer"> - <p> - This class implements a simple peephole optimizer that removes any NOP - instructions from the given class. - </p> - - <source> -import java.io.*; - -import java.util.Iterator; -import org.apache.bcel.classfile.*; -import org.apache.bcel.generic.*; -import org.apache.bcel.Repository; -import org.apache.bcel.util.InstructionFinder; - -public class Peephole { - - public static void main(String[] argv) { - try { - // Load the class from CLASSPATH. - JavaClass clazz = Repository.lookupClass(argv[0]); - Method[] methods = clazz.getMethods(); - ConstantPoolGen cp = new ConstantPoolGen(clazz.getConstantPool()); - - for (int i = 0; i < methods.length; i++) { - if (!(methods[i].isAbstract() || methods[i].isNative())) { - MethodGen mg = new MethodGen(methods[i], clazz.getClassName(), cp); - Method stripped = removeNOPs(mg); - - if (stripped != null) // Any NOPs stripped? - methods[i] = stripped; // Overwrite with stripped method - } - } - - // Dump the class to "class name"_.class - clazz.setConstantPool(cp.getFinalConstantPool()); - clazz.dump(clazz.getClassName() + "_.class"); - } catch (Exception e) { - e.printStackTrace(); - } - } - - private static Method removeNOPs(MethodGen mg) { - InstructionList il = mg.getInstructionList(); - InstructionFinder f = new InstructionFinder(il); - String pat = "NOP+"; // Find at least one NOP - InstructionHandle next = null; - int count = 0; - - for (Iterator iter = f.search(pat); iter.hasNext();) { - InstructionHandle[] match = (InstructionHandle[]) iter.next(); - InstructionHandle first = match[0]; - InstructionHandle last = match[match.length - 1]; - - // Some nasty Java compilers may add NOP at end of method. - if ((next = last.getNext()) == null) { - break; - } - - count += match.length; - - /** - * Delete NOPs and redirect any references to them to the following (non-nop) instruction. - */ - try { - il.delete(first, last); - } catch (TargetLostException e) { - for (InstructionHandle target : e.getTargets()) { - for (InstructionTargeter targeter = target.getTargeters()) { - targeter.updateTarget(target, next); - } - } - } - } - - Method m = null; - - if (count > 0) { - System.out.println("Removed " + count + " NOP instructions from method " + mg.getName()); - m = mg.getMethod(); - } - - il.dispose(); // Reuse instruction handles - return m; - } -} - </source> - </section> - - <section name="BCELifier"> - <p> - If you want to learn how certain things are generated using BCEL you - can do the following: Write your program with the needed features in - Java and compile it as usual. Then use <tt>BCELifier</tt> to create - a class that creates that very input class using BCEL.<br/> - (Think about this sentence for a while, or just try it ...) - </p> - </section> - - <section name="Constant pool UML diagram"> - - <p align="center"> - <a name="Figure 8"> - <img src="images/constantpool.gif"/> - <br/> - Figure 8: UML diagram for constant pool classes - </a> - </p> - </section> -</body> -</document> diff --git a/src/site/xdoc/manual/appendix.xml b/src/site/xdoc/manual/appendix.xml new file mode 100644 index 00000000..f242005e --- /dev/null +++ b/src/site/xdoc/manual/appendix.xml @@ -0,0 +1,357 @@ +<?xml version="1.0"?> +<!-- + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. +--> +<document> + <properties> + <title>Appendix</title> + </properties> + + <body> + <section name="Appendix"> + + <subsection name="HelloWorldBuilder"> + <p> + The following program reads a name from the standard input and + prints a friendly "Hello". Since the <tt>readLine()</tt> method may + throw an <tt>IOException</tt> it is enclosed by a <tt>try-catch</tt> + clause. + </p> + + <source> +import java.io.*; + +public class HelloWorld { + public static void main(String[] argv) { + BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); + String name = null; + + try { + System.out.print("Please enter your name> "); + name = in.readLine(); + } catch (IOException e) { + return; + } + + System.out.println("Hello, " + name); + } +} + </source> + + <p> + We will sketch here how the above Java class can be created from the + scratch using the <font face="helvetica,arial">BCEL</font> API. For + ease of reading we will use textual signatures and not create them + dynamically. For example, the signature + </p> + + <source>"(Ljava/lang/String;)Ljava/lang/StringBuffer;"</source> + + <p> + actually be created with + </p> + + <source>Type.getMethodSignature(Type.STRINGBUFFER, new Type[] { Type.STRING });</source> + + <p><b>Initialization:</b> + First we create an empty class and an instruction list: + </p> + + <source> +ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", + "<generated>", ACC_PUBLIC | ACC_SUPER, null); +ConstantPoolGen cp = cg.getConstantPool(); // cg creates constant pool +InstructionList il = new InstructionList(); + </source> + + <p> + We then create the main method, supplying the method's name and the + symbolic type signature encoded with <tt>Type</tt> objects. + </p> + + <source> +MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, // access flags + Type.VOID, // return type + new Type[] { // argument types + new ArrayType(Type.STRING, 1) }, + new String[] { "argv" }, // arg names + "main", "HelloWorld", // method, class + il, cp); +InstructionFactory factory = new InstructionFactory(cg); + </source> + + <p> + We now define some often used types: + </p> + + <source> +ObjectType i_stream = new ObjectType("java.io.InputStream"); +ObjectType p_stream = new ObjectType("java.io.PrintStream"); + </source> + + <p><b>Create variables <tt>in</tt> and <tt>name</tt>:</b> We call + the constructors, i.e., execute + <tt>BufferedReader(InputStreamReader(System.in))</tt>. The reference + to the <tt>BufferedReader</tt> object stays on top of the stack and + is stored in the newly allocated <tt>in</tt> variable. + </p> + + <source> +il.append(factory.createNew("java.io.BufferedReader")); +il.append(InstructionConstants.DUP); // Use predefined constant +il.append(factory.createNew("java.io.InputStreamReader")); +il.append(InstructionConstants.DUP); +il.append(factory.createFieldAccess("java.lang.System", "in", i_stream, Constants.GETSTATIC)); +il.append(factory.createInvoke("java.io.InputStreamReader", "<init>", + Type.VOID, new Type[] { i_stream }, + Constants.INVOKESPECIAL)); +il.append(factory.createInvoke("java.io.BufferedReader", "<init>", Type.VOID, + new Type[] {new ObjectType("java.io.Reader")}, + Constants.INVOKESPECIAL)); + +LocalVariableGen lg = mg.addLocalVariable("in", + new ObjectType("java.io.BufferedReader"), null, null); +int in = lg.getIndex(); +lg.setStart(il.append(new ASTORE(in))); // "i" valid from here + </source> + + <p> + Create local variable <tt>name</tt> and initialize it to <tt>null</tt>. + </p> + + <source> +lg = mg.addLocalVariable("name", Type.STRING, null, null); +int name = lg.getIndex(); +il.append(InstructionConstants.ACONST_NULL); +lg.setStart(il.append(new ASTORE(name))); // "name" valid from here + </source> + + <p><b>Create try-catch block:</b> We remember the start of the + block, read a line from the standard input and store it into the + variable <tt>name</tt>. + </p> + + <source> +InstructionHandle try_start = + il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, Constants.GETSTATIC)); + +il.append(new PUSH(cp, "Please enter your name> ")); +il.append(factory.createInvoke("java.io.PrintStream", "print", Type.VOID, + new Type[] { Type.STRING }, + Constants.INVOKEVIRTUAL)); +il.append(new ALOAD(in)); +il.append(factory.createInvoke("java.io.BufferedReader", "readLine", + Type.STRING, Type.NO_ARGS, + Constants.INVOKEVIRTUAL)); +il.append(new ASTORE(name)); + </source> + + <p> + Upon normal execution we jump behind exception handler, the target + address is not known yet. + </p> + + <source> +GOTO g = new GOTO(null); +InstructionHandle try_end = il.append(g); + </source> + + <p> + We add the exception handler which simply returns from the method. + </p> + + <source> +InstructionHandle handler = il.append(InstructionConstants.RETURN); +mg.addExceptionHandler(try_start, try_end, handler, "java.io.IOException"); + </source> + + <p> + "Normal" code continues, now we can set the branch target of the <tt>GOTO</tt>. + </p> + + <source> +InstructionHandle ih = + il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, Constants.GETSTATIC)); +g.setTarget(ih); + </source> + + <p><b>Printing "Hello":</b> +String concatenation compiles to <tt>StringBuffer</tt> operations. + </p> + + <source> +il.append(factory.createNew(Type.STRINGBUFFER)); +il.append(InstructionConstants.DUP); +il.append(new PUSH(cp, "Hello, ")); +il.append(factory.createInvoke("java.lang.StringBuffer", "<init>", + Type.VOID, new Type[] { Type.STRING }, + Constants.INVOKESPECIAL)); +il.append(new ALOAD(name)); +il.append(factory.createInvoke("java.lang.StringBuffer", "append", + Type.STRINGBUFFER, new Type[] { Type.STRING }, + Constants.INVOKEVIRTUAL)); +il.append(factory.createInvoke("java.lang.StringBuffer", "toString", + Type.STRING, Type.NO_ARGS, + Constants.INVOKEVIRTUAL)); + +il.append(factory.createInvoke("java.io.PrintStream", "println", + Type.VOID, new Type[] { Type.STRING }, + Constants.INVOKEVIRTUAL)); +il.append(InstructionConstants.RETURN); + </source> + + + <p><b>Finalization:</b> Finally, we have to set the stack size, + which normally would have to be computed on the fly and add a + default constructor method to the class, which is empty in this + case. + </p> + + <source> +mg.setMaxStack(); +cg.addMethod(mg.getMethod()); +il.dispose(); // Allow instruction handles to be reused +cg.addEmptyConstructor(ACC_PUBLIC); + </source> + + <p> + Last but not least we dump the <tt>JavaClass</tt> object to a file. + </p> + + <source> +try { + cg.getJavaClass().dump("HelloWorld.class"); +} catch (IOException e) { + System.err.println(e); +} + </source> + + </subsection> + + <subsection name="Peephole optimizer"> + <p> + This class implements a simple peephole optimizer that removes any NOP + instructions from the given class. + </p> + + <source> +import java.io.*; + +import java.util.Iterator; +import org.apache.bcel.classfile.*; +import org.apache.bcel.generic.*; +import org.apache.bcel.Repository; +import org.apache.bcel.util.InstructionFinder; + +public class Peephole { + + public static void main(String[] argv) { + try { + // Load the class from CLASSPATH. + JavaClass clazz = Repository.lookupClass(argv[0]); + Method[] methods = clazz.getMethods(); + ConstantPoolGen cp = new ConstantPoolGen(clazz.getConstantPool()); + + for (int i = 0; i < methods.length; i++) { + if (!(methods[i].isAbstract() || methods[i].isNative())) { + MethodGen mg = new MethodGen(methods[i], clazz.getClassName(), cp); + Method stripped = removeNOPs(mg); + + if (stripped != null) // Any NOPs stripped? + methods[i] = stripped; // Overwrite with stripped method + } + } + + // Dump the class to "class name"_.class + clazz.setConstantPool(cp.getFinalConstantPool()); + clazz.dump(clazz.getClassName() + "_.class"); + } catch (Exception e) { + e.printStackTrace(); + } + } + + private static Method removeNOPs(MethodGen mg) { + InstructionList il = mg.getInstructionList(); + InstructionFinder f = new InstructionFinder(il); + String pat = "NOP+"; // Find at least one NOP + InstructionHandle next = null; + int count = 0; + + for (Iterator iter = f.search(pat); iter.hasNext();) { + InstructionHandle[] match = (InstructionHandle[]) iter.next(); + InstructionHandle first = match[0]; + InstructionHandle last = match[match.length - 1]; + + // Some nasty Java compilers may add NOP at end of method. + if ((next = last.getNext()) == null) { + break; + } + + count += match.length; + + /** + * Delete NOPs and redirect any references to them to the following (non-nop) instruction. + */ + try { + il.delete(first, last); + } catch (TargetLostException e) { + for (InstructionHandle target : e.getTargets()) { + for (InstructionTargeter targeter = target.getTargeters()) { + targeter.updateTarget(target, next); + } + } + } + } + + Method m = null; + + if (count > 0) { + System.out.println("Removed " + count + " NOP instructions from method " + mg.getName()); + m = mg.getMethod(); + } + + il.dispose(); // Reuse instruction handles + return m; + } +} + </source> + </subsection> + + <subsection name="BCELifier"> + <p> + If you want to learn how certain things are generated using BCEL you + can do the following: Write your program with the needed features in + Java and compile it as usual. Then use <tt>BCELifier</tt> to create + a class that creates that very input class using BCEL.<br/> + (Think about this sentence for a while, or just try it ...) + </p> + </subsection> + + <subsection name="Constant pool UML diagram"> + + <p align="center"> + <a name="Figure 8"> + <img src="../images/constantpool.gif"/> + <br/> + Figure 8: UML diagram for constant pool classes + </a> + </p> + </subsection> + </section> + </body> +</document>
\ No newline at end of file diff --git a/src/site/xdoc/manual/application-areas.xml b/src/site/xdoc/manual/application-areas.xml new file mode 100644 index 00000000..2f96bca6 --- /dev/null +++ b/src/site/xdoc/manual/application-areas.xml @@ -0,0 +1,146 @@ +<?xml version="1.0"?> +<!-- + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. +--> +<document> + <properties> + <title>Application areas</title> + </properties> + + <body> + <section name="Application areas"> + <p> + There are many possible application areas for <font + face="helvetica,arial">BCEL</font> ranging from class + browsers, profilers, byte code optimizers, and compilers to + sophisticated run-time analysis tools and extensions to the Java + language. + </p> + + <p> + Compilers like the <a + href="http://barat.sourceforge.net">Barat</a> compiler use <font + face="helvetica,arial">BCEL</font> to implement a byte code + generating back end. Other possible application areas are the + static analysis of byte code or examining the run-time behavior of + classes by inserting calls to profiling methods into the + code. Further examples are extending Java with Eiffel-like + assertions, automated delegation, or with the concepts of <a + href="http://www.eclipse.org/aspectj/">Aspect-Oriented Programming</a>.<br/> A + list of projects using <font face="helvetica,arial">BCEL</font> can + be found <a href="../projects.html">here</a>. + </p> + + <subsection name="Class loaders"> + <p> + Class loaders are responsible for loading class files from the + file system or other resources and passing the byte code to the + Virtual Machine. A custom <tt>ClassLoader</tt> object may be used + to intercept the standard procedure of loading a class, i.e.m the + system class loader, and perform some transformations before + actually passing the byte code to the JVM. + </p> + + <p> + A possible scenario is described in <a href="#Figure 7">figure + 7</a>: + During run-time the Virtual Machine requests a custom class loader + to load a given class. But before the JVM actually sees the byte + code, the class loader makes a "side-step" and performs some + transformation to the class. To make sure that the modified byte + code is still valid and does not violate any of the JVM's rules it + is checked by the verifier before the JVM finally executes it. + </p> + + <p align="center"> + <a name="Figure 7"> + <img src="../images/classloader.gif"/> + <br/> + Figure 7: Class loaders + </a> + </p> + + <p> + Using class loaders is an elegant way of extending the Java + Virtual Machine with new features without actually modifying it. + This concept enables developers to use <em>load-time + reflection</em> to implement their ideas as opposed to the static + reflection supported by the <a + href="http://java.sun.com/j2se/1.3/docs/guide/reflection/index.html">Java + Reflection API</a>. Load-time transformations supply the user with + a new level of abstraction. He is not strictly tied to the static + constraints of the original authors of the classes but may + customize the applications with third-party code in order to + benefit from new features. Such transformations may be executed on + demand and neither interfere with other users, nor alter the + original byte code. In fact, class loaders may even create classes + <em>ad hoc</em> without loading a file at all.<br/> <font + face="helvetica,arial">BCEL</font> has already builtin support for + dynamically creating classes, an example is the ProxyCreator class. + </p> + + </subsection> + + <subsection name="Example: Poor Man's Genericity"> + <p> + The former "Poor Man's Genericity" project that extended Java with + parameterized classes, for example, used <font + face="helvetica,arial">BCEL</font> in two places to generate + instances of parameterized classes: During compile-time (with the + standard <tt>javac</tt> with some slightly changed classes) and at + run-time using a custom class loader. The compiler puts some + additional type information into class files (attributes) which is + evaluated at load-time by the class loader. The class loader + performs some transformations on the loaded class and passes them + to the VM. The following algorithm illustrates how the load method + of the class loader fulfills the request for a parameterized + class, e.g., <tt>Stack<String></tt> + </p> + + <p> + <ol type="1"> + <li> Search for class <tt>Stack</tt>, load it, and check for a + certain class attribute containing additional type + information. I.e. the attribute defines the "real" name of the + class, i.e., <tt>Stack<A></tt>.</li> + + <li>Replace all occurrences and references to the formal type + <tt>A</tt> with references to the actual type <tt>String</tt>. For + example the method + </li> + + <source> + void push(A obj) { ... } + </source> + + <p> + becomes + </p> + + <source> + void push(String obj) { ... } + </source> + + <li> Return the resulting class to the Virtual Machine.</li> + </ol> + </p> + + </subsection> + </section> + </body> +</document>
\ No newline at end of file diff --git a/src/site/xdoc/manual/bcel-api.xml b/src/site/xdoc/manual/bcel-api.xml new file mode 100644 index 00000000..8417f1d1 --- /dev/null +++ b/src/site/xdoc/manual/bcel-api.xml @@ -0,0 +1,645 @@ +<?xml version="1.0"?> +<!-- + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. +--> +<document> + <properties> + <title>The BCEL API</title> + </properties> + + <body> + <section name="The BCEL API"> + <p> + The <font face="helvetica,arial">BCEL</font> API abstracts from + the concrete circumstances of the Java Virtual Machine and how to + read and write binary Java class files. The API mainly consists + of three parts: + </p> + + <p> + + <ol type="1"> + <li> A package that contains classes that describe "static" + constraints of class files, i.e., reflects the class file format and + is not intended for byte code modifications. The classes may be + used to read and write class files from or to a file. This is + useful especially for analyzing Java classes without having the + source files at hand. The main data structure is called + <tt>JavaClass</tt> which contains methods, fields, etc..</li> + + <li> A package to dynamically generate or modify + <tt>JavaClass</tt> or <tt>Method</tt> objects. It may be used to + insert analysis code, to strip unnecessary information from class + files, or to implement the code generator back-end of a Java + compiler.</li> + + <li> Various code examples and utilities like a class file viewer, + a tool to convert class files into HTML, and a converter from + class files to the <a + href="http://jasmin.sourceforge.net">Jasmin</a> assembly + language.</li> + </ol> + </p> + + <subsection name="JavaClass"> + <p> + The "static" component of the <font + face="helvetica,arial">BCEL</font> API resides in the package + <tt>org.apache.bcel.classfile</tt> and closely represents class + files. All of the binary components and data structures declared + in the <a + href="http://docs.oracle.com/javase/specs/">JVM + specification</a> and described in section <a + href="#2 The Java Virtual Machine">2</a> are mapped to classes. + + <a href="#Figure 3">Figure 3</a> shows an UML diagram of the + hierarchy of classes of the <font face="helvetica,arial">BCEL + </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also + shows a detailed diagram of the <tt>ConstantPool</tt> components. + </p> + + <p align="center"> + <a name="Figure 3"> + <img src="../images/javaclass.gif"/> <br/> + Figure 3: UML diagram for the JavaClass API</a> + </p> + + <p> + The top-level data structure is <tt>JavaClass</tt>, which in most + cases is created by a <tt>ClassParser</tt> object that is capable + of parsing binary class files. A <tt>JavaClass</tt> object + basically consists of fields, methods, symbolic references to the + super class and to the implemented interfaces. + </p> + + <p> + The constant pool serves as some kind of central repository and is + thus of outstanding importance for all components. + <tt>ConstantPool</tt> objects contain an array of fixed size of + <tt>Constant</tt> entries, which may be retrieved via the + <tt>getConstant()</tt> method taking an integer index as argument. + Indexes to the constant pool may be contained in instructions as + well as in other components of a class file and in constant pool + entries themselves. + </p> + + <p> + Methods and fields contain a signature, symbolically defining + their types. Access flags like <tt>public static final</tt> occur + in several places and are encoded by an integer bit mask, e.g., + <tt>public static final</tt> matches to the Java expression + </p> + + + <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source> + + <p> + As mentioned in <a href="jvm.html#Java_class_file_format">section + 2.1</a> already, several components may contain <em>attribute</em> + objects: classes, fields, methods, and <tt>Code</tt> objects + (introduced in <a href="jvm.html#Method_code">section 2.3</a>). The + latter is an attribute itself that contains the actual byte code + array, the maximum stack size, the number of local variables, a + table of handled exceptions, and some optional debugging + information coded as <tt>LineNumberTable</tt> and + <tt>LocalVariableTable</tt> attributes. Attributes are in general + specific to some data structure, i.e., no two components share the + same kind of attribute, though this is not explicitly + forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped + with the component they belong to. + </p> + + </subsection> + + <subsection name="Class repository"> + <p> + Using the provided <tt>Repository</tt> class, reading class files into + a <tt>JavaClass</tt> object is quite simple: + </p> + + <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source> + + <p> + The repository also contains methods providing the dynamic equivalent + of the <tt>instanceof</tt> operator, and other useful routines: + </p> + + <source> +if (Repository.instanceOf(clazz, super_class)) { + ... +} + </source> + + </subsection> + + <h4>Accessing class file data</h4> + + <p> + Information within the class file components may be accessed like + Java Beans via intuitive set/get methods. All of them also define + a <tt>toString()</tt> method so that implementing a simple class + viewer is very easy. In fact all of the examples used here have + been produced this way: + </p> + + <source> +System.out.println(clazz); +printCode(clazz.getMethods()); +... +public static void printCode(Method[] methods) { + for (int i = 0; i < methods.length; i++) { + System.out.println(methods[i]); + + Code code = methods[i].getCode(); + if (code != null) // Non-abstract method + System.out.println(code); + } +} + </source> + + <h4>Analyzing class data</h4> + <p> + Last but not least, <font face="helvetica,arial">BCEL</font> + supports the <em>Visitor</em> design pattern, so one can write + visitor objects to traverse and analyze the contents of a class + file. Included in the distribution is a class + <tt>JasminVisitor</tt> that converts class files into the <a + href="http://jasmin.sourceforge.net">Jasmin</a> + assembler language. + </p> + + <subsection name="ClassGen"> + <p> + This part of the API (package <tt>org.apache.bcel.generic</tt>) + supplies an abstraction level for creating or transforming class + files dynamically. It makes the static constraints of Java class + files like the hard-coded byte code addresses "generic". The + generic constant pool, for example, is implemented by the class + <tt>ConstantPoolGen</tt> which offers methods for adding different + types of constants. Accordingly, <tt>ClassGen</tt> offers an + interface to add methods, fields, and attributes. + <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API. + </p> + + <p align="center"> + <a name="Figure 4"> + <img src="../images/classgen.gif"/> + <br/> + Figure 4: UML diagram of the ClassGen API</a> + </p> + + <h4>Types</h4> + <p> + We abstract from the concrete details of the type signature syntax + (see <a href="jvm.html#Type_information">2.5</a>) by introducing the + <tt>Type</tt> class, which is used, for example, by methods to + define their return and argument types. Concrete sub-classes are + <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt> + which consists of the element type and the number of + dimensions. For commonly used types the class offers some + predefined constants. For example, the method signature of the + <tt>main</tt> method as shown in + <a href="jvm.html#Type_information">section 2.5</a> is represented by: + </p> + + <source> +Type return_type = Type.VOID; +Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) }; + </source> + + <p> + <tt>Type</tt> also contains methods to convert types into textual + signatures and vice versa. The sub-classes contain implementations + of the routines and constraints specified by the Java Language + Specification. + </p> + + <h4>Generic fields and methods</h4> + <p> + Fields are represented by <tt>FieldGen</tt> objects, which may be + freely modified by the user. If they have the access rights + <tt>static final</tt>, i.e., are constants and of basic type, they + may optionally have an initializing value. + </p> + + <p> + Generic methods contain methods to add exceptions the method may + throw, local variables, and exception handlers. The latter two are + represented by user-configurable objects as well. Because + exception handlers and local variables contain references to byte + code addresses, they also take the role of an <em>instruction + targeter</em> in our terminology. Instruction targeters contain a + method <tt>updateTarget()</tt> to redirect a reference. This is + somewhat related to the Observer design pattern. Generic + (non-abstract) methods refer to <em>instruction lists</em> that + consist of instruction objects. References to byte code addresses + are implemented by handles to instruction objects. If the list is + updated the instruction targeters will be informed about it. This + is explained in more detail in the following sections. + </p> + + <p> + The maximum stack size needed by the method and the maximum number + of local variables used may be set manually or computed via the + <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods + automatically. + </p> + + <h4>Instructions</h4> + <p> + Modeling instructions as objects may look somewhat odd at first + sight, but in fact enables programmers to obtain a high-level view + upon control flow without handling details like concrete byte code + offsets. Instructions consist of an opcode (sometimes called + tag), their length in bytes and an offset (or index) within the + byte code. Since many instructions are immutable (stack operators, + e.g.), the <tt>InstructionConstants</tt> interface offers + shareable predefined "fly-weight" constants to use. + </p> + + <p> + Instructions are grouped via sub-classing, the type hierarchy of + instruction classes is illustrated by (incomplete) figure in the + appendix. The most important family of instructions are the + <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to + targets somewhere within the byte code. Obviously, this makes them + candidates for playing an <tt>InstructionTargeter</tt> role, + too. Instructions are further grouped by the interfaces they + implement, there are, e.g., <tt>TypedInstruction</tt>s that are + associated with a specific type like <tt>ldc</tt>, or + <tt>ExceptionThrower</tt> instructions that may raise exceptions + when executed. + </p> + + <p> + All instructions can be traversed via <tt>accept(Visitor v)</tt> + methods, i.e., the Visitor design pattern. There is however some + special trick in these methods that allows to merge the handling + of certain instruction groups. The <tt>accept()</tt> do not only + call the corresponding <tt>visit()</tt> method, but call + <tt>visit()</tt> methods of their respective super classes and + implemented interfaces first, i.e., the most specific + <tt>visit()</tt> call is last. Thus one can group the handling of, + say, all <tt>BranchInstruction</tt>s into one single method. + </p> + + <p> + For debugging purposes it may even make sense to "invent" your own + instructions. In a sophisticated code generator like the one used + as a backend of the <a href="http://barat.sourceforge.net">Barat + framework</a> for static analysis one often has to insert + temporary <tt>nop</tt> (No operation) instructions. When examining + the produced code it may be very difficult to track back where the + <tt>nop</tt> was actually inserted. One could think of a derived + <tt>nop2</tt> instruction that contains additional debugging + information. When the instruction list is dumped to byte code, the + extra data is simply dropped. + </p> + + <p> + One could also think of new byte code instructions operating on + complex numbers that are replaced by normal byte code upon + load-time or are recognized by a new JVM. + </p> + + <h4>Instruction lists</h4> + <p> + An <em>instruction list</em> is implemented by a list of + <em>instruction handles</em> encapsulating instruction objects. + References to instructions in the list are thus not implemented by + direct pointers to instructions but by pointers to instruction + <em>handles</em>. This makes appending, inserting and deleting + areas of code very simple and also allows us to reuse immutable + instruction objects (fly-weight objects). Since we use symbolic + references, computation of concrete byte code offsets does not + need to occur until finalization, i.e., until the user has + finished the process of generating or transforming code. We will + use the term instruction handle and instruction synonymously + throughout the rest of the paper. Instruction handles may contain + additional user-defined data using the <tt>addAttribute()</tt> + method. + </p> + + <p> + <b>Appending:</b> One can append instructions or other instruction + lists anywhere to an existing list. The instructions are appended + after the given instruction handle. All append methods return a + new instruction handle which may then be used as the target of a + branch instruction, e.g.: + </p> + + <source> +InstructionList il = new InstructionList(); +... +GOTO g = new GOTO(null); +il.append(g); +... +// Use immutable fly-weight object +InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); +g.setTarget(ih); + </source> + + <p> + <b>Inserting:</b> Instructions may be inserted anywhere into an + existing list. They are inserted before the given instruction + handle. All insert methods return a new instruction handle which + may then be used as the start address of an exception handler, for + example. + </p> + + <source> +InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); +... +mg.addExceptionHandler(start, end, handler, "java.io.IOException"); + </source> + + <p> + <b>Deleting:</b> Deletion of instructions is also very + straightforward; all instruction handles and the contained + instructions within a given range are removed from the instruction + list and disposed. The <tt>delete()</tt> method may however throw + a <tt>TargetLostException</tt> when there are instruction + targeters still referencing one of the deleted instructions. The + user is forced to handle such exceptions in a <tt>try-catch</tt> + clause and redirect these references elsewhere. The <em>peep + hole</em> optimizer described in the appendix gives a detailed + example for this. + </p> + + <source> +try { + il.delete(first, last); +} catch (TargetLostException e) { + for (InstructionHandle target : e.getTargets()) { + for (InstructionTargeter targeter : target.getTargeters()) { + targeter.updateTarget(target, new_target); + } + } +} + </source> + + <p> + <b>Finalizing:</b> When the instruction list is ready to be dumped + to pure byte code, all symbolic references must be mapped to real + byte code offsets. This is done by the <tt>getByteCode()</tt> + method which is called by default by + <tt>MethodGen.getMethod()</tt>. Afterwards you should call + <tt>dispose()</tt> so that the instruction handles can be reused + internally. This helps to improve memory usage. + </p> + + <source> +InstructionList il = new InstructionList(); + +ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", + "<generated>", ACC_PUBLIC | ACC_SUPER, null); +MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, + Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) }, + new String[] { "argv" }, "main", "HelloWorld", il, cp); +... +cg.addMethod(mg.getMethod()); +il.dispose(); // Reuse instruction handles of list + </source> + + <h4>Code example revisited</h4> + <p> + Using instruction lists gives us a generic view upon the code: In + <a href="#Figure 5">Figure 5</a> we again present the code chunk + of the <tt>readInt()</tt> method of the factorial example in section + <a href="jvm.html#Code_example">2.6</a>: The local variables + <tt>n</tt> and <tt>e1</tt> both hold two references to + instructions, defining their scope. There are two <tt>goto</tt>s + branching to the <tt>iload</tt> at the end of the method. One of + the exception handlers is displayed, too: it references the start + and the end of the <tt>try</tt> block and also the exception + handler code. + </p> + + <p align="center"> + <a name="Figure 5"> + <img src="../images/il.gif"/> + <br/> + Figure 5: Instruction list for <tt>readInt()</tt> method</a> + </p> + + <h4>Instruction factories</h4> + <p> + To simplify the creation of certain instructions the user can use + the supplied <tt>InstructionFactory</tt> class which offers a lot + of useful methods to create instructions from + scratch. Alternatively, he can also use <em>compound + instructions</em>: When producing byte code, some patterns + typically occur very frequently, for instance the compilation of + arithmetic or comparison expressions. You certainly do not want + to rewrite the code that translates such expressions into byte + code in every place they may appear. In order to support this, the + <font face="helvetica,arial">BCEL</font> API includes a <em>compound + instruction</em> (an interface with a single + <tt>getInstructionList()</tt> method). Instances of this class + may be used in any place where normal instructions would occur, + particularly in append operations. + </p> + + <p> + <b>Example: Pushing constants</b> Pushing constants onto the + operand stack may be coded in different ways. As explained in <a + href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are + some "short-cut" instructions that can be used to make the + produced byte code more compact. The smallest instruction to push + a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other + possibilities are <tt>bipush</tt> (can be used to push values + between -128 and 127), <tt>sipush</tt> (between -32768 and 32767), + or <tt>ldc</tt> (load constant from constant pool). + </p> + + <p> + Instead of repeatedly selecting the most compact instruction in, + say, a switch, one can use the compound <tt>PUSH</tt> instruction + whenever pushing a constant number or string. It will produce the + appropriate byte code instruction and insert entries into to + constant pool if necessary. + </p> + + <source> +InstructionFactory f = new InstructionFactory(class_gen); +InstructionList il = new InstructionList(); +... +il.append(new PUSH(cp, "Hello, world")); +il.append(new PUSH(cp, 4711)); +... +il.append(f.createPrintln("Hello World")); +... +il.append(f.createReturn(type)); + </source> + + <h4>Code patterns using regular expressions</h4> + <p> + When transforming code, for instance during optimization or when + inserting analysis method calls, one typically searches for + certain patterns of code to perform the transformation at. To + simplify handling such situations <font + face="helvetica,arial">BCEL </font>introduces a special feature: + One can search for given code patterns within an instruction list + using <em>regular expressions</em>. In such expressions, + instructions are represented by their opcode names, e.g., + <tt>LDC</tt>, one may also use their respective super classes, e.g., + "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>, + <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus, + the expression + </p> + + <source>"NOP+(ILOAD|ALOAD)*"</source> + + <p> + represents a piece of code consisting of at least one <tt>NOP</tt> + followed by a possibly empty sequence of <tt>ILOAD</tt> and + <tt>ALOAD</tt> instructions. + </p> + + <p> + The <tt>search()</tt> method of class + <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular + expression and a starting point as arguments and returns an + iterator describing the area of matched instructions. Additional + constraints to the matching area of instructions, which can not be + implemented via regular expressions, may be expressed via <em>code + constraint</em> objects. + </p> + + <h4>Example: Optimizing boolean expressions</h4> + <p> + In Java, boolean values are mapped to 1 and to 0, + respectively. Thus, the simplest way to evaluate boolean + expressions is to push a 1 or a 0 onto the operand stack depending + on the truth value of the expression. But this way, the + subsequent combination of boolean expressions (with + <tt>&&</tt>, e.g) yields long chunks of code that push + lots of 1s and 0s onto the stack. + </p> + + <p> + When the code has been finalized these chunks can be optimized + with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt> + (e.g. the comparison of two integers: <tt>if_icmpeq</tt>) that + either produces a 1 or a 0 on the stack and is followed by an + <tt>ifne</tt> instruction (branch if stack value 0) may be + replaced by the <tt>IfInstruction</tt> with its branch target + replaced by the target of the <tt>ifne</tt> instruction: + </p> + + <source> +CodeConstraint constraint = new CodeConstraint() { + public boolean checkCode(InstructionHandle[] match) { + IfInstruction if1 = (IfInstruction) match[0].getInstruction(); + GOTO g = (GOTO) match[2].getInstruction(); + return (if1.getTarget() == match[3]) && + (g.getTarget() == match[4]); + } +}; + +InstructionFinder f = new InstructionFinder(il); +String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; + +for (Iterator e = f.search(pat, constraint); e.hasNext(); ) { + InstructionHandle[] match = (InstructionHandle[]) e.next();; + ... + match[0].setTarget(match[5].getTarget()); // Update target + ... + try { + il.delete(match[1], match[5]); + } catch (TargetLostException ex) { + ... + } +} + </source> + + <p> + The applied code constraint object ensures that the matched code + really corresponds to the targeted expression pattern. Subsequent + application of this algorithm removes all unnecessary stack + operations and branch instructions from the byte code. If any of + the deleted instructions is still referenced by an + <tt>InstructionTargeter</tt> object, the reference has to be + updated in the <tt>catch</tt>-clause. + </p> + + <p> + <b>Example application:</b> + The expression: + </p> + + <source> + if ((a == null) || (i < 2)) + System.out.println("Ooops"); + </source> + + <p> + can be mapped to both of the chunks of byte code shown in <a + href="#Figure 6">figure 6</a>. The left column represents the + unoptimized code while the right column displays the same code + after the peep hole algorithm has been applied: + </p> + + <p align="center"><a name="Figure 6"> + <table> + <tr> + <td valign="top"><pre> + 5: aload_0 + 6: ifnull #13 + 9: iconst_0 + 10: goto #14 + 13: iconst_1 + 14: nop + 15: ifne #36 + 18: iload_1 + 19: iconst_2 + 20: if_icmplt #27 + 23: iconst_0 + 24: goto #28 + 27: iconst_1 + 28: nop + 29: ifne #36 + 32: iconst_0 + 33: goto #37 + 36: iconst_1 + 37: nop + 38: ifeq #52 + 41: getstatic System.out + 44: ldc "Ooops" + 46: invokevirtual println + 52: return + </pre></td> + <td valign="top"><pre> + 10: aload_0 + 11: ifnull #19 + 14: iload_1 + 15: iconst_2 + 16: if_icmpge #27 + 19: getstatic System.out + 22: ldc "Ooops" + 24: invokevirtual println + 27: return + </pre></td> + </tr> + </table> + </a> + </p> + </subsection> + </section> + </body> +</document>
\ No newline at end of file diff --git a/src/site/xdoc/manual/introduction.xml b/src/site/xdoc/manual/introduction.xml new file mode 100644 index 00000000..53766bd8 --- /dev/null +++ b/src/site/xdoc/manual/introduction.xml @@ -0,0 +1,80 @@ +<?xml version="1.0"?> +<!-- + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. +--> +<document> + <properties> + <title>Introduction</title> + </properties> + + <body> + + <section name="Introduction"> + <p> + The <a href="http://java.sun.com/">Java</a> language has become + very popular and many research projects deal with further + improvements of the language or its run-time behavior. The + possibility to extend a language with new concepts is surely a + desirable feature, but the implementation issues should be hidden + from the user. Fortunately, the concepts of the Java Virtual + Machine permit the user-transparent implementation of such + extensions with relatively little effort. + </p> + + <p> + Because the target language of Java is an interpreted language + with a small and easy-to-understand set of instructions (the + <em>byte code</em>), developers can implement and test their + concepts in a very elegant way. One can write a plug-in + replacement for the system's <em>class loader</em> which is + responsible for dynamically loading class files at run-time and + passing the byte code to the Virtual Machine (see section ). + Class loaders may thus be used to intercept the loading process + and transform classes before they get actually executed by the + JVM. While the original class files always remain unaltered, the + behavior of the class loader may be reconfigured for every + execution or instrumented dynamically. + </p> + + <p> + The <font face="helvetica,arial">BCEL</font> API (Byte Code + Engineering Library), formerly known as JavaClass, is a toolkit + for the static analysis and dynamic creation or transformation of + Java class files. It enables developers to implement the desired + features on a high level of abstraction without handling all the + internal details of the Java class file format and thus + re-inventing the wheel every time. <font face="helvetica,arial">BCEL + </font> is written entirely in Java and freely available under the + terms of the <a href="license.html">Apache Software License</a>. + </p> + + <p> + This manual is structured as follows: We give a brief description + of the Java Virtual Machine and the class file format in <a + href="jvm.html">section 2</a>. <a href="bcel-api.html">Section 3</a> + introduces the <font face="helvetica,arial">BCEL</font> API. + <a href="application-areas.html">Section 4</a> describes some typical + application areas and example projects. The appendix contains code examples + that are to long to be presented in the main part of this paper. All examples + are included in the down-loadable distribution. + </p> + </section> + + </body> + +</document>
\ No newline at end of file diff --git a/src/site/xdoc/manual/jvm.xml b/src/site/xdoc/manual/jvm.xml new file mode 100644 index 00000000..92197518 --- /dev/null +++ b/src/site/xdoc/manual/jvm.xml @@ -0,0 +1,502 @@ +<?xml version="1.0"?> +<!-- + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. +--> +<document> + <properties> + <title>The Java Virtual Machine</title> + </properties> + + <body> + <section name="The Java Virtual Machine"> + <p> + Readers already familiar with the Java Virtual Machine and the + Java class file format may want to skip this section and proceed + with <a href="bcel-api.html">section 3</a>. + </p> + + <p> + Programs written in the Java language are compiled into a portable + binary format called <em>byte code</em>. Every class is + represented by a single class file containing class related data + and byte code instructions. These files are loaded dynamically + into an interpreter (<a + href="http://docs.oracle.com/javase/specs/">Java + Virtual Machine</a>, aka. JVM) and executed. + </p> + + <p> + <a href="#Figure 1">Figure 1</a> illustrates the procedure of + compiling and executing a Java class: The source file + (<tt>HelloWorld.java</tt>) is compiled into a Java class file + (<tt>HelloWorld.class</tt>), loaded by the byte code interpreter + and executed. In order to implement additional features, + researchers may want to transform class files (drawn with bold + lines) before they get actually executed. This application area + is one of the main issues of this article. + </p> + + <p align="center"> + <a name="Figure 1"> + <img src="../images/jvm.gif"/> + <br/> + Figure 1: Compilation and execution of Java classes</a> + </p> + + <p> + Note that the use of the general term "Java" implies in fact two + meanings: on the one hand, Java as a programming language, on the + other hand, the Java Virtual Machine, which is not necessarily + targeted by the Java language exclusively, but may be used by <a + href="http://www.robert-tolksdorf.de/vmlanguages.html">other + languages</a> as well. We assume the reader to be familiar with + the Java language and to have a general understanding of the + Virtual Machine. + </p> + + <subsection name="Java class file format"> + <p> + Giving a full overview of the design issues of the Java class file + format and the associated byte code instructions is beyond the + scope of this paper. We will just give a brief introduction + covering the details that are necessary for understanding the rest + of this paper. The format of class files and the byte code + instruction set are described in more detail in the <a + href="http://docs.oracle.com/javase/specs/">Java + Virtual Machine Specification</a>. Especially, we will not deal + with the security constraints that the Java Virtual Machine has to + check at run-time, i.e. the byte code verifier. + </p> + + <p> + <a href="#Figure 2">Figure 2</a> shows a simplified example of the + contents of a Java class file: It starts with a header containing + a "magic number" (<tt>0xCAFEBABE</tt>) and the version number, + followed by the <em>constant pool</em>, which can be roughly + thought of as the text segment of an executable, the <em>access + rights</em> of the class encoded by a bit mask, a list of + interfaces implemented by the class, lists containing the fields + and methods of the class, and finally the <em>class + attributes</em>, e.g., the <tt>SourceFile</tt> attribute telling + the name of the source file. Attributes are a way of putting + additional, user-defined information into class file data + structures. For example, a custom class loader may evaluate such + attribute data in order to perform its transformations. The JVM + specification declares that unknown, i.e., user-defined attributes + must be ignored by any Virtual Machine implementation. + </p> + + <p align="center"> + <a name="Figure 2"> + <img src="../images/classfile.gif"/> + <br/> + Figure 2: Java class file format</a> + </p> + + <p> + Because all of the information needed to dynamically resolve the + symbolic references to classes, fields and methods at run-time is + coded with string constants, the constant pool contains in fact + the largest portion of an average class file, approximately + 60%. In fact, this makes the constant pool an easy target for code + manipulation issues. The byte code instructions themselves just + make up 12%. + </p> + + <p> + The right upper box shows a "zoomed" excerpt of the constant pool, + while the rounded box below depicts some instructions that are + contained within a method of the example class. These + instructions represent the straightforward translation of the + well-known statement: + </p> + + <p align="center"> + <source>System.out.println("Hello, world");</source> + </p> + + <p> + The first instruction loads the contents of the field <tt>out</tt> + of class <tt>java.lang.System</tt> onto the operand stack. This is + an instance of the class <tt>java.io.PrintStream</tt>. The + <tt>ldc</tt> ("Load constant") pushes a reference to the string + "Hello world" on the stack. The next instruction invokes the + instance method <tt>println</tt> which takes both values as + parameters (Instance methods always implicitly take an instance + reference as their first argument). + </p> + + <p> + Instructions, other data structures within the class file and + constants themselves may refer to constants in the constant pool. + Such references are implemented via fixed indexes encoded directly + into the instructions. This is illustrated for some items of the + figure emphasized with a surrounding box. + </p> + + <p> + For example, the <tt>invokevirtual</tt> instruction refers to a + <tt>MethodRef</tt> constant that contains information about the + name of the called method, the signature (i.e., the encoded + argument and return types), and to which class the method belongs. + In fact, as emphasized by the boxed value, the <tt>MethodRef</tt> + constant itself just refers to other entries holding the real + data, e.g., it refers to a <tt>ConstantClass</tt> entry containing + a symbolic reference to the class <tt>java.io.PrintStream</tt>. + To keep the class file compact, such constants are typically + shared by different instructions and other constant pool + entries. Similarly, a field is represented by a <tt>Fieldref</tt> + constant that includes information about the name, the type and + the containing class of the field. + </p> + + <p> + The constant pool basically holds the following types of + constants: References to methods, fields and classes, strings, + integers, floats, longs, and doubles. + </p> + + </subsection> + + <subsection name="Byte code instruction set"> + <p> + The JVM is a stack-oriented interpreter that creates a local stack + frame of fixed size for every method invocation. The size of the + local stack has to be computed by the compiler. Values may also be + stored intermediately in a frame area containing <em>local + variables</em> which can be used like a set of registers. These + local variables are numbered from 0 to 65535, i.e., you have a + maximum of 65536 of local variables per method. The stack frames + of caller and callee method are overlapping, i.e., the caller + pushes arguments onto the operand stack and the called method + receives them in local variables. + </p> + + <p> + The byte code instruction set currently consists of 212 + instructions, 44 opcodes are marked as reserved and may be used + for future extensions or intermediate optimizations within the + Virtual Machine. The instruction set can be roughly grouped as + follows: + </p> + + <p> + <b>Stack operations:</b> Constants can be pushed onto the stack + either by loading them from the constant pool with the + <tt>ldc</tt> instruction or with special "short-cut" + instructions where the operand is encoded into the instructions, + e.g., <tt>iconst_0</tt> or <tt>bipush</tt> (push byte value). + </p> + + <p> + <b>Arithmetic operations:</b> The instruction set of the Java + Virtual Machine distinguishes its operand types using different + instructions to operate on values of specific type. Arithmetic + operations starting with <tt>i</tt>, for example, denote an + integer operation. E.g., <tt>iadd</tt> that adds two integers + and pushes the result back on the stack. The Java types + <tt>boolean</tt>, <tt>byte</tt>, <tt>short</tt>, and + <tt>char</tt> are handled as integers by the JVM. + </p> + + <p> + <b>Control flow:</b> There are branch instructions like + <tt>goto</tt>, and <tt>if_icmpeq</tt>, which compares two integers + for equality. There is also a <tt>jsr</tt> (jump to sub-routine) + and <tt>ret</tt> pair of instructions that is used to implement + the <tt>finally</tt> clause of <tt>try-catch</tt> blocks. + Exceptions may be thrown with the <tt>athrow</tt> instruction. + Branch targets are coded as offsets from the current byte code + position, i.e., with an integer number. + </p> + + <p> + <b>Load and store operations</b> for local variables like + <tt>iload</tt> and <tt>istore</tt>. There are also array + operations like <tt>iastore</tt> which stores an integer value + into an array. + </p> + + <p> + <b>Field access:</b> The value of an instance field may be + retrieved with <tt>getfield</tt> and written with + <tt>putfield</tt>. For static fields, there are + <tt>getstatic</tt> and <tt>putstatic</tt> counterparts. + </p> + + <p> + <b>Method invocation:</b> Static Methods may either be called via + <tt>invokestatic</tt> or be bound virtually with the + <tt>invokevirtual</tt> instruction. Super class methods and + private methods are invoked with <tt>invokespecial</tt>. A + special case are interface methods which are invoked with + <tt>invokeinterface</tt>. + </p> + + <p> + <b>Object allocation:</b> Class instances are allocated with the + <tt>new</tt> instruction, arrays of basic type like + <tt>int[]</tt> with <tt>newarray</tt>, arrays of references like + <tt>String[][]</tt> with <tt>anewarray</tt> or + <tt>multianewarray</tt>. + </p> + + <p> + <b>Conversion and type checking:</b> For stack operands of basic + type there exist casting operations like <tt>f2i</tt> which + converts a float value into an integer. The validity of a type + cast may be checked with <tt>checkcast</tt> and the + <tt>instanceof</tt> operator can be directly mapped to the + equally named instruction. + </p> + + <p> + Most instructions have a fixed length, but there are also some + variable-length instructions: In particular, the + <tt>lookupswitch</tt> and <tt>tableswitch</tt> instructions, which + are used to implement <tt>switch()</tt> statements. Since the + number of <tt>case</tt> clauses may vary, these instructions + contain a variable number of statements. + </p> + + <p> + We will not list all byte code instructions here, since these are + explained in detail in the <a + href="http://docs.oracle.com/javase/specs/">JVM + specification</a>. The opcode names are mostly self-explaining, + so understanding the following code examples should be fairly + intuitive. + </p> + + </subsection> + + <subsection name="Method code"> + <p> + Non-abstract (and non-native) methods contain an attribute + "<tt>Code</tt>" that holds the following data: The maximum size of + the method's stack frame, the number of local variables and an + array of byte code instructions. Optionally, it may also contain + information about the names of local variables and source file + line numbers that can be used by a debugger. + </p> + + <p> + Whenever an exception is raised during execution, the JVM performs + exception handling by looking into a table of exception + handlers. The table marks handlers, i.e., code chunks, to be + responsible for exceptions of certain types that are raised within + a given area of the byte code. When there is no appropriate + handler the exception is propagated back to the caller of the + method. The handler information is itself stored in an attribute + contained within the <tt>Code</tt> attribute. + </p> + + </subsection> + + <subsection name="Byte code offsets"> + <p> + Targets of branch instructions like <tt>goto</tt> are encoded as + relative offsets in the array of byte codes. Exception handlers + and local variables refer to absolute addresses within the byte + code. The former contains references to the start and the end of + the <tt>try</tt> block, and to the instruction handler code. The + latter marks the range in which a local variable is valid, i.e., + its scope. This makes it difficult to insert or delete code areas + on this level of abstraction, since one has to recompute the + offsets every time and update the referring objects. We will see + in <a href="bcel-api.html#ClassGen">section 3.3</a> how <font + face="helvetica,arial">BCEL</font> remedies this restriction. + </p> + + </subsection> + + <subsection name="Type information"> + <p> + Java is a type-safe language and the information about the types + of fields, local variables, and methods is stored in so called + <em>signatures</em>. These are strings stored in the constant pool + and encoded in a special format. For example the argument and + return types of the <tt>main</tt> method + </p> + + <p align="center"> + <source>public static void main(String[] argv)</source> + </p> + + <p> + are represented by the signature + </p> + + <p align="center"> + <source>([java/lang/String;)V</source> + </p> + + <p> + Classes are internally represented by strings like + <tt>"java/lang/String"</tt>, basic types like <tt>float</tt> by an + integer number. Within signatures they are represented by single + characters, e.g., <tt>I</tt>, for integer. Arrays are denoted with + a <tt>[</tt> at the start of the signature. + </p> + + </subsection> + + <subsection name="Code example"> + <p> + The following example program prompts for a number and prints the + factorial of it. The <tt>readLine()</tt> method reading from the + standard input may raise an <tt>IOException</tt> and if a + misspelled number is passed to <tt>parseInt()</tt> it throws a + <tt>NumberFormatException</tt>. Thus, the critical area of code + must be encapsulated in a <tt>try-catch</tt> block. + </p> + + <source> +import java.io.*; + +public class Factorial { + private static BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); + + public static int fac(int n) { + return (n == 0) ? 1 : n * fac(n - 1); + } + + public static int readInt() { + int n = 4711; + try { + System.out.print("Please enter a number> "); + n = Integer.parseInt(in.readLine()); + } catch (IOException e1) { + System.err.println(e1); + } catch (NumberFormatException e2) { + System.err.println(e2); + } + return n; + } + + public static void main(String[] argv) { + int n = readInt(); + System.out.println("Factorial of " + n + " is " + fac(n)); + } +} + </source> + + <p> + This code example typically compiles to the following chunks of + byte code: + </p> + + <source> + 0: iload_0 + 1: ifne #8 + 4: iconst_1 + 5: goto #16 + 8: iload_0 + 9: iload_0 + 10: iconst_1 + 11: isub + 12: invokestatic Factorial.fac (I)I (12) + 15: imul + 16: ireturn + + LocalVariable(start_pc = 0, length = 16, index = 0:int n) + </source> + + <p><b>fac():</b> + The method <tt>fac</tt> has only one local variable, the argument + <tt>n</tt>, stored at index 0. This variable's scope ranges from + the start of the byte code sequence to the very end. If the value + of <tt>n</tt> (the value fetched with <tt>iload_0</tt>) is not + equal to 0, the <tt>ifne</tt> instruction branches to the byte + code at offset 8, otherwise a 1 is pushed onto the operand stack + and the control flow branches to the final return. For ease of + reading, the offsets of the branch instructions, which are + actually relative, are displayed as absolute addresses in these + examples. + </p> + + <p> + If recursion has to continue, the arguments for the multiplication + (<tt>n</tt> and <tt>fac(n - 1)</tt>) are evaluated and the results + pushed onto the operand stack. After the multiplication operation + has been performed the function returns the computed value from + the top of the stack. + </p> + + <source> + 0: sipush 4711 + 3: istore_0 + 4: getstatic java.lang.System.out Ljava/io/PrintStream; + 7: ldc "Please enter a number> " + 9: invokevirtual java.io.PrintStream.print (Ljava/lang/String;)V + 12: getstatic Factorial.in Ljava/io/BufferedReader; + 15: invokevirtual java.io.BufferedReader.readLine ()Ljava/lang/String; + 18: invokestatic java.lang.Integer.parseInt (Ljava/lang/String;)I + 21: istore_0 + 22: goto #44 + 25: astore_1 + 26: getstatic java.lang.System.err Ljava/io/PrintStream; + 29: aload_1 + 30: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V + 33: goto #44 + 36: astore_1 + 37: getstatic java.lang.System.err Ljava/io/PrintStream; + 40: aload_1 + 41: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V + 44: iload_0 + 45: ireturn + + Exception handler(s) = + From To Handler Type + 4 22 25 java.io.IOException(6) + 4 22 36 NumberFormatException(10) + </source> + + <p><b>readInt():</b> First the local variable <tt>n</tt> (at index 0) + is initialized to the value 4711. The next instruction, + <tt>getstatic</tt>, loads the referencs held by the static + <tt>System.out</tt> field onto the stack. Then a string is loaded + and printed, a number read from the standard input and assigned to + <tt>n</tt>. + </p> + + <p> + If one of the called methods (<tt>readLine()</tt> and + <tt>parseInt()</tt>) throws an exception, the Java Virtual Machine + calls one of the declared exception handlers, depending on the + type of the exception. The <tt>try</tt>-clause itself does not + produce any code, it merely defines the range in which the + subsequent handlers are active. In the example, the specified + source code area maps to a byte code area ranging from offset 4 + (inclusive) to 22 (exclusive). If no exception has occurred + ("normal" execution flow) the <tt>goto</tt> instructions branch + behind the handler code. There the value of <tt>n</tt> is loaded + and returned. + </p> + + <p> + The handler for <tt>java.io.IOException</tt> starts at + offset 25. It simply prints the error and branches back to the + normal execution flow, i.e., as if no exception had occurred. + </p> + + </subsection> + </section> + </body> + +</document>
\ No newline at end of file diff --git a/src/site/xdoc/manual/manual.xml b/src/site/xdoc/manual/manual.xml new file mode 100644 index 00000000..e481f5d4 --- /dev/null +++ b/src/site/xdoc/manual/manual.xml @@ -0,0 +1,70 @@ +<?xml version="1.0"?> +<!-- + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. +--> +<document> + + <properties> + <title>Byte Code Engineering Library (BCEL)</title> + </properties> + + <body> + + <section name="Abstract"> + <p> + Extensions and improvements of the programming language Java and + its related execution environment (Java Virtual Machine, JVM) are + the subject of a large number of research projects and + proposals. There are projects, for instance, to add parameterized + types to Java, to implement <a + href="http://www.eclipse.org/aspectj/">Aspect-Oriented Programming</a>, to + perform sophisticated static analysis, and to improve the run-time + performance. + </p> + + <p> + Since Java classes are compiled into portable binary class files + (called <em>byte code</em>), it is the most convenient and + platform-independent way to implement these improvements not by + writing a new compiler or changing the JVM, but by transforming + the byte code. These transformations can either be performed + after compile-time, or at load-time. Many programmers are doing + this by implementing their own specialized byte code manipulation + tools, which are, however, restricted in the range of their + re-usability. + </p> + + <p> + To deal with the necessary class file transformations, we + introduce an API that helps developers to conveniently implement + their transformations. + </p> + </section> + + <section name="Table of Contents"> + <ul> + <li><a href="introduction.html">Introduction</a></li> + <li><a href="jvm.html">The Java Virtual Machine</a></li> + <li><a href="bcel-api.html">The BCEL API</a></li> + <li><a href="application-areas.html">Application Areas</a></li> + <li><a href="appendix.html">Appendix</a></li> + </ul> + </section> + +</body> +</document> |