Introduction
When we write programs (source code) in Java (or in any other compiled language), the execution process is as follows:
The source code written by the programmer is converted to machine code or byte code. The machine code or byte code is then executed by the processor of the host machine or by a virtual machine on the host machine that emulates a processor. Languages like C, C++, Go and Rust compiles source code into machine code that is executed by the processor of the host machine. Languages like Java, Kotlin, Groovy and Scala convert source code into an intermediate language known as byte code that is executed by a virtual machine called the JVM(Java Virtual Machine).
Portability is a benefit of the JVM. The same byte code can run on all machines that have the JVM installed without any need for modification. The JVM gave rise to Java's slogan- "write once, run anywhere". The JVM is installed as part of one of the following:
the Java Development Kit(JDK)
the Java Runtime Environment(JRE)
The JVM creates space on a host machine to execute Java programs and it operates irrespective of the Operating System or platform of the host machine. An instance of the JVM is provided whenever a Java program needs to be executed or run.
This article gives an insight into how the JVM works. An understanding of how the JVM works is important for Java programmers at all levels. At the end of this article, the reader should be able to:
- describe the JVM and explain how it facilitates WORA
- describe the components of the JVM
- describe the function of each component of the JVM
- explain the causes of common JVM errors
Components of the JVM
The JVM consists of the following five components:
- Class Loader
- Runtime Data Area
- Execution Engine
- Java Native Interface(JNI)
- Native Method Libraries
The following image shows the components of the JVM:
Let us describe each component in detail.
Class Loader
The javac compiler converts source code written by the programmer into byte code, which is resident in a .class
file. The class loader loads .class
files into the main memory. In a Java application, the class that contains the main method is first loaded into memory.
The phases involved in loading a .class
file into memory are as follows:
- Loading Phase
- Linking Phase
- Initialization Phase
The following image depicts the phases of the JVM class loader:
Let us talk about these three phases in a lot more detail.
1. Loading Phase: A class loader reads the generated byte code in .class
file. It parses the data in the byte code and then stores the following information in an area of the JVM memory called the Method Area:
- The kind of object that the parsed byte code represents- if it represents a Class, an Interface or an Enum.
- The fully qualified name of the Class, Interface or Enum that the byte code represents and fully qualified name of the immediate parent
- Information about the fields and methods in the Class, Interface or Enum.
Finally, the JVM creates an object of type Class
and a class loader loads the object of type Class into an area of the JVM memory called the Heap Area using the ClassLoader.loadClass
method.
The object of type Class
can be used by a programmer to retrieve class-related information. Let us do a small demonstration as follows:
import java.lang.reflect.Field;
import java.lang.reflect.Method;
public class ClassInformation {
public static void main(String[] args) {
_Native newNative = new _Native("Joe Doe", "SCV070001");
Class aClass = newNative.getClass();
String className = aClass.getSimpleName();
System.out.printf("The class of aClass: %s%n", className);
System.out.printf("The following methods are defined in %s:%n", className);
Method[] methods = aClass.getDeclaredMethods();
for (Method method: methods) {
System.out.printf("%7s%n", method.getName());
}
System.out.printf("The following fields are defined in %s:%n", className);
Field[] fields = aClass.getDeclaredFields();
for (Field field:fields) {
System.out.printf("%7s%n", field.getName());
}
}
}
class _Native{
private String name;
private String scv;
public _Native(String name, String scv) {
this.name = name;
this.scv = scv;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getScv() {
return scv;
}
public void setScv(String scv) {
this.scv = scv;
}
}
We obtain the following output:
The class of aClass: _Native
The following methods are defined in _Native:
getName
setName
getScv
setScv
The following fields are defined in _Native:
name
scv
Note that for every
.class
file only one object of typeClass
is produced.
Three types of class loaders exist. They are described as follows:
1. The Bootstrap ClassLoader: The Bootstrap Class Loader is the root class loader. It loads classes in the Java standard packages like java.lang
, java.io
, java.net
and java.util
into the Heap Area. These packages are found in the rt.jar file and other core libraries present in the $JAVA_HOME/jre/lib directory.
2. The Extension ClassLoader: The Extension Class Loader is a child of the Bootstrap Class Loader. The Extension Class Loader loads the extensions of standard Java libraries(which are present in the $JAVA_HOME/jre/lib/ext directory) into the Heap Area.
3. The Application Class Loader: The Application Class Loader is a child of the Extension Class Loader. It loads files that are present in the application's classpath into the Heap Area. By default, the application's classpath is set to the current directory of the application.
The JVM follows the Delegation-Hierarchy principle whenever it tries to find a class. The task of finding a referenced class begins with the Bootstrap Class Loader. If the Bootstrap Class Loader is unable to find a class it delegates the task to the Extension Class Loader and if the Extension Class Loader cannot find the referenced class, it delegates the task to the Application Class Loader. If the Application Class Loader cannot find the referenced class, then a NoClassFoundError
or a ClassNotFoundException
is triggered.
2. Linking: The Linking phase involves the following processes:
- Verification
- Preparation
- Resolution
Let us describe these processes in more detail:
- Verification: A ByteCodeVerifier verifies if the semantics of the byte code in a .class
file is valid. If the verification fails, a java.lang.VerifyError
is triggered. For instance, if the code has been built using Java 11 but is being run on a system that has Java 8 installed, the verification process will fail.
- Preparation: The JVM allocates memory for the class variables and initializes the variables to their default values according to their types.
- Resolution: The JVM replaces symbolic references with their actual references which are present in the Method Area of the JVM memory. For instance, if you have references to other classes or to variables or constants defined in other classes they are replaced with their actual references in this phase.
3. Initialization: After the processes of loading and linking a class or an interface, the JVM executes the initialization method of a class or an interface(also known as <clinit>
). During the process of executing the initialization method of a class or interface, the following takes place:
the class's constructor or static block is executed
the class's super-classes are initialized(if they are not already initialized)
the class's variables are assigned their values as specified by the programmer
The Runtime Data Area
The Runtime Data Area consists of five components, which are shown in the image below:
Let us look at each one individually:
1. Method Area: In the Method Area, class-related information like the class name, the parent name, method information, and variable information are stored
If the available memory in the Method Area is not sufficient when loading a program into memory, an OutOfMemoryError
is triggered. The Method Area is created when the JVM starts up. There is only one Method Area per instance of the JVM in a machine. In a multi-threaded environment, all threads shared the same Method Area.
2. Heap Area: All objects and their corresponding instance variables, as well as arrays, are stored in the Heap Area. The Heap Area is also created when the JVM starts up. There is only one Heap Area per instance of the JVM in a machine. In a multi-threaded environment, all threads shared the same Heap Area.
Note that since the Heap Area and the Method Area are shared resources, the information stored in these areas are not thread-safe.
3. Stack Area: Whenever the JVM creates a new thread, a new Stack Area is created. A Stack Area is divided into blocks called Activation Records or Stack Frames. Whenever a method call occurs, the following are pushed to the next available Stack Frame in a Stack Area:
the local variables of the calling method
the return address that the called method needs to return to the calling method
After a method call occurs the corresponding Stack Frame for that method call is popped off the Stack Area.
When a thread requires a larger stack size than what is available in the Stack Area, a StackOverflowError
is triggered.
Note that the Stack Area is not a shared resource, hence it is thread-safe
4. Program Counter(PC) Registers: PC registers hold the address of the instruction currently executed by the JVM. After an instruction is executed, the PC register is loaded with the address of the next instruction. In a multi-threaded environment, each thread has its PC register
4. Native Method Stacks: In a multi-threaded environment, each thread has its Native Method Stack. Information about native methods (methods written in C and C++) are stored in the Native Method Stack.
Execution Engine
After the Class Loader loads the byte code into memory, the Execution Engine uses the details in the Runtime Data Area to execute instructions. It executes instructions in a byte code using the following components:
- Interpreter
- Just-In-Time(JIT) compiler
- Garbage Collector
- Interpreter: The Interpreter reads each line of byte code instruction and executes it. The line by line execution of instructions by the interpreter makes it slow. A drawback of the Interpreter is that multiple calls to the same method require interpretation at each time.
- Just-In-Time(JIT) compiler: The Execution Engine uses the JIT compiler for execution whenever it finds repeated code in a byte code. The JIT compiler scans the entire byte code and converts it to native machine code. The JIT compiler replaces repeated code with the direct native code so that re-interpretation is not required.
The JIT compiler has the following components:
1. Intermediate Code Generator: The Intermediate Code Generator generates intermediate code.
2. Code Optimizer: The Code Optimizer optimizes intermediate code generated by the Intermediate Code Generator for better performance.
3. Target Code Generator: The Target Code Generator converts the optimized intermediate code into native machine code.
4. Profiler: The Profiler finds hotspots- code that is executed repeatedly. Whenever repeated code is encountered during execution, the JIT compiler recognizes that the code has a hotspot and it replaces the repeated code with its corresponding direct native code.
- Garbage Collector: The Garbage Collector performs Garbage Collection on the Heap Area. Garbage Collection is the process of removing unreferenced objects from the Heap Area. Garbage Collection involves two phases:
- Mark: In this process, the Garbage Collector marks unreferenced objects in the Heap Area.
- Sweep: In this process, the Garbage Collector cleans the object marked during the Mark process.
Garbage Collection is done automatically by the JVM at regular intervals. The Garbage Collector can also be invoked by calling the System.gc
method but its execution is not guaranteed.
Java Native Interface(JNI)
The Java Native Interface loads Native Methods Libraries into the system memory. The Java Native Interface provides an interface for executing native methods (methods written in C and C++).
Native Method Libraries
The Native Method Libraries contain code that is written in native languages: C, C++ and Assembly Language. Native methods are required in cases where we need to write code that is not entirely supported by Java for instance when we need to interact with the system hardware. We add the native
keyword to a method header to indicate that the implementation of the method is available in a native library. After defining a native method, we use the System.loadLibrary
method to load the shared Native Library into the system memory and to make its methods available to Java. Native Libraries usually exist in the form of .so/.dll/.dylib files
Common JVM Errors
ClassNotFoundException
: The Class Loader triggers aClassNotFoundException
when the Class it is trying to load a class but does not find a definition for the specified class name. The Class Loader usesClass.forName
method or theClassLoader.loadClass
method or theClassLoader.findSystemClass
method to load a class into memory.NoClassDefFoundError
: The JVM triggers this error when a class has been successfully compiled but the Class Loader cannot find the.class
file at runtime.OutOfMemoryError
: The JVM triggers this error whenever it is out of memory and no more memory can be made available by the Garbage Collector.StackOverflowError
: The JVM triggers this error when a thread requires more stack memory than is available in the Stack Area.
Conclusion
In this article, we discussed the JVM and its internal components. A good understanding of how the JVM works helps us navigate common JVM related errors like the StackOverflowError
when coding. JVM related questions are popular during Junior and Senior level interviews for Backend Engineers.
Thank you for staying till the end. I hoped that you liked the article. Feel free to leave comments in the comment section. You can connect with me on Twitter- @ehizman_tutored and also take a look at my other articles.
Always remember to code with ❤️