JVM Injected Fields

Overview

Objects in Java usually have fields to store data. For example, in the Rectangle class below, it has two long fields - the length and width.

public class Rectangle {
    private long length;
    private long width;
}

Under the hood, the Java Virtual Machine (JVM) stores instances of a class in an oopDesc. Each oopDesc contains a markWord and a Klass pointer followed by the fields of the instance. The Klass is the JVM's representation of the Java Class, and pointers to an oopDesc (i.e. pointers to a Java object) are called Ordinary Object Pointers (oops).

Diagram illustrating layout of oop

It is useful to conceptualise the fields of an instance in an oopDesc as a region of memory. The location of a given field (i.e. field offset) is determined during class loading and this information is stored on the Klass. When a given field is accessed through a get operation, the memory at the corresponding field offset and size is obtained. This is shown in a section below.

Within HotSpot's JDK 17 implementation, there are fields which exist on the oopDesc, but a corresponding field in the Java class does not exist. These fields are injected into the Java class by the JVM for its use. For example, the Rectangle class could have an integer field called count that is not present in the above declaration but is used for JVM logic.

By leveraging these injected fields, we are able to use a Java object to store a CHERI capability. This object serves three purposes:

  1. Encapsulates operations on and accesses to pointers
  2. It preserves the capability while it is passed around within Java code.
  3. Restricts accesses to the capability to native code, e.g. through the Java Native Interface (JNI)

Field Layout

The field layout of a class is determined during the class loading and resolution phase. This layout can be interrogated using the JVM -XX:+PrintFieldLayout flag.

For a generic overview of memory layout in Java, check out this Baeldung article. For a more in-depth explanation about field layout, check out Aleksey Shipilëv's article.

Without Fields

When a class does not have any fields (e.g. java.lang.Object class), the layout is such that it only contains a mark word and the Klass pointer.

package java.lang;

public class Object {
}

From the output below for the java.lang.Object class, it can be seen that the first 32 bytes of the instance fields are RESERVED. The first 16 bytes correspond to the mark word and the next 16 bytes are the Klass pointer. As pointers are 16 bytes wide on CHERI platforms, the mark word (which occasionally stores a pointer) and the Klass pointer must both be 16 bytes.

Layout of class java/lang/Object
Instance fields:
 @0 32/- RESERVED
Static fields:
Instance size = 32 bytes

Single Field

When a single field is present, such as in the Circle class below, the field will appear after the mark word and Klass pointer.

public class Circle {
    private final long radius;
}

Conceptually, when the radius field is obtained through a get operation, the oopDesc for the instance is accessed at an offset of 32 bytes and the next 8 bytes are read. Those 8 bytes are "returned” as a Java long.

Instance fields:
 @0 32/- RESERVED
 @32 "radius" J 8/8 REGULAR
Static fields:
Instance size = 40 bytes

Injected Field Use in MemoryAddress

As mentioned in Eloise's blog post. We've used an injected field to store a pointer within the MemoryAddress class. This was the perfect use case as we wanted to hide this information from Java user code, and the injected field is only used by the JVM.

On CHERI platforms, using an injected field was necessitated for three reasons,

  1. In Java, the largest primitive type is a 64-bit long. This is not large enough to store a 128-bit capability. An object would be large enough (as the JVM representation is an oop which is a pointer). However, the garbage collector would mistake it for being an instance that needs to be collected. Therefore, it would be extremely unwise to highjack the object type and have "real" and "fake" objects.
  2. Using a Java primitive type to hold a capability would cast it to a pure integer type, thereby clearing the tag bit.
  3. Performance considerations. Our initial prototype of the MemoryAddress class stored the capability within a long[2] (i.e., a primitive Java long array of length 2). This meant it was not strictly necessary to store the capability within an injected field as it is representable within Java code. However, the need to preserve the capabilities by wrapping it in a Java object (instead of a primitive long) inflicts a significant performance and memory penalty. The size of the penalty is underpinned by the class’s implementation. This made the long[2] implementation unsuitable as it added: 1) the extra memory footprint of an array object; and 2) another layer of indirection when accessing the capability.

Moreover, by setting this in native code, it has the incidental benefit of allowing us to minimise the memory footprint on non-CHERI platforms through compile time flags (e.g. __CHERI_PURE_CAPABILITY__).

N.B.: It is possible to store capabilities within a primitive Java array because the body of the array is just a region of memory. Accessing the body of the array (i.e., indexing using stride and types) is abstracted away by the arrayOop class. Therefore, capabilities can be stored and retrieved provided there is an appropriate getter and setter.

Inspecting injected MemoryAddress.address field

In the Java definition of java.lang.MemoryAddress, there is only one long field - rawAddress.

package java.lang;

public class MemoryAddress {
    /** 
     * A field storing the raw address.
     * 
     * On any platforms with a word size <= 64 bits; this raw address stores the full native pointer.
     * 
     * On CHERI platforms with a 128-bit native pointer - the 64-bit address is stored within this field
     * (and therefore the remaining capability metadata is not stored in this field)
    */
    @Native private final long rawAddress;
}

In the layout of the MemoryAddress class below, it shows that there are two fields - address and rawAddress. The additional address field has been injected by the JVM.

Layout of class java/lang/MemoryAddress
Instance fields:
 @0 32/- RESERVED
 @32 "address" T 16/16 REGULAR
 @48 "rawAddress" J 8/8 REGULAR
Static fields:
 @0 352/- RESERVED
Instance size = 64 bytes
---

The address field is 16 bytes and has signature T. This signature extends Java's grammar and leaks information about the JVM's implementation - this signature corresponds to T_ADDRESS value in the BasicType enum (see src/hotspot/share/utilities/globalDefinitions.hpp). Unfortunately, this was a necessary workaround as it allowed for a 16-byte primitive to exist within the JVM.

// NOTE: replicated in SA in vm/agent/sun/jvm/hotspot/runtime/BasicType.java
enum BasicType {
// The values T_BOOLEAN..T_LONG (4..11) are derived from the JVMS.
  T_BOOLEAN     = JVM_T_BOOLEAN,
  T_CHAR        = JVM_T_CHAR,
  T_FLOAT       = JVM_T_FLOAT,
  T_DOUBLE      = JVM_T_DOUBLE,
  T_BYTE        = JVM_T_BYTE,
  T_SHORT       = JVM_T_SHORT,
  T_INT         = JVM_T_INT,
  T_LONG        = JVM_T_LONG,
  // The remaining values are not part of any standard.
  // T_OBJECT and T_VOID denote two more semantic choices
  // for method return values.
  // T_OBJECT and T_ARRAY describe signature syntax.
  // T_ADDRESS, T_METADATA, T_NARROWOOP, T_NARROWKLASS describe
  // internal references within the JVM as if they were Java
  // types in their own right.
  T_OBJECT      = 12,
  T_ARRAY       = 13,
  T_VOID        = 14,
  T_ADDRESS     = 15,
  T_NARROWOOP   = 16,
  T_METADATA    = 17,
  T_NARROWKLASS = 18,
  T_CONFLICT    = 19, // for stack value type with conflicting contents
  T_ILLEGAL     = 99
};

Although the usage of T_ADDRESS in the MemoryAddress class breaks the Java specification, the semantics of this use T_ADDRESS correspond to the description of T_ADDRESS (and other closely associated types) in HotSpot as an "internal reference within the JVM as if they were Java types in their own right".

To limit the impact of this workaround, the use of T_ADDRESS is restricted to JVM internals such that typical Java application user code would not come across it. Consequently, it is mainly only the tooling interface and agents (which inspect the JVM state and field layouts of classes) that would be affected.