Authored by Hannah Cusworth (Software Engineer at THG)
Introduction
The Soteria project is a collaboration between THG and the University of Manchester, which aims to port the Java runtime to the Morello platform. Whilst the core of this body of work has been writing a platform-specific implementation of OpenJDK's interpreter and JIT compiler, porting to Morello has also involved extensive code changes to parts of the code base previously considered platform independent. This is largely because, in implementing CHERI (Capability Hardware Enhanced RISC Instructions), Morello's ISA has broken the fundamental assumption of the HotSpot JVM that it will always be possible to represent a native address within a Java primitive.
The Java spec makes no provision for properly encapsulated machine addresses. Instead, implementations of the Java runtime use 64-bit longs to represent native pointers. On Morello, this is no longer possible because of the use of 128-bit fat pointers (capabilities). In addition to a 64-bit address, a capability uses the additional 64-bits to encode bounds and permissions metadata. As this blog post documents, porting language runtimes to CHERI can give rise to engineering challenges that previous platform-implementations have not needed to consider.
eetop: some background
This section will briefly outline some prerequisite knowledge about the Java thread model. Concurrency is an inherent feature of the Java language. Native threads are modelled by the C++ class JavaThread. The JVM instantiates an instance of this class for each underlying operating system thread of the running Java application. The JVM also exposes a limited view of the threads to the Java user via java.lang.Thread
, which provides natively implemented methods via the Java Native Interface (JNI) for interrogating its corresponding JavaThread.
However, the Java class also includes the private field long eetop
. The origin of the name eetop
seems to be lost to the sands of time, but in recent JVM incarnations it represents a native pointer to the underlying native thread for the specific Java thread instance. Although private, it is possible for Java applications to access this field via the Unsafe API, and from there interact directly with the native JavaThread object. The Unsafe API allows user code to circumvent the Java memory model and explicitly access memory using raw addresses. Whilst user code relying on the Unsafe is technically unsupported, the API is widely used is performance critical code. It is unclear how frequently eetop is accessed in this way by third-party code, but it seems likely that it continues to exist in order to maintain compatibility with libraries depending upon it.
The Problem and an Initial Solution
As outlined in the introduction, long fields which hold native pointers present a problem for our Morello port of the JVM. On first appearances, eetop appeared to be another typical example of how the assumption that a machine address will fit inside a primitive long has been baked into the Java runtime since its inception. As discussed in [another blog post] (), our solution to this difficulty has been to replace such instances with instances of a new Java class called MemoryAddress
, which properly encapsulates the manipulation of native addresses. However, a significant limitation of this approach is that the use of addresses now requires the instantiation of a Java object - a possibility which cannot be taken for granted in the code of the virtual machine itself. An example of where this presents difficulties is during the handling of OutOfMemoryError
s. The code which deals with the unwinding of the stack in these cases needs to use addresses to write a backtrace, but instantiating new MemoryAddress
objects is evidently not possible. Similarly, creating new instances of MemoryAddress
as fields of java.lang.Thread
is not always possible. As java.lang.Thread
is so fundamental to the JVM, it is one of the classes initialized in a non-standard way very early in the VM’s lifecycle – and this means that when the first instance of java.lang.Thead
is created it is too early to allocate instances of classes with standard initialisation (such as MemoryAddress
) on the Java heap.
Because of this, our initial solution for eetop
could not utilise MemoryAddress.
At first, we tried to allocate two long fields next to one another: private long eetop
and private long eetop_dummy
. However, as the virtual machine was not reliably allocating the two fields contiguously (which must be the case to correctly store a valid capability), we settled upon a long array: private long[] eetop
.
The Bug
At first, our solution seemed to work just fine when tested against the majority of the benchmarks we were using. However, as we expanded our testing, we encountered an issue whereby running our JDK against the Tomcat DaCapo benchmark, the program would hang. We narrowed down the cause of this hanging behaviour: it was a deadlock due to threads being stuck in the thread_blocked_state
and we observed that rolling back to before the change to eetop
fixed the issue –but it was not immediately clear how our change was causing this. The eetop field was not being used explicitly or opaquely via the Unsafe API as this would have caused a SIGPROT on morello because such an access would not be loading the whole valid capability contained in the two fields.
A Race Condition
After a considerable collaborative dev effort to fix the bug (which became known in the team as Lethocerus maximus), we concluded that our change to eetop had introduced a race condition. The bug’s behaviour was affected by adding printing/logging statements, suggesting that relative timing was determinative. In addition, stack traces showed that threads were getting stuck waiting, and we found cases where eetop != NULL
checks were being used to establish the liveness of threads. Given this, our working hypothesis is that by making updates to eetop much more expensive now that the array had introduced an additional layer of indirection, we had increased the probability of a race condition manifesting by several orders of magnitude. We remain unsure as to what exact sequence of concurrent actions lead to the race condition occurring. One potential scenario might have been that a Thread
was initialized with eetop == NULL
, and then before the initialisation could finish and be set to point to the underlying native thread, another thread checks its eetop
, and decides that this thread is dead and does not perform some action on it. Whilst this condition technically was already present in the vm as synchronised access to the eetop
field is not enforced, it seems that the chance of this occurring when the field is a long is negligible - our change simply made it significant.
The Final Fix
We discarded the array fix and returned to the idea of storing the capability for the native thread inside two consecutive long fields on the java class. The reason that we initially discarded this idea was that it is a smelly solution: it requires modifying generic code to handle an exceptional case. Enforcing the contiguous allocation of these fields required directly manipulating the field layout of the thread class on Morello - this meant introducing checks into the code generates the layout of all java classes to handle the special case of Thread.eetop
. With this change eetop continues to be a long field in the Java representation of a thread, but on the class of the underlying C++ thread object, we force the class to allocate 16 bytes for the field on Morello. And since implementing the fix, the race condition we encountered has never again caused JVM execution to hang.