Forking the JVM

java.io.IOException: error=12, Cannot allocate memory

While working on an embedded system with tight memory constraints, I discovered an inefficiency with the way Java executes processes. For this embedded system, it was sometimes necessary to execute bash scripts, which can be done by using Java’s ProcessBuilder. In general, when you execute a process, you must first fork() and then exec(). Forking creates a child process by duplicating the current process. Then, you call exec() to change the “process image” to a new “process image”, essentially executing different code within the child process. In general, this is how you create new processes to execute other programs / scripts. When attempting to execute these smaller processes from Java, I began receiving “java.io.IOException: error=12, Cannot allocate memory”.

 

The problem lies within fork().

 

An embedded system has a limited amount of memory; take 512MB for example. When we want to fork a new process, we have to copy the ENTIRE Java JVM… well… almost. What we really are doing is requesting the same amount of memory the JVM been allocated. So we want to execute a new process, and our JVM is taking up 350MB of memory, then our fork will request 350MB of memory. But wait! We only have 512MB! We may not have enough memory to fork our current UI, but all we want to do is execute a command, say… “ls”. The “ls” command doesn’t require 350MB, but if there isn’t at least 350MB, we may not be able to even fork a new child process. Now, the important thing to remember is when we fork, we are just requesting 350MB of memory; we are not going to use it necessarily. If we were to fork(), and the immediately call exec(), we wouldn’t use all of the 350MB.

1. A Second JVM

Our first attempt at a solution may involve using a second JVM. This whole mess is a result of our main program being too large in memory to exist twice. We could create a Java process, whose sole purpose is to execute our tasks. Such a JVM would be much smaller than our main program, and therefore fork() would have a better chance at get enough  memory allocated to spawn a child process. This solution is really only a workaround, because we did not eliminate the problem. What if our memory eventually gets so full, that our smaller JVM is unable to fork() because there just isn’t enough to allocate? What if our second JVM suddenly crashes? A second JVM is a potential solution, but not the only solution. If you’re interested, you can find an example of this idea at: http://www.assembla.com/spaces/forkbuddy/wiki

2. Custom JNI bindings for posix_spawn()

This whole mess began because of fork(), and how it copies one process to another in order to create the child process. Under the covers, Java’s ProcessBuilder uses fork/exec, but those are not the only UNIX commands that can be used to spawn processes. posix_spawn() is a different command that allows you to create a new child process, using a specified process image. There is no copying of the parent process. I guess Sun or Oracle never got around to using this implementation, which would certainly solve our dilemma. The only way to implement this in Java is to create a custom JNI binding. The Java Native Interface (JNI) is a way of calling native code (C/C++) from the Java Virtual Machine. You can find a similar implementation at https://github.com/bxm156/java_posix_spawn. Except instead of using posix_spawn(), we implement it ourselves. It turns out that not all Linux distributions implement posix_spawn() using vfork(), so its best just to write our own vfork() and exec() implementation. Instead of allocating memory for the new child process, vfork() allows the child process to share the same memory as the parent. There are both benefits and risks to using vfork(), but for our case, where we want to launch some other type of process, vfork() can be executed safely. Since we call exec() right after vfork(), we don’t need to worry about the parent process’s memory getting modified. If we were to vfork(), and then do something else besides exec(), we would risk modifying our parent process.

3. Over-commit

Our last option is the easiest fix, but comes at a cost. Every time we fork, the OS determines if enough memory can be allocated to copy our process to a child process. We can turn that safety-check off, and let the OS allow us to fork a new child process, even if there is not enough memory available to be allocated to make a copy of the parent. For our purposes, our scripts that we want to execute are always going to be much smaller than a 350MB JVM. So we probably will be fine. By enabling over-commit, we solve the problem, but there is a cost.

When you allocate memory in C, malloc() returns either a pointer to the memory address, or null if no memory could be allocated. If the programs on your system were written correctly (and by correctly, I mean with some sanity checking), they should attempt to die gracefully if they receive null from malloc(). When that happens, it means the system’s memory completely used up. When over-commit is enabled however, malloc() will NEVER return null. In fact, it will always return some pointer to a memory address, even if that address is in use (or maybe it doesn’t even exist).  At that point, the OS will release the dreaded OOM killer.Which will began terminate programs as it sees fit, in order to free up the memory. In general, the way in which it picks its victims is at random. Andries Brouwer came up with this analogy:

An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

To enable over-commit temporarily:

echo 1 > /proc/sys/vm/overcommit_memory

For a more permeant solution, you will need to edit /etc/sysctl.conf and add the following:

vm.overcommit_memory = 1

Now just restart your system for the change to take effect.

Speak Your Mind

*