Setting the nofile limit in upstart

While working on a Java project that executes multiple sub processes, I found it necessary to increase to the file descriptor limit. By default, processes spawned in Ubuntu had a limit of 1024 open files, but changing that limit via ulimit or by /etc/security/limits.conf had no effect. I discovered that changes in /etc/security/limits.conf (and subsequently /etc/pam.d/common-session ) are only applied to process spawned by a login shell. Programs that launch on startup via “upstart” do not get these limits applied to them. Thankfully, upstart provides the limit stanza that allows you to modify some of these parameters, including the maximum number of open files. To see the limits on process, grab its PID, and cat /proc/<<PID>>/limits

The following is an example of increase the maximum number of open files to 4096 for a given upstart job:

limit nofile 4096 4096

You can checkout the full set of limit options at:

Note: Some of these limits (if not all) are transferred to any children that are forked by the parent

Check out upstart at:

A warning about Runtime.getRuntime.exec() : You may be leaking File Descriptors

Normally Java’s online documentation is very informational, but this not one of those times. Recently, I had run into an issue where my application was issuing: error=24, Too many open files.

If you are making a lot of use of the Java’s Runtime.getRuntime.exec() or ProcessBuilder, then its very possible you are leaking File descriptors. Unix systems have a limit to how many files can be opened at any time by a given process. On Ubuntu 10.04, that limit is 1024 files. You can find out your system’s max limit per process by issuing: ulimit -n

Every time you execute a process using Java’s Runtime, STDIN, STDOUT, and STDERR are piped to Java as an Output or Input Stream. For each stream, there exists a pair of file descriptors. When the process has terminated, these streams are closed (in general) at the process side, closing on of the two file descriptors for that stream. If your Java program continues on, and you don’t use those streams in Java, then you are leaking the File descriptors on the Java side of the pipe (assuming your not specifically closing them or destroying the process). Normally, these will be closed when Garbage Collecting occurs, but you shouldn’t leave it up to the Garbage Collector to decide.

After you use ProcessBuilder or Runtime.getRuntime.exec(), you can do one of two things:

  1. Use a handler to the Process that was executed (say: Process p), and destroy it (Note: This also terminates the process if its running).  p.destroy();
  2. Close all the input streams if you haven’t already:
Process p = Runtime.getRuntime().exec( <command></command> );

Of course, if you used one of the above streams within say: a BufferedReader, you just need to close the buffered reader, and the underlying streams will be closed automatically, Once these streams are closed, their file descriptors are removed. If you want to take a look at a list of opens files: lsof

If you can get the PID of your Java process, you can filter out most of results from lsof by doing: lsof -p 1234

Getting a count is also easy:

lsof | wc -l

lsof -p 1234 | wc -l

Forking the JVM error=12, Cannot allocate memory

While working on an embedded system with tight memory constraints, I discovered an inefficiency with the way Java executes processes. For this embedded system, it was sometimes necessary to execute bash scripts, which can be done by using Java’s ProcessBuilder. In general, when you execute a process, you must first fork() and then exec(). Forking creates a child process by duplicating the current process. Then, you call exec() to change the “process image” to a new “process image”, essentially executing different code within the child process. In general, this is how you create new processes to execute other programs / scripts. When attempting to execute these smaller processes from Java, I began receiving “ error=12, Cannot allocate memory”.


The problem lies within fork().


An embedded system has a limited amount of memory; take 512MB for example. When we want to fork a new process, we have to copy the ENTIRE Java JVM… well… almost. What we really are doing is requesting the same amount of memory the JVM been allocated. So we want to execute a new process, and our JVM is taking up 350MB of memory, then our fork will request 350MB of memory. But wait! We only have 512MB! We may not have enough memory to fork our current UI, but all we want to do is execute a command, say… “ls”. The “ls” command doesn’t require 350MB, but if there isn’t at least 350MB, we may not be able to even fork a new child process. Now, the important thing to remember is when we fork, we are just requesting 350MB of memory; we are not going to use it necessarily. If we were to fork(), and the immediately call exec(), we wouldn’t use all of the 350MB.

1. A Second JVM

Our first attempt at a solution may involve using a second JVM. This whole mess is a result of our main program being too large in memory to exist twice. We could create a Java process, whose sole purpose is to execute our tasks. Such a JVM would be much smaller than our main program, and therefore fork() would have a better chance at get enough  memory allocated to spawn a child process. This solution is really only a workaround, because we did not eliminate the problem. What if our memory eventually gets so full, that our smaller JVM is unable to fork() because there just isn’t enough to allocate? What if our second JVM suddenly crashes? A second JVM is a potential solution, but not the only solution. If you’re interested, you can find an example of this idea at:

2. Custom JNI bindings for posix_spawn()

This whole mess began because of fork(), and how it copies one process to another in order to create the child process. Under the covers, Java’s ProcessBuilder uses fork/exec, but those are not the only UNIX commands that can be used to spawn processes. posix_spawn() is a different command that allows you to create a new child process, using a specified process image. There is no copying of the parent process. I guess Sun or Oracle never got around to using this implementation, which would certainly solve our dilemma. The only way to implement this in Java is to create a custom JNI binding. The Java Native Interface (JNI) is a way of calling native code (C/C++) from the Java Virtual Machine. You can find a similar implementation at Except instead of using posix_spawn(), we implement it ourselves. It turns out that not all Linux distributions implement posix_spawn() using vfork(), so its best just to write our own vfork() and exec() implementation. Instead of allocating memory for the new child process, vfork() allows the child process to share the same memory as the parent. There are both benefits and risks to using vfork(), but for our case, where we want to launch some other type of process, vfork() can be executed safely. Since we call exec() right after vfork(), we don’t need to worry about the parent process’s memory getting modified. If we were to vfork(), and then do something else besides exec(), we would risk modifying our parent process.

3. Over-commit

Our last option is the easiest fix, but comes at a cost. Every time we fork, the OS determines if enough memory can be allocated to copy our process to a child process. We can turn that safety-check off, and let the OS allow us to fork a new child process, even if there is not enough memory available to be allocated to make a copy of the parent. For our purposes, our scripts that we want to execute are always going to be much smaller than a 350MB JVM. So we probably will be fine. By enabling over-commit, we solve the problem, but there is a cost.

When you allocate memory in C, malloc() returns either a pointer to the memory address, or null if no memory could be allocated. If the programs on your system were written correctly (and by correctly, I mean with some sanity checking), they should attempt to die gracefully if they receive null from malloc(). When that happens, it means the system’s memory completely used up. When over-commit is enabled however, malloc() will NEVER return null. In fact, it will always return some pointer to a memory address, even if that address is in use (or maybe it doesn’t even exist).  At that point, the OS will release the dreaded OOM killer.Which will began terminate programs as it sees fit, in order to free up the memory. In general, the way in which it picks its victims is at random. Andries Brouwer came up with this analogy:

An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

To enable over-commit temporarily:

echo 1 > /proc/sys/vm/overcommit_memory

For a more permeant solution, you will need to edit /etc/sysctl.conf and add the following:

vm.overcommit_memory = 1

Now just restart your system for the change to take effect.