Setting the nofile limit in upstart

While working on a Java project that executes multiple sub processes, I found it necessary to increase to the file descriptor limit. By default, processes spawned in Ubuntu had a limit of 1024 open files, but changing that limit via ulimit or by /etc/security/limits.conf had no effect. I discovered that changes in /etc/security/limits.conf (and subsequently /etc/pam.d/common-session ) are only applied to process spawned by a login shell. Programs that launch on startup via “upstart” do not get these limits applied to them. Thankfully, upstart provides the limit stanza that allows you to modify some of these parameters, including the maximum number of open files. To see the limits on process, grab its PID, and cat /proc/<<PID>>/limits

The following is an example of increase the maximum number of open files to 4096 for a given upstart job:

limit nofile 4096 4096

You can checkout the full set of limit options at: http://upstart.ubuntu.com/wiki/Stanzas#limit

Note: Some of these limits (if not all) are transferred to any children that are forked by the parent

Check out upstart at: http://upstart.ubuntu.com/

A warning about Runtime.getRuntime.exec() : You may be leaking File Descriptors

Normally Java’s online documentation is very informational, but this not one of those times. Recently, I had run into an issue where my application was issuing: java.io.IOException: error=24, Too many open files.

If you are making a lot of use of the Java’s Runtime.getRuntime.exec() or ProcessBuilder, then its very possible you are leaking File descriptors. Unix systems have a limit to how many files can be opened at any time by a given process. On Ubuntu 10.04, that limit is 1024 files. You can find out your system’s max limit per process by issuing: ulimit -n

Every time you execute a process using Java’s Runtime, STDIN, STDOUT, and STDERR are piped to Java as an Output or Input Stream. For each stream, there exists a pair of file descriptors. When the process has terminated, these streams are closed (in general) at the process side, closing on of the two file descriptors for that stream. If your Java program continues on, and you don’t use those streams in Java, then you are leaking the File descriptors on the Java side of the pipe (assuming your not specifically closing them or destroying the process). Normally, these will be closed when Garbage Collecting occurs, but you shouldn’t leave it up to the Garbage Collector to decide.

After you use ProcessBuilder or Runtime.getRuntime.exec(), you can do one of two things:

  1. Use a handler to the Process that was executed (say: Process p), and destroy it (Note: This also terminates the process if its running).  p.destroy();
  2. Close all the input streams if you haven’t already:
1
2
3
4
Process p = Runtime.getRuntime().exec( <command></command> );
p.getInputStream().close();
p.getOutputStream().close();
p.getErrorStream().close();

Of course, if you used one of the above streams within say: a BufferedReader, you just need to close the buffered reader, and the underlying streams will be closed automatically, Once these streams are closed, their file descriptors are removed. If you want to take a look at a list of opens files: lsof

If you can get the PID of your Java process, you can filter out most of results from lsof by doing: lsof -p 1234

Getting a count is also easy:

lsof | wc -l

lsof -p 1234 | wc -l

Forking the JVM

java.io.IOException: error=12, Cannot allocate memory

While working on an embedded system with tight memory constraints, I discovered an inefficiency with the way Java executes processes. For this embedded system, it was sometimes necessary to execute bash scripts, which can be done by using Java’s ProcessBuilder. In general, when you execute a process, you must first fork() and then exec(). Forking creates a child process by duplicating the current process. Then, you call exec() to change the “process image” to a new “process image”, essentially executing different code within the child process. In general, this is how you create new processes to execute other programs / scripts. When attempting to execute these smaller processes from Java, I began receiving “java.io.IOException: error=12, Cannot allocate memory”.

 

The problem lies within fork().

 

An embedded system has a limited amount of memory; take 512MB for example. When we want to fork a new process, we have to copy the ENTIRE Java JVM… well… almost. What we really are doing is requesting the same amount of memory the JVM been allocated. So we want to execute a new process, and our JVM is taking up 350MB of memory, then our fork will request 350MB of memory. But wait! We only have 512MB! We may not have enough memory to fork our current UI, but all we want to do is execute a command, say… “ls”. The “ls” command doesn’t require 350MB, but if there isn’t at least 350MB, we may not be able to even fork a new child process. Now, the important thing to remember is when we fork, we are just requesting 350MB of memory; we are not going to use it necessarily. If we were to fork(), and the immediately call exec(), we wouldn’t use all of the 350MB.

1. A Second JVM

Our first attempt at a solution may involve using a second JVM. This whole mess is a result of our main program being too large in memory to exist twice. We could create a Java process, whose sole purpose is to execute our tasks. Such a JVM would be much smaller than our main program, and therefore fork() would have a better chance at get enough  memory allocated to spawn a child process. This solution is really only a workaround, because we did not eliminate the problem. What if our memory eventually gets so full, that our smaller JVM is unable to fork() because there just isn’t enough to allocate? What if our second JVM suddenly crashes? A second JVM is a potential solution, but not the only solution. If you’re interested, you can find an example of this idea at: http://www.assembla.com/spaces/forkbuddy/wiki

2. Custom JNI bindings for posix_spawn()

This whole mess began because of fork(), and how it copies one process to another in order to create the child process. Under the covers, Java’s ProcessBuilder uses fork/exec, but those are not the only UNIX commands that can be used to spawn processes. posix_spawn() is a different command that allows you to create a new child process, using a specified process image. There is no copying of the parent process. I guess Sun or Oracle never got around to using this implementation, which would certainly solve our dilemma. The only way to implement this in Java is to create a custom JNI binding. The Java Native Interface (JNI) is a way of calling native code (C/C++) from the Java Virtual Machine. You can find a similar implementation at https://github.com/bxm156/java_posix_spawn. Except instead of using posix_spawn(), we implement it ourselves. It turns out that not all Linux distributions implement posix_spawn() using vfork(), so its best just to write our own vfork() and exec() implementation. Instead of allocating memory for the new child process, vfork() allows the child process to share the same memory as the parent. There are both benefits and risks to using vfork(), but for our case, where we want to launch some other type of process, vfork() can be executed safely. Since we call exec() right after vfork(), we don’t need to worry about the parent process’s memory getting modified. If we were to vfork(), and then do something else besides exec(), we would risk modifying our parent process.

3. Over-commit

Our last option is the easiest fix, but comes at a cost. Every time we fork, the OS determines if enough memory can be allocated to copy our process to a child process. We can turn that safety-check off, and let the OS allow us to fork a new child process, even if there is not enough memory available to be allocated to make a copy of the parent. For our purposes, our scripts that we want to execute are always going to be much smaller than a 350MB JVM. So we probably will be fine. By enabling over-commit, we solve the problem, but there is a cost.

When you allocate memory in C, malloc() returns either a pointer to the memory address, or null if no memory could be allocated. If the programs on your system were written correctly (and by correctly, I mean with some sanity checking), they should attempt to die gracefully if they receive null from malloc(). When that happens, it means the system’s memory completely used up. When over-commit is enabled however, malloc() will NEVER return null. In fact, it will always return some pointer to a memory address, even if that address is in use (or maybe it doesn’t even exist).  At that point, the OS will release the dreaded OOM killer.Which will began terminate programs as it sees fit, in order to free up the memory. In general, the way in which it picks its victims is at random. Andries Brouwer came up with this analogy:

An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

To enable over-commit temporarily:

echo 1 > /proc/sys/vm/overcommit_memory

For a more permeant solution, you will need to edit /etc/sysctl.conf and add the following:

vm.overcommit_memory = 1

Now just restart your system for the change to take effect.

Codonics Final Co-op Report for Fall 2011

Codonics

During my co-op at Codonics, I have been heavily involved in implementing wireless and networking support for the Safe Label System, a medical device used in Operating Rooms to help reduce the number of medical mistakes in syringe labeling.

 

Responsibilities

At the beginning of my co-op at Codonics, my responsibilities were focused on testing software. I used a system called TestTrack to record bugs and related information. During development, developers updated these TestTrack tickets, and it was my job to verify the bugs were fixed.  In addition, I wrote up Test Reports, which detailed tests that I preformed to verify some type of functionality as defined in the functional specifications. One of my tests involved the creation of a webserver that would display barcodes on a web page. This webpage was displayed on an iPod Touch, and placed directly underneath a scanner. My test helped to verify the long-term functionality of the Safe Label System by printing over 1,000 labels automatically.

My largest responsibility at Codonics by far has been to oversee the development of the Safe Label System’s networking capabilities. I spent a few weeks testing multiple wireless adapters, and determining what software was needed to utilize them. Then, I began to write the software that would allow users to connect to wireless networks on the device. As the requirements changed, I have been updating the networking code. Recently, I have been implementing support for WPA/WPA2 Enterprise networks, such as EAP-TLS and PEAPv0-MSCHAPv2. These networks use certificates to encrypt and authenticate connections, which adds complexity to the software. I had to consider the best way to store this information, without letting it get into the hands of unauthorized users. Since these types of networks are used in hospitals, it is important that the software correctly handles such network authentication protocols.

 

Skills

Codonics has helped me to further supplement skills that I would not have received through classes at Case Western Reserve University. I gained experience with writing test procedures using LaTeX, a typesetting language used to create professional looking documents. I also learned how to write scripts using AutoIt, in order to automate many of the test procedures. Codonics also introduced me to JUnit tests, which are small pieces of code that test production code in order to look for regressions in functionality. These test are useful when multiple programmers are working on a project, or when a large refactoring of logic or code occurs.

I have used many skills that I learned in class while at my co-op with Codonics. My Advanced Game Design class helped me further familiarize myself with Java, a programming language used extensively during my co-op. I used the knowledge in my Operating Systems class to help develop the wireless networking code, as well as the restricted shell environment used in the Safe Label System. Operating Systems class helped to further my knowledge of concurrent processes and thread-safety, which I incorporated into my networking code. If multiple processes or threads attempted to manage a single wireless device, the adapter could end up getting configured improperly. In order to prevent such an occurrence, I implemented the use of a single thread pool that executes task synchronously, based on a given priority. Operating Systems, as well as Compilers, increased my knowledge of programming languages such as C and C++. These two languages are used in creating the restricted shell environment, which only allows authorized users to execute pre-specified commands, while blocking the use of others. There are cases in which we want to allow the user to run a pre-defined script, which could execute commands not normally allowed. In that case, the restricted environment is able to break out of the restrictions in order to use those specific commands, as well as access specific files and directories that are not normally permissible.

 

Reflections

My co-op at Codonics has helped me develop real-world experience in developing software as part of a team. I spent a majority of time at the beginning of my co-op doing testing, and it allowed me to understand how the products functioned in respect to the end user.  Before my co-op at Codonics, I never realized how much effort goes into testing, especially for medical products. For example, there are many test procedures that need to be executed on every release candidate to ensure there have been no regressions. New test procedures are created from test reports, which are sometimes planned out before the functionality has even been implemented. As I became familiar with the products, I began writing my own tests. My first test report covered printing 1,000 labels on an SLS, to ensure there were no memory leaks. In order to run the test, I created a webserver that generated barcode images, and loaded it onto an iPod Touch. The plan worked successfully, and soon my SLS unit was printing barcodes non-stop, automatically. My test interested many of my coworkers and supervisors, who stopped by to see the test in action. Before my barcode server, tests were done manually by testers, which consumed valuable time. In addition, testers were only able to test a limited number of drug vials that were around during the testing. My barcode scanner was able to test every drug known to the SLS, because it generated the barcodes automatically from a list of drugs in the SLS system. I hope that my code helps to shorten the time it takes to test the Safe Label System, and give testers more time to investigate other functional aspects of the device.

As I moved to more of a developer role, I began to learn the importance of writing unit tests. I was given a book on JUnit, and from there began writing and updating unit tests as I worked on the SLS project. Unit tests are small pieces of code that test the functionality of production code. These tests are very valuable when multiple developers are working on a project, and help to detect regressions quickly. There are some downsides however, because time must be taken to update the unit tests when the logic has changed. During my co-op I wrote the networking system for both wired and wireless network interfaces, and was forced to refactor my code as the input specifications changed. Unit testing helped to ensure the overall functionality of my design did not regress after refactoring the logic, and served to increase my confidence in my code changes.

I learned the important of writing software to technical specifications, which allows multiple programmers to work on various parts of the code at separate times. When code is written as agreed upon, it makes for an easier integration when developers begin to merge their code together. At first, such specifications can seem tedious, especially when they change as the software matures, but these specifications help to ensure the software behaves as expected, and can also serve as an outline that can be used by the Testing department to ensure a thorough examination of the software.

While working on Codonics, I was given the opportunity to do some business travel, an opportunity rarely given to co-op students. My first time traveling as a Codonics employee was to Anesthesiology 2011 in Chicago. While working at the tradeshow, I learned about various vendors in the medical industry, and how the Safe Label System positioned itself among them. During off the clock hours, I relaxed with fellow employees and learned more about the company, employees, and culture.

My second trip representing Codonics was to Boston, and it was my first time flying by myself. It was a very new experience for me, because I was responsible for taking along a Safe Label System to demonstrate the new networking capabilities at Massachusetts General Hospital. Once in Boston, I worked with MGH staff in order to verify our software was able to connect to their EAP-TLS network. I caught a glimpse of how a hospital’s IT staff was structured, and how different people in the chain were able to help push agendas along. We discussed networking requirements, user interface preferences, and likely use cases for how the Safe Label System would be networked in the operating rooms. I was awe-struck at how large the hospital complex was, which impacted my view of how important networking support would be for our next release of software. If a hospital buys hundreds of units, and places them in various buildings, it would be a challenge to update and verify their all working properly.  Networking support will give administrators control of all the units from one computer, with the ability to push updates out to all the systems with a push of a button. In addition, network administrators will be able to monitor the status of the units, and receive email notifications if something unexpected occurs, such as an unknown drug being scanned. The software I developed will help to ease the introduction of the Safe Label System into large hospitals, and reduce medical mistakes by improperly labeled drugs. Being given the chance to travel with Codonics has been a great privilege that has helped me to see other parts of the company that I would not have normally be exposed to.

I have been very happy with my co-o p at Codonics, and the experience has helped to boost my confidence in my programming skills. Codonics has helped me grow as a computer engineer, and I hope to continue learning more during my second co-op with them.

Codonics Mid Co-op Report

Check out my Final Co-op Report Fall 2011

Codonics

During my Co-op at Codonics, I have gained experience in testing software, and developing new features for the Codonics Safe Label System, a medical device used in Operating Rooms to help reduce the number of medical mistakes in syringe labeling.

Responsibilities

At the beginning of my co-op with Codonics, many of my responsibilities revolved around testing software that was near its release. I verified that known bugs were successfully fixed, and performed tests that revealed the existence of new bugs. As the software was updated, I updated test procedures to reflect new changes.  After I had become more familiar with the software, I formulated and wrote my own test reports, and developed custom scripts that were needed for the tests.

Once I had familiarized myself with the software, I was given task of fixing a few bugs with the software. Once the software was released, I began adding new features in order to allow the hardware to connect to wireless networks. Currently, I am continuing to improve and stabilize the wireless code, as well as develop a solution to restrict user access in the Linux terminal.

Continue reading “Codonics Mid Co-op Report” »

Taming Web Development

Lets face it… we all need Source Control in our projects, even for small scale projects with just one developer. There are too many times when we need to revert to a past revision of our code. In a typical computer application or game, source control makes sense; but what about web development? When you factor in an FTP server and MySQL database, how do you accomplish your version control? How do you keep everything in sync when you are deploying to an external server? Most web projects are tracked via SVN, but SVN is very limiting compared to other options such as Git. Git provides powerful team collaboration tracking and revision control, but how can it be implemented to work with a web server? The answer is surprisingly easy.

I had been searching for an answer to this question, when I stumbled upon an article by Joe Maller (http://joemaller.com/990/a-web-focused-git-workflow/). He offered a great explanation of how I could use Git to control not only my project, but eliminate FTP altogether. I have my own server that i use for hosting a website, and having full control over that server opens up the possibilities for a very handy work flow. Using Eclipse as my Primary IDE, I can develop websites, commit the code, and push it to my web sever. I don’t need to leave Eclipse or use FTP. Whats even better? I don’t have to worry about losing my code if my web server goes down, because I’m not push my code to my web server. If my web server were to suddenly kick the bucket, my code would be safe and sound on GitHub.

One problem I had though with this method was getting my server to pull the latest code from GitHub. After doing some investigative detective work, I found that the account my apache and php were running on didn’t have enough permissions to execute a git pull. To be honest, that is probably a good thing for security reasons, but not exactly want I wanted at this point. I needed to execute git pull somehow, but outside the limited permission scope of my Apache web service. Instead of a php script calling the git pull, I need something else to execute the script with all permissions needed. Since I knew exactly what needed to be executed, and what would happen, I eventually settled on writing a simple program to do the git pull.

1
2
3
4
5
6
7
8
9
10
11
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
setuid(0);
system("/home/dev/post-update-hook.sh");
return 0;
}

Great! Now just compile that and execute:’

1
2
sudo chown root /path/to/program
sudo chmod u+s /path/to/program

 

And for the PHP file:

1
2
3
4
5
<?php
ini_set('max_execution_time', 300);
system("/home/dev/gitPullExec");
echo "Complete";
?>

Great! In retrospect, it would be better to setup the permissions so that it is based on a group, and not the user. Now I just opened a security hole right? Probably, after all, if someone was to gain access to the machine, they could change the post-update-hook.sh file into something else and then execute it. But you know.. I don’t really need to worry about that. This server is only used to host one site. If user was to gain access to the one account I setup, which is where the site is located, then there really is no need for them to use root, they would have access to the entire site anyways. The site doesn’t really host any mission critical data. Not to mention, the only way the script could be executed without authenticating, would be to execute the PHP script that calls this program. If I wanted to make it more secure, I could have it check the hash of the script, and verify it is correct before it runs it. Of course, if a user had access to my GitHub account, they could inject code into the site by committing the code there, and then waiting for this hook to run. However I’ll assume that if that were to happen, then I would have more problems than just security on the server, wouldn’t you agree?

MySQL – MEMORY

One of the things that I enjoy setting up are MySQL databases, its like my anti-drug. I enjoy the feeling of setting up tables, related via “PRIMARY_KEYS”. There is just a thrill to setting it all up, although I can admit the data entry part isn’t always fun if you have a large set of data. I think I started using MySQL databases when I was a kid, and as I grew, learned more and more about them. I will not pretend to be an expert by any means, but I feel as though I have grown over the years to really become familiar with  PHP+MySQL ( soon to include Python + MySQL I hope).

During the Fall 2010 Career Fair at CWRU, I heard about one of the visiting companies, Yelp.com, was giving a talk on MySQL database optimization and use in a large scale environment. I was fascinated by the very though of it. I have always used MySQL for my own personal projects, but have never gotten a chance to see how it would withstand to a large base of millions of users. All the queries, page views, tracking, logs. The database must be huge, and the larger the database, the slower it will become. How do they manage to handle such a load, not to mention the backup servers they must use to prevent the loss of such critical data.

As a kid, I stuck with MyISAM as my default storage engine for MySQL, it seemed the defacto standard, and I didn’t need to worry about transaction locks (unlike in Android, where problems always occurred without them, but thats’ a rant for another time). — So you can imagine my glee when I learned that non-crucial information, could be stored in a very fast table that uses the MEMORY storage engine. Now, this information is extremely volatile, and if the server crashes or goes down, that data will be lost. For some statistical data however, it could be all you need, and the performance benefit would be valuable. If you wanted, you can have a cron job or back scheduled to collect this data at defined intervals. As long as this data is require or crucial to your website, you may want to consider using this type of table storage for your tracking or statistics.

For a list of storage engines in MySQL 5.5, check out Chapter 13 in the MySQL Reference Manual

Moodle (Modular Object-Oriented Dynamic Learning Environment)

If you have every taken a course at Case Western Reserve University, you professor has probably used Blackboard at some point in time. Professors use Blackboard to provide course supplements and materials, supposedly to help the student learn better. Blackboard however, is nothing more than a fancy upload and storage facility. True, blackboard contains social features such as discussion boards and wikis, and can even provide digital quizzes, but that’s as far as it goes. This piece of software costs thousands of dollars in licensing fees, and for what? Its mainly used to provide storage for course documents. It is a waste of money  when compared to things like Dropbox. If professors want to supplement student learning, they need to involve the student in activities that stimulate the learning experience.

That’s where Moodle comes in…

The focus of the Moodle project is always on giving educators the best tools to manage and promote learning, but there are many ways to use Moodle:

  • Moodle has features that allow it to scale to very large deployments and hundreds of thousands of students, yet it can also be used for a primary school or an education hobbyist.
  • Many institutions use it as their platform to conduct fully online courses, while some use it simply to augment face-to-face courses (known as blended learning).
  • Many of our users love to use the activity modules (such as forums, databases and wikis) to build richly collaborative communities of learning around their subject matter (in the social constructionist tradition), while others prefer to use Moodle as a way to deliver content to students (such as standard SCORM packages) and assess learning using assignments or quizzes.

Working as the Lead Developer & Architect in the Moodle Pilot at Case Western Reserve University, I have been impressed with Moodle’s ability to be customized with our own modules and plugins. I have been able to extend Moodle’s capabilities far beyond Blackboard to meet the demand of professors who wish to teach their students in a very social construct. Moodle can organize a course in a variety of ways:

  1. Topics – Course sections are given a topic number, and can also be given a name
  2. Weeks – Each week of the course corresponds to a course section
  3. Discussion – The course revolves around a discussion board

Topics and Weeks are probably the most widely used course formats in our pilot today. Professors are making some very beautiful course sites, filled with informative and fun activities for students to do every week. Using plugins such as VoiceThread, Moodle offers our students multimedia presentations that allow for questions, comments, and feedback. As a developer, Moodle offers me a wide range of ways to expand the social structure of learning for students. I have learn so much from working with Moodle, in both coding practices, and social communication.

Privacy Policy