[UPDATE: If you're searching for ways to better use your multiple cores, the R2007a release seems to support multithreaded computations. More on this on my recent post.]
I haven’t posted much recently. Well, I can say for my defense that I was sick (actually we were all three sick, each in turn) and shortly after that we went on vacation. Also, next week I have reserve duty, so I’m not going to post much in the near future either. I have still a pending SQL post I talked about a long time ago that is waiting for me to finish the last quarter of it. Shame on me…
Anyway, I wanted to share some performance problems I had to solve recently. They are all related to Matlab, so if you’re not into Matlab I guess it will be of little interest to you J
I recently purchased a very powerful machine with a dual-core CPU, 2GB DDR2 memory, etc. I’m using it to run some very extensive calculations, mostly in Matlab. However, I got very frustrated because:
1. After a few hours of intensive execution, Matlab throws me an “out of memory” exception. And, sure enough, it’s taking up more than 2GB of memory (note that I have 2GB ram + as much cache as I like).
2. I just couldn’t make the PC use all of the CPU. I didn’t expect it to use both CPU’s for a single Matlab process. I expected around 100% usage on one CPU (core actually) and close to 0% on the second. Instead, sometimes I got around 50% on both and most of the time it was around 20-30% for both CPU’s. So I was barely using ¼ of my processing capability (and only ½ of what I expected).
To tackle the first problem I tried everything, read a whole bunch of web pages but in the end I didn’t get anywhere. My first intuition was that for some reason Matlab is not releasing memory it has allocated, probably due to some error of mine, and I just couldn’t find the source of the problem. After some time I started to think that maybe the problem is with the SVM library I am using repeatedly in my process. As I mentioned in a previous post, I have started using a Matlab wrapper over SVMLight. After digging into both libraries, adding some code to track memory allocations, I managed to prove that both libraries don’t free all the memory that they allocate. While working on all this, I learned a bit about MEX-Files (C functions that can be called from Matlab), since the wrapper library is a MEX, of course. So I learned that there are various ways to allocate/free memory when you’re working with MEX-Files. One, of course, is the C malloc/free functions. Every memory block allocated with “malloc” must be freed (other wise you’re leaking), and if you’re using Windows, you must free it exactly once (more causes exceptions). Another problem with malloc/free in MEF-Files is that memory allocated by “malloc” should not be returned to Matlab. An alternative to malloc/free is using Matlab’s mxMalloc/mxFree pair. Memory allocated by mxMalloc can be returned to Matlab. Additionally, when the MEX-File is released (i.e. the function call is over), any memory allocated with mxMalloc that was not part of the returned variables is automatically released.
So, all I needed to do was make sure that BOTH library would perform ALL their memory allocations with mxMalloc, and then I wouldn’t need to take care of the freeing – Matlab would do it for me (as a side-note I must tell you that in my scripts, I call these libraries over and over again in a loop, so there are many short calls to the MEX-File). So I simply aliased malloc to mxMalloc and free to mxFree (remember the good old “#define malloc mxMalloc” ?) and voila – problem solved (“worked around” would be a better definition). I left my process running for 8 hours, and the memory used by Matlab increased only by a few small MBs! Yipee!!!
Once I had my memory problem resolved, I could start thinking about the CPU. So I changed my script files to separate the workload into 2 processes, instantiated 2 instances of Matlab – each running part of the job. At that point, the most incredible thing happened – once I started running the second process, the CPU turned flat (< 5% on both cores) and both processes got stuck with almost no progress at all!!! This drove me totally crazy! I tried changing the process affinity (i.e. define each process to run on a separate CPU), changed some system performance settings and even installed a hotfix that deals with performance problems on dual-core machine (you can download it from links in this thread) – but nada! Then I thought of trying something totally different. The code I was running was located on a remote computer. Actually, it was on a Disk-on-Key that was connected to my laptop (and the new PC has a network map to the DoK). I’m using this configuration because I take the DoK often with me and need to work on it from my laptop. So I connected the DoK directly to my workstation (the new PC with the dual-core CPU) and voila – both processes work marvelously, taking up 100% of both CPUs (cores)!!!
What have I learned from all this:
- As always, the source of the problems is usually where it makes most sense. With the memory, the problem is with 3rd party (non-commercial) code. This code was written for research purposes and I guess that as such there was little effort done to ensure no memory leaks. With the CPU – the problem with the issue of working on network mapped disks. I’m sure that once you’ve read everything above you’ve told yourself – “hey, but it’s so obvious!”. So as always, I should have followed my instincts in the first place…
- My little aliasing trick with malloc-mxMalloc is something to keep in mind. It might not always be a good idea (especially for long-running code) because if there is a leak it remains until the MEX-File is unloaded, but I guess that in many cases it’s a really good and simple workaround with minimum risk to harm code you’re not familiar with.
- Working from a network drive involves some hidden constraints that must be further investigated. It may be something that is specific to Matlab. Don’t know yet.