Why not? (Ilan Assayag's blog): April 2007

Friday, April 27, 2007

Thoughts about evolutionary algorithms

WARNING: This post is likely to be rather long, full of musings with very little concrete data to back them up. I hope it will interest you, but I don't promise anything...

Today I was playing with my little girl, Tamar (6 months), when some questions started to form in my head. When you think of the ultimate goal of each living species, it's rather simple: ensure the existence of the species for as long as possible. This is obtained by "mechanisms" such as reproduction, survival of the fittest, randomization mechanisms (e.g. mutations, weather changes, etc.) and numerous other techniques Mother Nature has to offer.

Scientists have been trying to simulate some of these mechanisms for various purposes for half a century, with variant degrees of success. Genetic Algorithms are the most widely known of them, and have been extensively researched in the past 20 years or so. The basic idea is that you have some objective function you want to optimize, and a large set of parameters to tune. The assumption is that the time it takes to compute every possible set of parameters is computationally prohibitive (unless you're willing to wait 10,000 years until you get the answer). The tuning of the parameters is done by applying operations known in genetics, such as reproduction or crossover (taking some parameters from one solution and some from another - creating a new solution in its own), mutation (changing some parameter(s) through some randomized mechanism in order to create a new solution), selection (each iteration keeps the best solution sets from previous iterations), and variants thereof. The main issue I would like focus on, is that there is always one objective function, and every solution (i.e. set of parameters) is judged by the score it acquires at that objective function.

Now let's get back to my little girl. As I said, as far as Nature is concerned, my daughter's ultimate purpose, like every human being, is to ensure the human race will persist for eternity. However, even Mother Nature knew that this is too big a goal to be of any concrete value. Therefore, it has set an infinite amount of sub-goals in our life, each of which bringing us closer to the ultimate global goal of Human Kind (be it in a minuscule amount). These sub-goals are, for example, to reproduce, which requires (as sub-sub-goal) to meet and get acquainted with males, which requires among other thing, to know how to walk (that's already a sub-sub-sub-goal), which has a precondition of being able to crawl (sub-sub-sub-sub-goal I think).

So Tamar's objective function, for our discussion, is to learn how to crawl. Not thinking about any of these things, I knew she needs to learn how to crawl (and frankly I wanted her to get tired, but that's even more besides the point), so I set her on a matress, with a blinking toy in front of her to attract her attention.

(I told you it would be long... don't lose patience, we're getting there...)

If we get back to Genetic Algorithms, the objective function of our problem, is to learn how to crawl (say we can give a score for the crawling quality). However, to reach this goal, I created a new objective function, apparently completely unrelated to the real goal - reaching this annoying blinking toy. She could end up reaching it in a million of ways, or never being able to reach it on her own (which is actually what happened, since she kept crawling backwards), but at the end of the day, her mind and body would have learned some new things that will, eventually, serve her to reach that real goal (crawling). In other words - I created a synthetic objective function, with only some vague relationship with the real objective function, with the express intent of improving the chances of, on the long run, optimizing the real objective function (crawling).

Now my question is - how could this approach be applied to Genetic Algorithms in general? How can we synthesize, throughout a GA's run, new objective functions, which will help the GA to reach the ultimate objective function?

Another rather different question that comes to mind - Tamar didn't reach the blinking toy, because she started crawling backward. I'm no specialist in these matter, but I allow myself to assume that at some level, she has learned something from this experience. Somewhere in her brain, something has marked that the movements she did were wrong (to reach her goal, i.e. get the toy), and that she should find another way. She will do the same movements once, twice, maybe thirty times - in the end she will grasp that the solution must be found elsewhere. As far as I know, Genetic Algorithms (and all their variants) are always concerned with "good" solutions (i.e. solutions that get good scores on the objective function). I don't think there is any algorithm that knows how to keep track of "bad" solutions, in order to avoid getting back to solutions close to that. A possible approach could be to run a "positive" GA, which strives to improve the objective function, but always keep track of the "bad" solutions. Whenever one the new generation's items is too similar to one of the bad solutions (or some representative of the bad guys), it is disposed of without even checking the objective function, and a new one is created instead. Of course, there is the risk of missing some good solutions that are accidentally similar to the bad ones, but in case that computing the objective function is an expensive task, it may be worth the tradeoff.

Thursday, April 26, 2007

Trying Photobucket

I'm sick of not being able to upload pictures to my blog. So I've decided to try out Photobucket. We'll see how it goes...

Tamar doing a face Photo Sharing and Video Hosting at Photobucket

Tamar Photo Sharing and Video Hosting at Photobucket

Gal dancing at her 3rd birthday Photo Sharing and Video Hosting at Photobucket

Tuesday, April 17, 2007

Leasing

If you live in Israel, and especially if you're a high-tech professional, you are probably aware of the big noise around the expected change in taxes for leasing cars.

Globes just published a great read - an article that compares this issue to some other, more understandable yet completely virtual scenario. Even if you don't subscribe to the point of view that the taxes should indeed be increased, it's still fun to read.

Another interesting read on this subject is a site by the same author (and others), which explains why the taxes should indeed be increased.

If, like many others, you're trying to figure out the actual monetary value of using a leased car, check this calculator out.

Lastly, although I understand all the reasons why the taxes should indeed be increased, I'm still a big fan of leasing cars. I'm a complete car-dummy, and the feeling that whatever the problem, there is someone who will come within a couple of hours to fix it, is worth a lot more than money. So IMHO, if indeed there is a glitsh in the current tax calculation, then justice must be done and the taxes should be updated. I still, however, want to keep the option of using a leased car (as an employee), without being pointed at as if I were a danger to society.

Monday, April 16, 2007

SQL Server Execution Plan - Rantings...

Let me say it outloud: SQL Server Execution Plan SUCKS!

In essence, SSEP (I'm too lazy to keep writing the whole name) is supposed to help you analyze how the SQL Engine performed a query (more specifically what optimization decisions were taken) in order to either alter the query and/or the tables to improve performance. As such, SSEP provides you with a nice graphical interface, that shows every element of the execution along with it's relative cost. You can then see more properties for each element, which supposedly will help you better understand it.

SSEP in SQL2005 has gone through very few changes since SQL2000, and it's a pitty. With a small query, SSEP can help you out - you can easily find the most costly element, analyze it and kill the culprit. With long and complicated queries, however, it's a nightmare. The most prominent issue here is USER EXPERIENCE!!! Let me explain:

In most cases, the first thing you do when you run SSEP is to search for the most costly sub-query and in that you look for the most costly query element. When you have a long and complicated query (which sometimes is inevitable), you can find yourself "surfing" the execution plan for many long minutes, just searching for the elements of interests, never being actually certain you got them all. You just keep scrolling up/down left/right and back again, trying to get a grasp of what's going on. There is a zooming feature, but it just isn't enough! There is no easy way to navigate through the data and search for specific elements. I mean - come on, would it have been so difficult to add, for example, a list of elements with the main properties (say object name, cost and physical and logical operation), allowing you to sort it and jump from that list to the relevant element in the execution plan?

Another USER EXPERIENCE issue is that of the data that comes along with each element. Granted, SSEP is aimed at advanced users (usually DBA level), but would it hurt someone to provide some more explanative descriptions?

Lastly, if you really want to improve the USER EXPERIENCE, you should guide the user (without forcing her hand) to find the real problem as fast as possible. For instance, you could use colors for the elements, such that costly operations would stand out (color table scan in red, index seek in green). Note that these colors should also stand out in the list mentioned above. You could even propose some solutions ad-hoc. For instance, if you see a very costly Bookmark (used to link between indexes and actual data in the tables), you can propose to add an index that will include all the fields required, effectively removing the need for a bookmark (because the engine will read the data directly from the index). There are tons of things you could do to make the user's life easier, and help her finish her job faster, so why not?!?

Tuesday, April 10, 2007

Spam the spammer!!!

I just got a spam mail that made me REALLY angry.

It's from a company that calls itself Mailmedia. The message itself, however, was sent by a user at the lombardisoftware.com domain (I'm pretty sure that the genuine Lombardi company has nothing to do with this...).

What upset me the most with this mail, was the nerve - it was no less than a promotion for businesses to use their services to send spam mails. Yes, that's right - I got a spam mail that is a promotion for a spam company!

But that means that there should be some way to communicate with them, right? Unfortunately, there is no website, or email address - I would have been more than happy to spend a few hours setting up my own spamming machine to kill their website or their email address. I would, really! But they weren't that stupid - they only provided an Israeli mobile phone number:

050-5281978

(Note, I've searched this phone number up in Google and found only one entry in a Gays forum. This guy seems to be searching for a fatherly type... I have no idea whether this is genuine, or just another frustrated spamee who wants payback.)

Now, my request to every Israeli reading this post - please please please - spam this phone number with as many calls as you can. Let them FEEL how annoying it is to be spammed all day long. Let them LOOSE potential clients because the line will always be busy. Make them SUFFER as much as we do from their never ending spam mails.

Be smart: don't forget to HIDE the calling number when you call them (for Orange you must preceed the number with #31#, for HOT you must preceed with *43).

DISCLAIMER: Although I really want as many people to join the spammers spamming effort as possible, I take no responsibility whatsoever to any miscomfort, legal or other issues you may encounter as a result of doing as I just asked.

Sunday, April 08, 2007

On the superfluousness of the Singleton pattern in .NET

We all know the Singleton pattern. It's intent is to "Ensure a class only has one instance, and provide a global point of access to it." (from GOF). There are many possible implementations of it, some are good, others are to be ashamed of. You can find a good review by Jon Skeet, or on Patterns and Practices where you can also find a successful implementation using double-locking mechanism. By the way, the implementation in "Design Patterns in C#" by Steven John Metsker was utterly dissapointing, to say the least.

But this is all assuming you indeed need a Singleton pattern. If you read the Intent, as defined by the GOF carefully, there is nothing in simply using a static class that does not meet this intent. At the time of writing the Design Patterns book, the GOF didn't have C# or .NET. Back then, they were mostly relying on C++ and Smalltalk. I'm not proficient in Smalltalk, but I know that with C++ there were problems with static and global variables. For instance, it wasn't clear what their order of instantiation would be, nor when the destructors would be called. So when you're working with good old C++, you just have no choice but to use the Singleton pattern. With .NET, however, things are different. Instantiation of static classes have a clear and deterministic behavior. With destruction, of course, it's different, but most cases where you need a Singleton you'll keep it alive as long as the program is running anyway.

In the Design Patterns book, there are some other "consequences" which could end up as reasons for using the Singleton pattern instead of simply a static class:

Permits refinement of operations and representation - or in other words - a Singleton class can be extended by means of inheritance. Most cases I have encountered, there is no inheritance going on with the Singleton, and if there is, then the actual Singleton is the last in the hierarchy. Even more so - many implementations use sealed classes either for performance optimizations or simply because the pattern wouldn't work with inherited classes.
Permits a variable number of instances - or in other words - one day, some 5 years ago, some freaky newby will invent a reason to use multiple instances of the Singleton and you should make it easier for the idiot to do the change. One word: YAGNI !
More flexible than class operations - same as the 2 previous items

So, why would you use the Singleton pattern in .NET instead of just a static class?

1. Inheritance - I said in most cases there is no need for it. There is always the exception... You can see one on Jon Skeet's blog.

2. Destruction - when you want to keep a reference counter of the number of consumers, and somehow destroy it when it's no more in use. Complicated, especially given the non-deterministic destruction of objects in .NET.

3. Other static functionality - if you need some static functionality that should not cause the actualy 'Singleton' to be created (any call to a static method will cause the ctor to be called). Note that this is also a problem with most of the pattern's implementations, because they rely on a static field/ctor.

4. Parameterized contruction - this seems to me like the most likely reason to use the Singleton pattern instead of a static class, but here again there is a problem because most of the cleanest implementations rely on the static constructor. The idea here is that sometimes you want the Singleton to be initialized with some parameters, so you can't rely on the paramterless, static ctor.

Now that that's settled, and given the name of my blog - why not using the Singleton pattern?

1. As a general principle, I like using patterns when they are needed, no more. So the main reason why not, IMHO, is that in most cases it's simply superfluous - you end up spending design/coding/unit testing/qa/debugging/etc time on something you don't need. Smart.

2. Some of the implementations are really clean and neat (see, for example, the 4th in Jon Skeet's list, which is the one I used to use before I understood I don't need it).

3. Inertia - Most of us are still in the C++ state of mind that there is no other way to really "Ensure a class only has one instance, and provide a global point of access to it." .

Why not? (Ilan Assayag's blog)