Why not? (Ilan Assayag's blog): September 2006

Saturday, September 30, 2006

Moving to feedburner!

I'm moving my RSS feed to feedburner. Please make sure you register to: http://feeds.feedburner.com/WhyNotilanAssayagsBlog . I think I succeeded in redirecting feeds from the old RSS/Atom urls to feedburner automatically. If you experience any problem - please let me know.

The Multi-Tasking Myth - Revisited

Jeff Atwood strongly recommends to avoid multi-task working. He even backs it up with studies and citations from people who have already proved they know what they're talking about.

In principle, I agree with him, but I would like to set one exception (in French they "it's the exception that proves the rule"). The exception refers to research projects. Very often, I find myself immerged in some aspect of a research. I encounter some problem, but no matter how hard I try, I just can't find a way out. I'm stuck! When this happens, the best way out (at least for me), is to step back. I give my brain a rest, letting it work on other things. This could be anything - some simple programming task, reading a book, whatever. The important thing is to do anything that's as far as possible from my initial task. Then, after a couple of hours, it hits me (usually when I'm in the shower...) - I suddenly understand what I've been missing, and what I must do to get out of my current situation and move on.

So as far as I am concerned - when I have a complicated research project to work on, I always have a backup, low-priority task, preferably a very simple one, to which I can switch in case of stall. The switch costs, of course, but at least I don't find myself investing a lot of energy getting nowhere.

In all other cases, context switching is a huge time-waster and usually causes more harm than good, no doubt about that!

P.S: I'm not talking about literally doing 2 things together (talking on the phone and surfing for example). This is something that always degrades the quality of both actions. I admit - sometimes I do that, but then I'm willing to pay the price...

Friday, September 29, 2006

Wireless Security - the difference between Alfred the Optimist, Luba the Ostrich, Uninformed Max and Sacha Paranoia

Many people use wireless routers at home. It lets you use all your computers without the hassle of having cables all around the house, it's cheap, easy to install, and usually works pretty well. Very few people, however, think seriously about security issues regarding their home wireless networks. Even those who do think about it, tend to think their network is made secure by using some relatively-easy-to-hack tricks. Others, not concerned by security issues, might have half their bandwidth used up by their neighbors without them knowing it. The most shocking part of all this, is that many enterprise IT managers do the same mistakes with their enterprise wireless networks. My purpose today is to shed some light on the vulnerabilities and how they can and are being handled.

First, let's remember that a wireless network is, well - it's a network and it's wireless (duh!). Being wireless means that as long as you are in range, you can read packets sent from the access point and send packets to it. Whether you'll be able to understand the packets or whether the access point will do something with packets sent to it, is another problem. Yet remember - it's all up there in the air!

Second, let's look at the possible vulnerabilities:

Bandwidth "theft"
Network intrusion
Denial Of Service (DOS)

Bandwidth "theft"

Refers to other people (usually neighbors) using your wireless network to access the internet. Since most ISPs charge by bandwidth, if somebody is using your internet connection at the same time as you are - then you get less bandwidth for yourself. I "stole" bandwidth several times during the war when I was staying at other people's houses (at some point I started wandering around the country with my own router to stop doing that). As far as I know, it's not really called "stealing", since the air is public domain, so you can't claim anything about packets floating around. But still, if you have someone constantly using your bandwidth, it can become pretty annoying.

Network intrusion

All browsers, firewalls, etc. use the notion of "trusted" and "untrusted" zones. The idea is that in your trusted zone you have computers you trust will not try to do you any harm, whereas the untrusted zone (everything else) can consist any computer in the world, including some that might be very mean to you, if you just give them a chance. Before the wireless era, things were pretty simple. Anything inside the company/house is trusted, anything else is not. Yet now, when you can connect to the network without being physically even inside the building, it's more complicated. You could, obviously, configure your firewall to treat your wireless network as an untrusted zone. This is good, and is actually the way to do, if you're using a publicly available wireless network service (say at the coffee shop or in the mall). However, if you are at home, and have your printer connected to one computer and want to print from another computer, it's much simpler to have all your home network as a trusted zone (and that's just one example, of course). But this means, that any computer connected to your home wireless network will be trusted, even your neighbor downstairs (unless you protect yourself, see below).

Denial Of Service (DOS)

Refers to a general way of attacking a target, such that some important resource would become unavailable. The idea is basically to perform a huge amount of communication with the target, taking up all it's resources for yourself, thus denying them from other, legitimate users of the resource. There are various means of performing DOS attacks on a wireless network, resulting in disrupting legitimate usage of the network. Actually, wireless networks are very poorly protected against such attacks. Anyway, I'll keep this issue out of the discussion for today.

OK, so we're focussing on bandwidth theft and network intrusion. How do we avoid them? Well, let me introduce you to a few friends of mine: Alfred the Optimist, Luba the Ostrich, Uninformed Max and Sacha Paranoia...

Alfred the Optimist

Alfred believes people are fundamentally good. He might not even realize there are any vulnerabilities in using a wireless network - why would anyone want to do something bad to his network? Why would someone use his bandwidth, if that person could pay for it himself? Alfred installed the router on his own, keeping the factory settings without changing anything except what was absolutely necessary to actually connect to the internet. He has been experiencing some slowness in his internet connection, especially at night, when he knows his neighbor likes to download illegal music files. But, heck, you know, that's how it goes with the internet - sometimes it's faster, sometimes is slower. Yeah, yeah, he's got occasional annoying pop-ups and he has to reboot 3 times a day otherwise it takes 7 minutes to load notepad, but who hasn't a few problems once in a while? (Note: Most Alfred's don't use any firewall for the exact same reasons, and don't understand how the salesperson managed to convince them to buy that anti-virus license...)

Luba the Ostrich

Luba is a computer programmer. She's not an expert in networking, nor in security, but she's got her BSc and understands both the problems of bandwidth theft and network intrusion. She knows there are various protection method, but she's smart enough to know everything comes with a price - if you're going to secure everything on your network, you will have to pay in performance! And Luba doesn't like to pay in performance!!! So she pokes in her router's configuration options and sees it's possible to disable broadcasting of the SSID. She searches a bit about that, maybe tries it out, and then understands that once the SSID is not broadcasted, her network cannot be found. Hey, that's cool! If nobody knows my network exists, they won't be able to connect to it - so I'm safe. Luba knows, of course, that if this was enough a solution, then there wouldn't be so many other options in the security tab of her router. But as we said, she's an ostrich, so she keeps living in denial. One day, Luba's conscience starts to bother her - maybe my network is not secure enough? So she goes back to poking around at her router's configuration settings, and then she discovers you can configure an access list - a fixed list of MAC addresses that may connect to the network. Coooool - she quickly runs "ipconfig /all" on all her computers, writes down all her wireless adapters' MAC addresses and fills the list. That's it - she can now sleep peacefully, certain she's protected against both threats.

The truth is that in most cases, Luba will be fine with either of these options (especially with both). Yes, you read correctly, in most of the cases, not all. And here is why:

Not broadcasting the SSID doesn't do anything to prevent someone from connecting to the network. All routers have a default SSID name. If Luba didn't change her router's SSID, then there is a good chance that many people have networks with the same SSID as she has (all aother Luba's out there). So if I used to be connected to a network with that SSID, and I find myself in the area of Luba's house, my laptop will automatically connect to her network, although she doesn't broadcast her SSID.
Even if she did change her SSID, that it is not broadcasted doesn't mean that it is not visible. Each packet Luba sends from her own PC holds her SSID (unencrypted). So if I use some kind of wireless sniffer (links at the bottom), I can easily discover any wireless network currently in work, including Luba's.
Similar to the SSID, in an unsecured wireless network, the MAC addresses are also transfered unencrypted, so once I've catched a valid packet with my sniffer, I can very easily spoof the MAC address and use hers instead of mine in my packets.

We must admit - there is no real chance anyone will do much effort to run a sniffer and then spoof her MAC address just to steal some bandwidth. So as far as bandwidth theft is concerned, with both SSID broadcasting disabled and the use of an access list, Luba is practically immunized. However, is she protected against intrusions? Well, partly - if the intruders just try to enter any wireless network to create heavoc - they won't be willing to do much effort (there are too many Alfred's out there to waste their time on Luba). However, if she's concerned about people trying to get specifically into her network, then they are most likely to be ready to make the effort, and then Luba is in trouble...

Uninformed Max

Max used to be an Ostrich, like Luba. One day, when he had his head deep under the ground, a big fat bull came around. The bull was horny, and, well you know... Anyway, since then Max has become a little bit more cautious (and has some difficulties sitting down...). He uses a specific SSID, changed his default administrator password on his router, has disabled SSID broadcasting and uses a MAC access list. In addition, he wants his network to be secure. Looking at all the possibilites, he chooses the one that seems the simplest, yet secure - WEP. He feels good an cosy with his own secured wireless network.

The problem with Max, is that nobody told him that WEP is sooo not-secure that it's just a waste of time and energy. Look at this and this and please don't miss this...

Sacha Paranoia

Sacha is one of a kind. He changes his SSID once a week and uses a 127 character password for his administrator account that changes each time he logs in. Asking him whether he disabled SSID broadcasting or uses MAC access lists could cause him to rip your head off just due to the insinuation that he might have missed that. Sacha knows everything there is to know about WEP's shortcomings. Until last year he was using WPA. Now he uses WPA2. Although his network is as secure as currently possible, he's had problems sleeping at night, imagining minuscule ET's wandering around his precious network. Last week he almost had a heart-attack, when only one of his 3 firewalls succeeded in blocking an attack. Actually, it wasn't a real attack - for some reason his firewall thinks his fax is a malicious software enemy. So Sacha disconnected his router completely. Actually, he's disconnected his computer from the internet altogether. From the power source as well - to be on the safe side. He is now working in his garden, watering his flowers. His hard-disk in his back-pocket, just in case...

Interesting software

Netstumbler - For mapping active wireless networks

Airsnort - Can be used to extract the WEP encryption key

Ethereal - Network protocol analyzer

Thursday, September 28, 2006

Working at Google

Joel Spolsky points to Steve Yegge's view on Agile. I've personally never had the chance to work using Agile methodologies - it was inconceivable in the companies I used to work for, and now I'm mostly involved in long-term research projects, where development is minimal.

Anyway, I think that the subject of Steve's post is a little misleading. In addition to depicting why he thinks Agile is mostly bad (comparing it to good and bad cholesterol), it also provides a very interesting insight of how it is to work at Google. Like many of the commenters, I think in most companies it's not a realistic approach. Nevertheless, I'm pretty sure every company could learn a few things from them, and at least adopt some of it.

Anyway, I really enjoyed reading that post - I usually avoid reading long posts, but I'm happy I didn't skip this one :-)

Wednesday, September 27, 2006

Hypocrites

I'd like to add my 2 cents to Roy's post about U2U's decision not to work with Israelis.

The way I see it, politics and business are two very different think, and mixing them together is wrong. The base for this statement is that:

That's not the way to settle things. If anything, I think that good business, which leads to good human interaction, could be much more helpful in promoting political ideas than not doing business at all.
It's a collective punishment, much like the same things Mr. Uyttersprot is trying to go against.

China is one of the most oppressive countries against its own people - would you stop making business with all Chinese companies, stop buying anything Made in China, because of that?

The American use of Guantánamo Bay detainment camp has been controversial to say the least - would you stop making business with all American companies because of that?

Saudi Arabia's government states that all citizen must be Muslim. The religious minorities are not allowed to have their churches or temples or pray in public. There are also very harsh laws oppressing women. I suggest you'd stop buying all petroleum-based products...

Nike has admitted abusing its workers in the Asian continent. Not buying their products would make MUCH more sense, since it would directly hurt the abuser in question. Yet, I'd bet you'll find at least one pair of Nike's in Mr. Uyttersprot's closet.

I'm not saying Mr. Uyttersprot is right with his political claims. All I'm saying is that there are many aspects to this war, every Israeli agrees with some of the decisions taken during the war and disagrees with others. Like Roy, I believe it's almost impossible to pass a judgement when you haven't been there, let alone when you don't know all the facts. In any case, and even if you do subscribe to the political views of Mr. Uyttersprot, I don't think it's wise to mix business with politics. And if you do - please, have the decency to remain consistent with your claims: stop doing business with China, USA, Saudi Arabia and almost any other country on the globe (including Belgium, which has a history it would rather forget, oppressing the people in Kongo). And if you're not, then you're just a hypocrite.

[Note: I was born in Belgium, live in Israel and hold both nationalities.]

Tuesday, September 26, 2006

Search enhancement - take 2

Yesterday I suggested a way to leverage the properties of blogs to enhance searches. This made me think of some additional enhancements possible, if you try to use the type of content at hand.

Let me explain:

Current search engines employ 3 basic information sources to retrieve the most relevant results:

The actual text in the web page.
The structure of the text (i.e. headlines vs. simple content, various HTML tags, etc).
The structure of the web - we all know about PageRank.

What is common to all 3 sources is that they don't seriously differentiate between various types of web pages (enterprise vs. private homepages, blogs, newsgroups, news channels, e-commerce, etc.). This isn't completely accurate, since it is possible to perform searches that only search in specific sources of information (newsgroups, blogs, etc.), but that's not the point.

What I feel is missing is an intelligent usage of the structure of each type of web page.

Some examples:

Blogs, news channels - why not implement a voting mechanism (similar to PageRank or other) that takes into consideration the number of talkbacks, the number of registered RSS clients, etc.? Even if a webpage has a low PageRank, if it has a large number of commenters or many RSS subscribers, from many different places, it may indicate that the site is much more important than it may seem.
Newsgroups - number of threads, number of users, etc. I don't see many links to newsgroups on the web in general. Yet, some are very active. Then why not use additional, newsgroups-specific parameters to measure a newsgroup's relevance?
e-commerce - Many price-comparison web sites allow their users to rate products and write comments about them. I'm sure that the more products are being rated and the more raters there are, the more chances there are that the price-comparison site is a good one. Even more so - products/vendors with many/high rating accross price-comparison sites should be promoted.
Professional Magazines - Many give users the possibility to rank and write comments about products (like CNET), others give users the possibility to give feedback about the quality of the articles (like MSDN). Why not use that as part of the retrieval process?

What do you think?

Monday, September 25, 2006

Leveraging blogs for search enhancement

Here is an idea - any remarks/suggestions would be welcome.

One of the most important differences between a blog and a simple web site, is that a blog changes all the time. The same person could post 10 different posts on 10 different subjects. Each would be related to that person, yet the only place where they get all connected together is in the blog.

Let's imagine I met someone some time ago, we had a nice little chat, and I gave him my business card. Unfortunately, when he sent his pants to dry-cleaning, he forgot to take out the card (sounds familiar?). He remembers my first name is Ilan, remembers I live in the northern part of Israel, that I work on my thesis and that I like making soups.

Had he remembered my family name is Assayag, things would have been fine - a search for "ilan assayag" on Google brings my blog as one of the first entries. Yet he doesn't remember my family name...

What would be really cool, would be to search for everything you know about the person, and get some kind of aggregated results. So in this case, that person could search for "ilan soup thesis north israel". The search engine would know that search results from blogs should be aggregated in an intelligent manner. A simple solution could be to look at ALL the post from the same blog as if they were ONE single document (although I suppose you could come up with better solutions). If this would be possible, then I'm pretty sure that a search like "ilan soup thesis north israel" would return the correct answer, even though each element of the query can be found on another post within the same blog.

Note: I know that it would be simpler to search for "ilan blog", which, in this case, would be sufficient, although I'm pretty sure that if my name was something else, such a simple query would just not be enough.

Sunday, September 24, 2006

Three Ways to Inject Your Code into Another Process

Look at this amazing article!!! (by Robert Kuster)

Matlab R2006a impressions

Until recently I was using Matlab 6.5 (R13) for both my academic and professional research projects. I tried using Matlab 7 (R14) several times (university, friends, etc.) but the UI was so slow I never seriously considered upgrading.

Along came Matlab R2006a, which I am now using a lot. The UI is still much slower than 6.5, but not slow enough to become a showstopper. Here is a short summary of my impressions of R2006a, with emphasis on the new features interesting to my work:

Overal performance:

Calculations - I really don't know. Supposedly, some of the functions were optimized (I ran into at least one or two) but I really couldn't tell I felt a difference (though it doesn't mean there isn't)
UI - Version 6.5 (R13) was very stable, fast and with a relatively low memory footprint. Version 7.0 (R14) was such a nightmare in terms of UI performance that it was mostly unusable. Version R2006a is much better than version 7. It's not as responsive as 6.5 (especially startup), but it's absolutely usable. Also, it has some cool added features (enhanced debugging, tabbed windows, etc) that makes it worthwhile.

Support Vector Machines - It's the first version with an SVM library. It's extendible and rather simple to use, but is slower than SVMLight. Also, it seems to me that in terms of accuracy it is also inferior compared to SVMLight (although I'm sure it depends a lot on the problem at hand). I have the feeling that their SVM support is still at its infancy, and I dare assume it will evolve with their next versions. For now, it's just a set of function in the Bioinformatics toolbox... Anyway, if you want a simple way to use SVMLight from Matlab, try this out .
Genetic Algorithms - The GA library is a real kick! It has all the features you can find in other libraries available online, and more. Also, the UI is very nice and easy to configure. It also generates graphs out-of-the-box that are much more informative than those I've been using so far. The downside is that the documentation sucks (especially when using bit strings) and that I had to debug and solve 2 bugs in the original Matlab code (!!!) to make it work properly with bit strings (I may post on this some time in the future). This being behind me, I think it's a very good library in all important ways: seems to do a good exploration of the search space (not too susceptible to local minima), fast execution, easy configuration, cool graphs, etc.
Distributed Computing - I only had a short time to test it at a remote location. Yet I couldn't make it work. The architecture is very simple, but the various components kept getting stuck and loosing connection for no apparent reason.

P.S: I'm posting this using Windows Live Writer - interesting to see how this turns out...

Thursday, September 21, 2006

Google Image Search API - now available in Chinese

I'm happy to say that my little article on CodeProject has been translated to Chinese. I know several languages, but Chinese is not one of them, so I have no way of judging the translation :-)
Anyway, I'd like to thank hidecloud for the efforts and time put in this, and I hope it will now be helpful for some more people.

Wednesday, September 20, 2006

Windows Updates Has Got Some Nerve !!!

Windows Update is a great, indispensable, tool. With all the threats around the Internet, using a non-updated machine is simply stupid.
Yet, it is lacking one huge problem - configurability !!!
The configuration options are so limited I don't know whether to cry or to laugh.

I am using a server to run various CPU/memory/time-intensive tasks. Last week I was running a task that was taking 100%CPU for about a day and a half (and it had several more days to run), when suddenly "poof", the server rebooted by itself. I looked at the monitor in the morning, unable to understand why I get the logon screen, when I knew I had this quite heavy task running. A quick look at the Event Viewer showed me that the computer rebooted at 3:30 AM. Right before that, I see the Windows Update Agent, prouding himself of having finished downloading updates and being ready to install them. Since this happened to me already a long time ago, I knew exactly what happened - my server is (sorry - it WAS) configured to run the updates automatically, and some of the updates required to reboot the computer. So Mister Windows Update decided, without asking my opinion, to simply reboot and kill everything that's on its way. Mister Windows Update is so vain that He doesn't even need to let you know He did it - if you really must know, just figure it out implicitely from the Event Viewer.
BTW, when I said it happened to me a long time ago - it wasn't really to me. It happened at a customer (very large investment bank), suddenly their server rebooted in the middle of the night, without prior notice. Since the server was dedicated to the application I was responsible of, I had to figure out, what was going on remotely (I was in Tel Aviv, the server in London, and many security constraints in the middle). It took me much longer back then, since I had no clue what was going on, and like a polite developer my first assumption was that something is really wrong with my software. It took several iterations, me feeling guilty and stupid not to know how bad my own software can behave, until I found the real culprit.

Why can't we configure things like this (each, of course, should have many possibilities):

Reboot only for high-risk security updates
Announce the reboot X time in advance, both on the machine and by mail/SMS/whatever
Download automatically only updates related to X,Y,Z applications - the rest doesn't interest me
Install automatically only security updates - the rest leave to me
Download at time X, send email/SMS/whatever in order to give the administrator the time to run the install in an orderly fashion, if he doesn't then install at time Y
Before installing, send a message to the currently logged-in user, to let him overrule (i.e. postpone the install), send an email/SMS/whatever
Install/reboot only if the computer has been idle for some time
If you are configured to reboot, and really have to kill the rest of the process population, have the decency to let a note behind, in the form of mail/SMS/Event Viewer/whatever

I can go on forever, but you get the point.

Thursday, September 14, 2006

Some interesting search engines

A couple of weeks ago I posted about a way to get more results from your search engines (i.e. beyond the number of maximal results for a query). I then referred to Ask Jeeves, which has been along for quite some time, and provides a nice query refinement interface.
I was asked whether I know of any more search engines, which provide more features than the features available by the 3 giants (Google/Yahoo!/Live). Well, here's my personal list (in no particular order):

A9 - Simultaneously runs your search on various data sources. For example, you can enter a query and it will search for images, books, web and wikipedia all at once. The results are then presented, all on one page, but in a separate columns for each data source.
Vivisimo - Clustering of query results into groups of related results.
iBoogie - Clustering of query results into groups of related results.
Mooter - Clustering of query results into groups of related results. The clusters are presented in a graphical form, and when you drill-down you get a similar tree-like interface like the others.
DogPile - Simultaneously searches all major search engines (Google, Yahoo!, MSN, Ask Jeeves and more). The results are presented in aggregated form, with an indication of which engines returned it.
KartOO - Clustering of query results into groups of related results. The clusters are presented in a graphical form. Their UI may seem somewhat complex, but it's pretty cool once you get used to it.
WiseNut - Groups related results into categories. It generates much less categories than other similar engines.
Infonetware - Clustering of query results into groups of related results.

If you have anything to add, please let me know :-)

Wednesday, September 13, 2006

What is my mother tongue?

I was born in Antwerp, Belgium. Although the local formal language is Flemish, at home and with my friends I spoke exclusively French. Most lessons at school were in Flemish, and so was the language we had to use on the streets.
At the age of 13 we moved to Israel. With my family I kept speaking French, but everything else was Hebrew. Well, except for my baby sister, who was 3 when we moved to Israel, so with her I speak a mix of French and Hebrew. All my studies from high-school to Masters Degree were in Hebrew.
For 10 years now I have been working with computers, all the scientific and other professional material I read is in English, 99% of my mail communication is in English, I like to read books in English, etc.
I type at least 10x faster in English than any other language.
I read the fastest in English, then French and rather slowly in Hebrew.
There is no language I can (hand) write such that other people than me would be able to decipher what I wrote. And that's when write slowly.
When I write (or type) in French, the mistakes I make suit a 13-year old.
I dream usually in either French or Hebrew, but once in a while I even dream in English.
I speak Hebrew with my wife and French with my child.
People say I have a Hebrew accent in French, a French-Belgian accent in Hebrew and a mixed French/Hebrew accent in English.

Now you tell me - what is my mother tongue?

Wednesday, September 06, 2006

Finding Great Developers

Joel has once again published a very interesting article, this time about Finding Great Developers.
The most interesting part, to me, was his description of their internship and how they guide it to result in perfect recruitments.
For my part I have been responsible of recruiting several employees in the past. In the company I worked for, this kind of long-term pipeline was not an option - I usually had only a few weeks or a few months to find the person I needed, and I always needed somebody with experience.
In my experience, and in contrast with Joel's, employee referrals have always proved to be the best source. The most important part, though, is to interview the referring employee thoroughly, before even starting the process with the candidate. I have always seen that when you talk to someone who knows the candidate, if you ask the right questions, you can get a very accurate idea of what to expect. The clue (and it's IMHO the base for everything related to recruiting) is to ask open questions, that will force the referrer to tell the things you want to hear.
I agree that there is a conflict of interest, when you give a bonus to your employee for a recruit, but if you know your employee, trust her, and interview her thoroughly about the referred friend, you should be OK. Also, although non-compete agreements are sometimes used here in Israel as well, as far as I know, the legal situation here makes it very difficult to enforce them (they are usually overruled by a law for free employment).

Why not? (Ilan Assayag's blog)