Wednesday, March 28, 2007

I don't know what this guy is on - but I want some...

This is really hallucinating. If this thing is for real, I want to know what this guy takes - and want some for myself...

Oren Zarif is one of these people who claim to have exceptional, supernatural and X-Files-like powers. Apparently, he likes to get it really hard. According to YNET, he has contacted Mike Tyson, proposing a fight. Zarif , weighting 68 kgs, as far from being a boxer talent as one can be, claims to be able to box with his eyes closed. He proposes a one-on-one, with his eyes completely covered. If Zarif loses, he would pay Tyson 5M$. If he wins, Tyson will have to donate the same amount to the US government for the victims of global terror.

For my English readers, the letter sent to Tyson, in passable English, can be found here. Source - YNET.

WTF ?!?!?

A Million Dollar Contest - For Super-Geeks

Netflix is a very large (largest?) DVD rental company. One of their most important assets is Cinematch, an in-house developed movie recommendation system. Cinematch's purpose is to predict, based on a user's previous rating of movies, the rating a user would give to other movies. The result, of course, is a recommendation of movies that are most likely to suite the user's taste.

Apparently, Cinematch works pretty well, but the guys at Netflix would like it to work even better. So they came up with the Netflix Prize: encourage developers and researchers to come up with an algorithm that improves the quality of rating prediction of Cinematch significantly enough (at least by 10%), by promising a prize of 1,000,000$. To make it even more interesting, and since the whole contest spans over at least 5 years, there is a yearly prize of 50,000$ which will be given to the best solution each year.

Being a Machine Learning freak, I find this contest GREAT! I also like the rules of the contest a lot. Basically, the winning algorithm will have to be made publicly available at the end of the contest.

This kind of initiatives are so great because they encourage the development of new concepts and algorithms, which even if they won't win the first prize, might very well be helpful for various other purposes. I think the best comparison is Fermat's Last Theorem. In his will, Paul Wolfskehl initiated a prize of then 100,000 marks to whomever would be able to prove or disprove Fermat's Last Theorem. This generated a huge interest in the subject, which resulted in an incredibly rich amount of new ideas and whole new areas of mathematics being discovered and researched to this day. I doubt that the Netflix Prize will have the same effect, but I do believe it will give Machine Learning a well-deserved boost.

As far as Netflix is concerned - they can only win from it. The news about this contest should inevitably increase their exposure. If the contest succeeds, and someone manages to provide significantly better results - it will be worth much more than 1M$ for them. If nobody manages to win the contest, then they can heartily claim to be using the best movies matching algorithm human brain could come up with do date. Either way, it's a win-win situation for them.

It's a very difficult task, but I think I'm going to give it a try, as far as time permits...

If you're interested, following are the Terms and Conditions in a Nutshell:

Contest begins October 2, 2006 and continues through at least October 2, 2011.

Contest is open to anyone, anywhere (except certain countries listed below).

You have to register to enter.

Once you register and agree to these Rules, you’ll have access to the Contest training data and qualifying test sets.

To qualify for the $1,000,000 Grand Prize, the accuracy of your submitted predictions on the qualifying set must be at least 10% better than the accuracy Cinematch can achieve on the same training data set at the start of the Contest.

To qualify for a year’s $50,000 Progress Prize the accuracy of any of your submitted predictions that year must be less than or equal to the accuracy value established by the judges the preceding year.

To win and take home either prize, your qualifying submissions must have the largest accuracy improvement verified by the Contest judges, you must share your method with (and non-exclusively license it to) Netflix, and you must describe to the world how you did it and why it works.

For more elaborated information, check out the Netflix Prize page.

Yahoo! Mail to become unlimited in size?

According to YNET, Yahoo! announced that as of May 2007 they will gradually update all Yahoo! Mail accounts to unlimited mailbox sizes. I couldn't find any additional source mentioning this, but I trust YNET didn't make this up.

This isn't surprising, of course. Ever since Google introduced GMail it was obvious that at some point they would announce that the mailbox size has become unlimited. Maybe the surprising thing is that the first company to come with such an announcement is Yahoo!, who's been lagging behind Google with most of the previous advances.

This is, of course, a very welcome announcement, hopefullly to be followed by all the other main webmail suppliers.

Cool!

P.S: I think that with this new development, it's time that the ISP's decide whether they want to continue supplying email services to their customers. If they do - then they should dramatically improve the level of their service (speed and mailbox size). Otherwise, they're really making fools of themselves.

Sunday, March 25, 2007

Joke By Code

I haven't blogged much lately, I know, sorry. Between work, kids, buying a house (!!!!!) and a million other things, I just couldn't find a minute.

Anyway, today I was searching for some good books about SQL Server 2005 programming. I have a lot of experience with SQL 2000, but it's about time I get to know the little baby (who's way past the 'baby' stage and would better be called a teenager) a little more.

So I came accross Murach's SQL Server 2005 for Developers and like everyone, I started skimming through the 6 customer reviews. All reviews were really good, except the last one, which seems to have been written by particularly dissapointed customer. So I decided to have a look at the 3 comments to his review. Here is what I found (quote):

SELECT OneHalfBrain
FROM Name
WHERE LName = `Husain'

results: NULL

It made me laugh (the reviewer is called Munawer Husain, so I assume it was meant personally against him, and not in some stupid racist direction).

Tuesday, March 13, 2007

Multithreaded Computation Support in Matlab - FINALLY!!!

A couple of weeks ago, Mathworks released a new version of Matlab (R2007a). They have, at last, added support for multithreaded computations.
A LOT of entries to my blog come from searches like "matlab dual core", which surprisingly puts a post of mine very high in the results list. The thing is that especially now with the increased usage of dual and quad-core machines, running heavy Matlab computations with a single thread is just a waste of resources. Apparently, the guys at Mathworks were listening to the users, and added support for multi-threaded calculations. It seems to be managed under the hood somehow, and requires changes to the preferences. From the release notes:

"If you run MATLAB on a multiple-CPU system (multiprocessor or multicore), use a new preference to enable multithreaded computation. This can increase MATLAB performance for element-wise and BLAS library computations.
By default the preference is not set, so you must set it to enable multithreaded computation. With the preference enabled, MATLAB automatically specifies the recommended number of computational threads, although you can change that value. On AMD-based Linux platforms, MATLAB supports multithreaded computation, but requires an extra step to change the default BLAS."

Doesn't sound a very nice way to do it, and certainly lacks user control, but it's a start. I didn't upgrade to the new version yet, so I can't talk from experience. Also, the new release seems to include improvements to the Distributed Computing Toolbox, which sound also very interesting.

Sunday, March 11, 2007

Google Image Search API updated (again...)

What can I say - the html format was changed again.
You can download the complete code from my article on CodeProject.
If you just want the update - I just had to change the Regex file, so you can simply replace its content with the following:


imagesRegex: (dyn\x2EImg\x28\x22[^\x22]*\x22,\x22[^\x22]*\x22,\x22(?<code>[^\x22]*)\x22,\x22(?<imgurl>[^\x22]*)\x22,\x22(?<width>[^\x22]*)\x22,\x22(?<height>[^\x22]*)\x22,[^\x29]*)
dataRegex: (?<width>[0-9,]*)\s+x\s+(?<height>[0-9,]*)\s+(pixels\s+){0,1}-\s+(?<size>[0-9,]*)(k)
totalResultsRegex: (?<upperLimit>upperLimit>(\s)*)(?<lastResult>[0-9,]*)([^=])*=(?<maxLimit>maxLimit>(\s)*)(?<totalResultsAvailable>[0-9,]*)

Thursday, March 01, 2007

Is it the end of Internet Freedom? A possible workaround...

Politicians are trying to add more and more limits to the freedom and anonymity we enjoy over the Internet. A few months ago, they wanted to force talkbackers to identify themselves - death sentence to talkbacks. Now, they are seriously talking about forcing users to identify themselves when using adults websites.
It's a pitty really. Instead of searching for ways to improve the quality of the service they provide us, they try to castrate the Internet into something more manageable (for them). Also, the philosophy behind this law they are trying to pass, is so anti-democratic and anti-privacy that it makes me want to cry.

Anyway, there might be a way around this. Check out Tor . It's a technology that basically routes web requests (or any other TCP-based communication) through various nodes in an encrypted form, making it almost impossible to track down the original user.