Why not? (Ilan Assayag's blog): 2007

Thursday, December 13, 2007

Chain Letters Are Worse Than Viruses

Yesterday I got once again a Chain Letter. In case you don't know - I HATE CHAIN LETTERS!!!

This time, it was a pseudo virus alert. When I complained with the sender, urging her to stop sending me chain letters, she said: "But it's a VIRUS alert! I can't take the chance you'll miss it!"

What people don't understand, is that if 50% of the users would think like this sender, there would be no Internet. Nada, zip, nil, rien du tout, nothing, niets, kadachat...

Just do the math:

Let's assume 50% of the people believe in those nonsense and send such a virus alert to 20 other people.

Now assume it takes on average 5 minutes from the moment you get the email until you forward it (some a little more, some a little less).

In this worst case scenario, we flood the Net with 10^12 emails after 1 hour. Keeping the 5-minute window I assumed above, it's more than 3 BILLION emails per second. No need to calculate how many emails would be sent after 2 hours - there would be no Internet by then.

Fortunately for the Internet, most users know better than to forward Chain Letters...

Wednesday, December 12, 2007

AI AI AI AI AI ...

My curiosity has been arisen big time. Apparently, there is a new kind of malware, which involves the use of natural language dialogue to extract information from users, in the disguise of a flirtatious conversation. It's called CyberLover and was apparently developed in Russia. According to PC Tools, this program can converse with a human for 30 minutes without the dude being able to see he's talking to a robot.

The Turing Test has officially been passed...

Read the original warning issued by PC Tools, or an article at ComputerWorld.

AMAZING!!!

Tuesday, December 04, 2007

A Killing IDE Feature I Would Like To See

A long time ago I used to use BugTrapper - an application that sits on the production server, records every instruction, and makes it possible to "play" everything back, step by step. It's a great tool to analyze bugs and especially crashes "post mortem", as long as it's not related to some obscure race condition (the overhead of using BugTrapper often ruling out the race in the first place).

I think there definitely is a case for applications like this, and the fact that Mutek hasn't been able to push itself farther into developers' awareness is quite surprising to me.

The feature I would like to see in an IDE is a mini-BugTrapper. I would like the IDE to be able to record up to a certain amount of instructions (say up to 100,000) during debugging. How many times did you stop at some breakpoint and suddenly realized you should have put this breakpoint a little bit earlier in the flow? You really need to see the value of some parameter, or the actually executed flow, a few steps back - but you can't. The only thing the IDE gives you is the static current call stack - which just isn't enough. You want to know what variables caused you to get into that current call stack, but that's beyond the scope of the IDE's features.

That's, IMHO, a killing feature that could significantly boost debugging time.

Sunday, December 02, 2007

Is GOTO always evil?

The other day I decided to use a "goto" statement in my C# code. It was a difficult decision to take, and was primarily motivated by the need for readability.

Apparently, Linus Torvalds also thinks there are cases where "goto" is appropriate, so I'm in good company...

Thanks to Scott Hanselman for pointing to this thread.

How do you Exactly Approximate??

Let me quote from MSDN about System.Double:

"A mathematical or comparison operation that uses a floating-point number might not yield the same result if a decimal number is used because the floating-point number might not exactly approximate the decimal number."

I found this funny, go figure...

Thursday, November 15, 2007

How consistent should a blogger be?

I recently got back to reading Jeff Atwood's blog, after a long pause on my part. I was very surprised to see advertisements there, especially since I remembered him discussing this issue in the past - opting for the negative. Of course, one can always change his mind. But still, I find it quite funny. Reading the last few comments on that post you can read him agreeing that advertising on a blog is "like advertising on your business card".

Apparently he now doesn't mind advertising on his business card ...

By the way - I found Jon Galloway's comment hilarious...

Tuesday, November 13, 2007

VB Grrrrrr...

Yet another MSDN and .NET WTF:

I'm currently implementing some temporary code that was written in VB into our C# infrastructure (let's skip the details). Anyway, there are parts in the VB code that I would like to group because either they are currently not being used, or for some reason I want to hide it from view and get to it at some later phase.

Obviously, a #Region directive seems like the best solution.

My VB is quite rusted, but using common sense I tried using the same syntax I'm used in C#. But alas - it didn't work. So I searched MSDN - hey, it should work! Well, VB have this little difference that the identifier_string MUST exist and it must be surrounded by quotation marks. OK, no biggy, I usually put it there anyway.

But why doesn't it work?

Well, there is this tiny little limitation, hidden from you if you rely solely on MSDN, that "'#Region' and '#End Region' statements are not valid within method bodies."

Which raises two questions:

1. Why, in Heaven's name, should there be such a difference between C# and VB. It's just a freaking compilation directive!

2. Assuming there is some justified reason for that (which I doubt - I guess it's just a non-implemented feature) - would it hurt someone to put this info in MSDN so I won't have to get crazy trying to figure out why it doesn't work?!?!

And now to a personal to-remember note:

I relied solely on Intellisense and Resharper to know the code is wrong. I'm using the C# only version of Resharper, so I have no idea whether Resharper would have been more helpful. Anyway - had I compiled from the beginning (or at least looked at the message Intellisense gave me) - I would have seen much sooner why it doesn't work...

Monday, November 12, 2007

Could the World become a Better Place?

Shai Agassi is a person who doesn't have to prove himself - he's done it 400,000,000 ($) times and much more. I had the chance of working for him at TopTier, though at the time he was mostly in the US and I don't think he'll remember me.

He's now investing all his power in the Project Better Place, with the goal of transforming our fuel-based cars into electric-cars, by providing the necessary infrastructure and business plan.

Will he succeed? I sincerely hope so. I am also willing to be one of his first customers for a pilot plan in Israel. If anyone is to succeed in such a project - it's him.

Some say his real goal is to own the software that will handle the whole system. Well, I think that if he succeeds in this project, it makes complete sense and there's no harm about it. If making this world a Better Place means Agassi will own the most important software in history - so be it!

In any case, I admire him for being ready to risk his most valuable asset - his reputation - for this huge and very risky project.

If you want to keep track of what's going on, I suggest you read his blog.

Good luck Shai!

Citations of the day

By Henry Kissinger:

Military men are just dumb stupid animals to be used as pawns in foreign policy.

If everybody is your enemy, then you are not paranoid.

Power is the ultimate aphrodisiac.

Corrupt politicians make the other ten percent look bad.

Source: Wikipedia

Sunday, November 11, 2007

XP Printing issues - Grrrr...!!!

D., my colleague came to me today asking for help with printing problems. At some point I decided to log off, but then I couldn't log on again. The machine kept saying that the current time is different than the network time. I couldn't log on with any of our domain users (including Administrator), so I had to log on with the local Admin user. Looking at the time, it looked fine. After digging and searching for 15 minutes, we discovered that the date differed - running some simulation D. had to move his clock one day ahead and forgot to move it back.

Once we moved the clock back to the right day, he could print flawlessly!

Things gives rise to many questions:

1. Why couldn't he print when his day was wrong? I understand there are synchronization issues involved - but in 2007 (almost 2008) these things shouldn't happen! Sometimes you really can not have all your computers synchronized. That's life!

2. Why did the error talk about time and not date?

3. Why couldn't we even log on with a domain user?

Grrrrrrrrrrrr........!@#!@#!@#@!@#!@#!@#

Wednesday, November 07, 2007

Supermarket 2.0

A paraphrase of Web 2.0 to real life - GREAT!

Monday, November 05, 2007

Parameterized SELECT TOP

Just a small tip:

Say you want to run a query that returns the first @X results of some query, such that @X is a parameter.

The simple "SELECT TOP @X ..." statement doesn't work. I found this weird, because I know for a fact that SQL 2005 supports using a parameter in the TOP clause (as opposed to SQL 2000 where you had to use either dynamic queries or the SET ROWCOUNT statement).

Solution - surround the parameter with parenthesis as so:

"SELECT TOP (@X) ..."

Something to remember...

Monday, October 29, 2007

We're Hiring - Searching for a Quantitative Analyst

For the past few years I've been working for a global investment firm. At the beginning I worked part-time as a freelancer, but recently the company opened an R&D facility in Herzliya and now I'm a full time employee again. We're now looking for exceptionally talented quantitative analysts, to participate in the development, validation & documentation of risk & analytical applications and processes.

Our firm has been engaged for over 10 years in the research and development of systematic trading models for the global financial markets. The developed models are used in the actual money management of clients' portfolios.

Role will involve working on new risk management models, data extrapolation, and trading model analysis and development.

Requirements:

· MS or PhD in Math or Physics

· Exposure to stochastic calculus

· Excellent general modeling skills

· Grasp of PDE’s and Monte Carlo

· Experience in statistical data analysis or signal processing is a plus

· Experience in Matlab \ C# is a plus

· Financial knowledge is a plus

To apply, fax your resume and a cover letter to 09-970-7329 or email careers@eaglets.com

Grrrr... I hate bad documentation! or: "An INSERT EXEC statement cannot be nested"

I recently had to extensively use a feature in SQL I seldom used before - inserting the results of an EXEC statement directly into a table. Sounds a reasonable thing to do, right? Accidentally, I had a bunch of stored procedures calling each other, using temporary tables, and finally filling a "final" table. I finished writing the whole thing, and then - BABOOM! - it doesn't work, because "An INSERT EXEC statement cannot be nested". Huh? What?
Why? Why didn't you say so before?

Following is a quote from SQL Server Books Online for "INSERT (Transact-SQL)" :

execute_statement

Is any valid EXECUTE statement that returns data with SELECT or READTEXT statements. The SELECT statement cannot contain a CTE.

If execute_statement is used with INSERT, each result set must be compatible with the columns in the table or in column_list.

execute_statement can be used to execute stored procedures on the same server or a remote server. The procedure in the remote server is executed, and the result sets are returned to the local server and loaded into the table in the local server.

If execute_statement returns data with the READTEXT statement, each READTEXT statement can return a maximum of 1 MB (1024 KB) of data. execute_statement can also be used with extended procedures. execute_statement inserts the data returned by the main thread of the extended procedure; however, output from threads other than the main thread are not inserted.

Did you see any clear mention of this limitation???

Thursday, October 25, 2007

SQL Table Variables - Limitations

I've come to really like table variables in SQL. They are a nice and natural development of database programming. And, when used smartly, can greatly improve performance compared to using temporary tables.

Today I've discovered that table variables cannot be truncated. I had no idea...

Anyway, I've found a cool source of more information about table variables, including some stuff that is not or not well documented in Books Online. Check here and here.

When you think of it - it makes sense. As far as I know, table variables are memory constructs whereas temporary tables are kept in tempdb. Truncating a table simply removes the reference of the table's page from the database. However, if the table is only kept in memory and in no actual database - it has no meaning...

Wednesday, October 24, 2007

I type at 48.1 wpm and 97.2% accuracy

I took the typing test everyone talks about, here are the results:

Number of words typed: 144
Test duration: 3 min
Speed: 48.1 words/min. (240 keystrokes/min.)
Error penalty: 4
Accuracy: 97.2%

It's good, but I've got a lot to improve. I must say it made me quite nervous - causing me to make several time-consuming errors (didn't get high accuracy for nothing). Also for me, writing code goes way faster (even before counting Intellisense, Resharper and the like in) than text I've never seen before. There are some words, or parts of words that I can just type at lightning speed (new, class, for, if, the, select, from, ...)

Confession

Scott Hanselman, THE blogger, has a very interesting post about The Five Second Rule. Actually, the comments are the most interesting...

I was born in Belgium, moved to Israel at 13 and have been living here ever since (almost 20 years). I've never heard of this rule.

I must, however, confess that when I was a little child I would scratch chewing gums off the ground and eat them with no second thought. I did this until one day my sister Tali saw me, utterly shocked, and gave me an important lesson in hygiene...

Tuesday, October 16, 2007

I miss my const methods

Yet another thing to add to the list of things I miss in C# - the ability to define a method in a class that is guaranteed (by the compiler!) to not change the state of the object (like const methods in C++).

I don't understand the implementation of System.Double

Take a close look at the implementation of System.Double (use Reflector or any other dissassembler).

Look, for example, at the following excerpt:

public const double PositiveInfinity = (double)1.0 / (double)0.0;
public const double NaN = (double)1.0 / (double)0.0;

Huh?!?! What ?!!? Why ?!?!

Of course, I had to run and check:

double myNaN = double.NaN;
bool isInf = (double.IsPositiveInfinity(myNaN));
// isInf is set to FALSE

double myInf = double.PositiveInfinity;
bool isNaN = (double.IsNaN(myInf));
// isNaN is set to FALSE

It doesn't end there. Look at the implementation of IsNaN:

public static bool IsNaN(double d)
{
    return (d != d);
}

All'n'all, I must confess I can't make any sense of all this!

I promise to publish if/when I do.

Google - what are you cooking

[UPDATE: Apparently I was a little bit late at noticing. See here for a news article on the subject dated October 15th.]

Only a week ago, my GMail account had an allocation of less than 3GB. Two days ago it crossed the 3GB line, and now it's almost 3.5GB.

Call me crazy, but I think there is something behind this. Is Google about to (at last!) remove the "Beta" label from this great product? Are they about to take over some big data storage firm?

What's up big G ? You're up to something, I know it...

Why Double.IsNaN

If you have a double value set to NaN you can check that it's NaN only through the Double.IsNaN method. MSDN quote: "Use IsNaN to determine whether a value is not a number. It is not possible to determine whether a value is not a number by comparing it to another value equal to NaN."

In other words, there is no guarantee that (Double.NaN == Double.NaN).

A simple question - WHY ?

Sunday, October 14, 2007

Wine is better than Water

I just got this in my mailbox. I rarely read, let alone forward, this kind of mails. Yet, it came from a person I trust and who avoids sending stupidities (thanks Yuval!) and I really feel I ought to pay it forward...

As Ben Franklin said:
In wine there is wisdom, in beer there is freedom,
in water there is bacteria.

In a number of carefully controlled trials,
scientists have demonstrated that if we drink 1 liter
of water each day, at the end of the year we would
have absorbed more than 1 kilo of Escherichia coli,
(E. coli) - bacteria found in feces. In other words,
we are consuming 1 kilo of poop.

However, we do NOT run that risk when drinking wine
& beer (or tequila, rum, whiskey or other liquor)
because alcohol has to go through a purification
process of boiling, filtering and/or fermenting.

Remember: Water = Poop, Wine = Health

Therefore, it's better to drink wine and talk
stupid, than to drink water and be full of shit.

There is no need to thank me for this valuable
information: I'm doing it as a public service.

Defining a Startup company

First, I would like to draw your attention to this post (and blog in general):

Defining a Startup company « First steps in the Hi-Tech sales world .

It's maintained by a guy (I think) who writes anonymously about his experience in the Hi-Tech world. I usually don't fancy anonymous writers, but this is the exception that confirms the rule - by being anonymous he allows himself to be free of any politically-correct barriers, and not being afraid of being sacked by writing what he thinks about his work and employer. The result is well-written, interesting, thoughtful, funny, and still written with style and manners. Give it a try, you'll like it!

Anyway, I'd like to asses the issue of defining a Startup company. Inevitably, I had to check what Wikipedia had to say. The emphasis there is more on the company's age ("limited operating history"), size and potential growth. In general, they are more concerned with the financial aspect of the matter - a Startup company is a company that is highly scalable and can produce huge ROI very fast. They also mention that it usually focuses on creating a product, rather than providing services.

In The Beginner's other post and comments, people related more to the atmosphere at work:

Not Startup (anymore):

"When you have a purchase order system in place that is automated and requires the signatures of half the company (company size is irrelevant here) for a purchase of a paper clip."

"Feeling like a small bolt in a large machine where you can no longer influence anything"

"people covering their asses by adding the whole company on every email they send"

"working your ass for some idiot that has a reserved parking space that is never occupied before 9:30 am or after 5:30 pm, who is earning 5 times your pay and who’s only donation to the company is when he keeps his mouth shut and doesn’t generate any trouble by releasing another stupid decision"

Still Startup:

"A small place, good atmosphere, people working all to one goal."

"the added bonuses - a pool table, cool offices, or some other crap"

IMHO, there is no single definition to a Startup company. I've seen companies of 15 that could never be considered a Startup, and worked together with almost 150 other employees for a superb Startup. In every company you have smart people, and those who are smarter, you have "the company ass-hole", and after you've worked for a place enough time - some things start bugging you badly, Startup or not.

So where should the line be drawn?

Well, I think that there are many parameters involved. No Startup fits all of these parameters, and some of the parameters are purposely vague, but in general, a company who complies to most of these parameters, is likely to be, to some degree, a Startup company. Almost sounds like a Fuzzy Logic rule...

So here is my (incomplete) list, in no particular order:

Size - a Startup company must be small. At least it should feel small to the employees as well as to the outside world. I've worked for a company with almost 150 employees who still felt small, but that's rather uncommon, and very difficult to achieve.
Age - the company and employees must feel young. It doesn't mean you can't have workers of 50+. If you even have one, young, important and charismatic individual - it could do the job. I've also seen companies where people of 30+ were called "the eldest", and still the company felt old. So don't go firing all your guru's with 20 years of experience now!
Speed - in a Startup company, things move fast. Decisions are taken on the spot. The word "process" is only used as a running instance of a computer program! The overall mood can switch from one extremity to the other in a few minutes. People walk fast in the corridors. There is action in the air!
Belonging - people feel they belong to something great. There is a purpose to your work. When someone asks you something, you help him because you want him to succeed, knowing it will help us. You'll delve into his problem until completion, even if you'll have to solve it yourself.
Believing - employees must know and believe in the company's goals and capability to reach these goals. They must also feel that they are making a difference - they must believe in their ability to impact the company's success.
Potential - it should be evident that, in all matters (company size, sales, products, ...) - there is much more to be done than that which has been done. There should be a feeling of huge potential. Also, employees should have the feeling that once part of this potential will be achieved, they will be significantly compensated financially (stock options, bonuses, etc.).

(Note: I feel I haven't finished this post - I'll complete this list later on...)

Monday, September 24, 2007

I'm not much smarter than a baboon - aouch!

Yesterday I went to Microsoft's software architects users group where Yaniv Hakim, CTO of eWave, discussed the architecture of their eGen application generator.

The lecture was very interesting and very well presented, but I would like to point to his very last slide.

The question is - do you believe everything you read on the Internet?

Thursday, September 20, 2007

Some things speak for themselves

Today I had to meet someone in the village where I live. While waiting for him, my eyes wandered over a public message board. One of the messages looked like that (translated from Hebrew):

"Pretty girl will cook and clean"

Hmm, is that the only thing she is proposing?

Tuesday, September 18, 2007

Impersonating data types

I've already talked about things I'm missing in C#/CLR (here and here). I'd like to add a new concept I would call "impersonation".

Let's start with a simple example:

Say I have a method that calculates the sum of some array of doubles (double[]), like so:

public static double Sum(double[] list)
{
    double sum = 0.0;
    for (int i = 0; i < list.Length; i++)
    {
        sum += list[i];
    }
    return sum;
}

Now I would like to use this method with an input that is not an array of doubles, but some other list of double values (say List<double>, a RowCollection, whatever).

The straight-forward solution is to change the Sum method to receive a collection instead of an array, or make it a template method. But that's when I can change it! What if the method belongs to some class that is not under my jurisdiction?

In this case, the only solution I can think of is to create a whole new array of doubles based on the collection you want to work with. But of course, that's not what I want! First, it costs time and memory. Second, if the Sum method would also change the actual values in the array - you're lost. In my case, it's the first restriction that bugs me.

If the input variable was something else than an array (some class), in most cases you would be able to solve it by inheriting from it and feeding the child class to the method. But then again it wouldn't work for all cases. First because the lack of multiple inheritance (sigh) could restrain you from such solution. Second, in the case the input variable is of some sealed type.

My solution (proposal) - impersonation.

A possible implementation would be an attribute on the class that indicates to the compiler that an object of this type could be used to substitute some other type. For example:

[Impersonate(typeof(double[]))]
public class MyCollection
{
    // Implement the parts of double[] you are going to need
}

Note that in theory, one could implement in MyCollection only those parts of Array that might indeed be called. Then the call to the method could be done as so:

MyCollection myCollection = new MyCollection();
double sum = Sum(myCollection);

And, of course, the compiler should be able to understand the attribute and not generate any errors. Unless, of course, the Sum method uses some features of Array that are not implemented in MyCollection .

Lastly, I would allow multiple impersonations on the same class.

Sunday, September 16, 2007

Asymmetric Accessor Accessibility in C#

Today I wanted to define one accessor with two different accessibility levels. That is, I wanted a property to have public 'get' access and private 'set' access. I remembered that in .NET 2.0 this became possible, but didn't remember exactly the syntax. So I tried first my intuition, which was:

public DataTable MyTable
{
    get { return myTable; }
}

private DataTable MyTable
{
    set { myTable = value; }
}

Unfortunately, I was wrong. A quick search revealed the secret syntax:

public DataTable MyTable
{
    get { return myTable; }
    private set { myTable = value; }
}

More details in the MSDN entry.

Wednesday, September 12, 2007

Why I urge you to NOT buy an LG laptop in Israel !!!

About a year ago I posted a very enthusiastic post about my laptop, concluding that "I'm pretty sure that next time I buy a laptop, LG will be on the top of my potential brands!".

Today, my friends, I admit I had no idea what I was talking about, because until that moment I didn't run into any problem with the machine. Now that I have, and had to get (no) help from their laptops lab I must say the exact contrary - DO NOT buy LG laptops in Israel, if you want your warranty to have any meaning!

Huh? What? What happened???

I'll try to make it short:

1. It's been some time now that I had 2 main issues with the laptop:

a. It would suddenly freeze, leaving me no alternative but to hard-reboot it.

b. The mouse buttons didn't work well.

2. At some point I couldn't work with it anymore, so I had to bring it to the lab. The machine is 2 years old, the warranty is for 3 years - should be a no-brainer.

3. Since the lab is at a remote location, I brought it to a computer shop/lab that works with LG and provide a service of sending/receiving stuff to/from the lab. So far so good.

4. After almost 3 weeks, the laptop was back.

a. They fixed the mouse-buttons.

b. The freezing was answered by the so-annoying "you must reinstall Windows".

c. When I got home I realized that the battery didn't work anymore!!! When I sent it to the lab, the battery was still able to provide me with ~2 hours of work. Now I got it back completely broken: 0 juice, 0 recharge!!!

5. I sent it back to the lab, got it back after again almost 3 weeks, with the even more frustrating answer that the warranty is not valid for batteries. I talked to the lab director, but bumped into a solid wall.

6. Last night I reinstalled Windows. Guess what - when I worked with it now it got frozen once again...

I rest my case...

Thursday, August 30, 2007

Running the same application as Windows Application and Console Application

Say you have a Windows Application (i.e. with GUI and all), and you want to add to it the option to be executed as a Console Application as well. Here are the two steps necessary:

1. Adding Console Application support

You must create your application as a Windows Application. Then open the Project Properties, and under the Application tab set the "Output type" to "Console Application". Once this is set, you must update your Main function to support dual application types. By default, when you create your application as a Windows application, your Main looks like this:

static void Main(string[] args)
{
    Application.EnableVisualStyles();
    Application.SetCompatibleTextRenderingDefault(false);
    Application.Run(new Form1());
}

To support both Console Application and Windows Application, you must change it. For example, you can decide that if it receives as sole argument the string "OpenForm" it will open as a Windows Application, otherwise as a simple Console Application (in which case you'll probably want to take care of the arguments). So you should change your Main as so:

static void Main(string[] args)
{
    if (args.Length == 1 && args[0] == "OpenForm")
    {
        Application.EnableVisualStyles();
        Application.SetCompatibleTextRenderingDefault(false);
        Application.Run(new Form1());
    }
    else
    {
        // TODO: Take care of arguments
        Console.WriteLine("This is a console application");
    }
}

2. Remove the annoying background console

The above code is nice, but has one annoying side-effect - when you open the application as a Windows Application, you constantly have a console open in the background (closing it will close your form). To work around this you must reopen the application (i.e. creating a new process) with the console hidden. This is done as so:

static void Main(string[] args)
{
    if (args.Length == 0)
    {                                
        Process current = Process.GetCurrentProcess();
        string fileName = current.MainModule.FileName;
        ProcessStartInfo si = new ProcessStartInfo(fileName, "OpenForm");

        si.CreateNoWindow = true;
        si.RedirectStandardError = true;
        si.RedirectStandardOutput = true;
        si.UseShellExecute = false;
        Process.Start(si);
    }
    else if (args.Length == 1 && args[0] == "OpenForm")
    {
        Application.EnableVisualStyles();
        Application.SetCompatibleTextRenderingDefault(false);
        Application.Run(new Form1());
    }
    else
    {
        // TODO: Take care of arguments
        Console.WriteLine("This is a console application");
    }
}

Explanation:

The assumption is that if you want the application to run as a Windows Application, it doesn't need any argument (though this could also be done easily if required). So if the application starts with no arguments, it will create a new process of itself (through the MainModule we extract the running process' file name), but this time with no console in the background (all the settings on the ProcessStartInfo object). This time, it is called with an argument that knows to load the form (the "OpenForm" argument).

The result is an application that can be run both as Windows Application and Console Application. When you run it as a Windows Application there is a console that opens and closes immediately in the background, but that's all.

Thanks to Ami Bar for helping me with the second step.

Maintainability

A short while ago Oren Eini stated that in his opinion, the only metric that counts is Maintainability. He even gave an excellent way to measure it.

;-)

Ever since I read these two posts, I've been trying to find ways to concretely prove him wrong. The farthest I got was with Performance, where you may have good reasons to improve performance at the expense of harming maintainability. But in good code, this is done only if the improved performance are part of the requirement. In this case, you've got to measure the code compared to some other alternative that still meets the requirement - leaving you once again to measure maintainability only.

My tiny addition to Oren's statement would be - as long as the requirements are fulfilled.

Sunday, August 26, 2007

The C# ?? operator

I needed the C# ?? operator today. This operator returns the left-hand operand if it's not null or the right-hand operand otherwise. I remembered it had the ? sign in it, but couldn't remember the exact syntax. Anyway - I'm keeping it here in my blog for the next time...

Thursday, August 23, 2007

A possible improvement to my Google Image Search API

I discovered today the following article, which mentions my API. An interesting approach they propose is, if I understand correctly, to use some common .NET class to load the HTML, and somehow extract the images from the HTML. When I wrote the API, some 2-3 years ago, I searched for such a thing but didn't find any - maybe I missed it?

If this works, it can remove completely the API's major liability, which is the dependence on the regular expressions. Right now, the API parses the HTML response returned by Google and when the format of this response is changed - the whole thing breaks. On average, since I initially published the API, the response format changed 1-2 times a year.

The downside, of course, is performance - loading the whole HTML will always be much more CPU and memory intensive than using a regex. Yet, for most applications I guess it's a price that can be paid.

Once I have a few spare hours I'll check it out. Or maybe next time Google change the response format and I need to dig into it again. We'll see.

Each line of code should do one thing

I was writing some code today that looks like this:

int counter = 0;

while (/* some condition */)
{
    // Do some formatting
    counter++;
    FormattingProcessStatus(counter);
}

Now I could have done it differently like so:

int counter = 0;

while (/* some condition */)
{
    // Do some formatting
    FormattingProcessStatus(++counter);
}

which would have "saved" me one line of code. I hate this - I always have. Ever since I started learning C/C++ (over a decade ago) and while trying to solve an exercise discovered a situation where two different compilers generated different results. But today, for the first time, I understood why I hate it so much - it's against a very basic rule that EACH LINE OF CODE SHOULD DO ONE THING.

This is a very simple rule I don't remember having read anywhere, but it's the basis for readable code. Writing code, and even more so - reading code, requires a lot of brain effort. You need to be able to see the whole architecture, and how that particular object and method fits in. Sometimes you need to keep a whole stack of variables (state) in mind to really understand what's going on, etc. The difference between having to read a line that does one single thing and reading a line that does more - is very big, and complicates the reading of the code exponentially.

So if you want your code to be readable - start by making sure each line does exactly one thing!

Fighting car accidents - my five cents

Recently a family was torn apart when a truck driver smashed into a car, killing the father an daughter and injuring the wife and son. The truck driver had a history of over 190 (!!!) traffic convictions !!!!!!!

Of course, this made a lot of noise, and many people keep asking how someone like that still drives, where have the judges left their sharp brains while judging his cases, etc.

The thing is that there is no concrete incentive to restrain these mad-men. A thing that could help would be to change the system completely - instead of having the insurance policy made on a per-car basis, make per-driver. That is, if I have a driving insurance, it would be valid no matter whose car I drive (much like already exists for mechanics). In addition, all traffic convictions should be made publicly available. The result would be that companies would avoid hiring people with many convictions - because their insurance policies are more expensive and they are dangerous.

The rules of the market will be such that dangerous drivers will have a really hard time to find jobs (especially when the job involves driving a company vehicle), and that, ladies and gentlemen, is one hell of an incentive!

If course, it's not without flaws, but I think it's worth being investigated further.

Marketing: How to give your clients something valuable without any costs

Yesterday I got a letter from Orange, saying something that reads more or less like this:

"Dear Ilan Assayag,

We are approaching the birthday of your client-ship. On August 21, you will be our client for X years. As such, we would like to give you a present you will appreciate. Therefore, during the whole day of August 21, you can talk to anyone on our network for FREE. That's right, on August 21 you won't pay for any conversation to Orange users!!!

bla, bla, bla..."

The thing is - I got this letter on August 22...

WITH keyword in SQL

A feature I didn't know in SQL 2005: WITH can be used to create ad-hoc table-like entities within a query (they call it CTE for Common Table Expression). Check it out here and here.

Wednesday, August 22, 2007

The most hilarious academic paper ever...

It's pretty old, but I discovered it only a few days ago. Some of Israel's brightest minds (such as Shimon Schocken - former dean of the the Efi Arazi School of Computer Science at the Interdisciplinary Center Hertzlia and Yossi Vardi - one of the most prominent hi-tech entrepreneurs and founder of tens of companies) joined forces to write a technical paper claiming that Snails Are Faster Than ADSL. The title is funny, the content is hilarious - take the time to read it and enjoy yourself!

Sunday, August 05, 2007

Getting the system uptime in Windows

Here's something I often need, especially when I need to find out when/why some server rebooted...

To get the system uptime, type this:

systeminfo | find "System Up Time:"

And in general, looking at the results of systeminfo is pretty interesting as well, showing stuff like product ID, uptime, type of processor(s), system directory, language and regional settings, physical memory and page file settings, installed hotfixes, basic network parameters.

Thursday, July 26, 2007

Replacing dates to sortable strings in SQL

Try this:

select replace(replace(convert(nvarchar(19),@Date,120), ':',''), ' ', '_')

Monday, July 23, 2007

Yet another Windows WTF

I am now working on a brand-new machine, running Windows XP 64 bit, with 4GB of physical memory. Being a big fan of hibernation, I wanted to set my system to support it. Guess what - it's not supported!

A quick check on Google showed me this:

This issue occurs because hibernation is disabled on computers that have more than 4 GB of RAM.
Hibernation requires sufficient disk space to contain the contents of the computer's memory. Performance is poor on a computer that has more than 4 GB of memory and that has support for hibernation. Therefore, Microsoft has disabled support for hibernation on such computers.

The source, BTW, is from Microsoft's knowledge base.

Now you tell me - why do they decide for me what performance is unacceptably poor and what is not? If I have 10GB of ram, I know that hibernation will be slow, and if I choose to use it anyway - it's my decision to take, not MS's!

I haven't tried the workaround proposed yet, we'll see if it helps...

Sunday, July 22, 2007

Enabling xp_cmdshell in SQL Server 2005

It is already known to every SQL newbie that the system xp_cmdshell stored procedure is a huge hole in SQL security. Basically, it allows anyone with permissions to run it to be able to execute shell commands on the SQL machine. To provide a more secure system, in SQL 2005, this stored procedure is not available by default (unlike SQL 2000).

To enable this stored procedure, you should run the following script (for more details about the permissions required see here and here):

exec sp_configure 'show advanced options', 1
go
reconfigure
go
exec sp_configure 'xp_cmdshell', 1
go
reconfigure
go

REMEMBER - this is extremely dangerous and exposes your server to a wide variety of attacks, so be careful!

Thursday, July 19, 2007

New Israeli Blogger - Ami Bar

I strongly recommend you to check out my friend and colleague, Ami Bar's, new blog. He's just started, but now that we are once again working together I intend to push him into blogging as much as possible. Believe me - this guy knows stuff you'll want to know!

Just to show you why, check out his excellent SmartThreadPool on Codeproject.

Good luck, Ami, and may the Schwartz be with you!

Wednesday, July 18, 2007

Changing the number of maximum Internet connections

I know it's an old story, but every time I need it I have to go and search for this information, so I'm posting it here for later reference.

For a reason I don't know (and don't really care), Windows limits the number of concurrent Internet connection available (with XP I think you get 2 concurrent connections in HTTP 1.0 and 4 in HTTP 1.1).

With contemporary computers and the bandwidths of these days, this limitation is archaic to say the least.

So, if you need to be able to handle more simultaneous connections, you just need to add two entries to the registry. To simplify it, just copy the following lines to a file with a .reg extension and then double-click it. It will change the configuration to allow 50 simultaneous connections ;-)

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings]
"MaxConnectionsPerServer"=dword:00000032
"MaxConnectionsPer1_0Server"=dword:00000032

Tuesday, June 26, 2007

Where does CHKDSK hide its log?

I haven't been blogging for over two months - it's been busy. More on this, perhaps, in another post.

Today I'd like to complaint about something that's bothered me for years - where does CHKDSK hide its log file??? I purposely added a link to the utility, because even on the formal help page there is no information about that. The thing is that more often than not, you will run chkdsk at a time where you don't need the computer, because it could take a while (one hour in my case). So even if you remain next to the computer - you might miss the few seconds it gives you to view the results, and then you're on your own. Go figure where they're hiding this damn log file! Do I have bad sectors on my hard drive? Yes! No! Perhaps...

Every time I run chkdsk I end up finding the actual log file, but it always takes me a while to remember where to find it.

Anyway, the simplest way to see the results of chkdsk is to check it out in the Event Viewer (Control Panel ==> Administrative Tools ==> Event Viewer). It's very "smartly" located under the "Application" group (putting it under "System" would make it too easy to locate I guess). There you have to search for an "Information" record with source "Winlogon".

Who said Windows XP isn't user-friendly?

Friday, April 27, 2007

Thoughts about evolutionary algorithms

WARNING: This post is likely to be rather long, full of musings with very little concrete data to back them up. I hope it will interest you, but I don't promise anything...

Today I was playing with my little girl, Tamar (6 months), when some questions started to form in my head. When you think of the ultimate goal of each living species, it's rather simple: ensure the existence of the species for as long as possible. This is obtained by "mechanisms" such as reproduction, survival of the fittest, randomization mechanisms (e.g. mutations, weather changes, etc.) and numerous other techniques Mother Nature has to offer.

Scientists have been trying to simulate some of these mechanisms for various purposes for half a century, with variant degrees of success. Genetic Algorithms are the most widely known of them, and have been extensively researched in the past 20 years or so. The basic idea is that you have some objective function you want to optimize, and a large set of parameters to tune. The assumption is that the time it takes to compute every possible set of parameters is computationally prohibitive (unless you're willing to wait 10,000 years until you get the answer). The tuning of the parameters is done by applying operations known in genetics, such as reproduction or crossover (taking some parameters from one solution and some from another - creating a new solution in its own), mutation (changing some parameter(s) through some randomized mechanism in order to create a new solution), selection (each iteration keeps the best solution sets from previous iterations), and variants thereof. The main issue I would like focus on, is that there is always one objective function, and every solution (i.e. set of parameters) is judged by the score it acquires at that objective function.

Now let's get back to my little girl. As I said, as far as Nature is concerned, my daughter's ultimate purpose, like every human being, is to ensure the human race will persist for eternity. However, even Mother Nature knew that this is too big a goal to be of any concrete value. Therefore, it has set an infinite amount of sub-goals in our life, each of which bringing us closer to the ultimate global goal of Human Kind (be it in a minuscule amount). These sub-goals are, for example, to reproduce, which requires (as sub-sub-goal) to meet and get acquainted with males, which requires among other thing, to know how to walk (that's already a sub-sub-sub-goal), which has a precondition of being able to crawl (sub-sub-sub-sub-goal I think).

So Tamar's objective function, for our discussion, is to learn how to crawl. Not thinking about any of these things, I knew she needs to learn how to crawl (and frankly I wanted her to get tired, but that's even more besides the point), so I set her on a matress, with a blinking toy in front of her to attract her attention.

(I told you it would be long... don't lose patience, we're getting there...)

If we get back to Genetic Algorithms, the objective function of our problem, is to learn how to crawl (say we can give a score for the crawling quality). However, to reach this goal, I created a new objective function, apparently completely unrelated to the real goal - reaching this annoying blinking toy. She could end up reaching it in a million of ways, or never being able to reach it on her own (which is actually what happened, since she kept crawling backwards), but at the end of the day, her mind and body would have learned some new things that will, eventually, serve her to reach that real goal (crawling). In other words - I created a synthetic objective function, with only some vague relationship with the real objective function, with the express intent of improving the chances of, on the long run, optimizing the real objective function (crawling).

Now my question is - how could this approach be applied to Genetic Algorithms in general? How can we synthesize, throughout a GA's run, new objective functions, which will help the GA to reach the ultimate objective function?

Another rather different question that comes to mind - Tamar didn't reach the blinking toy, because she started crawling backward. I'm no specialist in these matter, but I allow myself to assume that at some level, she has learned something from this experience. Somewhere in her brain, something has marked that the movements she did were wrong (to reach her goal, i.e. get the toy), and that she should find another way. She will do the same movements once, twice, maybe thirty times - in the end she will grasp that the solution must be found elsewhere. As far as I know, Genetic Algorithms (and all their variants) are always concerned with "good" solutions (i.e. solutions that get good scores on the objective function). I don't think there is any algorithm that knows how to keep track of "bad" solutions, in order to avoid getting back to solutions close to that. A possible approach could be to run a "positive" GA, which strives to improve the objective function, but always keep track of the "bad" solutions. Whenever one the new generation's items is too similar to one of the bad solutions (or some representative of the bad guys), it is disposed of without even checking the objective function, and a new one is created instead. Of course, there is the risk of missing some good solutions that are accidentally similar to the bad ones, but in case that computing the objective function is an expensive task, it may be worth the tradeoff.

Thursday, April 26, 2007

Trying Photobucket

I'm sick of not being able to upload pictures to my blog. So I've decided to try out Photobucket. We'll see how it goes...

Tamar doing a face Photo Sharing and Video Hosting at Photobucket

Tamar Photo Sharing and Video Hosting at Photobucket

Gal dancing at her 3rd birthday Photo Sharing and Video Hosting at Photobucket

Tuesday, April 17, 2007

Leasing

If you live in Israel, and especially if you're a high-tech professional, you are probably aware of the big noise around the expected change in taxes for leasing cars.

Globes just published a great read - an article that compares this issue to some other, more understandable yet completely virtual scenario. Even if you don't subscribe to the point of view that the taxes should indeed be increased, it's still fun to read.

Another interesting read on this subject is a site by the same author (and others), which explains why the taxes should indeed be increased.

If, like many others, you're trying to figure out the actual monetary value of using a leased car, check this calculator out.

Lastly, although I understand all the reasons why the taxes should indeed be increased, I'm still a big fan of leasing cars. I'm a complete car-dummy, and the feeling that whatever the problem, there is someone who will come within a couple of hours to fix it, is worth a lot more than money. So IMHO, if indeed there is a glitsh in the current tax calculation, then justice must be done and the taxes should be updated. I still, however, want to keep the option of using a leased car (as an employee), without being pointed at as if I were a danger to society.

Monday, April 16, 2007

SQL Server Execution Plan - Rantings...

Let me say it outloud: SQL Server Execution Plan SUCKS!

In essence, SSEP (I'm too lazy to keep writing the whole name) is supposed to help you analyze how the SQL Engine performed a query (more specifically what optimization decisions were taken) in order to either alter the query and/or the tables to improve performance. As such, SSEP provides you with a nice graphical interface, that shows every element of the execution along with it's relative cost. You can then see more properties for each element, which supposedly will help you better understand it.

SSEP in SQL2005 has gone through very few changes since SQL2000, and it's a pitty. With a small query, SSEP can help you out - you can easily find the most costly element, analyze it and kill the culprit. With long and complicated queries, however, it's a nightmare. The most prominent issue here is USER EXPERIENCE!!! Let me explain:

In most cases, the first thing you do when you run SSEP is to search for the most costly sub-query and in that you look for the most costly query element. When you have a long and complicated query (which sometimes is inevitable), you can find yourself "surfing" the execution plan for many long minutes, just searching for the elements of interests, never being actually certain you got them all. You just keep scrolling up/down left/right and back again, trying to get a grasp of what's going on. There is a zooming feature, but it just isn't enough! There is no easy way to navigate through the data and search for specific elements. I mean - come on, would it have been so difficult to add, for example, a list of elements with the main properties (say object name, cost and physical and logical operation), allowing you to sort it and jump from that list to the relevant element in the execution plan?

Another USER EXPERIENCE issue is that of the data that comes along with each element. Granted, SSEP is aimed at advanced users (usually DBA level), but would it hurt someone to provide some more explanative descriptions?

Lastly, if you really want to improve the USER EXPERIENCE, you should guide the user (without forcing her hand) to find the real problem as fast as possible. For instance, you could use colors for the elements, such that costly operations would stand out (color table scan in red, index seek in green). Note that these colors should also stand out in the list mentioned above. You could even propose some solutions ad-hoc. For instance, if you see a very costly Bookmark (used to link between indexes and actual data in the tables), you can propose to add an index that will include all the fields required, effectively removing the need for a bookmark (because the engine will read the data directly from the index). There are tons of things you could do to make the user's life easier, and help her finish her job faster, so why not?!?

Tuesday, April 10, 2007

Spam the spammer!!!

I just got a spam mail that made me REALLY angry.

It's from a company that calls itself Mailmedia. The message itself, however, was sent by a user at the lombardisoftware.com domain (I'm pretty sure that the genuine Lombardi company has nothing to do with this...).

What upset me the most with this mail, was the nerve - it was no less than a promotion for businesses to use their services to send spam mails. Yes, that's right - I got a spam mail that is a promotion for a spam company!

But that means that there should be some way to communicate with them, right? Unfortunately, there is no website, or email address - I would have been more than happy to spend a few hours setting up my own spamming machine to kill their website or their email address. I would, really! But they weren't that stupid - they only provided an Israeli mobile phone number:

050-5281978

(Note, I've searched this phone number up in Google and found only one entry in a Gays forum. This guy seems to be searching for a fatherly type... I have no idea whether this is genuine, or just another frustrated spamee who wants payback.)

Now, my request to every Israeli reading this post - please please please - spam this phone number with as many calls as you can. Let them FEEL how annoying it is to be spammed all day long. Let them LOOSE potential clients because the line will always be busy. Make them SUFFER as much as we do from their never ending spam mails.

Be smart: don't forget to HIDE the calling number when you call them (for Orange you must preceed the number with #31#, for HOT you must preceed with *43).

DISCLAIMER: Although I really want as many people to join the spammers spamming effort as possible, I take no responsibility whatsoever to any miscomfort, legal or other issues you may encounter as a result of doing as I just asked.

Sunday, April 08, 2007

On the superfluousness of the Singleton pattern in .NET

We all know the Singleton pattern. It's intent is to "Ensure a class only has one instance, and provide a global point of access to it." (from GOF). There are many possible implementations of it, some are good, others are to be ashamed of. You can find a good review by Jon Skeet, or on Patterns and Practices where you can also find a successful implementation using double-locking mechanism. By the way, the implementation in "Design Patterns in C#" by Steven John Metsker was utterly dissapointing, to say the least.

But this is all assuming you indeed need a Singleton pattern. If you read the Intent, as defined by the GOF carefully, there is nothing in simply using a static class that does not meet this intent. At the time of writing the Design Patterns book, the GOF didn't have C# or .NET. Back then, they were mostly relying on C++ and Smalltalk. I'm not proficient in Smalltalk, but I know that with C++ there were problems with static and global variables. For instance, it wasn't clear what their order of instantiation would be, nor when the destructors would be called. So when you're working with good old C++, you just have no choice but to use the Singleton pattern. With .NET, however, things are different. Instantiation of static classes have a clear and deterministic behavior. With destruction, of course, it's different, but most cases where you need a Singleton you'll keep it alive as long as the program is running anyway.

In the Design Patterns book, there are some other "consequences" which could end up as reasons for using the Singleton pattern instead of simply a static class:

Permits refinement of operations and representation - or in other words - a Singleton class can be extended by means of inheritance. Most cases I have encountered, there is no inheritance going on with the Singleton, and if there is, then the actual Singleton is the last in the hierarchy. Even more so - many implementations use sealed classes either for performance optimizations or simply because the pattern wouldn't work with inherited classes.
Permits a variable number of instances - or in other words - one day, some 5 years ago, some freaky newby will invent a reason to use multiple instances of the Singleton and you should make it easier for the idiot to do the change. One word: YAGNI !
More flexible than class operations - same as the 2 previous items

So, why would you use the Singleton pattern in .NET instead of just a static class?

1. Inheritance - I said in most cases there is no need for it. There is always the exception... You can see one on Jon Skeet's blog.

2. Destruction - when you want to keep a reference counter of the number of consumers, and somehow destroy it when it's no more in use. Complicated, especially given the non-deterministic destruction of objects in .NET.

3. Other static functionality - if you need some static functionality that should not cause the actualy 'Singleton' to be created (any call to a static method will cause the ctor to be called). Note that this is also a problem with most of the pattern's implementations, because they rely on a static field/ctor.

4. Parameterized contruction - this seems to me like the most likely reason to use the Singleton pattern instead of a static class, but here again there is a problem because most of the cleanest implementations rely on the static constructor. The idea here is that sometimes you want the Singleton to be initialized with some parameters, so you can't rely on the paramterless, static ctor.

Now that that's settled, and given the name of my blog - why not using the Singleton pattern?

1. As a general principle, I like using patterns when they are needed, no more. So the main reason why not, IMHO, is that in most cases it's simply superfluous - you end up spending design/coding/unit testing/qa/debugging/etc time on something you don't need. Smart.

2. Some of the implementations are really clean and neat (see, for example, the 4th in Jon Skeet's list, which is the one I used to use before I understood I don't need it).

3. Inertia - Most of us are still in the C++ state of mind that there is no other way to really "Ensure a class only has one instance, and provide a global point of access to it." .

Wednesday, March 28, 2007

I don't know what this guy is on - but I want some...

This is really hallucinating. If this thing is for real, I want to know what this guy takes - and want some for myself...

Oren Zarif is one of these people who claim to have exceptional, supernatural and X-Files-like powers. Apparently, he likes to get it really hard. According to YNET, he has contacted Mike Tyson, proposing a fight. Zarif , weighting 68 kgs, as far from being a boxer talent as one can be, claims to be able to box with his eyes closed. He proposes a one-on-one, with his eyes completely covered. If Zarif loses, he would pay Tyson 5M$. If he wins, Tyson will have to donate the same amount to the US government for the victims of global terror.

For my English readers, the letter sent to Tyson, in passable English, can be found here. Source - YNET.

WTF ?!?!?

A Million Dollar Contest - For Super-Geeks

Netflix is a very large (largest?) DVD rental company. One of their most important assets is Cinematch, an in-house developed movie recommendation system. Cinematch's purpose is to predict, based on a user's previous rating of movies, the rating a user would give to other movies. The result, of course, is a recommendation of movies that are most likely to suite the user's taste.

Apparently, Cinematch works pretty well, but the guys at Netflix would like it to work even better. So they came up with the Netflix Prize: encourage developers and researchers to come up with an algorithm that improves the quality of rating prediction of Cinematch significantly enough (at least by 10%), by promising a prize of 1,000,000$. To make it even more interesting, and since the whole contest spans over at least 5 years, there is a yearly prize of 50,000$ which will be given to the best solution each year.

Being a Machine Learning freak, I find this contest GREAT! I also like the rules of the contest a lot. Basically, the winning algorithm will have to be made publicly available at the end of the contest.

This kind of initiatives are so great because they encourage the development of new concepts and algorithms, which even if they won't win the first prize, might very well be helpful for various other purposes. I think the best comparison is Fermat's Last Theorem. In his will, Paul Wolfskehl initiated a prize of then 100,000 marks to whomever would be able to prove or disprove Fermat's Last Theorem. This generated a huge interest in the subject, which resulted in an incredibly rich amount of new ideas and whole new areas of mathematics being discovered and researched to this day. I doubt that the Netflix Prize will have the same effect, but I do believe it will give Machine Learning a well-deserved boost.

As far as Netflix is concerned - they can only win from it. The news about this contest should inevitably increase their exposure. If the contest succeeds, and someone manages to provide significantly better results - it will be worth much more than 1M$ for them. If nobody manages to win the contest, then they can heartily claim to be using the best movies matching algorithm human brain could come up with do date. Either way, it's a win-win situation for them.

It's a very difficult task, but I think I'm going to give it a try, as far as time permits...

If you're interested, following are the Terms and Conditions in a Nutshell:

Contest begins October 2, 2006 and continues through at least October 2, 2011.

Contest is open to anyone, anywhere (except certain countries listed below).

You have to register to enter.

Once you register and agree to these Rules, you’ll have access to the Contest training data and qualifying test sets.

To qualify for the $1,000,000 Grand Prize, the accuracy of your submitted predictions on the qualifying set must be at least 10% better than the accuracy Cinematch can achieve on the same training data set at the start of the Contest.

To qualify for a year’s $50,000 Progress Prize the accuracy of any of your submitted predictions that year must be less than or equal to the accuracy value established by the judges the preceding year.

To win and take home either prize, your qualifying submissions must have the largest accuracy improvement verified by the Contest judges, you must share your method with (and non-exclusively license it to) Netflix, and you must describe to the world how you did it and why it works.

For more elaborated information, check out the Netflix Prize page.

Yahoo! Mail to become unlimited in size?

According to YNET, Yahoo! announced that as of May 2007 they will gradually update all Yahoo! Mail accounts to unlimited mailbox sizes. I couldn't find any additional source mentioning this, but I trust YNET didn't make this up.

This isn't surprising, of course. Ever since Google introduced GMail it was obvious that at some point they would announce that the mailbox size has become unlimited. Maybe the surprising thing is that the first company to come with such an announcement is Yahoo!, who's been lagging behind Google with most of the previous advances.

This is, of course, a very welcome announcement, hopefullly to be followed by all the other main webmail suppliers.

Cool!

P.S: I think that with this new development, it's time that the ISP's decide whether they want to continue supplying email services to their customers. If they do - then they should dramatically improve the level of their service (speed and mailbox size). Otherwise, they're really making fools of themselves.

Sunday, March 25, 2007

Joke By Code

I haven't blogged much lately, I know, sorry. Between work, kids, buying a house (!!!!!) and a million other things, I just couldn't find a minute.

Anyway, today I was searching for some good books about SQL Server 2005 programming. I have a lot of experience with SQL 2000, but it's about time I get to know the little baby (who's way past the 'baby' stage and would better be called a teenager) a little more.

So I came accross Murach's SQL Server 2005 for Developers and like everyone, I started skimming through the 6 customer reviews. All reviews were really good, except the last one, which seems to have been written by particularly dissapointed customer. So I decided to have a look at the 3 comments to his review. Here is what I found (quote):

SELECT OneHalfBrain
FROM Name
WHERE LName = `Husain'
results: NULL

It made me laugh (the reviewer is called Munawer Husain, so I assume it was meant personally against him, and not in some stupid racist direction).

Tuesday, March 13, 2007

Multithreaded Computation Support in Matlab - FINALLY!!!

A couple of weeks ago, Mathworks released a new version of Matlab (R2007a). They have, at last, added support for multithreaded computations.
A LOT of entries to my blog come from searches like "matlab dual core", which surprisingly puts a post of mine very high in the results list. The thing is that especially now with the increased usage of dual and quad-core machines, running heavy Matlab computations with a single thread is just a waste of resources. Apparently, the guys at Mathworks were listening to the users, and added support for multi-threaded calculations. It seems to be managed under the hood somehow, and requires changes to the preferences. From the release notes:

"If you run MATLAB on a multiple-CPU system (multiprocessor or multicore), use a new preference to enable multithreaded computation. This can increase MATLAB performance for element-wise and BLAS library computations.
By default the preference is not set, so you must set it to enable multithreaded computation. With the preference enabled, MATLAB automatically specifies the recommended number of computational threads, although you can change that value. On AMD-based Linux platforms, MATLAB supports multithreaded computation, but requires an extra step to change the default BLAS."

Doesn't sound a very nice way to do it, and certainly lacks user control, but it's a start. I didn't upgrade to the new version yet, so I can't talk from experience. Also, the new release seems to include improvements to the Distributed Computing Toolbox, which sound also very interesting.

Sunday, March 11, 2007

Google Image Search API updated (again...)

What can I say - the html format was changed again.
You can download the complete code from my article on CodeProject.
If you just want the update - I just had to change the Regex file, so you can simply replace its content with the following:


imagesRegex: (dyn\x2EImg\x28\x22[^\x22]*\x22,\x22[^\x22]*\x22,\x22(?<code>[^\x22]*)\x22,\x22(?<imgurl>[^\x22]*)\x22,\x22(?<width>[^\x22]*)\x22,\x22(?<height>[^\x22]*)\x22,[^\x29]*)
dataRegex: (?<width>[0-9,]*)\s+x\s+(?<height>[0-9,]*)\s+(pixels\s+){0,1}-\s+(?<size>[0-9,]*)(k)
totalResultsRegex: (?<upperLimit>upperLimit>(\s)*)(?<lastResult>[0-9,]*)([^=])*=(?<maxLimit>maxLimit>(\s)*)(?<totalResultsAvailable>[0-9,]*)

Thursday, March 01, 2007

Is it the end of Internet Freedom? A possible workaround...

Politicians are trying to add more and more limits to the freedom and anonymity we enjoy over the Internet. A few months ago, they wanted to force talkbackers to identify themselves - death sentence to talkbacks. Now, they are seriously talking about forcing users to identify themselves when using adults websites.
It's a pitty really. Instead of searching for ways to improve the quality of the service they provide us, they try to castrate the Internet into something more manageable (for them). Also, the philosophy behind this law they are trying to pass, is so anti-democratic and anti-privacy that it makes me want to cry.

Anyway, there might be a way around this. Check out Tor . It's a technology that basically routes web requests (or any other TCP-based communication) through various nodes in an encrypted form, making it almost impossible to track down the original user.

Wednesday, February 28, 2007

A Personal Letter to Everyone I Know, Knew, Will Ever Know, Think I Know, Think S/He Knows Me - To EVERYONE!!!

Dear family, friends, ex-friends, future-friends, acquaintaces, whatever,

That's it, I've had it!

STOP SENDING ME CHAIN LETTERS !!!

When you think AOL/Microsoft/IBM/whatever is monitoring emails, and you forward them in hope for earning a few bucks - I understand you have as much common sense as my 3-year old (or less).

When you think you can save the life of some poor person with cancer who needs AB blood - it's obvious you haven't been around the Internet for the past 10 years.

When you think luck will come to you by spamming others' mailbox - I truely hope you are never going to meet any Hari-Krishna, because you're soo easy to fool.

When you think you make me feel loved and happy by forwarding some 10-pages long 15-year old load of BS - you're oh! so wrong!

So please please please, I don't want to offend you. If you have something to write to me - be my guest, I'd be glad to read it. If it's something you got from someone who got it from someone who... don't think - leave me out of it! I'd better loose one good joke than have yet another 5 spam (sorry - chain) mails in my inbox!!!

Yours sincerely,

Ilan

P.S: If you really can't help it, and still want to forward this junk (not to me!), I suggest you check out the following first:

http://breakthechain.org/
http://www.snopes.com/
http://info.org.il/irrelevant/ (Israeli)

Wrapper for svmpredict.exe

% predictions = libsvmpredict(test_set, model, varargin)
% options:
% -b probability_estimates: whether to predict probability estimates, 0 or
% 1 (default 0); for one-class SVM only 0 is supported
%
% NOTE1: This function actually executes LibSVM's svmpredict tool. For more
% info check out: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
% NOTE2: Although LibSVM supports more types of labels, this wrapper is
% limited to integer labels only.
% NOTE3: This function assumes that svmpredict.exe is located in a directory
% that lies in your system's PATH.
function predictions = libsvmpredict(test_set, model, varargin)
if isempty(varargin)
options = struct([]);
else
options = varargin{1};
end;

% Dump the test set to a temporary file
testfile = tempname;
fid = fopen(testfile, 'wt');
dumpData = zeros(size(test_set,1), 2*size(test_set,2) + 1);
format = '%d %d:%f';
for i=1:size(test_set, 2)
dumpData(:, 2*i) = repmat(i, size(test_set,1), 1);
dumpData(:, 2*i+1) = test_set(:, i);
if i < size(test_set,2)
format = sprintf('%s %%d:%%f', format);
end;
end;
format = sprintf('%s\n', format);
fprintf(fid, format, dumpData');
fclose(fid);

% Dump the model to a temporary file
modelfile = tempname;
fid = fopen(modelfile, 'wt');
fprintf(fid, model');
fclose(fid);

% Run svmpredict with the given model and write the predictions to a temporary
% file
predictionsfile = tempname;
command = 'svmpredict';

if isfield(options, 'b')
command = sprintf('%s -b %d', options.b);
end;

dos(sprintf('%s %s %s %s', command, testfile, modelfile, predictionsfile), '-echo');

% Load the predictions temporary file and return its content
predictions = dlmread(predictionsfile);

return;

Wrapper for svmtrain.exe

% model = libsvmtrain(training_set, options)
%
% options:
% -s svm_type : set type of SVM (default 0)
% 0 -- C-SVC
% 1 -- nu-SVC
% 2 -- one-class SVM
% 3 -- epsilon-SVR
% 4 -- nu-SVR
% -t kernel_type : set type of kernel function (default 2)
% 0 -- linear: u'*v
% 1 -- polynomial: (gamma*u'*v + coef0)^degree
% 2 -- radial basis function: exp(-gamma*|u-v|^2)
% 3 -- sigmoid: tanh(gamma*u'*v + coef0)
% 4 -- precomputed kernel (kernel values in training_set_file)
% -d degree : set degree in kernel function (default 3)
% -g gamma : set gamma in kernel function (default 1/k)
% -r coef0 : set coef0 in kernel function (default 0)
% -c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
% -n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
% -p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
% -m cachesize : set cache memory size in MB (default 100)
% -e epsilon : set tolerance of termination criterion (default 0.001)
% -h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)
% -b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
% -wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)
% -v n: n-fold cross validation mode
%
% NOTE1: This function actually executes LibSVM's svmtrain tool. For more
% info check out: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
% NOTE2: Although LibSVM supports more types of labels, this wrapper is
% limited to integer labels only.
% NOTE3: This function assumes that svmtrain.exe is located in a directory
% that lies in your system's PATH.
function model = libsvmtrain(training_set, labels, varargin)
if size(training_set,1) ~= size(labels,1)
error('The training_set and labels should have the same number of rows.');
end;
if isempty(varargin)
options = struct([]);
else
options = varargin{1};
end;

% Dump the training set to a temporary file
datafile = tempname;
fid = fopen(datafile, 'wt');
dumpData = zeros(size(training_set,1), 2*size(training_set,2) + 1);
dumpData(:, 1) = labels(:, 1);
format = '%d';
for i=1:size(training_set, 2)
dumpData(:, 2*i) = repmat(i, size(training_set,1), 1);
dumpData(:, 2*i + 1) = training_set(:, i);
format = sprintf('%s %%d:%%f', format);
end;
format = sprintf('%s\n', format);
fprintf(fid, format, dumpData');
fclose(fid);

% Run svmtrain with all given options and write the model to a temporary
% file
modelfile = tempname;
command = 'svmtrain';

if isfield(options, 's')
command = sprintf('%s -s %d', options.s);
end;

if isfield(options, 't')
command = sprintf('%s -t %d', options.t);
end;

if isfield(options, 'd')
command = sprintf('%s -d %f', options.d);
end;

if isfield(options, 'g')
command = sprintf('%s -g %f', options.g);
end;

if isfield(options, 'r')
command = sprintf('%s -r %f', options.r);
end;

if isfield(options, 'c')
command = sprintf('%s -c %f', options.c);
end;

if isfield(options, 'n')
command = sprintf('%s -n %f', options.s);
end;

if isfield(options, 'p')
command = sprintf('%s -p %f', options.p);
end;

if isfield(options, 'm')
command = sprintf('%s -m %f', options.m);
end;

if isfield(options, 'e')
command = sprintf('%s -e %f', options.e);
end;

if isfield(options, 'h')
command = sprintf('%s -h %d', options.h);
end;

if isfield(options, 'b')
command = sprintf('%s -b %d', options.b);
end;

if isfield(options, 'wi')
command = sprintf('%s -wi %f', options.wi);
end;

if isfield(options, 'v')
command = sprintf('%s -v %d', options.v);
end;

dos(sprintf('%s %s %s', command, datafile, modelfile), '-echo');

% Load the model temporary file and return its content
fid = fopen(modelfile, 'r');
model = fread(fid, '*char')';
fclose(fid);

% For some reason, the data read has superfluous newlines (it seems to have
% doubled the newline characters or something like that).
model = regexprep(model, '\r', '');

return;

LibSVM Matlab Wrappers

As I mentioned in previous posts, I've been using SVMLight for quite some time now. It's been good while it lasted, but I'm now in the search for a stronger SVM library, especially one that handles multiple labels well. I've been introduced to LibSVM, which seems very promising. In theory, they have libraries for various systems and languages, including Matlab. However, I've had quite some trouble making it work with Matlab, so instead of breaking my head into solving the problem, I've decided to work around it in a quick-and-dirty manner (my favorite...).

What I did, is simply wrap the LibSVM's executables in a simple way, so it can be executed using Matlab functions (given the executables are located somewhere in your PATH).

The following two posts will include a wrapper for svmtrain.exe and svmpredict.exe

Please let me know if you encounter any problems using them.

Ah, of course, they are provided AS IS, no warranty or anything alike. I also don't take any credit for it, nor any responsibility as to how/who/when/where/etc. it can be used. For this - please check with the authors of LibSVM.

Monday, February 26, 2007

NTFS File encryption does not like virtual drives

I like the NTFS file encryption feature - you simply define a folder as being encrypted, and any susequent file you add to it is automatically encrypted. This means that, depending on permissions, other users may see that the file exists, but they won't be able to get the content. Even the system Administrator won't be able to read the file's content!
(There are various other goodies related to encrypted files, such that you can't copy them and such, but that's besides the point).

But I also like virtual drives. For example - my 3 PC's have a virtual drive called Z: that maps to a shared folder in one of the machines (called C:\Shared - to be precise).

So I had a couple of files in an encrypted folder (accidentally on the machine that hosts the folder mapped to by Z), and wanted to transfer them to another machine. I sent them to Z, and then I wanted to decypt them so I would be able to retrieve them from my second machine.
Just to make sure the scenario is clear: I'm connected with the correct user (the one who encrypted the files), and simply want to decrypt the files. The only thing is that I want to do it from a virtual drive (Z:).

BABOOM!!!

"An error occurred applying attributes to the file:
....
The system cannot find the path specified."

Of course, once I tried to decrypt from C:\Shared instead of Z:\ - everything went fine.

Not nice, really not nice.

Sunday, February 25, 2007

My daughter can do magic !!!

Purim is getting near, and this year, our eldest daughter Gal (almost 3) is going to dress up as a fairy. So yesterday we decided to try out the costume - pink dress, purple skirt on top of it, purple wings, silver crown and, inevitably - silver star-tipped wand.
Once she was completely dressed she started playing with her wand, in it went more or less like this:

Gal: Hocus, pocus, bili-bili-bocus: Grandma and Grandpa.
(nothing happens)
Gal: Where are Grandma and Grandpa ?
Ma&Pa: Huh?
Gal: Where are Grandma and Grandpa ???
(our little brains try to understand why she suddenly starts asking about her grandparents. after a minute or so, the smart of us gets the point)
Ma: Were you expecting Grandma and Grandpa ?
Gal: Yes - I said "Hocus, pocus, bili-bili-bocus" and Grandma and Grandpa didn't come!!!

(At this point I understand what's going on. So I step aside, secretly call Grandma and Grandpa, and ask them to come, and give us a call when they're outside so she can conjure them again, and this time succeed).

Ma: You know, you need to practice a lot for this kind of spells to succeed. Come on, let's practice...

(After half an hour, and a phone call from Grandma and Grandpa telling us they are outside)

Ma: Come on, let's try conjuring Grandma and Grandpa again
Gal: Hocus, pocus, bili-bili-bocus: Grandma and Grandpa.

The face she did when she saw them entering the house, was worth all the gold in the world...

Tuesday, February 20, 2007

Reflector 5.0 is out - awesome!!!

Usually, you don't feel much difference when Lutz Roeder releases a new version of his can't-live-without-it tool. You open it, it says that there is a new version, you click OK, and you don't feel any difference. You're using a new version of Reflector, but don't see where the big deal was.

Today, however, things went completely differently. I downloaded the new version (using the Automatic Update feature which works like a charm), and was already surprised that the downloaded files was so big (1 MB).
Then I opened it... This guy is a genious:

Integrated help from MSDN - when you look at a method's decompilation, you automatically see, at the bottom of the screen, the MSDN information about that method. Even cooler - the links inside that pseudo-MDSN pane work inside Reflector, and navigate to the correct place showing you the decompilation automatically!
Expand Methods - When you look at a class, there is a link at the bottom called "Expand Methods" which will expand all the methods in one screen, effectively providing you with a complete decompiled class.
Enhanced Analyzer feature - I LOVE the "Assigned By" feature, which shows you which methods change a field!
Integrated search in MSDN and Google (I wonder how long it will take for Microsoft and Yahoo! to go complaining about the lack of configurability of the search engine...)
Shell integration - run reflector.exe /register in a command prompt. Do it! NOW!!! You did it? Sure?!?! Well, assuming you're smart enough to have taken this advice - from now on, when you right-click on a managed dll, you'll have a "Browse with .NET Reflector" option, which will automagically open the dll with, huh, well Reflector of course. Actually, you don't even have to go through right-click - it's set as default when you double-click the dll...
More features here. I wouldn't be able to put it better - The Best Tool Ever Got Even Better!!!