Tuesday, February 26, 2008

SQL Server 2005 rantings - User Defined Aggregate Functions are nice, but not there yet...

1. Why can't there be UDA's in T-SQL? Granted, it's easy to write it in CLR, but sometimes it would be simpler (and more appropriate) to write it in SQL. It also took me a while to figure out that indeed there is such limitation...

2. UDA's must be serializable. Why? I don't know yet (still need to figure that one out), although I have some ideas, but anyway it's besides the point - it's a must and I assume there are good reasons for that. The problem is that whenever you're doing something slightly more complicated than just an average or Product, you need to accumulate all the values until you get to Terminate() (e.g. a variation on STDEV). This means that this list you've just accumulated could grow significantly. Now to the pitfall - when you use user-defined serialization (which you would have to in this case), you must specific the maximum size that the UDA structure could grow to. This maximum size is limited to 8000 bytes (*sounds familiar...). So in my case, I'm using a UDA over double values, and thus I'm limited to aggregating a little below 1000 records. IMHO this reduces the practical usage of UDA's to about 50%...

3. I tried to write a UDA for decimal data. No matter what I did, it constantly produced a function defined to return decimal(18,0). In other words - no decimal numbers to the right of the dot. In the end I didn't have the time to find out the KB article talking about it, but I suppose there is - I pretty much tried everything. In my particular case using double values was an acceptable compromise - it won't always be that way...

Thursday, February 14, 2008

Learning Machine Learning - The WEKA Way

If you're interested in working with or learning about Machine Learning, you really MUST check out WEKA. When I first saw WEKA, a few years ago, it looked like a cute tool to start learning ML, with a very small set of implemented algorithms and only available for Java developers. Now, it has become a very rich research platform, in which one can easily test a very wide variety of ML algorithms with endless tuning parameters and analysis tools. You can read data directly from a database and you can now even run WEKA directly from within your .NET code (check also this) !!!!!

I'm a complete newbie with WEKA, but it seems that it's going to be a lot of fun and much faster working with it than anything I did before. I just hope it will hold up to the expectations that are building up in me now...

One more thing - notice that there is the "book version" and the "developer version". The former is the one on which their book is based on and is not expanded (only bug fixes). The latter is the version that is on constant development, has more features, and significantly more implemented algorithms.

Tuesday, February 12, 2008

WLW - Didn't they hear about 64-bit ???

When opening WLW it says that the Beta has expired and forwards me to download the new version. When I do that - I get a message that it is not supported for 64-bit windows (I'm using XP 64bit).

Hum, what?

1. 64bit is alive and kicking and getting more and more users. It's time that software companies (MS being one, IMHO) get used to provide support for 64bit platforms by default.

2. If the new version does not support my platform - why sending me to download it and waste my time and nerves?

Grrrrr...