Sunday, August 27, 2006

Getting more results from your search engine

Most search engines limit the total number of results they return per search string. With Google it's 1000. With Yahoo!, if it hasn't changed lately, it's 5000.
I've been asked a few days ago whether I know of any workaround that makes it possible to get more results per search string. My first answer was - "sorry, no can do". I did find a way to circumvent the limitation imposed by the Google API that limits the number of queries that can be executed per day (which is accidentally also limited to 1000). This workaround is a side-effect of my Google Image Search API. Yet, this does not provide a means to get more than 1000 results per search string.
After giving it some thought, I could figure out at least one way to increase the number of results per query. It's not a very accurate solution, but it's better than nothing. The idea is to use the various search engines that perform query refinement (some call it clustering of results). A good example is Ask Jeeves. When you perform a search on these engines, they also give you a list of suggestions to narrow or expand your search. That is, if you search for "apple", the narrowing suggestions are things like "Apple the fruit", "Facts about Apples", "Apple Tree", "Macintosh", etc.
When you work with this kind of engines (or with a "simple" search engine and one with narrowing capabilities together), you can start out by running the original search (apple) and retrieve all the results available for that. Then you can iteratively retrieve the results for all narrowing queries as well (up to 1000 for each), and keep drilling down as much as you like. Of course, there will most probably be a substabtial amount of duplicates in the results, which you will have to handle. Also, the more you drill down, the farther you'll get from the original query (i.e. query drift). Another problem is that of ranking - say your original query was "apple", how do you define the ranks between the results for "apple tree" and "Macintosh". So this still raises quite a few questions. Yet, in the end, you can end up with a much larger number of results that are to some extent related to the original query.

You may ask - why would someone need more than 1000 results per search string? Besides, the further you go down the ranking, the less their relevance to the original search string. In most cases - you're right. Yet, for some research purposes, not only would you need more than 1000 results - you might even prefer getting these than the "good" results returned in the first few pages.

Can anyone come up with some other (better?) idea to work around this limitation??? If you have an idea - please drop me a line!

(Note: I'm using the term "search string" to indicate a complete search, regardless of the number of results pages you get. The term "query" refers to what retrieves one single results page, since the query also includes the result index at which the results page should start. In other words, all "search results" for a single "search string" are achieved by sending multiple "queries" - if you have 100 results in each results page you need to execute approximately 10 "queries" to retrieve all the results Google provides for that "search string")

Monday, August 21, 2006

Setting default share permissions

If you use shared folders often, you probably know that Windows XP defaults share permissions to "Everyone" (with full access). If you don't know that - shame on you!
I don't know exactly where this default is stored, but if you want to change it, you can do with with the Tweak UI PowerToy.
In the tree on the left, select "Access Control". Then choose the "Default share permissions" in the combobox and click "Change".
Enjoy :-)

Working with Source Safe over the web

I am working (from Israel) for a company from the US (I haven't finished my thesis yet, but my grant is dry and I still need to feed my family...). Since I'm working on applications related to trading, there is a great emphasis on security so every file transfer is done over a VPN.
Lately I needed to work directly with their VSS database. I agree with most that Source Safe is really something that should be left in the past, but it will take some time before I can convince them to switch and in the mean time work must continue. Everybody who has tried to use VSS over an Internet connection knows it's just impossible to work this way. Add to that the cross-Atlantic delay and a VPN link and you get to wait 10-20 minutes only to open a tree in the viewer (that is, if you're lucky enough not to get link errors, which I'm currently investigating with my ISP). So I've been looking for applications that help accessing an existing VSS database over the Internet.
The major tools I found were as follows:

The prices vary around 150$-250$ per user, except for VssConnect which costs only 30$.
They all work with some kind of web server and have a way to encrypt the data, but in our case it's irrelevant since we are working over a VPN link anyway.
SAW and VSSRemoting have a special feature that supposedly improves file transfer significantly, by transfering only the parts of the files that are different.
We didn't want to waste too much time reviewing each, so we decided to start by checking out the one that looks the most mature (SAW).
In general, I am very happy using it. It works really fast, integrates with Visual Studio 2005 and does the work well. I did encounter a few issues, though:
1. Switching with the same solution between SAW and VSS didn't work well for me. I contacted DynamSoft's support and they tried to help me, but it didn't work. In my case I don't care much, because 99% of the time I'll be using SAW anyway. If you plan on switching between them often - I suggest you check it out thoroughly.
2. File comparison application dissapointing:
a. Compares lines and not words/characters (like VSS)
b. No option to ommit blanks, so even if the difference is just an irrelevant space, you will see the whole line marked as different (like VSS)
c. The GUI doesn't work properly - when there is only one difference, the arrows to jump to the difference are disabled when there is only one.
(NOTE: According to Support, this issue should be fixed soon - in the next release).
3. Sometimes, when I just do something on the solution with no need to interact with VSS, it starts performing all kinds of synchronization operations with the server. During that time I can't do anything with the solution. Since SAW works really fast with the server, it's not the end of the world, and usually this process is over in 10-20 seconds. It also doesn't happen a lot, but still - it's annoying and shouldn't happen at all.
(NOTE: According to Support this is initiated by the IDE and not by the SAW integration client directly)
4. When trying to perform something with too many files (~15+) I get socket errors. The link to the server isn't broken, but the specific operation is aborted. This is a very painful issue, since it requires a lot of manual workarounds. I'm still investigating this issue with both DynamSoft and my ISP - I have reasons to believe it may be caused by problems in my Internet connection. I would still have hoped SAW would be able to cope with minor connection problems, though...
Despite these, we are probably going to purchase it - especially if I manage to fix the socket errors problem by fixing the Internet connection.
As a last note, Support told me that in the next month they are about to release their next version (5). I asked whether, if we purchase SAW before the release, we would be entitled to an automatic upgrade when the next version will actually be released (like JetBrains did when we purchased Resharper 1.5, little before the release of 2.0). I was dissapointed by their answer: they haven't decided yet about the upgrade policy. Therefore, even if I manage to fix the socket errors problem, we will still have to wait for the release of the next version before purchasing (or for them to tell me we will get the next version for free anyway).

Friday, August 18, 2006

The War

On July 16th I had to flee my house – and came back after having wandered around the country with my family for 30 days. If you want to get a glimpse of what the war has done to my personal life, here you go.
I live in a small village in the Izrael valley. It’s in the northern part of Israel, but quite far from the border with Lebanon (about the same distance as Haifa). I have a magnificent little girl, a beautiful wife 7 months pregnant and a crazy dog. In the proximity of our village is an important air-force base. In normal days, the sound of the planes is not very pleasant, but you learn to live with it. We knew, before moving to that village, that if a war was to happen, our little heaven would be troubled, especially due to the proximity to the base, who is an obvious target. We never figured how much …
To keep things simple, I’ve decided to summarize the impact of this war on our day to day life as follows:

  • We don’t have any shelter in our house – so from the beginning we had to flee in order to be safe. So we slept 30 days at other people’s who have been kind enough to open their house for us (6 different places).

  • Rockets landed a couple of hundred meters from where I was, and more importantly from my girl’s kindergarten (in the town next-by).

  • The first time the rockets landed close by, all phone lines in the neighborhood of the kindergarten crashed. From the road we took to the kindergarten it took a while to understand that the smoke came from behind the kindergarten and not from there exactly…

  • At some point my wife decided to go back to work (we were staying at our in-laws at the time who unlike us have a shelter). The sirens caught her when she was in the parking lot, ready to go back home. It’s an open lot, with no cover at all. At first she simply dropped to the ground, in order to try and avoid the deadly bullets. When the rockets started to fall she tried to find something to she could shelter under. She found a place with a 3 millimeter roof where a few other people were taking shelter. The funny thing was that the roof was the last of their concern – there were many gas tanks pilled up right next to them … When the attack was over, she literally flew home. Two hours later there was another attack in the same area. 3 of the rockets landed right in the path my wife uses back home.

  • One specific attack was particularly scary – we were at my in-laws and when the siren started my mother in law was in the shower. She didn’t make it to the shelter on time. Suddenly the rockets started falling – REALLY close. My wife, who’s 7 months pregnant sat next to me, in the shelter. At that moment, while we felt the whole house tremble and her mother wasn’t answering our calls – I thought we were about to loose our baby.

  • Before that same attack, my 2-year old girl was looking at a DVD of the Teletubies. For those who don’t know what this is – they are little creatures who are happy with everything and laugh for anything. You could rip their heads off and they would still find a reason to laugh and be happy and nice. Anyway – from that moment on, my little girl doesn’t stop telling me that “the Teletubies scared her”. She has become moody, winy, and can’t stay more than a few minutes without seeing us both (don’t even think of leaving her with somebody else). She often wakes up screaming shortly after having fallen asleep, probably due to nightmares.

  • Until now, when I hear an ambulance my heart misses several beats – my first reaction is that it’s a siren again. Any strong noise (even a door being slammed) makes me fear a rocket has fallen. When I’m with others, we usually exchange looks and it’s clear to all that we all experience the same thing.

  • The thing is – we are among the lucky ones. None of our friends and relatives got killed or seriously injured. Just to show you how lucky we are indeed - a friend of us lives in another village next to us. Two of her nephews who live in the same village got hit by those horrible bullets being propelled by the rocket when it explodes. They each got 2 bullets in the arm. In the past two weeks they endured together more than 10 surgeries and they are not over yet. Imagine if the bullets had hit the abdominal region or the head…

Thursday, August 03, 2006

SQL Server 2005 - Frustrated by a good feature

I guess that most people who have been using SQL 2005 for some time already know about this. I, for my part, have worked a lot with SQL 2000 in the past, but never had the chance to really work with SQL 2005 until recently. Now I needed to connect to a server on a remote machine and had this frustrating experience...

1. Scenario: trying to connect to a SQL 2005 server from a remote machine fails. It gives the following message: "Sqlcmd: Error: Microsoft SQL Native Client: An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections."
2. After getting this message once or twice you take the time to actually *read* it, and start checking the settings for the SQL Server.
3. Like anyone with experience with SQL 2000 would do, I opened the SQL Server Management Studio. There I looked at the properties for the server instance and saw that the configuration seems fine and the server is configured to accept remote connections.
4. Back to square one. From this point I started looking for the culprit...
a. Maybe it's the FireWall? I configured the FW to trust my local network, but maybe it fucked up somehow? Disabling the FW quickly showed me that's not the problem.
b. Maybe there is some problem with the SQL version and I should upgrade? Seemed unlikely, yet I checked it out. Turned out to be irrelevant since I was already using the latest version (SP1).
c. Maybe I did something wrong with the connection string I used? Tried all possible variations - nothing worked...
5. In the back of my mind I started to rethink about this error message I got. It says that by default, SQL 2005 is configured not to allow remote connections. I don't remember having changed that - so how come it's configured to accept remote connections? Could there possibly be some other configuration parameter that has some impact on remote connections?
6. I went over all the configurable parameters for the server instance and for the database (from SQL Server Management Studio) - nada
7. Well then, I'm on the verge of throwing my computer out of the window. I'll give Google a last try. Then I found this: http://support.microsoft.com/?kbid=914277&SD=tech
8. It turns out that there is a much more elaborated way of configuring remote connections in SQL 2005. This is done through the "SQL Server 2005 Surface Area Configuration". It is, of course, a good thing they have added this wealth of configuration options - but why couldn't it be accessible from the Management Studio? Couldn't they add an "Advanced" button on the remote server connections options that opens this Surface Area Configuration??? And if they didn't want to put too much things in the Management Studio - why did they give an option to configure remote connections there, when it can't work on its own anyway???

If you ever run into something similar - please remember this, it will save you some valuable time ...