I check my mails and found commit from Tobi where he refactored our yesterday’s code. We are a bit confused and decide to wait for him in order to clarify his changes. Meanwhile I setup RealVNC on my Windows, Andrea and Christoph successfully log to it. Desktop sharing now works pretty good for both sides! Tobi comes and we understand that his changes were done due to separation of concerns in classes. Ok, we are not agreed, we try to refactor the code in the way we see it. Damn, it’s not that simple as it seemed to be for the first time! Each solution has it’s benefits and drawbacks. Ok, we leave the code as it is now and check it with JProfiler. Results are outstanding, with having cached classes map we double improve performance! Now we quickly do ant target which automatically generates persistent classes…done. It’s too late now, so we agree to continue on Monday.

We are trying to setup VNC server on my side. For some reason Andrea cannot connect to my linux vnc server… <gulp>

Ok, let’s continue working in “secretary-boss” style once again.. No it’s a bit quicker, we begin to better understand each other. Christoph joins us, now we are three in the developers chat now! Initial task is quickly done, but now we think if we can improve our implementation even more. I want to take active part in coding not just by typing ideas in skype, but also in IDE. That leads us to establishing alternative VNC server. Christoph uses Windows and RealVNC. I try to connect to his server – bingo! I can even type there. We exchange several good ideas about code refactoring and found new task to implement – automatic generation and validation of persistent classes with ant.

I’m very curious about our inter-office (Munich and St. Petersburg) pair programming. Will it work or no? How will it go? Where should we start with?

First idea is to establish chat and voice chat as well – skype seems to be the best candidate. Also we should somehow establish desktop sharing. Andrea tells me that she already set up some VNC server on her MAC. I never worked with VNC software before, so a quick search in google gives me several good candidates to start with (RealVNC for Windows and tightvnc for linux). Set up clients on both OS. Trying to use RealVNC – oops, it cannot connect to VNC server on MAC.  >> more…

After finishing the work on stopwords and stemming, the only thing left to do was the conversion from random input texts to the encoding Java uses. Strangely there seems to be no library that takes care of this – well, there is chardet but it didn’t suit our demands. Fortunately I faced the same problem some time ago and implemented a charset detector. It plainly counts the occurences of non-ASCII characters and makes an educated guess whether the input is UTF8, windows-1250, ISO-8859-15 or x-MacRoman. Hours were spent evaluating code tables and finding out what encoding each of the common special characters has in the different charsets (and how this whole encoding stuff works).
So the only thing that needed to be done was looking up the bytes used to represent certain cyrillic (windows-1252) letters in an hex-editor and adding them to the guessing algorithm. To spare you the trouble (if you ever need this too), here’s the code-snippet: >> more…

Ryan over at 37signals did a great post recently about why they included certain features in a sprint.

“The best part of building ‘as little as possible’ comes after launch. Every feature you skipped or held off on is free open space in the app for later development. Instead of a lot of baggage and maintenance, a bare-minimum release means new possibilities for feedback.”

>> more…

One of the ingredients for a good (and fast) search is the stopwords list. It contains all words that appear so often in a language that their relevance for search is almost zero. For example the word “and” – if a search would list all texts that contain “and” the list of results would be enormous! Of course this won’t harm the quality of a good search algorithm, but its performance and the size of the index. And that’s why wanted a stopwords list for the new russian search.
We invited Olha Biletska and Tetyana Dekola, two students I met at an alumni meeting recently, to help us with this task. After losing everything because of this great IDEA bug we finally came up with a list. It’s bigger than the german or english one because the russian language has six cases instead of four.
Download (UTF-8)

As we want to introduce a cool russian search in conjectPM, we tried out the russian snowball stemmer that comes with Lucene. It’s a rule based stemmer, that cuts and changes a number of characters from the input to produce some kind of word stem. This is used to prepare the indexed files as well as the queries so that it doesn’t matter for the search whether words are plural or singular (same for other kinds of flection).

After some time of wondering why the output was always the same as the input, we found out that our input was Unicode, but the stemmer expected ISO 8859-5 (cyrillic). But how to translate one encoding to the other?  >> more…

Today we had to learn the hard way, that things in your favorite IDE aren’t always what they seem to be. 

We created a new file in Intellij IDEA, typed in some cyrillic text which was correctly shown on the screen and relied on the editor to save the file the same way as we saw it. But unfortunately it only saved a series of question marks.

As we found out, the file encoding (shown at the lower left corner of the window) was set to ASCII. And so, eventually, all non-ASCII characters were replaced by question marks. What remains is deep frustration, 500 lines of question marks and the big question: Why the hell didn’t the editor show us, that he couldn’t interpret the typed-in characters?

coffeeI know, it looks a bit obsessive. But here is another great example of a not pointing friendly user interface. We found this on a coffee maker outside our meeting room in the most excellent and recomendable ”Schlosshotel Mondsee” in… yes, Mondsee, Austria. >> more…

The thing I used to hate most in books about programming are typos in the code. Now that I have read “Hardcore Java” I have a new most hated thing: “best practice” tips that are in fact “bad practice” (or worst?). And the book has plenty of errors in the code and in the text as well.

The author’s suggestion to speed up internationalization is to use a java file which contains all the properties again as String constants (so you have another place to forget, when you insert new properties). Those constants are filled at compile time – no chance to switch languages after the build.

He also suggests to make methods protected instead of private, because some day, maybe, someone might want to inherit from that method. >> more…