Wednesday, October 12, 2005

Java performance urban legends

From Java theory and practice: Urban performance legends, revisited (I guess this small article got slashdotted a few weeks ago, so everyone probably has already seen it):

Allocation in modern JVMs is far faster than the best performing malloc implementations. The common code path for new Object() in HotSpot 1.4.2 and later is approximately 10 machine instructions, whereas the best performing malloc implementations in C require on average between 60 and 100 instructions per call.

Good to know about that since I always thought anything Java would be slower than C++. Makes me feel less guilty when writing Java.

All this also reminds me of this sig I saw once on usenet:

The march of progress:

C:

printf("%10.2f", x);

C++:

cout << style="font-weight: bold;">

Java:

java.text.NumberFormat formatter = java.text.NumberFormat.getNumberInstance();
formatter.setMinimumFractionDigits(2);
formatter.setMaximumFractionDigits(2);
String s = formatter.format(x);
for (int i = s.length(); i <>

Friday, October 07, 2005

Spatial hierarchical set clustering, anyone ?

Yes, that was the best way to describe the problem I have at hand while designing/implementing the new thumbnail browser widget for imgSeek2. I have a collection of images, each belonging to one or more sets, and I want to implement an algorithm to find the best spatial representation of this on a grid.

I have conventional data structures telling me which images belong to each set and which sets an image belongs to. In other words (I mean, pictures), I have this:



and want to draw this:


So, which approach would you recommend ? Keep in mind that drawing thumbnails using a grid like on the 2nd picture is just as hard as drawing them like Venn diagrams on the 1st picture.

It's really hard to google for such an algorithm so I'll have to come up with something on my own. My first thought is to iterate all sets and generate a tree (not a binary one since a set may have more than 2 subsets) as balanced as possible using a combination of the number of images on the set and number of images shared with other sets as a weight to each node.

I found an interesting usenet post and it seems that R-Tree is the name of the spatial data structure I'm looking for.

Any other ideas ?

Something I'd like to ask an economist


Just in case there is one right now reading this post or someone who also thought about this: There is an annual index compiled by The Economist which basically tells how much a Big Mac costs in dollars on each country. That is, you take the cost for it on the local currency and then convert that amount to USD.

My understanding is that the variation from one contry to the other is explained by how expensive it is to produce and serve a Big Mac on each country. That reflects how expensive is labour, ingredients, etc. OK. So I live in Brazil, and according to that index, the Big Mac on the USA is 28% more expensive probably due to what I mentioned, but how can we explain the absurd discrepancy between how much the engineering consultancy company I work for in Rio de Janeiro can charge for its services to local customers and one doing the exact same thing would charge if it was based in Seattle ? I'm sure the difference is well above 28%, and could be estimated even higher if taking into account my salary here and what I'd make working abroad.

One conclusion I could arrive is that the discrepancy gets bigger as we move up in the cost chain of services, but why ? Is my conclusion correct ?

I thought about it after reading some chapters from Freakonomics. I got a bit interested about this economic phenomenon and wondered if anyone out there know of any other good book discussing this topic in laymen language.

Sunday, October 02, 2005

Yet another multithreading pitfall: random numbers

This one cost me a few hours to detect. Always remeber to initialize the stdlib random number engine for each thread with something like
/* initialize random generator */
srand ( time(NULL) );
otherwise they will all generate the same sequence of "random" numbers.

I wish I could avoid multithreading on this project, but I can't since it's a QT GUI and responsiveness is crucial to the end user.

On a side note: does anyone out there know of a compilation of common multithreading pitfalls floating around the net ?

Saturday, October 01, 2005

The Million Dollar Homepage. One dollar per pixel.

There is no limit to human creativity. On The Million Dollar Homepage, you pay a dollar per pixel of advertisement, and guess what ? The owner is selling 1 million pixels. So far 240,100 pixels have been sold.

And to prove the first phrase on this post, someone is doing the same thing, but charging 20 cent per pixel, with the following slogan "Why pay five times more?". The owner actually bought some pixels on the million dollar page.

On web based project management software, collaboration team and workgroup

Are you also tired of reading about web based project management software ? Well, you are right about feeling this way, because there are over 220 products like this. And just in case you feel like evaluating them in order to start your own project my advice is: start with JotSpot. It goes well beyond the "online collaboration tools" mindset and actually delivers an application framework melted inside a wiki.