Sunday, December 10, 2006

Network study of Brazilian soccer players

Two brazilian physicists got a CD-ROM from a famous sports magazine (Revista Placar - 2003) packed with historical data (yearly stats covering 13,411 soccer players and 127 clubs) and modeled a network representing the connection between brazilian soccer players (players are connected when they played together on a club).

As a summary of the scientific article, here are some interesting findings:
  • Average degree of separation: 3.29: That means a player can reach any other player with an average of 3.29 hops. In other words, the network can be classified as a small-world (similar findings were made for movie actors who starred on the same movie, mathematicians who co-authored papers, etc).
  • The network was broken in many clusters in 1971 and 1972, after that there is only one component. Which means that in the early seventies there wasn't much mobility among clubs: some players and the players they've played with used to work on the same clubs.
  • The network is becoming more assortative with time: This seems to indicate the existence of a growing segregationist pattern, where the players transfer occurs, preferentially, between teams of the same size. This also means that players are increasingly more likely to play with players equally well-connected. This same behaviour is found on the network of mathematicians who co-authored papers.
  • The mean connectivity is increasing. The authors can think of two reasons for that: the player’s professional life is turning longer and/or the players transfer rate between teams is growing up.
  • The clustering coefficient is decreasing with time: Also in this case, there may be two possible explanations: the players transfer rate between national teams is increasing - - so players get a chance to play in several clubs, thus generating less clusters of clubs - and the exodus of the best Brazilian players to foreigner teams (which has increased, particularly, in the last decades).
  • Choosing randomly, a Brazilian soccer player has ten times less chance to have scored 36 goals than 13 goals. The player with the highest score is Roberto Dinamite with 186.
If you're interested on the type of analysis done on the referenced article, a great introduction to the new science field dedicated to studying networks can be found on Linked: How Everything Is Connected to Everything Else and What It Meansby Albert-Laszlo Barabasi (who is cited on the article itself).

Tuesday, November 14, 2006

Search engine for Eclipse sites

I've just created a Google Co-Op for Eclipse so I can get into API, examples, plugins and articles faster.

This will search only Eclipse-related sites, including subdomains of

Sunday, November 05, 2006

Comparison matrix for startup ideas

I've made a comparison spreadsheet for startup ideas that may be useful for others.

The purpose is to have all the important considerations to be made when coming up with an idea for a startup as columns and the ideas themselves as rows.

You may use it as a quick way to brainstorm all your crazy ideas, compare each other and make sure you're covering all important aspects of a successful startup.

Here's more detail on what each column means: (most of the referenced articles or quotes are from the Y-Combinator library)


A brief description of the idea itself. The initial idea is not a blueprint, but a question.


Who are the users/Wideness
Who's going to use whatever your idea is trying to do and how much they desire it:
In nearly every failed startup, the real problem was that customers didn't want the product. For most, the cause of death is listed as "ran out of funding," but that's only the immediate cause. Why couldn't they get more funding? Probably because the product was a dog, or never seemed likely to be done, or both.

Problems solved
What exactly are you trying to solve ? How well do you understand this problem? Is it really a problem people have?


Question: How much value can you ultimately deliver?

The most successful products give benefits quickly (both in the life of a product and a user's relationship with it), but also lend themselves to continual development of and discovery of additional layers of benefit later on.

But most things are deeper than they seem at first glance. Practically any application, once people start using it, can be used as a lever to more activity and benefit delivery. Being smart about what you're leveraging is key.

Understanding of users
You must understand exactly what customers want or be one yourself.

From the What Customers Want section
of How to Start a Startup:
No matter what kind of startup you start, it will probably be a stretch for you, the founders, to understand what users want. The only kind of software you can build without studying users is the sort for which you are the typical user. But this is just the kind that tends to be open source: operating systems, programming languages, editors, and so on. So if you're developing technology for money, you're probably not going to be developing it for people like you. Indeed, you can use this as a way to generate ideas for startups: what do people who are not like you want from technology?

As soon as everything is running as intended, where are revenues coming from ?

What are the costs associated with your business model?

There are
mainly five sources of startup funding:
  1. Friends and Family: Similar to angel investors.
    • Advantage: They're easy to find (you already know them)
    • Disadvantages: you mix together your business and personal life; they will probably not be as well connected as angels or venture firms; and they may not be accredited investors, which could complicate your life later.
  2. Consulting: Which means either saving money from your current salary doing whatever you currently do or working on consultancy itself. The best sort of job is a consulting project in which you can build whatever software you wanted to sell as a startup. Then you can gradually transform yourself from a consulting company into a product company, and have your clients pay your development expenses.
    • Advantage: This is a good plan for someone with kids, because it takes most of the risk out of starting a startup.
    • Disadvantage: Not enough time, energy and commitment that a startup requires.
  3. Angel Investors: Angels are individual rich people. Angels who've made money in technology are preferable, for two reasons: they understand your situation, and they're a source of contacts and advice.
  4. Seed Funding Firms: Seed firms are like angels in that they invest relatively small amounts at early stages, but like VCs in that they're companies that do it as a business, rather than individuals making occasional investments on the side.
  5. Venture Capital Funds: VC firms are like seed firms in that they're actual companies, but they invest other people's money, and much larger amounts of it. VC investments average several million dollars. So they tend to come later in the life of a startup, are harder to get, and come with tougher terms.

What operational system are you planning on using ? Based on any fancy middleware ? REST? SOAP? Which web framework?

More on that:
How do you pick the right platforms? The usual way is to hire good programmers and let them choose. But there is a trick you could use if you're not a programmer: visit a top computer science department and see what they use in research projects.
The question is: How difficult will it be to launch a worthwhile version 1.0?

According to Evan Williams:

Tractability is partially about technical difficulty and much about timing and competition—i.e., How advanced are the other solutions? Building a new blogging tool today is less-tractable, because the bar is higher. Building the very first web search engine was probably pretty easy. Conversely, building the very first airplane was difficult, even though there wasn't any competition.
Development plan
Do you have a broad idea of what the development plan (think of a roadmap) for the next months and years ? You should not release to early and not too late, so you must have a good idea when your initial prototype will be ready for release and which features to add next or an idea of what should go next.

What's the potential for integrating to other web sites? Can you leverage any external system or source of data ? Does providing an open API makes sense for your idea ?

Domain name
Which domain name to use ? Should be a catchy, small and somehow related to what your idea is about.


Who are your competitors ? Having some competition is not really bad. In a sense, it simply means that your idea is so good that others are also trying to capitalize on it.

Advantage over competitors
Now that you're aware that you have some competition, what's your advantage over them?

Question: Is it clear why people should use it?

Everything is obvious once its successful. Big wins come when you can spot something before its obvious to everyone else. There are several vectors to this: 1) Is it obvious why people should use it? 2) Is it obvious how to use? 3) Is it an obviously good business?

The key question for evaluating an idea is number one: Is it obvious why people should use it? In most cases, obviousness in this regard is inversely proportional to tractability.

How are users going to know that you can solve one of their problems ?

Focus (money X being cool)
Do you want to just make something cool out of your idea just for the sake of it or are you focusing on making money?


How motivated are you regarding this idea ?
In some fields the way to succeed is to have a vision of what you want to achieve, and to hold true to it no matter what setbacks you encounter. Starting startups is not one of them. The stick-to-your-vision approach works for something like winning an Olympic gold medal, where the problem is well-defined. Startups are more like science, where you need to follow the trail wherever it leads.
Time devotion required
A general estimate of how much time you'd have to devote for giving life to this idea.

Who will help you take the idea off the ground ?

From the People section of How to Start a Startup:

Like most startups, ours began with a group of friends, and it was through personal contacts that we got most of the people we hired. This is a crucial difference between startups and big companies. Being friends with someone for even a couple days will tell you more than companies could ever learn in interviews.

Ideally you want between two and four founders. It would be hard to start with just one. One person would find the moral weight of starting a company hard to bear.

If you can't understand users, however, you should either learn how or find a co-founder who can. That is the single most important issue for technology startups, and the rock that sinks more of them than anything else.

So who should start a startup? Someone who is a good hacker, between about 23 and 38, and who wants to solve the money problem in one shot instead of getting paid gradually over a conventional working life.

And from The 18 Mistakes that Kill Startups:

Have you ever noticed how few successful startups were founded by just one person? Even companies you think of as having one founder, like Oracle, usually turn out to have more. It seems unlikely this is a coincidence.

What's wrong with having one founder? To start with, it's a vote of no confidence. It probably means the founder couldn't talk any of his friends into starting the company with him. That's pretty alarming, because his friends are the ones who know him best.

Potential legal issues
Are you breaking any vaguely described patent? Indexing someone else's content ? Using the right disclaimers ? Do you even need to worry about terms of service ?

On which city do you plan on basing your startup ? A bad location may be a mistake:
Startups prosper in some places and not others. Silicon Valley dominates, then Boston, then Seattle, Austin, Denver, and New York. After that there's not much. Even in New York the number of startups per capita is probably a 20th of what it is in Silicon Valley. In towns like Houston and Chicago and Detroit it's too small to measure.
But that doesn't really mean that starting from other places will doom your startup...

Where are you and your partners going to work from ?

From What Business Can Learn from Open Source:
The average office is a miserable place to get work done. And a lot of what makes offices bad are the very qualities we associate with professionalism. The sterility of offices is supposed to suggest efficiency. But suggesting efficiency is a different thing from actually being efficient.

Things are different in a startup. Often as not a startup begins in an apartment. Instead of matching beige cubicles they have an assortment of furniture they bought used. They work odd hours, wearing the most casual of clothing. They look at whatever they want online without worrying whether it's "work safe." The cheery, bland language of the office is replaced by wicked humor. And you know what? The company at this stage is probably the most productive it's ever going to be.

Thursday, November 02, 2006

How many people worldwide make their living out of eBay ?

How High-Volume eBay Manages Its Storage is an article worth reading and packed with interesting numbers, like this one which I simply can't believe:

1.3 million people make all or part of their living selling on eBay.

Wednesday, October 25, 2006

Google Co-op Search Engine for research on computer science

I've put together a Google Co-op Search Engine for computer science.

It includes most respected repositories for computer science related research papers and articles, including:


Tuesday, October 17, 2006

Texture remover source example

Use it to remove a texture background from an image based on its color. It works by creating a color range from user chosen points and then removing all image pixels whose color falls into that range.


Source code: Eclipse CDT project, requires QT 4.0

Executable: Just unzip and run.

Sunday, September 03, 2006

Why everything in Ruby is such an uphill battle ?

I've just spent more than 10 minutes figuring out how to determine if a given string can be converted to an integer, because it turns out that the to_i string method never raises an exception and returns 0 when the string can't be converted, but what if I want to know when it can't be converted ? Grrrr ...

Wednesday, August 23, 2006

You know you've been tagging pages with "toread" on too much when ...

... you start to wonder if a human lifetime is gonna be enough to actually read them.

Does that also happen with you ?

Saturday, August 19, 2006

Extract PDF title from all files on a directory

Got a directory full of PDF files with file names that have nothing to do with their title and want to generate a text listing ?

Try this Python script. You need to have pyPdf installed.

# pyPdf available at
from pyPdf import PdfFileWriter, PdfFileReader
import os

for fileName in os.listdir('.'):
if fileName.lower()[-3:] != "pdf": continue
input1 = PdfFileReader(file(fileName, "rb"))

# print the title of document1.pdf
print '##1', fileName, '##2', input1.getDocumentInfo().title
print '##1', fileName, '##2'

Example output:

##1 00317565.pdf ##2 A framework for the specification of SCADA data links - Power Systems, IEEE Transactions on
##1 00363299.pdf ##2 Advanced SCADA concepts - IEEE Computer Applications in Power
##1 00392026.pdf ##2 Routing SCADA data through an enterprise WAN - IEEE Computer Applications in Power
##1 00500696.pdf ##2 INTEGRATION OF SCADA AND DA/DMS ACROSS A LARGE DISTRIBUTION SYSTEM - Energy Management and Power Delivery, 1995. Proceedings of EMPD '95., 1995 International Conferenc
##1 00515274.pdf ##2 THE DESIGN OF NEXT GENERATION SCADA SYSTEMS - Power Industry Computer Application Conference, 1995. Conference Proceedings., 1995 IEEE
##1 00517471.pdf ##2 THE ROLE OF MEDIUM ACCESS CONTROL PROTOCOLS IN SCADA SYSTEMS - Power Delivery, IEEE Transactions on

Saturday, August 12, 2006

Differences between sexes

Just in case you're as interested on this topic as I am, here are some updates on researches about this topic:

Here's an interesting piece from the Economist The mismeasure of woman (a pun on a book from Stephen Jay Gould), which sheds light on many myths. One that caught my attention is:

Female is the default brain setting. Until the eighth week of gestation every human fetal brain looks female. The brain, like the rest of the human body, becomes male as a result of surges of testosterone—one during gestation and one shortly after birth.

On Why Do Beautiful Women Sometimes Marry Unattractive Men?, Dubner and Levitt talk about a new study by Satoshi Kanazawa, an evolutionary psychologist at the London School of Economics, suggesting that it may be a simple supply-and-demand issue: there are more beautiful women in the world
than there are handsome men.

This BBC report references a UK research suggesting that hungry men were attracted to heavier women and explains that how full a man's stomach is can dictate the type of woman he will fancy.

Sunday, August 06, 2006

Word of caution: Distutils, SWIG, STL

This one made me waste some precious time so here it is in chance Google picks it up.

Python's Distutils got a little smart and can now run Swig for you but if you have any STL datatypes (vectors, maps, etc) on your wrapped functions, remember to add the undocumented swig_opts=['-c++'] option to your Extension module definition on

Otherwise, you may end up with Swig spitting errors like

/usr/share/swig1.3/std/std_common.i:109: Error: Syntax error in input(1).

which if you take a look at the mentioned line of std_common.i, it's an unrecognized "%}" swig-pre-processor-thingy.

Here's how a module definition would look like:

#standard setup stuff left out
ext_modules = [Extension('pyCOMHook._mouse', ['mouse.i'],
libraries=libs, include_dirs=includes, define_macros=macros,
extra_compile_args=compilerArgs, language=lang, swig_opts=['-c++'])],

Other than that, if you get the dreaded

ImportError: dynamic module does not define init function (initimymodule)

error when importing your module, try renaming to

Saturday, August 05, 2006

Gah, I hate blogs !

TechCrunch on More Stats on, This Time Positive:

This is an update on the post I wrote about earlier today that showed massively decreasing traffic on the site according to Comscore, and flat traffic from Alexa.


At the end of this process, after reviewing the public data (deeply flawed, but neutral) and Yahoo internal data (presumably accurate, but selectively disclosed), I’ve come to the conclusion that I have no idea what’s up at I’m going to go with my gut and trust Yahoo.
And this is what I have to say: Wow, really interesting... Another example of the classic blog post “there is a rumor and I was unable to confirm the rumor, so let’s just keep spreading”. That’s what I call relevant news.

Thursday, August 03, 2006

Installing mplayer on Ubuntu

Ok, it took me more than it should to find the right way for getting mplayer to work on Ubuntu Dapper Linux, so I thought I should post it here so others can find it quicker than I managed to.

You must do the following actions as root (super user) so you will need to supply your user password.

Here's the two lines you must add to the file /etc/apt/sources.list:

deb dapper multiverse
deb-src dapper multiverse

(for other Ubuntu versions, replace "dapper" with "gutsy" on the lines above for example)

And after saving the file, run these two commands on a console:

sudo apt-get update
sudo apt-get install mplayer

Sunday, May 14, 2006

Applied Hierarchical Temporal Memory

Founded by Jeff Hawkins (who also founded Palm Computing), Numenta promises to deliver technology on a few years really worth watching for:

Numenta, Inc.: "Numenta is developing a new type of computer memory system modeled after the human neocortex. The applications of this technology are broad and can be applied to solve problems in computer vision, artificial intelligence, robotics and machine learning. The Numenta technology, called Hierarchical Temporal Memory (HTM), is based on a theory of the neocortex described in Jeff Hawkins' book entitled On Intelligence (with co-author Sandra Blakeslee)."

From what's been hinted at Hawkins book, they are going to deliver HTM (which is the basic mechanism used by our brain to learn and recall patterns, giving us what we call "thinking") to application developers, allowing more intelligent industrial machines and end-user software applications.

Also check a pdf they've just released describing the basic concepts behind HTM.

Saturday, February 11, 2006

Sourceforge domain scam

It seems like someone registered the domain and is placing spam sites mimicking real projects.

Compare the real one:

With the fake one:

Wednesday, January 25, 2006

Funny quote on design patterns

From a blog post by Grady Booch:

The past few days I've been cataloging several hundred architectural and design patterns ...
I mean, these things were supposed to make our lives easier in the long term, but nothing easy starts with learning hundreds of patterns and concepts. Do we really need that much different patterns ?

Monday, January 23, 2006

Long live Chuck Norris

Chuck Norris Facts: This would be one of the funniest sites I've ever seen (yes, I'm weird just like you). Some quotes:

Chuck Norris' tears cure cancer. Too bad he has never cried. Ever.

Chuck Norris has counted to infinity. Twice.

Chuck Norris is 1/8th Cherokee. This has nothing to do with ancestry, the man ate a fucking Indian.

Chuck Norris can win a game of Connect Four in only three moves.

The quickest way to a man's heart is with Chuck Norris' fist.

Chuck Norris can set ants on fire with a magnifying glass. At night.

And today I came across a full length SNL music video giving some insight on how this legend came to be and a Web 2.0 version of Chuck Norris facts.

Sunday, January 22, 2006

Struggling with bookmarklets size

I've been playing with bookmarklets lately as I try to improve the one I created for our.imgSeek. During my struggles, I found a really useful online bookmarklet builder which is great for testing, formatting JS and compressing it (removing useless chars).

And why compressing it ? Firefox doesn't seem to care about your bookmarklet code size, but IE does and will silently ignore your code if it's longer than 500 chars. I'm feeling like an artist trying to squeeze useful code in 500 chars and trying a lot of hacks to overcome this IE limit.

Tuesday, January 10, 2006

browser based image annotation techniques

I was doing some research on current implementations of browser based image annotation (basically letting users select an image region and input some data about that region) and the only working example I could find is this Javascript example. I know Flickr! does it with flash, but anyway it's not opensource.

Is there anyone else out there aware of another good example ?