Main

October 11, 2007

robots & data centers

two very important things:

1. Google is now providing academics with access to MapReduce. That includes access to a $30M research cluster.

2. DARPA urban challenge is on in less than a month - here are the latest updates & vids are here is a list of competitors with photos.

ap.

October 07, 2007

Memewars

Meme-trackers are curious beasts. Since I first discovered memeorandum.com a couple years back, I have been keenly studying them and their full-text clustering algorithms.

Yesterday’s News, Today.

The basic meme-tracker premise is simple – they scour the blogosphere & newsosphere for the latest action so you don’t have to.

While they are all reasonably successful at it, I don’t think they have quite nailed the vision just yet, in the same way that AltaVista & InfoSeek didn't quite solve the web information retrieval problem. I say that because I keep finding myself drifting back to sites like Slashdot & Gizmag because they know stuff that the meme-trackers will never tell me.

The root cause is that most meme-trackers utilize inbound link weight for their ranking system in some way or another. Whether it is based on naïve 1st order link count, the recursive PageRank random surfer model or extracting bipartite authority graphs, they all share the same problem - emerging topics are not heavily linked to begin with and not all interesting topics attract sufficient links over time.

In short, too much reliance on inbound link weight can result in a lot of missed information with the remainder being delivered quite slowly.

BuzzTracker sold for $5M

Recently, when Yahoo purchased buzztracker.com for $5M that placed a valuation on the meme-tracker landscape. Assuming that valuation is a function of eyeballs, then based on Quantcast’s data, Technorati could be worth $700M. Going by Alexa's data, Technorati could be worth as much as $1.3B.

So who is winning?

Here is a list of meme-trackers, along with their current Alexa Rank & Quantcast Reach data.

Of the top 3 (according to Alexa), Technorati & Feedster both started life as blog search engines and have only recently evolved into meme-trackers. Topix on the other hand will probably evolve itself off the list soon as it is looking more and more like a social media site.

Continue reading "Memewars" »

April 04, 2007

Stock or Not

A good friend of mine Josh Reich has built a simple but compelling game where you are presented with a chart constiting of data from a real financial market alongside a chart of some random data - you are charged with the task of spotting the fakes.

Turns out its harder than it seems, although the average is 50%. On the surface that kinda says to me technical analysis is bogus, but here is the thing... people are either horrendously bad or exceptionally good at it.

I have been hassling Josh to include a survey to determine if there is a correlation between adeptness at spotting fake stock charts and being a successful trader (or a wall street zip code).

Which leads me to the question..
Can u guess the difference between a real stockchart and a random pile of junk?

February 08, 2007

The Dreaded Heisenbug

Yay I used a bayes decision tree to isolate a bug today in a fraction of the time it would have otherwise taken.

About 48 hours ago I started work on repairing a Heisenbug. For the less geeky of you, Heisenbugs are a rather nasty class of software fault that “disappears or alters its characteristics when it is researched”.

Most bugs are generally the result of only a single input (or knob or button or whatever) being set to a single value (or range or whatever). Software testers live by this assumption, and 99% of the time it is true, so true that we tend to forget (or is that ignore?) the 1% of bugs that can’t be explained so easily.

After tearing my hair out all yesterday and going to bed feeling somewhat defeated, this morning I woke with a fresh mind, a new day and a small suspicion that perhaps this bug fell into that 1%.

After poking and prodding at my adversary for most of the morning it was pretty clear that this was occurring probabilistically and that some pairs of input combinations made the bug occur more frequently.

Turns out the randomness was due to a threaded race condition and a combination of three inputs being in a certain range tended to make it occur more frequently – knowing those settings (which fell out of the decision tree) was thankfully enough to explain why the bug was occurring.

September 12, 2006

gnoos

I am now officially working at Ben Barren's web2 startup gnoos.com.au.

heh, I guess that means I'll be updating this a bit more.

February 11, 2006

First Post

Yup. Welcome to my blog. More to come.

ap.