Mike Macgirvin
Diary and Other Rantings
Beyond Silicon Valley
   
Friday, May 16 2008, 08:46 pm
Feb 03, 2007
Web 3.0

Ya know, the more I think about - I believe that the next wave on the web is going to be heuristics. OK, that and pay-per-view. But think about it. This wave has been all about collecting and value-adding zottabytes of data. Folks are starting to harvest all these data collections and finding other ways to add value.

The real value will come from using this info-wealth as training data that our systems can learn from. That's what heuristics is all about... learning. Then this internet could get really freaking smart. People who looked at this page or bought this thing almost always looked at that one or bought this item. You've already seen this on Amazon. We'll need this kind of machine learning to get through the pending info glut. I already mentioned infinite content streams. The web has grown bigger than any of us can ever hope to visit in our lifetimes - even to find the places that interest us. We're going to need help. Things we are interested in will need to know our interests and either find us or migrate to the top of the pile.

But not the junk. There isn't enough time for it. For information based companies, survival will depend on being good at doing this. You can't rely on Google page rank alone. There are way too many people fighting for that pole position. You won't be able to hold it forever. 

Let's move this idea to something bigger than Amazon. Take it to the whole web. Let's go to CNN. Ah, we see that you don't care about the presidential primary. Libertarian buddhist males over 45 traditionally tend to lose interest in the event. Would you like some tech news? And nine out of ten people who saw this particular article also liked to see this other one from our Offbeat section. Should we show that to you now?

Now for Google. Paris Hilton. Hmm. We noticed that every time a male in your age group asked for a (female) celebrity search they followed it shortly thereafter with 'naked'. Shall we just cut to the chase and take you there instead? Yesterday it seems that 40 million people downloaded a particular video of her. Is that what you're after? 

Then for web 4.0 we'll be able to feed our own data into the mix. Web 3.0 is learning based on populist trends. Web 4.0 is personalized. Grab my entire weblog. All my surfing habits. Use that as training data. You'll know what I'm going to like or where I like to go even before I do. Then we might finally achieve the computing vision we all grew up hoping for. Hello computer. Give me a quick summary of everything in the world that I care about. Yes, I'd like to have breakfast at Frank's Place. Just like on every third Tuesday of the month. A new Indiana Jones movie came out? Great! It's already on my Netflix queue? Thanks.

The down side is that they'll have you pegged. The systems will know what you're going to buy and who from and how much you're willing to spend and when you're going to buy it.

The bright side of this is with negative content. Accuracy goes up as you acquire more training data. It would be hard to beat a heuristic spam filter that has been trained with 30 trillion spam messages. If you don't want to see porn, it could just as easily be removed by a neural net. We certainly have enough data on the net to train it to almost foolproof accuracy today.  

Before you scoff, consider that in most cases we already have the data. That's the hard part. The easy part is writing a little function/script to take what we know about you and then (based on that) spit out a probability that a particular piece of content will please you. For some web properties this could be as easy as twenty lines of code. It's the same thing they use to show reports to management. For the rest of us, it could be a page or two - still not a big deal in the larger scheme of things. 

Categories: computer
Comments? | More Actions Open/Close menu
Back
grep me no patterns and I'll tell you no lines.