cover photo

Mike Macgirvin

traffic and more traffic

  from Diary and Other Rantings
Sorry folks for the problems connecting this morning. Had our very first million hit day, and the servers in La Brea weren't ready for it - and neither was I....

It was a combination of several things converging at once. Shamita Shetty  apparently was exposed on the Style Ikon channel, and even with an overlaid star to protect the family jewels it seems that a bunch of folks from the UK couldn't get enough of it. Wonder what this incident did to the transatlantic backbones...

But what really killed the servers was that as this was going on - Google, Yahoo, and Ask Jeeves and a couple of other search engines all converged on 10-12 of my sites at the same time to do their normally periodic crawls. Normally I'll get one or two crawlers doing one or two sites a day.
Things seem to have now returned to a normal state of chaos.
Ah ha! I was wondering what happened to ya! So you have been elevated to the lofty million-air club? Congrats! :)
Tech Support

  last edited: Mon, 29 Jan 2007 03:22:25 +1100  from Diary and Other Rantings
This belongs to MichaelAnn. Reproduced here because I love it. I would normally just import the feed except this one is setup with teaser (short) feeds where all the articles are chopped off in mid [...]

You've reached support, how may I help you?
Hey! Cool! I'm gonna be famous! Hi Mike! :)

One of my favorite support dialogs always begins with:
me: What operating system are you using? Windows or Mac?
caller: I don't know... how can I find out?
Dear Santa...

  last edited: Wed, 22 Nov 2006 03:02:45 +1100  from Diary and Other Rantings
Dear Santa -

You know that quad core 64-bit Pentium with 8 gigs of RAM and a 400G hard drive I asked for? Yeah, with the quad firewire interface and twin 27 inch monitors. That's the one.

Nevermind... Could I have one of these instead?

Good to see you haven't lost your sense of humor!
Fuzzy Tools

  last edited: Tue, 17 Oct 2006 08:52:14 +1000  from Diary and Other Rantings
I've been watching the whole bruhaha at with some interest. This is a homegrown website that takes live 911 feeds and puts them on a Google mashup. Cute and clever use of technology. The Seattle Fire Department responded by changing the logs to image format rather than text.

That's the background. Reading some of the articles pointed me to 'gocr' which is a free OCR package. Now this is useful - I wasn't earlier aware of its existence. It basically takes an image and tries to distinguish text in the image and gives you the text. If you saw my article on comment spam, you'll realize that 'captcha' images to prevent spam are doomed. This is where you type into a box the letters you see in the picture. Most of these are annoying anyway, but it's pretty hard to get them through a clever command-line driven OCR program. If you make it so hard to read that gocr can't read it, chances are that none of your audience will be able to either.

But I have an even deeper interest in this stuff.  Gocr is a framework for finding recognizable stuff in images. Something the world has needed for a while now is something that can filter porn. In theory there isn't much difference between distinguishing the letter 'b' in a picture (in any of 600 different fonts) and say a breast (in any of 600 different sizes/shapes). I'm being polite. Any anatomical feature.

Some folks worked on this problem back in the '80's, correlating the prevalance of what could be termed 'skin tones' in an image.

The tools and concepts are out there. It shouldn't take much more than a man month or three to put them together into a porn filter. There's probably a market for such a thing.

OK, gocr is probably encumbered with the GNU General Public License. So maybe there isn't much of a market unless one just uses the general pattern recognition concepts (but not the code) and starts from scratch. I don't have anything against the GPL. It serves its purpose, but it does make it hard to re-use code in the workplace. It's a bummer to always have to start from scratch, when the software already exists and has been pretty much debugged. If I'm releasing code into the public domain, I always use either the Berkeley/Stanford license or no license at all. Free, no warranty, blah blah. The GPL is basically a self-replicating virus - which was written by lawyers instead of geeks.
Ya' know I was going to agree with you that gocr is absolute junk, which I found out after fifteen minutes of evaluation back in 2006. It doesn't do any kind of fuzzy match and just looks at pixel comparisons in functions something like compare_a(), compare_b(), etc. Hardly general purpose pattern matching algorithms. I've since posted about doing this kind of pattern matching with heuristics, but it really doesn't matter because then you went into name calling and insult mode and I really don't owe you the time of day.  

Building software involves taking into account a great many factors - of which the license terms can be important if you're (as I was at the time) working for a corporation that has banned anything tainted with GPL. These employers leave you no choice but to build from scratch - which isn't a problem if you're any good at the job and want to continue receiving a paycheck. I do know a bit about the GPL - I've worked with it since the first draft of the license back in the late 80s. I choose not to use legal atrocities like this for my own projects. This makes them legally usable by more people than so-called 'open-source'.  

Bugger off.