I've just about got most of the kinks out of the new authentication system. Should be able to plug it in next week (I'm going away for a couple of days).
What I'm trying to do is something I haven't seen in all my web travels (though it probably exists in a corner of another galaxy in the web universe). A multi-user authenticated website without cookies and without sessions, and which also allows non-authenticated access. You'd think it was rocket science, but it turns out to be relatively easy. And no, I'm not using chained HTTP POST requests or encoded URL's to pass info through the site. I call that a session. And it works on all browsers! (This site is optimized for Firefox on Linux - if it doesn't display correctly in your browser, tough titties). So far I can't see any fundamental logic flaw, which makes me wonder why 99.999999999999999% of the other websites feel they need cookies and/or sessions. It seems to be a case of everybody else does it this way, so that's the way it has to be done...
[Update 27-FEB-2006]
I found the logic flaw... It isn't insurmountable, but the logic is ugly. Basically, it was a fresh new face on HTTP auth. By re-arranging the logic, it's possible to have a page authenticate only when desired, and not 100% of the time. That was my novel concept. The flaw is the same old flaw in HTTP auth - you can't easily logout. I've found several workarounds that can be made to provide this functionality, but are basically crude hacks from a technical standpoint. I really don't like crude hacks. Looks like I'll be going back to cookies/sessions - or perhaps decide that logging out isn't important for what I'm trying to accomplish. I'm no longer trying to write a diary software package to give/sell/distribute to others, for instance. That decision frees me from a lot of installation and support issues; and allows me to write for a modern environment (for once) instead of dumbing everything down so it will work with Apache1/MySQL3/PHP3 on Win95.
New Ajax laundry detergent is stronger than dirt.
You may have seen a lot of AJAX floating around on the web recently. No it isn't because everybody has a lot of dirty clothes to wash. This is the new AJAX. Asynchronous Javascript And XML. Although it isn't really anything new, it's a new way of writing some web applications.
Basically, Ajax makes use of the ability of Javascript to fetch a URL. Instead of loading a page or an image however, the modern way of doing things is to talk XML with a server. If you recall, I've previously mentioned the new wave of XML-RPC applications taking over the world. Ajax is a big part of that.
The best example of why somebody might want Ajax is to envision a web page with 'auto-complete as you type' information. As you start typing something, it starts to show you results based on what you've typed so far. To do this in a traditional web page or even a Javascript page, you need to have all of the data loaded before you can search through it for the relevant piece. This could be thousands of entries. With Ajax, as you type, the browser is actually doing web requests behind the scenes to match the data - and showing you the results without having to reload or go to another webpage.
This is one step closer to having web pages be as interactive and full-featured as native applications (i.e. .exe files).
The downside of all this is that things are going on behind the scenes, and this could be disconcerting. Privacy issues, usability issues (the back button might not go back to where you thought). But perhaps the biggest issue of all is that of dealing with software bugs. When software is happening invisibly behind the scenes and doesn't perform the expected result, it's very difficult to troubleshoot. The person reporting the bug might not even know that something is happening behind the scenes. As far as they're concerned, they loaded a web page from foo.com and the web page didn't work. It might not have been foo.com's page at all. It might have been a server in Vermont that was providing an XML feed and experienced a brief outage. Somebody from foo.com however is going to have to spend hours figuring this out.
I was up most of the night getting the webserver working again. Apache failed to compile. So I upgraded. Failed to compile. Took out the mod_dav option I was trying to turn on. It built. Sigh. Since this was an upgrade, I installed it. But then PHP broke. All the websites, hosed. Nothing.
So I recompiled PHP. But mod_dav wasn't working, and that was the whole point. Not only that but mod_rewrite wasn't working either. Turning on either made the build fail. Couldn't find libgdbm. I'm not using libgdbm. How many times do I have to tell you!?! Apparently, it won't listen. It will configure without libgdbm, but then will complain that it can't find it. Fine, I'll use Berkeley DB instead. Can't find Berkeley DB. And by the way, it can't find libgdbm (which it isn't supposed to be using) either. Oh, and to add insult to injury, it can't find /usr/lib/libgdbm.la - which is there because I installed it anyway, even though I don't want to use it. Why can't you find something that's there and you're not supposed to look for? Why can't you do anything right? Why? Why?
I went back to last months apache package. It wouldn't compile at all. The same errors about missing databases that I have no need for. Now I'm really hosed. Can't get back to my working environment.
At about 3:30 AM I finally pulled out original sources and rebuilt everything in the exact configuration I setup a few weeks ago. (No small feat). Yay! Went to bed.
Later this morning I started the forensics to figure out what went bad and why. Turns out that it's APR (Apache Portable Runtime). APR is supplied with Apache, but Apache (and APR) are built with a prior version of APR if it exists. I've had so many APR's on this system I can't figure out which one it loaded. The message that it required a newer version of APR made it suspect. Why the heck do you need a newer version of APR if you're going to BUILD a brand spanking new version of FREAKING APR? What's wrong with this picture?
So anyway, I manually built and installed the APR piece of the webserver, and then used that version to build the rest of the webserver. Mod_dav built just fine this time around. But I'm not installing it today, since I already know I'll have to rebuild PHP again to make it work. I'll just sit on it for a few days. I've had enough upgrade joy for the week.
Need sleep.
Most of the places I've worked in the software world used CVS as the version control software. I can't tell you how many times I've cursed CVS because it will usually end up doing something stupid, and the only way out is to go in as a Unix administrator and mess with the repository to fix it. More than once I've done this and ended up messing up every developer in the place. Hacking the repository is a no-no. But the poorly written software often leaves you no other choice. With CVS there's no such thing as a mistake. If you put a file into the system, you've put it there forever. If you create a bug that wipes out your hard drive and make the mistake of checking it in, it's always there to wipe out the hard drives of any sucker that chances on that bad revision.
Anyway, this is a long way of saying that I'm now using subversion. Every single little thing that I hated about cvs has been fixed in svn. It's probably safe to assume that subversion was written by a group of ex-CVS admins who got tired of doing the same stupid things over and over.
Now if only I can get my webserver to load it up... apache doesn't build mod_dav by default, which the svn webserver module seems to need. Apache likewise doesn't seem to want to rebuild since I upgraded my zlib for the mySQL upgrade -- it's getting a header conflict. I had to update zlib because mySQL wanted a more recent version of Berkeley DB, and Berkeley DB wanted a more recent version of about thirty different packages... Sigh... Back to upgrade hell. The problem with Linux is that it's always a moving target. Make that a few thousand moving targets that are all moving at different speeds, with each depending on hundreds of other projects which also move at varying speeds.
Try plugging that into a spreadsheet...
Until I get it worked into the menu, there's a tag cloud at
This is an aggregate. You can also view them by author at (for example):
Here's the way it's supposed to work...
The main site now lives at 'BADDCAFE.com'. The main site is an aggregator of unique contributors. At the moment I'm the only contributor, so bear with me...
baddcafe.com/mike is where my weblog lives. Baddcafe.com/julio is where you would find Julio's blog if there was such a person contributing here. On the main site you'll find all these contributions. On the individual site, you'll find very personalized environments (theoretically -- I'm still going through the issues list).
For instance, 'categories' are unique to an individual. You won't find them on the top level site (until I get a tag cloud going). You see, Julio (the hypothetical contributor) might be into martial arts and set up categories according to those interests. I might add things like brewing to mine. There's no reason we have to share the same namespace for categories - in fact it's absurd. So we won't.
Feeds:
RSS feeds are at baddcafe.com/feed
You can also tailor these results - feed/mike will get you my feeds. feed/mike/music will get you my 'music' category.
Don't worry ... all of this will make more sense as it starts to stabilize.
What's left on the 3.0 to-do list:
comment moderation (easy)(done)upgrade RSS feeds to SQL drivers (easy)(done)user manager (a bit more work)(done)- configuration editor (remove executable code from the configuration; - challenging) (in progress)
- migrating my thousands of old archives (will require a few dry runs before I actually do it.)
Just worked out all the 'clean URL' logic in a much better way than version 2.x. In 2.x, everything is tied to the article filename, which usually looks like '15-FEB-2006'. To view the article, you access something like 'article.php?15-FEB-2006'.
But in version 3, there will be no visible '.php' files in the URL, and I've done away with question mark queries entirely. And there are no more dates in the article access code. It's just an article ID. And it's multi-user, so the weblog author has to show up in the site URL if you want to access a particular author's weblog.
So the URL for my weblog is going to look something like 'http://(somewhere)/mike/(something)' instead of 'http://(somewhere)/index.php/mike/(something)'.
And to look at a particular article it will be 'http://(somewhere)/article/9'.
All of this requires significant cooperation with the webserver. Mod_rewrite, ForceType directives, all kinds of stuff. It's easy stuff once the URL format is clarified. But it won't be plug-n-play. I'm not even going to try and support non-Apache environments or anything but 5.x PHP using mysqli (which requires at least 4.x MySQL). There comes a time when old software has to die, especially since nobody is actually paying me to do this. There isn't enough time in the day to make it work on IIS and PostGresQL. Even supporting 4.x PHP requires lots of hacks to accomodate the changes in various function calls.
What's left on the 3.0 to-do list:
- comment moderation (easy)
- upgrade RSS feeds to SQL drivers (easy)
- user manager (a bit more work)
- configuration editor (remove executable code from the configuration; - challenging)
- migrating my thousands of old archives (will require a few dry runs before I actually do it.)
It's all about fighting 'comment spam'. Turns out that Google's page rank mechanism has completely altered the internet landscape. It's all about getting links. There are folks out there whose job description is to find ways to get their employer's website link on as many pages as they can. The first place they go? Weblogs, bulletin boards, public forums. They post a comment. It has nothing to do with the article they are commenting on. It's just some text with a link in it. A link that presumably Google will find and increase their page rank.
Most of the truly public forums have closed already. This has helped to reduce the problem because the first wave of attacks was automated. Now the spammers have to do a lot of stuff by hand. Find the security mechanism and get through it to post the link. That's why everything is being moderated these days.
Netscape anticipated this back in 1995. The SSL protocol was revised to include a personal identity. It was known at the time as an 'internet driver's license'. Without an established identity, you wouldn't be able to access the majority of web services being envisioned. This also would centralize authentication and create a new business model - some company or three would get very wealthy by being the gatekeepers of internet identity.
The internet driver's license died a horrible death due to privacy (and monopoly) concerns. Without privacy, the web would not flourish and reach its full potential. So went the argument. And without a credible universally acceptable identity mechanism, the only way to establish identity is the way we do it now. Every website that wishes to track identity has a login mechanism that is different from everybody else. That's why you now have hundreds (or thousands) of user accounts with passwords.
Now fast forward ten years or so. The web has indeed flourished. It still has yet to reach its full potential. In order to get to the next phase, it may be time to bring back the internet driver's license...
Been there, done that...
That's why I end up shaking my head in disgust at all these CMS packages I've looked at. Sure, some of them have the cutting edge features, but they all suffer from DB bloat. To generate one web page, the web server has made somewhere between 200 and 1000 individual transactions to the database server. You won't notice this until the site starts to bog down, say around 1,000 users. By then it's too late to do anything but buy lots of hardware . One machine for every thousand users. Quick - how many machines will you need to serve 6 million people? That's right. Six thousand computers.
So I got creative with the SQL. Instead of grabbing data an article at a time (data, categories, comments, attachments, etc...), I get it all at once. Four queries to stuff the entire weblog index page.
One, two, three, four. Grab a month's worth of articles. This returns the article numbers I need to find the categories and attachments. Then one more SELECT and I've got the category summary table. Then it's all loaded and I just have to spit out the html (or rss or whatever). To be fair, I'll probably end up with a couple more queries before it's all converted. "Recent comments" likely needs its own. Plus a handful of file accesses to load all the functions and headers.
Now we're talkin'. That rocks. I could probably get this down to a couple of queries using several levels of LEFT JOIN and sub-SELECTs, but that's getting fanatical. Only a madman could maintain that code. One slipup in the syntax and you'll be getting back 60 trillion records due to combinatorial factors. Five or six queries is good. I can live with that.
For instance, you can't just delete a record with a database manager. Well, you could, but it's asking for trouble. You sorta' need to know how that record ties in with all the other records on the system and make sure you don't leave some orphaned data hanging around somewhere. In a weblog, articles are linked to comments and attachments and categories for instance. Change one and you've generally gotta' change everything else to keep it all synchronized.
But none of this is rocket science. I've done it all before. So I've started the overhaul in a different space. Got the posting and editing page pretty much complete already. Category management, posts management, yada, yada. Should have most of it working in another day or three - assuming some quality time at the keyboard.
The good part is that most of my existing work translates directly. The post page for instance looks and works exactly the same. Exactly. That's the way it should be.
The music store is gone, so yesterday I took the day off. Today I'm taking another day off, and no, I'm not gonna' watch the game. Think I'll upgrade my databases. Actually, I upgraded them this morning. That's all done.
on the Falkland Islands have devised what they consider a marvelous new
game. Noting that the local penguins are fascinated by airplanes, the
pilots search out a beach where the birds are gathered and fly slowly
along it at the water's edge. Perhaps ten thousand penguins turn their
heads in unison watching the planes go by, and when the pilots turn
around and fly back, the birds turn their heads in the opposite
direction, like spectators at a slow-motion tennis match. Then, the
paper reports "The pilots fly out to sea and directly to the penguin
colony and overfly it. Heads go up, up, up, and ten thousand penguins
fall over gently onto their backs.
-- Audobon Society Magazine

Digg
Delicious
Netscape
Technorati