cover photo

Mike Macgirvin

mike@macgirvin.com

fortune_to_html in PHP

Mike Macgirvin
  last edited: Wed, 26 Nov 2008 22:04:48 +1100  from Diary and Other Rantings
One of the problems with using the Unix/Linux fortune (aka fortune-mod) command in web pages is making it readable in HTML. One can provide something that mostly works by substituting any HTML entities (&,<,>, and double quote) and converting linefeed to <br />.

However you're still going to get a lot of fortunes with unprintable characters where the original intent was lost - as many of these used 'backspace hacks' to provide character underlines, accent marks, and on really old fortune databases, using backspace to strike out text and replace it with something more amusing.

Here is a function that should make 99.999% of the fortunes you may encounter that use weird ASCII tricks display in web pages mostly as originally intended.

<?php

function fortune_to_html($s) {

  // First pass - escape all the HTML entities, and while we're at it
  // get rid of any MS-DOS end-of-line characters and expand tabs to
  // 8 non-breaking spaces, and translate linefeeds to <br />.
  // We also get rid of ^G which used to sound the terminal beep or bell
  // on ASCII terminals and were humorous in some fortunes.
  // We could map these to autoplay a short sound file but browser support
  // is still sketchy and then there's the issue of where to locate the
  // URL, and a lot of people find autoplay sounds downright annoying.
  // So for now, just remove them.

  $s = str_replace(
    array("&",
          "<",
          ">",
          '"',
          "\007",
          "\t",
          "\r",
          "\n"),

    array("&",
          "<",
          ">",
          """,
          "",
          "        ",
          "",
          "<br />"),
    $s);

  // Replace pseudo diacritics
  // These were used to produce accented characters. For instance an accented
  // e would have been encoded by '^He - the backspace moving the cursor
  // backward so both the single quote and the e would appear in the same
  // character position. Umlauts were quite clever - they used a double quote
  // as the accent mark over a normal character.

  $s = preg_replace("/'\010([a-zA-Z])/","&\\1acute;",$s);
  $s = preg_replace("/\"\010([a-zA-Z])/","&\\1uml;",$s);
  $s = preg_replace("/\`\010([a-zA-Z])/","&\\1grave;",$s);
  $s = preg_replace("/\^\010([a-zA-Z])/","&\\1circ;",$s);
  $s = preg_replace("/\~\010([a-zA-Z])/","&\\1tilde;",$s);

  // Ignore multiple underlines for the same character. These were
  // most useful when sent to a line printer back in the day as it
  // would type over the same character a number of times making it
  // much darker (e.g. bold). I think there are only one or two
  // instances of this in the current (2008) fortune cookie database.

  $s = preg_replace("/(_\010)+/","_\010",$s);

  // Map the characters which sit underneath a backspace.
  // If you can come up with a regex to do all of the following
  // madness  - be my guest.
  // It's not as simple as you think. We need to take something
  // that has been backspaced over an arbitrary number of times
  // and wrap a forward looking matching number of characters in
  // HTML, whilst deciding if it's intended as an underline or
  // strikeout sequence.

  // Essentially we produce a string of '1' and '0' characters
  // the same length as the source text.
  // Any position which is marked '1' has been backspaced over.

  $cursor = 0;
  $dst = $s;
  $bs_found = false;
  for($x = 0; $x < strlen($s); $x ++) {
    if($s[$x] == "\010" && $cursor) {
      $bs_found = true;
      $cursor --;
      $dst[$cursor] = '1';
      $dst[$x] = '0';
      $continue;
    }
    else {
      if($bs_found) {
        $bs_found = false;
        $cursor = $x;
      }
      $dst[$cursor] = '0';
      $cursor ++;
    }

  }

  $out = '';
  $strike = false;
  $bold = false;

  // Underline sequence, convert to bold to avoid confusion with links.
  // These were generally used for emphasis so it's a reasonable choice.
  // Please note that this logic will fail if there is an underline sequence
  // and also a strikeout sequence in the same fortune.

  if(strstr($s,"_\010")) {
    $len = 0;
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        $len ++;
        $bold = true;
      }
      else {
        if($bold) {
          $out .= '<strong>';
          while($s[$x] == "\010")
             $x ++;
          $out .= substr($s,$x,$len);
          $out .= '</strong>';
          $x = $x + $len - 1;
          $len = 0;
          $bold = false;
        }
        else
          $out .= $s[$x];
      }
    }
  }

  // These aren't seen very often these days - simulation of
  // backspace/replace. You could occasionally see the original text
  // on slower terminals before it got replaced. Once modems reached
  // 4800/9600 baud in the late 70's and early 80's the effect was
  // mostly lost - but if you find a really old fortune file you might
  // encounter a few of these.

  else {
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        if($strike)
          $out .= $s[$x];
        else
          $out .= '<strike>'.$s[$x];
        $strike = true;
      }
      else {
        if($strike)
          $out .= '</strike>';
        $strike = false;
        $out .= $s[$x];
      }
    }
  }

  // Many of the underline sequences are also wrapped in asterisks,
  // which was yet another way of marking ASCII as 'bold'.
  // So if it's an underline sequence, and there are asterisks
  // on both ends, strip the asterisks as we've already emboldened the text.

  $out = preg_replace('/\*(<strong>[^<]*<\/strong>)\*/',"\\1",$out);

  // Finally, remove the backspace characters which we don't need anymore.

  return str_replace("\010","",$out);
}
Reflection CMS update

Mike Macgirvin
  last edited: Wed, 23 Jul 2008 13:44:25 +1000  from Diary and Other Rantings
At this time, I've managed to pull together a working kernel and prototype of the Reflection CMS. It is not yet ready for public release, but I've been pleased with the progress. Here's a bit of a white paper I've been putting together to explain the rationale and provide a high level overview.

                 Reflection Content Management System

Purpose:

Web content management systems and frameworks that exist today are clunky, overly-complicated, and often insecure. While many of the open source projects are developer friendly and openly encourage derivation, there is often a group that jealously protects the 'core' from feature creep. This makes it difficult to realise many web designs; as it is often the core that is insufficient to the task at hand. Being developer friendly does not mean that an application provides a workable development environment. Add-on modules often cannot be trusted - as they often reflect the work of novice software designers who have had to overcome the limitations of the core product.

In an effort to appeal to the most people, data abstraction is taken to new levels of absurdity and inefficiency. This is not limited to content management systems, as it is a software problem in general.

What I have attempted in taking on this gargantuan task of creating yet another content management system is to solve many of these problems, and to create a system that is extensible and encourages development at all levels - including the so-called core. To that end - most every function can be over-ridden without introducing serious versioning and update issues/incompatibilities. Nothing is sacred.  

The more that I mulled this task, the more it became apparent that what I was looking for in a content management framework is no less than an operating system for web pages. This involves user management, security, and the ability to execute arbitrary 'applications'. It also involves a notion of a file system hierarchy which can be represented entirely by URLs.

Many other content systems abstract data types, and this is a good idea; though it often makes for messy designs. At the heart is a generic nucleus of a content - who owns it, what the permissions are, various timestamps, etc. Data fields that are unique to a particular content item are stored elsewhere and joined on demand.

Implementation of this level of abstraction is a challenging problem. Due to design limitations of most database systems, it involves some tradeoffs - primarily in the ability to perform searches on extended data of multiple extensible data types. For a single type, it can be done with one query. However when multiple data types are involved, a second pass needs to be run to return the extended data for each item. For this reason, it is prudent to store as much 'searchable' information as practical within the nucleus.

There is also general agreement over using themes and templates at the presentation end, so that different renderings are possible without hacking code. Here I'd like to take it one step further and modularise the entire presentation layer. As well as a 'theme', once can choose a particular layout or representation of objects, such as a choice between list view and iconic view, and/or XML feed elements. By making this extensible and arbitrary, entirely new renderings can be accomplished without touching the object code or business logic.

Permissions System

Permissions are the core of any multi-user system. This needs to be well defined, and implemented close to the kernel or core and far away from the presentation layer. In a development environment, the developers should mostly be free of managing permissions. I've implemented a permissions concept similar to Unix/Linux - although modified for better adaptability to web applications. It uses the familiar rwx concept, but I've split the 'x' permission into 'x' and 'u'. 'x' is simply a list permission. 'u' is an ability to use or extend an item. For an article, the 'u' bit allows comment rights. For a vocabulary, it allows the ability to tag something using that vocabulary. I've also introduced higher level permissions. There are six levels:  
  • rwxu admin  
  • rwxu moderators  
  • rwxu owner  
  • rwxu group  
  • rwxu members  
  • rwxu other (aka visitors)
Members is for logged in members. Group is a group association to a unique group identifier, moderators are site moderator accounts. Admin privileges are included in the permissions flags for completeness; though it isn't obvious what value this serves and in most cases these will be masked to prevent locking out the system admin from managing the system.

The Directory Object

The directory or folder object is the primary means of implementing complex data structures and representations. It is an object like any other object on the system, but when navigated to, presents a listing of those items which are attached to it as siblings. It implements a general purpose search and list/enumerate operation. It also contains a path/filename to distinguish it in the URL hierarchy and provide file system semantics to database objects. However, the important items that it contains are a umask (permissions mask) which is applied to any child items, and it can also be configured only to hold items of certain types. This is what distinguishes a photo album from a weblog or forum list. One holds photos and the others hold articles. By allowing a directory to hold any type of content, it can be made to resemble a traditional filesystem; and indeed a multi-user website can be implemented which provides member sub-sites that they manage completely.  

The directory also has complete control over the presentation layer, via themes, renderings, and menu selection. This implies that directory is not simply a 'list', but the complete embodiment of the controls, settings, and the look of that list. These can be inherited and passed on to sub-directories. A limitless range of site policy and structure can be implemented by controlling the settings of the appropriate directory entries.

Applications

Applications or executable code lives outside the virtual directory tree. In order to address the need for an extensible application space and recognising the confines of URL management, applications are denoted by the first URL path parameter. For instance http://example.com/edit invokes the object edit/post application. Additional URL path components are passed to the application as arguments an a manner similar to Unix/Linux 'argv/argc' mechanisms. Application URLs take precedence over path URLs, such that creating a directory or document called 'edit' at the root level will be unavailable at that URL if the 'edit' application exists. An external path alias mechanism exists to redirect to another URL in the case of conflict with the application space.

An application framework exists that supplies plugin methods for handling initialisation, form posts, main page content, and menu callbacks. Arguments are parsed and passed in as argv/argc elements, although meta-arguments dealing with pagination (such as 'page=4') are dealt with by the kernel or core to minimise extra argument parsing at the application level. To provide pagination, an application only needs to obtain a count the total number of items and invoke a 'paginate' function.

Licensing

Reflection will be available under the generic Berkeley license. Free for all uses but with no implied warranty.

Platform

Recent/modern flavours of LAMP. Apache/mod_rewrite is required. PHP5.2+ is required for timezone support. Language: English.
Q=`(parlez vous|hablas|sprechen ze) $geek?`

Mike Macgirvin
  from Diary and Other Rantings
It just occurred to me that in the last 4-5 days I've written 'code' in Visual Basic, SmallTalk, lisp, C, bash, awk, PHP, perl, and python. Thousands of lines of code total. And there are probably a few dialects I forgot here. Not to mention 30-40 different flavours of config files, sed, Oracle-SQL, mySQL, and LDAP and some other stuff that don't quite qualify as 'code' but still involve intimately knowing a strange computer dialect. Oh yeah, HTML and JavaScript (of course).
The Seven Year Itch

Mike Macgirvin
  last edited: Fri, 01 Feb 2008 16:07:39 +1100  from Diary and Other Rantings
Sometime later this month "Diary and Other Rantings" (i.e. my weblog) will turn 7 years old, and I'll start my eighth year doing this activity called 'blogging'. Perhaps I'll mark the day, perhaps not. We'll see. Maybe I'll just stop doing it altogether. Maybe not. We'll see.

This all started in early 2001. I was at AOL making lots and lots of money from my Netscape stock options. I had a Netscape employee home page that was visited hundreds of thousands of times a day, but this was slowing. AOL no longer linked to it. I had started running a new server in the spare bedroom in 1998-1999, and later moved it to the garage. It took almost two years to get a working DSL link so that I could actually run a public website off of it. High-speed internet to the home was still an experimental technology. DSL wasn't yet ready for prime time and ISDN had other issues which plagued it. Leased-line required somebody to sell you an end-point on the public net and nobody was doing this, besides being limited to 56k which was now the speed of most modems. Web 'hosting' in those days was mostly for big business and costed big money. I could certainly afford it, but decided to spend my cash on more important things (like buying  a music store a year later).

Running a Linux box with an internet link isn't very expensive in the overall scheme of things. So once the DSL was finally working I made a new home page and started improving it.

I think it was Cindy at 'Off the Beaten Path' (now at 'dustingmybrain.com' ) who first introduced me to the concept of a rambling page. Instead of replacing your 'Current Interests' web page every week, you just keep adding to it. Drop in a date. Write what's happening. I started doing this. I was writing HTML in emacs. I called it an online diary. I didn't have titles, categories, RSS feeds, etc. These would come much later. I wasn't writing 'articles', I was just rambling. Why do you need a title for it? That makes it look so structured. The only important thing is the date, so somebody knows when it was that you thought this way. This was important. After several years of living on Netscape time, I firmly believed that one didn't think the same way for very long, and technology was always changing - so information had to have a date.

The other thing that I did was to take a cue from some of the large online news sites, which were the best model available for presenting information that had timestamps. I started writing in reverse chronological order (recent first). This was born of necessity, since nobody wanted to load a large page and scroll to the end to find recent stuff; which was how we did things previously (logfile format).  

In fact I maintained this format for a few years until it became unmanageable. Then I looked for ways of automating my monthly (or whenever) process of moving the current entries to an archive page and starting fresh. So after looking to see what programs were available and trying a few of them, I instead wrote a program to do it myself. Over time that evolved from a simple diary 'archiver' to the thing that you see today - a mega social portal that does everything but make coffee. (I miss this incidentally, I had my computer turn on the coffee pot from an online request in the early 1980s using my first homebrew social portal).  

I still wonder whether anybody reads these pages. Does anybody care? I don't subscribe to the current notions of SEO and affiliate marketing and trackware and all the other ways to improve one's blog ranking. Most notable these days are the pages and pages of 'widgets' attached to every blog, selling everything from online communities to soap. Why bother? Your only visitors will be other bloggers that are all trying to get you to visit their own blog. They aren't really reading what you have to say, they're too busy 'selling' their own wares. Still even after the RSS fiasco a few months back, I manage to pull in a few thousand humans a week. They come and read a page and leave again. This is the state of the modern internet.

It may be of some interest that I've managed to serve up a few hundred million pages since this all started - mostly to crawlers and robots; however last year activity peaked with about 100,000 daily hits (30,000 human visitors) and we've had six or seven days with over a million hits. I've written close to 1500 articles and there have been about 6600 total articles at one time or another from various feeds - before I was forced to nuke them for legal reasons. Only about 250 comments total, which I attribute to my decision a couple years back to do away with the daily spam cleanup and only allow website members to post comments. [I've since revised this policy.]

The 'community portal' (which I started writing a couple of years ago) doesn't have much community and I don't know if that will ever change. Community folks like big parties and unless you have one, you're late to the party. Bloggers only like communities where they can sell their blog.  I don't know how to convince them that a long-running website with several thousand non-blogging human visitors a week is actually a good place to drop a link. Yeah, I could put you on my blogroll, but I read thousands of blogs. It would quickly grow to be unmanageable and you'd be lost in the noise.

But you can add your own link and profile page and whatever - you don't need me to do it. Hint, hint.  

Anyway - we'll see if this lasts or whether I just decide that there are better things to do. Write into space everyday and maybe a couple of people will read it. Maybe not.

That's what it's all about.

Don't ask yourself if it is actually relevant or important or whether anybody cares. You might not like to hear the true answer. It's one blog amongst hundreds of millions, all trying to be visited. All thinking they should be relevant to somebody. It's like asking if one star in the entire universe is relevant. Maybe one is relevant to somebody. But the big question looms, is it yours? Unless it's the sun and brings life to this planet, it's likely just another star in the vastness of space.  

In fact, nobody really cares whether you blog or not when all is said and done. Well maybe one or two folks. In my case those are the same one or two folks that cared back in 2001. Everybody else is just passing through on their way to somewhere else.

Still every day (sometimes two) I go to my website and ramble about what's on my mind. I tweak the software to make it better. Even knowing that it is all an exercise in futility. Strange.
Stupid Tricks

Mike Macgirvin
  from Diary and Other Rantings
If you're a techie and are ever really bored, here's something you can do for amusement...

Assume you're running on some flavour of Windows. First create a virtual machine running Linux.

Now in your virtual machine, let's fire up a CP/M emulator. A machine running in a machine running in a machine.

Now wasn't that fun?

Now let's take it one step further...

In your virtual machine, run a windows emulator. Now run some cygwin tools.  Maybe ssh (from within a Bourne shell) out to a Mac and do something there.

So now you've got Mac stuff running under Linuxy stuff running under Windows running under Linux running under Windows.

Like I said, you need to be really bored to do stuff like this. And don't even think about trying to explain to anybody why it's even amusing.
uneventful

Mike Macgirvin
  last edited: Tue, 27 Nov 2007 22:58:05 +1100  from Diary and Other Rantings
Nothing spectacular today. Well, ok - there was that head-on collision at the top of MacQuarie Pass this morning. Guess that counts for something. Glad it wasn't me.

Dug into the mysteries of 'udev', or why your Linux box can't seem to use 'eth0' ever again with another hardware address, insisting on eth1, eth2, etc... Well in fact it can re-use eth0, but you've got to find the file (/etc/udev/rules.d/z25_persistent-net.rules) where they store the MAC addresses of every adapter that ever gets plugged in and fix it. It's not exactly something that comes up on a web search for re-configuring net adapters under Linux. You kinda' haveta' figure out that 'udev' is the culprit and track it down from there.

Worked a bit more on this website. Lots of under the covers changes that will make a lot of people happy. But I'll let them figure it out.

Now I'm settling down to a homebrew. It's done. It's not the best I've ever cooked up, but considering what I've got to work with - it's absolutely the best beer (the first decent one) I've had in this country. It'll do (quite nicely). I've got 5 cases to go through before I have to make some more - and I can only drink a couple bottles at a time before the walls start spinning. So this should last a month or two.

It cost me about $15-20 a case all told. Over time I should be able to make it better and drop the cost to about $5/case. If you consider that anything better than toilet-water beer costs $50-$60 a case - and lasts about a week, it's a pretty fair savings overall. Should be able to trim my monthly beer budget at least in half, and not have to drink the disgusting swill that sells as 'premium' beer here.
OS madness, chapter #7936

Mike Macgirvin
  last edited: Thu, 08 Nov 2007 14:54:42 +1100  from Diary and Other Rantings
Still struggling with device drivers on Windows Vista. The sound card drivers have an update, but I'm skeptical. Several folks reported BSOD when they installed it.

And I've lost any good feelings I had for Debian. Recently I moved my old RedHat installation to a newer PC - one that was only 8 years old rather than 10. All went extremely smooth. On bootup, it found the new motherboard, new network card, mouse, monitor, etc. - and configured all of them.  Everything worked fine.

Then I upgraded to Debian. The RedHat was a couple of years old, and I didn't want to mess with building PHP, MySQL, and Apache upgrades as well. Just boot up a newer Linux. Debian is currently one of the more popular Linux flavors - and I especially like the APT package management utility. Need PHP? 'apt-get install php'. You don't need to build and configure it and mess with library dependencies. These are all taken care of. If it needs new libraries, these are installed as well as any libraries that they depend on.  

Anyway, now (a couple of weeks later) I put in another newer PC - this time only 4 years old. I was expecting everything to go smoothly like it did last time. But it didn't. Debian doesn't have very good hardware (re-)detection, and they also don't load any other drivers than what is absolutely necessary. So I'm faced with an incomplete operating system that doesn't recognize the monitor or ethernet card. And I can't load in the modules for these devices over the net, because it doesn't recognize the network card. It's a Catch-22.

The only solution now is a re-install. Spend a few weeks getting everything configured and then start over. Right. I've been here before. Way too often...

But if you're one of those folks considering moving away from RedHat/Fedora, beware. It's nice to be able to plop your disks into another box if the one you've got goes bad - and keep running. Debian won't do this.
Daylight Savings

Mike Macgirvin
  last edited: Sat, 27 Oct 2007 14:20:00 +1000  from Diary and Other Rantings
Tomorrow is Daylight Savings. Remember 'Spring forward, fall back'? That's right - tomorrow we move it forward, no matter how odd that may seem. It's October, but it's spring.

At least the Australian government hasn't been mucking with and tweaking DST as it did before the 2000 Olympics. The software engineers need time to code in the changes - I think that a lot of the world now has Sydney time right.

Well that would be anybody using the Olsen timezone databases. I know personally about thirty web services which just give you a choice of 'GMT+10' - and these are all going to be wrong tomorrow. On the bright side, I really don't care if they get it wrong. I'm not using any of them for anything globally time sensitive. It always makes my head hurt trying to figure out how many hours I'm going to be away from GMT with all the conversions and tweaks in effect. I suppose it'll probably be GMT+11. One hour forward. But wait, we'll then be one hour closer to Greenwhich, England as the earth spins. Not further from it. So maybe it's GMT+9. Silicon Valley will be... Uh, I give up. It's in negative GMT and the time is going back. So is it forward or backward? I'll have to figure it out on paper to work out the difference between LA (where this server is) and Sydney (close enough to where I am).  

But this will also give a good test of my own daylight savings and timezone functions (which use Olsen tables). The U.S. is going one hour back and we're going one hour forward. I might be poring over the code tomorrow if something gets askew.
Mike Macgirvin
  
In fact the time changed here - I was just a bit premature on when it changes there. They used to try and change the whole world on the same day, but you're right. Last year's energy act messed up that part of it.

No worries. Everything seems to be working. It just means I'll have to go through all of this again when you folks change over. I won't bother calculating the delta right now, since it's in a temporary state. It's nice to know the delta before I make a phone call overseas. Nothing worse than 'Hello? Who is this? It's 3 in the morning!'
peonyden
 
Hi Mike Thanks for posting about Daylight Saving Changes. We have now changed, but on the Sunday morning, so you might have been early to all events on Saturday, if you changed on 27 October. But that's better than being completely out. It is one thing to be unaware. You end up 1 hour late on the first day of the change. I went the wrong way, once, (the "autumn" change) and put my clocks forward, at the end of the season. I was two hours early for everything, and thought I had missed a series of important events. but they had not yet happened. I was in a panic - unnecessarily. The little adage "Spring Forward, Fall Back" which you quoted is now permanently engraved in my brain as a result of the scare I had on that occasion. Denis
peonyden
 
Mike

Now that I am properly awake, I see that your comment: "that the time had changed here" was written on Sunday morning - sorry if I implied you were a day ahead.

That's the other problem with Daylight Saving changeover. Body awake, but brain not yet awake.

Denis
Disable Linux screen blanking

Mike Macgirvin
  from Diary and Other Rantings
I'm a bit tired of searching the web for this every time I install a new Linux box; which is every few weeks these days - but apparently not often enough to get it stuck in my brain.  

To disable the annoying 10 minute Linux automatic screen blanking, use:

/usr/sbin/setterm -blank 0

Stick it in /etc/rc.local on Debian, or in /etc/init.d/rc.local if that doesn't exist.
 linux
Mike Macgirvin
  
Actually in Debian 4.x, all of this stuff has been moved and generalized. Instead of rc.local, you should be able to edit the settings directly in /etc/console-tools/config - which should be read/processed by /etc/init.d/console-screen.sh during startup.
goodbye /dev/fd0

Mike Macgirvin
  last edited: Sat, 13 Oct 2007 08:50:39 +1000  from Diary and Other Rantings
I've got both a 5.25 and 3.5 floppy drive on my Linux box at home. I bought the 5.25 drive in the mid-80s as part of my first 'clone kit' (cheap IBM-style PC built completely from components made in Asia). Today I finally unplugged it, even though it still works just fine. I can't even recall the last time I actually used it. I believe I made a Linux emergency boot disk back in 2002, but that was on the 3.5 drive.

I tossed all my old 5.25 disks prior to moving overseas. After much more than ten years in non-controlled temp/humidity environments, there wasn't any 'critical data' left to speak of. Ditto for the 3.5's.

So I've got no disks left to read, and 720kbytes is pathetic and slow storage in 2007.  If you need a copy of something, just flash it to a USB stick.

It certainly wasn't my first floppy drive (that distinction would go to an old 8 inch CP/M drive which I never used because it was obsolete before I ever finished writing the assembly language driver code); - but it's the longest surviving piece of computer hardware I've got at the moment.  Seems a shame to let it go, however what's the point of keeping it running?

This data too shall pass.
Another new chunk of code

Mike Macgirvin
  from Diary and Other Rantings
I've been thinking about how to do this for ages. Finally did it. You'll find some new stuff in the menus called 'Sharing'. No, it isn't about that half of a ham sandwich you're having for lunch.

Over the past few months you've gained the ability to subscribe to various information sources and recently to configure what features you desire from the website. I mentioned this ability months ago in my grand vision - kinda' like myspace on acid. Well I finally finished the thing which takes that vision and turns it into reality. When you create these private views of the website, it's like having your own private Content Management System (CMS). In fact that's the way I like to describe it to people. There are lots of folks peddling multi-user community CMS's. Yada, yada. But nobody that I'm aware of is selling or even working on a 'personal CMS as part of a multi-user community'.

So what's this 'Sharing' all about? It's simple. You spend the effort to configure your personal content system whatever way you want. Then you can turn around and let other people use it.  

See with MySpace, you get a page that's all yours to mess with. Some of the other community sites give you a few more things. But this let's you build your own complete community site from within a broader community platform. You can make this look like an auto racing site and when people visit your shared page, and then any other page on the site, that's what they'll see. You're in total control until they go elsewhere or turn it off. You control the menus, you control the skin, and you control all of the information sources for your space. They can be your info sources, they can be my info sources, or pull in whatever feeds you want from anywhere. Want your audience to have mail? Chat? And discuss the Confederate War? Two minutes. Photo dating website? Two minutes.
Nasty little bug in mod_rewrite

Mike Macgirvin
  last edited: Sun, 08 Oct 2006 06:03:47 +1000  from Diary and Other Rantings
Actually I think the bug is in Apache, but whatever.

Let's say you've got a site like this that uses clean URL's.

http://someplace.somewhere/something/more/stuff

Now let's say you wanted  to insert a category name in the middle of this URL, and the category name contains a slash. Let's say 'more/less' instead of 'more'. But you can't change the number of slashes, because in the example 'stuff' isn't a category, it's something else.

So what you would normally do in PHP (and many other languages) is urlencode() the name. This gives you a category that looks like 'more%2Fless'. Now you can just urldecode() it and turn it back into a legal category name without messing up the URL.

But the problem is that if you use mod_rewrite to support clean URL's, it currently decodes the URL in the process of doing its work - before you ever see it. So there's no way of knowing if a category has a slash or not. Ditto for hash and several other characters. It turns out the bug is actually in Apache, which is decoding the URL before it hands it off to mod_rewrite, but that part doesn't matter. If you don't use mod_rewrite you won't see the bug. Some of us have to use it though.

This violates the primary law of encoding information - you must have one and only one decoder for every encoder. Apache/mod_rewrite is decoding something it has no right to touch.  

Fortunately, there's a way out of this mess, but it's very non-standard. You have to further encode the URL so that it can't get automatically decoded by the middle software layers. You could just double encode it which works today, but then if they ever fix the bug, you'll end up with a bad decoding. I'm currently turning %2F into ^2F, since ^ is one of the few characters which isn't normally used in a URL. This then gets encoded by the browser to be %5E2F. It doesn't matter if the %5E part is turned back into a ^, or even decoded more than once (the second and subsequent decodes will essentially be a no-op). All that matters is that the slash remains encoded all the way through this hostile communications channel.

What an ugly hack.

I'd go into apache and fix it, but that won't help me in the short term. I'd have to wait for the patch to get rolled into a future release, and then wait for my service providor to pick up the later release. ...That could be a year or more unless  some urgent security bug pops up. So I guess I'd better get used to living with this hack.
The Crossroads

Mike Macgirvin
  last edited: Mon, 08 May 2006 14:31:47 +1000  from Diary and Other Rantings
Standing at the crossroads. The website software has evolved a bit more. Photo albums are necessary so that I can pull in what remains of macgirvin.com - which does little more these days than house my photo collection and collect my email archives.

Soon I'll be shutting down the aging Linux box in my garage and moving everything that's left onto hosted websites such as the one you are getting this post from.  

The photo archives use group permissions. This way if a new family member of friend registers I only have to add them to one list to let them see the entire private photo collection. It is also a departure from the simple permissions I originally baked into the site - registered or not, admin or not.

It provides full control of who can access what. The basic permissions structure was fine for getting something working quickly. Now it's time to move on. You want something only accessible to family? To males over 35 that are into scuba diving? You need a little better permission control.  

This in turn opens the possibility of group permissions for forums, weblogs, or just about anything else. That's what I mean about standing at the crossroads.

This is where everything changes.
Why?

Mike Macgirvin
  last edited: Tue, 02 May 2006 01:37:11 +1000  from Diary and Other Rantings
I've been asked why I do this. Why do I spend so much time on my computer? What do you have to show for it? What purpose does it serve? If you aren't making money, why are you doing it?

I suppose to a casual observer it looks as though my efforts have been about as productive as if I spent my days playing Duke Nukum or hack all day. Why bother, indeed? What have I done?

I built a community web portal. If you think it's so easy, try it yourself. There are a lot of smart people on this planet, and a lot of community web portals - and most of them are free. So what is it that drives me? First of all, I don't know of anybody that has tried to single-handedly write a community portal from scratch. This is almost always done by organizations (both formal and informal). It is a huge undertaking full of thousands of challenges. I'm not a worker bee solving one little piece of the puzzle. Instead it encompasses a much bigger picture.  

That's why.  

This is my education. You can't buy this education. You can only live it. Why would somebody choose mysqli over the mysql interface? Why not use HTTP auth for a website? Isn't it easier? Can you make drop shadow images in a cross-platform manner? How about if you change the page layout? How do you upload entire directories to a photo album? Is it even possible? How do you make clean URLs? Plugins? Virtual hosts? Themes and avatar pages? How about ajax chat? How hard is it really? What else is AJAX useful for? Should you use a procedural or object oriented approach for a large website project? Why? How do you manage sessions? What problems are you likely to have if you build a PHP5 application and try and run it on PHP4? Why on earth would somebody use PHP if Ruby is so much faster to develop? Why on earth would somebody use PHP when the execution time of 'C' and 'C++' are so much faster? Which costs more to maintain, MVC programming or mixed-script? Should forums be flat or threaded? Why? What is the most efficient schema in all of these cases? Do you use existing code or write it yourself? Which is really the quickest to market?  

Anybody can download a web portal and create a website. I can answer all of these questions and hundreds more from first hand experience. I've done it. I know why.  

And that's why...