Mike Macgirvin
Diary and Other Rantings
   
Saturday, Jan 10 2009, 12:14 am
Nov 26, 2008
fortune_to_html in PHP

One of the problems with using the Unix/Linux fortune (aka fortune-mod) command in web pages is making it readable in HTML. One can provide something that mostly works by substituting any HTML entities (&,<,>, and double quote) and converting linefeed to <br />.

However you're still going to get a lot of fortunes with unprintable characters where the original intent was lost - as many of these used 'backspace hacks' to provide character underlines, accent marks, and on really old fortune databases, using backspace to strike out text and replace it with something more amusing.

Here is a function that should make 99.999% of the fortunes you may encounter that use weird ASCII tricks display in web pages mostly as originally intended.

 

<?php

function fortune_to_html($s) {

  // First pass - escape all the HTML entities, and while we're at it
  // get rid of any MS-DOS end-of-line characters and expand tabs to
  // 8 non-breaking spaces, and translate linefeeds to <br />.
  // We also get rid of ^G which used to sound the terminal beep or bell
  // on ASCII terminals and were humorous in some fortunes.
  // We could map these to autoplay a short sound file but browser support
  // is still sketchy and then there's the issue of where to locate the
  // URL, and a lot of people find autoplay sounds downright annoying.
  // So for now, just remove them.

  $s = str_replace(
    array("&",
          "<",
          ">",
          '"',
          "\007",
          "\t",
          "\r",
          "\n"),

    array("&amp;",
          "&lt;",
          "&gt;",
          "&quot;",
          "",
          "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;",
          "",
          "<br />"),
    $s);

  // Replace pseudo diacritics
  // These were used to produce accented characters. For instance an accented
  // e would have been encoded by '^He - the backspace moving the cursor
  // backward so both the single quote and the e would appear in the same
  // character position. Umlauts were quite clever - they used a double quote
  // as the accent mark over a normal character.


  $s = preg_replace("/'\010([a-zA-Z])/","&\\1acute;",$s);
  $s = preg_replace("/\&quot;\010([a-zA-Z])/","&\\1uml;",$s);
  $s = preg_replace("/\`\010([a-zA-Z])/","&\\1grave;",$s);
  $s = preg_replace("/\^\010([a-zA-Z])/","&\\1circ;",$s);
  $s = preg_replace("/\~\010([a-zA-Z])/","&\\1tilde;",$s);

  // Ignore multiple underlines for the same character. These were
  // most useful when sent to a line printer back in the day as it
  // would type over the same character a number of times making it
  // much darker (e.g. bold). I think there are only one or two
  // instances of this in the current (2008) fortune cookie database.

  $s = preg_replace("/(_\010)+/","_\010",$s);

  // Map the characters which sit underneath a backspace.
  // If you can come up with a regex to do all of the following
  // madness  - be my guest.
  // It's not as simple as you think. We need to take something
  // that has been backspaced over an arbitrary number of times
  // and wrap a forward looking matching number of characters in
  // HTML, whilst deciding if it's intended as an underline or
  // strikeout sequence.


  // Essentially we produce a string of '1' and '0' characters
  // the same length as the source text.
  // Any position which is marked '1' has been backspaced over.

  $cursor = 0;
  $dst = $s;
  $bs_found = false;
  for($x = 0; $x < strlen($s); $x ++) {
    if($s[$x] == "\010" && $cursor) {
      $bs_found = true;
      $cursor --;
      $dst[$cursor] = '1';
      $dst[$x] = '0';
      $continue;
    }
    else {
      if($bs_found) {
        $bs_found = false;
        $cursor = $x;
      }
      $dst[$cursor] = '0';
      $cursor ++;
    }

  }

  $out = '';
  $strike = false;
  $bold = false;

  // Underline sequence, convert to bold to avoid confusion with links.
  // These were generally used for emphasis so it's a reasonable choice.
  // Please note that this logic will fail if there is an underline sequence
  // and also a strikeout sequence in the same fortune.

  if(strstr($s,"_\010")) {
    $len = 0;
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        $len ++;
        $bold = true;
      }
      else {
        if($bold) {
          $out .= '<strong>';
          while($s[$x] == "\010")
             $x ++;
          $out .= substr($s,$x,$len);
          $out .= '</strong>';
          $x = $x + $len - 1;
          $len = 0;
          $bold = false;
        }
        else
          $out .= $s[$x];
      }
    }
  }

  // These aren't seen very often these days - simulation of
  // backspace/replace. You could occasionally see the original text
  // on slower terminals before it got replaced. Once modems reached
  // 4800/9600 baud in the late 70's and early 80's the effect was
  // mostly lost - but if you find a really old fortune file you might
  // encounter a few of these.

  else {
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        if($strike)
          $out .= $s[$x];
        else
          $out .= '<strike>'.$s[$x];
        $strike = true;
      }
      else {
        if($strike)
          $out .= '</strike>';
        $strike = false;
        $out .= $s[$x];
      }
    }
  }

  // Many of the underline sequences are also wrapped in asterisks,
  // which was yet another way of marking ASCII as 'bold'.
  // So if it's an underline sequence, and there are asterisks
  // on both ends, strip the asterisks as we've already emboldened the text.

  $out = preg_replace('/\*(<strong>[^<]*<\/strong>)\*/',"\\1",$out);

  // Finally, remove the backspace characters which we don't need anymore.

  return str_replace("\010","",$out);
}
 

 

 

 

 

 

Comments? | More Actions Open/Close menu
Nov 18, 2008
Available domains

Some interesting available domains for today, courtesy of NameThingy


UseArea.com
AbstractDocument.com
UseAnt.com
NiceEffect.com
RealCriminal.com
OnePiano.com
ReservedMan.com
UseLamp.com
ExoticOrange.com
WideModel.com
LessVirus.com
RapLady.com
LonelyWeek.com
WeakPresident.com
TopShadow.com
BestRockers.com
KriZit.com
BodyClaim.com
OldCircle.com
StuckCan.com
RegularBurger.com
YoungHam.com
RadioactiveHeat.com
DoctorIssue.com
PredatorAnimal.com
WarSunday.com
FriendlyTuna.com
OneMaiden.com
FunnyDrug.com
RoundChin.com
BetaApple.com
BaySummer.com
LowSquare.com

Comments:

December 3, 2008 21:11
[*TOP MEMBER*] Les
Nice list - thanks for your drupal advice on the forums. Are you using 6.x and if so what do you think of it - is it worth upgrading too?

 

cheers les

Comments? | More Actions Open/Close menu
Nov 13, 2008
What *not* to name your kids

While doing some data analysis on the namethingy, I came across some interesting findings.

The boys and girls names therein were taken mostly from recent US census data (and adapted, modified, and otherwise mangled for my own use).

What I found interesting was that once a particular name has gotten some bad press, it can poison that name for centuries from being used again. Just think, when was the last time you met somebody named:

Cain

Goliath

Judas

Hansel

Gretel

Benedict

Napolean

Adolf

 

 

Comments? | More Actions Open/Close menu
Nov 02, 2008
Next Tuesday

The first Tuesday in November. Everybody remembers what's important about that, right?

Right. Melbourne Cup Day. The entire nation comes to a screeching halt for a five minute horse race.

Oh yeah, there's that little presidential election in America; which is also held on the first Tuesday in November - except that's actually on Wednesday (Sydney time). 

Comments? | More Actions Open/Close menu
"More software projects have gone awry for lack of calendar time than for all
other causes combined."
-- Fred Brooks, Jr., _The Mythical Man Month_