cover photo

Mike Macgirvin

mike@macgirvin.com

Towards a distributed online social network

Mike Macgirvin
  from Diary and Other Rantings
Towards a distributed online social network.

Recent events have caused a great deal of consternation in the future of Facebook as a social portal. We've seen this before in other social networks and experiments over the years. Many folks have suggested (and there are some current development efforts) that the future of online social networking is distributed – where nobody has control over your private data but you. This is a top level examination of how such a network might be implemented and suggests some practical implications. This document will evolve. Contact me directly for updates and/or practical discussion.

On the issue of friendship

In the online social landscape of recent years, “friendship” has become a mutual and binary relationship between two people. You are either friends or you're not. There are no shades of grey for acquaintances, co-workers, etc. Likewise a “relationship” can only exist if both people mutually agree to the existence of the friendship.

This is completely alien to friendship as it exists in the real world. Often, feelings of kinship we might feel for somebody are not reciprocated, or are reciprocated at a different level or degree than our own interest.

So let us start with a clarification of relationships in this new world of online social networking. They can be one-sided. They can also have many levels of closeness and/or separation. Interaction may be complete and involve coordination and synchronicity or it may be intermittent and casual.

In software terms, we must dispose of the outdated notion of “request friendship, and if it is accepted, friendship commences”. That is simply not how the world works. The most convenient way to express this in terms of the social web is an “expression of interest”.

On the issue of privacy

Facebook in particular has been the focus of a good deal of scrutiny because of differences in the perception of privacy. Humans express themselves differently depending on who they are interacting with. Some of these communications may be open and public (as in the instance of a professional musician communicating with his/her fan base). Some of them can be quite private (as in the instance of two potential or de facto lovers). In between are a range of needs, such as the need to keep our private life away from close scrutiny by co-workers and potential employers.

If we are to communicate with this range of needs online, we need fine granularity in who can see different pieces of information or communications. Facebook has tried and in fact has advanced the state of the art in this concept of privileged communication, but it comes at odds with their business model - which requires most communications to be open so that the service has more value to advertisers.

If we are to propose a distributed social network model, it must provide at least this level of granularity.

The distributed social model

Many aspects of the online social network are made more difficult in a distributed model than they are in a monolithic environment. Conversations with several participants are not easy to present in real time. Instead of the conversations existing in a single database, fragments may need to be pulled in from disparate locations.

Also, without the supervision of a central authority with the power to revoke membership, abuse can occur in the system by either commercial forces or vandals. Let's take the issue of friend requests. Commercial forces will seek to make lots of friends, because it is the nature of advertising to spread your brand name.

We discovered in the “blogging era” of 2002-2008 that any comment box which was open to the public quickly became a spam target. A distributed social service will not succeed if it requires manual maintenance of thousands of undesirable requests for communication each day.

In order to prevent or reduce abuse, we need some form of verification of identity. A username/password scheme will not succeed in this environment as people will tend to use the same password in all cases and this can lead to forgeries and break-ins.

Symmetric or public key cryptography has to be central to a distributed social network. Here there is a key pair – one public and one private. Either key can decrypt communications encrypted with the other. One key is kept under tight control (the “private key”). The “public key” can be shared with anybody.

We can also assume that given an open and distributed implementation, software will be created with a range of features to fulfil individual needs. We do not need to define exactly how these work or what they do. All we need to do is specify how they will interact. We also cannot force anything on the individual “cell” of this social network. For instance, we cannot specify that if we comment on an author's communication, that our comment will be kept private from certain other people. We may not even be able to delete it once it has left our hands. All we can do is “request deletion” and hope for the best. It's a good thing for implementers and friends to honour these requests because otherwise they will fall into disfavour and not be trusted with future communications.

Also, how do we know that “Barbara Jensen” is actually the Barbara Jensen we have communicated with before? We need a shared secret and/or strong authentication. Who do we trust? In the early days of such a service, we might be willing to trust somebody is who they say they are – especially if this person becomes a member of our social network through email or personal communication. As time goes on and our network grows, the abuse potential will also grow. At that point we will need stronger verification. This might involve “friend of a friend” trust – or third party trust through a “respected” source, such as a major network provider who has already established a relationship with this person. It would be easy to say that we should use known certificate authorities to provide this information - but this will involve monetary requirements and complicated procedures for proving one's identity. These complications could prevent a network from ever growing to a level of “critical mass”. So we have to be lenient with what credentials we accept, while preferring more proof the further from our core group of social contacts one is.

XSS prevention is also a built-in feature of monolithic social network providers. This radically affects the ability to use HTML in our communications. Plain text may not be seen as a rich form of communication. Perhaps a rich text language should be provided from the outset. There is also the potential for third-party providers to provide filtering services.

Ditto for porn filtering.  

It is also probable that third-party aggregators will provide global searching and lookup to locate profiles of people that interest us.

Mandatory items:


  • Public profile page. The public profile page should include the URL's of any other communication endpoints we require, embedded into the page, be it either HTML or XML. This is the global access page to this particular profile. It may contain recent updates of items with global visibility. It also contains either inline or a pointer to the public encryption key for this person.

  • Expression of interest URL, discoverable on the public profile page. This is the friendship request application. This is activated by electronically “dropping a card” containing the requestor's public profile page. There need not be any communication that the expression of interest was acted on. A “ticket” is granted regardless. This ticket is to be used in further communications with this person.


  • Private profile page. This page is where to find the “feed” for this person. The feed itself is contained in an XML exchange of credentials (an encrypted copy of your ticket, along with your public key), which results in a personalised feed of information being returned. There is no requirement that the person recognise you as a friend. You might be returned a generic public page.


  • A “post page” where you may comment on published items.

  • A notification URL which is where notifications of recent activity will be sent.

  • A “friend” page. This is quite different from the friend pages you may be used to as it includes profile pictures/names/URLs for people who you follow – and are willing to share with the public. There is no requirement that they follow you. If you provide an XML credential exchange, you may be shown a different list depending on what the person wishes to share.
The only way to determine if a friendship is mutual is if both people actively follow each other's activities.
fortune_to_html in PHP

Mike Macgirvin
  last edited: Wed, 26 Nov 2008 22:04:48 +1100  from Diary and Other Rantings
One of the problems with using the Unix/Linux fortune (aka fortune-mod) command in web pages is making it readable in HTML. One can provide something that mostly works by substituting any HTML entities (&,<,>, and double quote) and converting linefeed to <br />.

However you're still going to get a lot of fortunes with unprintable characters where the original intent was lost - as many of these used 'backspace hacks' to provide character underlines, accent marks, and on really old fortune databases, using backspace to strike out text and replace it with something more amusing.

Here is a function that should make 99.999% of the fortunes you may encounter that use weird ASCII tricks display in web pages mostly as originally intended.

<?php

function fortune_to_html($s) {

  // First pass - escape all the HTML entities, and while we're at it
  // get rid of any MS-DOS end-of-line characters and expand tabs to
  // 8 non-breaking spaces, and translate linefeeds to <br />.
  // We also get rid of ^G which used to sound the terminal beep or bell
  // on ASCII terminals and were humorous in some fortunes.
  // We could map these to autoplay a short sound file but browser support
  // is still sketchy and then there's the issue of where to locate the
  // URL, and a lot of people find autoplay sounds downright annoying.
  // So for now, just remove them.

  $s = str_replace(
    array("&",
          "<",
          ">",
          '"',
          "\007",
          "\t",
          "\r",
          "\n"),

    array("&",
          "<",
          ">",
          """,
          "",
          "        ",
          "",
          "<br />"),
    $s);

  // Replace pseudo diacritics
  // These were used to produce accented characters. For instance an accented
  // e would have been encoded by '^He - the backspace moving the cursor
  // backward so both the single quote and the e would appear in the same
  // character position. Umlauts were quite clever - they used a double quote
  // as the accent mark over a normal character.

  $s = preg_replace("/'\010([a-zA-Z])/","&\\1acute;",$s);
  $s = preg_replace("/\"\010([a-zA-Z])/","&\\1uml;",$s);
  $s = preg_replace("/\`\010([a-zA-Z])/","&\\1grave;",$s);
  $s = preg_replace("/\^\010([a-zA-Z])/","&\\1circ;",$s);
  $s = preg_replace("/\~\010([a-zA-Z])/","&\\1tilde;",$s);

  // Ignore multiple underlines for the same character. These were
  // most useful when sent to a line printer back in the day as it
  // would type over the same character a number of times making it
  // much darker (e.g. bold). I think there are only one or two
  // instances of this in the current (2008) fortune cookie database.

  $s = preg_replace("/(_\010)+/","_\010",$s);

  // Map the characters which sit underneath a backspace.
  // If you can come up with a regex to do all of the following
  // madness  - be my guest.
  // It's not as simple as you think. We need to take something
  // that has been backspaced over an arbitrary number of times
  // and wrap a forward looking matching number of characters in
  // HTML, whilst deciding if it's intended as an underline or
  // strikeout sequence.

  // Essentially we produce a string of '1' and '0' characters
  // the same length as the source text.
  // Any position which is marked '1' has been backspaced over.

  $cursor = 0;
  $dst = $s;
  $bs_found = false;
  for($x = 0; $x < strlen($s); $x ++) {
    if($s[$x] == "\010" && $cursor) {
      $bs_found = true;
      $cursor --;
      $dst[$cursor] = '1';
      $dst[$x] = '0';
      $continue;
    }
    else {
      if($bs_found) {
        $bs_found = false;
        $cursor = $x;
      }
      $dst[$cursor] = '0';
      $cursor ++;
    }

  }

  $out = '';
  $strike = false;
  $bold = false;

  // Underline sequence, convert to bold to avoid confusion with links.
  // These were generally used for emphasis so it's a reasonable choice.
  // Please note that this logic will fail if there is an underline sequence
  // and also a strikeout sequence in the same fortune.

  if(strstr($s,"_\010")) {
    $len = 0;
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        $len ++;
        $bold = true;
      }
      else {
        if($bold) {
          $out .= '<strong>';
          while($s[$x] == "\010")
             $x ++;
          $out .= substr($s,$x,$len);
          $out .= '</strong>';
          $x = $x + $len - 1;
          $len = 0;
          $bold = false;
        }
        else
          $out .= $s[$x];
      }
    }
  }

  // These aren't seen very often these days - simulation of
  // backspace/replace. You could occasionally see the original text
  // on slower terminals before it got replaced. Once modems reached
  // 4800/9600 baud in the late 70's and early 80's the effect was
  // mostly lost - but if you find a really old fortune file you might
  // encounter a few of these.

  else {
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        if($strike)
          $out .= $s[$x];
        else
          $out .= '<strike>'.$s[$x];
        $strike = true;
      }
      else {
        if($strike)
          $out .= '</strike>';
        $strike = false;
        $out .= $s[$x];
      }
    }
  }

  // Many of the underline sequences are also wrapped in asterisks,
  // which was yet another way of marking ASCII as 'bold'.
  // So if it's an underline sequence, and there are asterisks
  // on both ends, strip the asterisks as we've already emboldened the text.

  $out = preg_replace('/\*(<strong>[^<]*<\/strong>)\*/',"\\1",$out);

  // Finally, remove the backspace characters which we don't need anymore.

  return str_replace("\010","",$out);
}
Available domains

Mike Macgirvin
  from Diary and Other Rantings
Some interesting available domains for today, courtesy of NameThingy

UseArea.com
AbstractDocument.com
UseAnt.com
NiceEffect.com
RealCriminal.com
OnePiano.com
ReservedMan.com
UseLamp.com
ExoticOrange.com
WideModel.com
LessVirus.com
RapLady.com
LonelyWeek.com
WeakPresident.com
TopShadow.com
BestRockers.com
KriZit.com
BodyClaim.com
OldCircle.com
StuckCan.com
RegularBurger.com
YoungHam.com
RadioactiveHeat.com
DoctorIssue.com
PredatorAnimal.com
WarSunday.com
FriendlyTuna.com
OneMaiden.com
FunnyDrug.com
RoundChin.com
BetaApple.com
BaySummer.com
LowSquare.com
NameThingy

Mike Macgirvin
  from Diary and Other Rantings
Since I stopped actively updating this site several months back, it appears the bulk of the incoming traffic has been visiting my various random name generators.

I decided to clean these up a bit and spin them off onto a dedicated site. You can visit it at NameThingy.com. It was quite a fun exercise, as I've managed to reduce the random name generation and all the potential options to a single HTML page that is dynamically refreshed using Ajax. Check it out.
Stoppping XSS forever - and better web authentication

Mike Macgirvin
  last edited: Thu, 21 Aug 2008 11:46:11 +1000  from Diary and Other Rantings
I've been working on all kinds of different ways to completely stop XSS (and potentially the related CSRF) and provide a much better authentication framework for web applications.

The problem:

The HTTP protocol is completely stateless. On the server side each and very page access starts with zero knowledge of who is at the other end of the connection. In order to provide what were once considered 'sessions' in the pre-web computing days, the client is able to store a 'cookie' which is sent from the server, which is sent to every page within that domain. The server can look at this cookie and use it to bind a particular person who has presumably passed authentication so they don't have to re-authenticate.

But cookie storage has some serious flaws. If somebody who isn't the specified logged-in person can read the cookie, they can become that person. IP address checks can help to provide extra verification but in a world containing proxies this information can be spoofed.

Cross Site Scripting is a method whereby a malicious person who is allowed to post HTML on a page can inject javascript code which is then executed on a registered user's session and the cookie is leaked or sent elsewhere - allowing the malicious person to impersonate the registered person.

A possible solution:

I'm still working out the details so please let me know if this is flawed, but I think I've got a way to prevent XSS and still allow registered members to post full HTML, CSS, whatever - including javascript.  It relies on the fact that cookies are stored and used per-domain. Different domains are unable to see cookies from another domain.

We'll also assume SSL connections since anything else can leak everything (cookies, passwords, everything) to a port sniffer.

We'll start with a normal website at https://example.com - which we'll assume is a multi-user website where XSS could be a problem. If somebody on this site can inject javascript onto a page, they can steal the cookies of a logged-in user. There are hundreds of ways to do this that are beyond the scope of this discussion.

But we'll also create another domain - say https://private.example.com - which processes logins and does not serve content. This will have a different cookie than example.com. Perhaps we'll let it serve the website banner image just so that it is accessed on every page of the site. Since there is no active content allowed, it is immune to XSS eploits.

It is allowed to process login requests and send cookies, and one image. That's it.  

What this means from an attacker's viewpoint is that he/she now needs to steal two cookies to impersonate somebody else.  It may be easy to steal the cookie on the main site, but there's no way to get at the cookies for the private.example.com site since it isn't allowed to host active content.

The main site uses out-of-band methods (not involving HTTP) to communicate between the two domains and establish that the session is valid and authenticated. They're both hosted in the same place after all. It can check a file or database to see that the logged in session was authenticated by the other site. Both keys (cookies) have to match or the authentication is denied.

Anybody see a flaw in this? Granted I still haven't thought it through completely and haven't yet tested it, but I don't see any glaring problems on the surface. Some variation of this concept will probably work and both prevent XSS as well as provide a better way of doing web authentication that is much more resistant to intrusion.    

Again assuming https to prevent snooping, the only way I can see to steal both cookies and impersonate a logged-in user is to have access to the target person's desktop and browser.

It also allows a site to completely separate the authentication mechanism from the content server allowing the authentication code to be small, simple, self-contained, and verifiable.
Reflection CMS update

Mike Macgirvin
  last edited: Wed, 23 Jul 2008 13:44:25 +1000  from Diary and Other Rantings
At this time, I've managed to pull together a working kernel and prototype of the Reflection CMS. It is not yet ready for public release, but I've been pleased with the progress. Here's a bit of a white paper I've been putting together to explain the rationale and provide a high level overview.

                 Reflection Content Management System

Purpose:

Web content management systems and frameworks that exist today are clunky, overly-complicated, and often insecure. While many of the open source projects are developer friendly and openly encourage derivation, there is often a group that jealously protects the 'core' from feature creep. This makes it difficult to realise many web designs; as it is often the core that is insufficient to the task at hand. Being developer friendly does not mean that an application provides a workable development environment. Add-on modules often cannot be trusted - as they often reflect the work of novice software designers who have had to overcome the limitations of the core product.

In an effort to appeal to the most people, data abstraction is taken to new levels of absurdity and inefficiency. This is not limited to content management systems, as it is a software problem in general.

What I have attempted in taking on this gargantuan task of creating yet another content management system is to solve many of these problems, and to create a system that is extensible and encourages development at all levels - including the so-called core. To that end - most every function can be over-ridden without introducing serious versioning and update issues/incompatibilities. Nothing is sacred.  

The more that I mulled this task, the more it became apparent that what I was looking for in a content management framework is no less than an operating system for web pages. This involves user management, security, and the ability to execute arbitrary 'applications'. It also involves a notion of a file system hierarchy which can be represented entirely by URLs.

Many other content systems abstract data types, and this is a good idea; though it often makes for messy designs. At the heart is a generic nucleus of a content - who owns it, what the permissions are, various timestamps, etc. Data fields that are unique to a particular content item are stored elsewhere and joined on demand.

Implementation of this level of abstraction is a challenging problem. Due to design limitations of most database systems, it involves some tradeoffs - primarily in the ability to perform searches on extended data of multiple extensible data types. For a single type, it can be done with one query. However when multiple data types are involved, a second pass needs to be run to return the extended data for each item. For this reason, it is prudent to store as much 'searchable' information as practical within the nucleus.

There is also general agreement over using themes and templates at the presentation end, so that different renderings are possible without hacking code. Here I'd like to take it one step further and modularise the entire presentation layer. As well as a 'theme', once can choose a particular layout or representation of objects, such as a choice between list view and iconic view, and/or XML feed elements. By making this extensible and arbitrary, entirely new renderings can be accomplished without touching the object code or business logic.

Permissions System

Permissions are the core of any multi-user system. This needs to be well defined, and implemented close to the kernel or core and far away from the presentation layer. In a development environment, the developers should mostly be free of managing permissions. I've implemented a permissions concept similar to Unix/Linux - although modified for better adaptability to web applications. It uses the familiar rwx concept, but I've split the 'x' permission into 'x' and 'u'. 'x' is simply a list permission. 'u' is an ability to use or extend an item. For an article, the 'u' bit allows comment rights. For a vocabulary, it allows the ability to tag something using that vocabulary. I've also introduced higher level permissions. There are six levels:  
  • rwxu admin  
  • rwxu moderators  
  • rwxu owner  
  • rwxu group  
  • rwxu members  
  • rwxu other (aka visitors)
Members is for logged in members. Group is a group association to a unique group identifier, moderators are site moderator accounts. Admin privileges are included in the permissions flags for completeness; though it isn't obvious what value this serves and in most cases these will be masked to prevent locking out the system admin from managing the system.

The Directory Object

The directory or folder object is the primary means of implementing complex data structures and representations. It is an object like any other object on the system, but when navigated to, presents a listing of those items which are attached to it as siblings. It implements a general purpose search and list/enumerate operation. It also contains a path/filename to distinguish it in the URL hierarchy and provide file system semantics to database objects. However, the important items that it contains are a umask (permissions mask) which is applied to any child items, and it can also be configured only to hold items of certain types. This is what distinguishes a photo album from a weblog or forum list. One holds photos and the others hold articles. By allowing a directory to hold any type of content, it can be made to resemble a traditional filesystem; and indeed a multi-user website can be implemented which provides member sub-sites that they manage completely.  

The directory also has complete control over the presentation layer, via themes, renderings, and menu selection. This implies that directory is not simply a 'list', but the complete embodiment of the controls, settings, and the look of that list. These can be inherited and passed on to sub-directories. A limitless range of site policy and structure can be implemented by controlling the settings of the appropriate directory entries.

Applications

Applications or executable code lives outside the virtual directory tree. In order to address the need for an extensible application space and recognising the confines of URL management, applications are denoted by the first URL path parameter. For instance http://example.com/edit invokes the object edit/post application. Additional URL path components are passed to the application as arguments an a manner similar to Unix/Linux 'argv/argc' mechanisms. Application URLs take precedence over path URLs, such that creating a directory or document called 'edit' at the root level will be unavailable at that URL if the 'edit' application exists. An external path alias mechanism exists to redirect to another URL in the case of conflict with the application space.

An application framework exists that supplies plugin methods for handling initialisation, form posts, main page content, and menu callbacks. Arguments are parsed and passed in as argv/argc elements, although meta-arguments dealing with pagination (such as 'page=4') are dealt with by the kernel or core to minimise extra argument parsing at the application level. To provide pagination, an application only needs to obtain a count the total number of items and invoke a 'paginate' function.

Licensing

Reflection will be available under the generic Berkeley license. Free for all uses but with no implied warranty.

Platform

Recent/modern flavours of LAMP. Apache/mod_rewrite is required. PHP5.2+ is required for timezone support. Language: English.
The Darkness

Mike Macgirvin
  last edited: Thu, 21 Feb 2008 07:11:29 +1100  from Diary and Other Rantings
Yesterday around lunchtime an entire subnet at the school went 'dark'. This is not good. The session ('semester' for my  friends in the northern hemisphere) starts on Monday. This is the absolute busiest time of year for faculty and staff because we've got a lot of stuff to prepare for next week when the hammer falls.

The curious thing is that the subnet that went 'dark' was only one subnet, and another subnet which traverses the same wire continues to work fine. So only a random smattering of machines was affected. This made it very difficult to even track down what happened because it appeared as a random cluster of machines that suddenly could not route packets. As it turns out they're all related by having an equal third byte of the IP address. What made it even more difficult to troubleshoot is that two of these machines which went dark are the primary DNS servers, so when they vanished, nobody could see anything 'by name' until we patched a couple of machines over to an alternate.

Trying to get anything done by IP number is a minefield, because even if you don't use any hostnames directly, you might accidentally touch a server or service which does - and you're screwed; waiting for it to time out (if you're lucky).  Some services just hang until you get tired of looking at the hourglass icon and then you have to go find another already logged-in session somewhere else to work. Can't start any new sessions because they mount home directories, which touch name servers and will hang.  

So I'm chugging the morning coffee and heading off to work an hour early this morning to try and recover from this disaster. Spent half the night awake formulating a plan after spending the entire evening determining for certain where the problem was. The problem is a router that's locked in a closet, and only the main campus IT folks have keys. Coincidentally, they made some configuration changes in that closet yesterday. Around lunchtime.

Gentlemen, get over here right now and unlock that door.

Unfortunately it isn't that easy, as there are layers of bureaucracy to contend with. My backup plan is to move two of our absolutely critical machines out of the darkness and into the light. One of them I can unplug and carry away. The other is a virtual machine living on two networks (which can run elsewhere) but I've got to find a wire in another room/building that can talk to both subnets. Oh and a cooperative host with enough disk and memory, that won't mind being loaded up with an alien machine.
Q=`(parlez vous|hablas|sprechen ze) $geek?`

Mike Macgirvin
  from Diary and Other Rantings
It just occurred to me that in the last 4-5 days I've written 'code' in Visual Basic, SmallTalk, lisp, C, bash, awk, PHP, perl, and python. Thousands of lines of code total. And there are probably a few dialects I forgot here. Not to mention 30-40 different flavours of config files, sed, Oracle-SQL, mySQL, and LDAP and some other stuff that don't quite qualify as 'code' but still involve intimately knowing a strange computer dialect. Oh yeah, HTML and JavaScript (of course).
Stunned would be the word I'm looking for

Mike Macgirvin
  last edited: Sat, 02 Feb 2008 18:55:23 +1100  from Diary and Other Rantings
Microsoft is making a bid to buy Yahoo!. Surprise, shock... What words could describe my emotion on hearing this?

I can see the motivation and the reasoning behind it. They want to put a stop to Google ("I'd like to buy a noun, please."). Still I believe this is the wrong way to do it. The only way for them to stop Google is to buy Google. Don't laugh. They are ideologically more closely aligned than you might realize. I don't believe that they've thought through the consequences of this decision - or maybe they have but just don't care. It is a culture clash of epic proportions that will result pretty much in the destruction of Yahoo! and all they've ever done - and do nothing to harm Google. I suspect many of the employees will quit outright, and there's not much place for them to go in Silicon Valley except to side with the enemy (Google), the largest employer in the valley that's still adding significant headcount.

But I also believe that this move can't be stopped, so it doesn't really matter what I think about it. I would however like to share with you the exact image that popped into my brain on hearing this.

Image/photo
The Seven Year Itch

Mike Macgirvin
  last edited: Fri, 01 Feb 2008 16:07:39 +1100  from Diary and Other Rantings
Sometime later this month "Diary and Other Rantings" (i.e. my weblog) will turn 7 years old, and I'll start my eighth year doing this activity called 'blogging'. Perhaps I'll mark the day, perhaps not. We'll see. Maybe I'll just stop doing it altogether. Maybe not. We'll see.

This all started in early 2001. I was at AOL making lots and lots of money from my Netscape stock options. I had a Netscape employee home page that was visited hundreds of thousands of times a day, but this was slowing. AOL no longer linked to it. I had started running a new server in the spare bedroom in 1998-1999, and later moved it to the garage. It took almost two years to get a working DSL link so that I could actually run a public website off of it. High-speed internet to the home was still an experimental technology. DSL wasn't yet ready for prime time and ISDN had other issues which plagued it. Leased-line required somebody to sell you an end-point on the public net and nobody was doing this, besides being limited to 56k which was now the speed of most modems. Web 'hosting' in those days was mostly for big business and costed big money. I could certainly afford it, but decided to spend my cash on more important things (like buying  a music store a year later).

Running a Linux box with an internet link isn't very expensive in the overall scheme of things. So once the DSL was finally working I made a new home page and started improving it.

I think it was Cindy at 'Off the Beaten Path' (now at 'dustingmybrain.com' ) who first introduced me to the concept of a rambling page. Instead of replacing your 'Current Interests' web page every week, you just keep adding to it. Drop in a date. Write what's happening. I started doing this. I was writing HTML in emacs. I called it an online diary. I didn't have titles, categories, RSS feeds, etc. These would come much later. I wasn't writing 'articles', I was just rambling. Why do you need a title for it? That makes it look so structured. The only important thing is the date, so somebody knows when it was that you thought this way. This was important. After several years of living on Netscape time, I firmly believed that one didn't think the same way for very long, and technology was always changing - so information had to have a date.

The other thing that I did was to take a cue from some of the large online news sites, which were the best model available for presenting information that had timestamps. I started writing in reverse chronological order (recent first). This was born of necessity, since nobody wanted to load a large page and scroll to the end to find recent stuff; which was how we did things previously (logfile format).  

In fact I maintained this format for a few years until it became unmanageable. Then I looked for ways of automating my monthly (or whenever) process of moving the current entries to an archive page and starting fresh. So after looking to see what programs were available and trying a few of them, I instead wrote a program to do it myself. Over time that evolved from a simple diary 'archiver' to the thing that you see today - a mega social portal that does everything but make coffee. (I miss this incidentally, I had my computer turn on the coffee pot from an online request in the early 1980s using my first homebrew social portal).  

I still wonder whether anybody reads these pages. Does anybody care? I don't subscribe to the current notions of SEO and affiliate marketing and trackware and all the other ways to improve one's blog ranking. Most notable these days are the pages and pages of 'widgets' attached to every blog, selling everything from online communities to soap. Why bother? Your only visitors will be other bloggers that are all trying to get you to visit their own blog. They aren't really reading what you have to say, they're too busy 'selling' their own wares. Still even after the RSS fiasco a few months back, I manage to pull in a few thousand humans a week. They come and read a page and leave again. This is the state of the modern internet.

It may be of some interest that I've managed to serve up a few hundred million pages since this all started - mostly to crawlers and robots; however last year activity peaked with about 100,000 daily hits (30,000 human visitors) and we've had six or seven days with over a million hits. I've written close to 1500 articles and there have been about 6600 total articles at one time or another from various feeds - before I was forced to nuke them for legal reasons. Only about 250 comments total, which I attribute to my decision a couple years back to do away with the daily spam cleanup and only allow website members to post comments. [I've since revised this policy.]

The 'community portal' (which I started writing a couple of years ago) doesn't have much community and I don't know if that will ever change. Community folks like big parties and unless you have one, you're late to the party. Bloggers only like communities where they can sell their blog.  I don't know how to convince them that a long-running website with several thousand non-blogging human visitors a week is actually a good place to drop a link. Yeah, I could put you on my blogroll, but I read thousands of blogs. It would quickly grow to be unmanageable and you'd be lost in the noise.

But you can add your own link and profile page and whatever - you don't need me to do it. Hint, hint.  

Anyway - we'll see if this lasts or whether I just decide that there are better things to do. Write into space everyday and maybe a couple of people will read it. Maybe not.

That's what it's all about.

Don't ask yourself if it is actually relevant or important or whether anybody cares. You might not like to hear the true answer. It's one blog amongst hundreds of millions, all trying to be visited. All thinking they should be relevant to somebody. It's like asking if one star in the entire universe is relevant. Maybe one is relevant to somebody. But the big question looms, is it yours? Unless it's the sun and brings life to this planet, it's likely just another star in the vastness of space.  

In fact, nobody really cares whether you blog or not when all is said and done. Well maybe one or two folks. In my case those are the same one or two folks that cared back in 2001. Everybody else is just passing through on their way to somewhere else.

Still every day (sometimes two) I go to my website and ramble about what's on my mind. I tweak the software to make it better. Even knowing that it is all an exercise in futility. Strange.
Documentation? What documentation?

Mike Macgirvin
  last edited: Thu, 31 Jan 2008 16:09:19 +1100  from Diary and Other Rantings
The school's weather station webpage seems to have stuffed it sometime around Thanksgiving. Today somebody finally noticed and alerted the support staff.

My boss asks "where's the documentation?".

Right.  There is none. This system has been in place for ten years or more and fails occasionally.  When that happens we go in and fix it.

Start with the webpage that actually displays the data. It's pulling the data from a file that is supposed to be automagically updated. Except we don't believe in magic. The file didn't get updated. Now to find out why.

Since this is a scheduled event, cron has to be involved. Let's have a look at the crontab file. Hmmm. It's pulling the changes from another file that is supposed to be automagically updated. That one hasn't been changing either. What changes that file? It isn't cron. Or is it? That file is symlinked to a file on another computer. Let's go have a look at the other computer. Ah, I see. There's a crontab running there which generates the contents of the update file from a data file via a collection of python scripts. Let's have a look at those.

As I suspected, they are pulling data from yet another file that is automagically updated. Right. It hasn't changed since November either. What changes this file? Time to scan the logs. Nothing.

OK, it's time to start from the other direction. The weather station is connected to a PC in the corner of a lab. Let's have a look there. It's hung and totally unresponsive. OK, maybe that's the problem. I reboot it. Then go back to the webpage. Nope. Nothing has changed.

OK, somehow the data has to get from the weather station computer to the other computer where the python scripts can munge it. Let's have a look at the logs.

The logs say everything is fine, but it isn't fine. Nothing. It's not happening. Well this is interesting. I check connectivity and network connections. They're OK. We've got an IP addess and pings work just fine. A closer look reveals that there's a Windows task scheduler which occasionally FTP's the weather files across the net to the second Unix box. The logs don't show any errors. Hmmm. The files aren't being FTP'd though. They aren't making it. Then I see a notice at the bottom of the screen. Updates were applied some time since the computer was last powered on - six months ago. OK, what updates? Windows firewall. Right. So I have a look, and sure enough the computer's FTP connection has been firewalled because of an automatic update. The FTP's are silently failing - and indicating success. This is pure evil. After several minutes I'm able to get in with an administrator account that can fix the firewall and do so.

Then have another look. Still nothing happening. What could be the problem now? Ah, on reboot FTP is automatically disabled on the weather station software - again without any warnings. The logs again say everything is working and files are being transferred. More evil. What's the use of having log files if they lie to you? I turn on the FTP. Bingo - now the files get through. Now back to the second computer to manually process the files and dump them into the directory where the third computer can pick them up. Then back to the third computer to manually update the processed files.

Yay! It works.

Back to the documentation. How would somebody document stuff like this? There's just too much that can go wrong. I could use up a tree or two writing it all down. This is why we've got systems folks.
Stupid Tricks

Mike Macgirvin
  from Diary and Other Rantings
If you're a techie and are ever really bored, here's something you can do for amusement...

Assume you're running on some flavour of Windows. First create a virtual machine running Linux.

Now in your virtual machine, let's fire up a CP/M emulator. A machine running in a machine running in a machine.

Now wasn't that fun?

Now let's take it one step further...

In your virtual machine, run a windows emulator. Now run some cygwin tools.  Maybe ssh (from within a Bourne shell) out to a Mac and do something there.

So now you've got Mac stuff running under Linuxy stuff running under Windows running under Linux running under Windows.

Like I said, you need to be really bored to do stuff like this. And don't even think about trying to explain to anybody why it's even amusing.
Sun acquires MySQL

Mike Macgirvin
  last edited: Fri, 18 Jan 2008 09:30:46 +1100  from Diary and Other Rantings
Unless you've been watching closely, this announcement was easy to miss. Sun Microsystems is acquiring MySQL. This has ramifications both good and bad.

This will likely affect a huge number of people who are currently using open source web applications; a majority of which are being stored on MySQL databases. Their future viability is now questionable. It all depends on the license and revenue models Sun chooses to adopt.

I would also try to steer clear of the pending 6.0 release as it is likely to involve significant re-structuring of the code to suit Sun's business requirements. It may be a year or three before it stabilises again. Sun is legendary for introducing layers of bureaucracy into development projects.  

While Sun may make public announcements of their intent to continue to provide the product for free [and it should be noted that there was no such announcement in the press release], it is difficult to imagine the corporate bean counters not making a recommendation to derive as much revenue stream as possible from the acquisition.

You can read the announcement here.

Also of potential interest is this (dated) history of MySQL
No Intel Inside

Mike Macgirvin
  last edited: Wed, 09 Jan 2008 13:02:39 +1100  from Diary and Other Rantings
After a continuing feud between Intel Corp and the One Laptop Per Child organization, Intel finally walked out with a huff.

OLPC has been trying hard to get cheap computers into the world's poorest countries. But in order to keep the price down, these have AMD chips instead of the higher-priced Intel processors, and use open source operating systems rather than the also pricier Windows.

Intel joined the organization a while back, seeing that it was a huge market opportunity. But they used their insider knowledge to get their own Wintel machines into the running for the purchase orders, at almost twice the cost. Then they put their biggest sales reps into the field to make sure that they got all the orders.  

OLPC cried foul. Intel responded by leaving the organization (presumably taking their support money with them).
Netscape won - well sort of

Mike Macgirvin
  last edited: Mon, 10 Dec 2007 18:12:41 +1100  from Diary and Other Rantings
I was recently reflecting on my startup days at Netscape more than a decade ago. What was compelling about what we were trying to accomplish at the time was to make operating systems irrelevant.

Netscape tried to accomplish this by embedding what once were typically operating system functions into a multi-protocol window onto the world (the web browser). It had your text editor, the aforementioned web browser, a window system (aka frames) and a programming language or two (Java and JavaScript) to tie it all together and turn it into a general purpose information and communication appliance. The consensus at the time was that this would make the underlying operating system (*nix, Windows, Apple, OS/2, whatever) irrelevant. If you had this tool on your computer, you could work on any computer, any platform, and be able to do all of your information related tasks.

Needless to say, this grand idea failed. The browser is still around, but the original idea was lost along the way. New browsers don't come with the same tools we tried to provide back then. Java and calendaring in particular are now separate add-ons, as is the email package - available separately.

But does that mean Netscape lost? Well yeah, Netscape did. They're gone. But the concept of operating system irrelevance didn't. As I write this (on a Windows box), I've got windows open to Unix servers, I've got Unix command shells and utilities, I've got typical Unix programming languages and databases and web servers all running in this alien environment. It's actually impressive how far we've come. I'm currently dumping a remote Linux file system to a local (Windows) disk drive using nothing but Unix commands. ssh, tar, bash. These are all running on the Windows box using Cygwin, which comes with a couple hundred native Unix commands. I've got my familiar LAMP (Linux, Apache, MySQL, PHP) web programming environment via XAMPP; which I'm developing with emacs. Remote system monitors running via X (the Unix windowing system) and being displayed on my desktop via Cygwin/X. I've even got my mouse setup with hover-to-focus mode.  The only thing that provides solid evidence that I'm not running a Unix operating system is the IE icon on the desktop (which I never use).

And this is all running on Windows.

So the operating system is completely irrelevant. It's just something that sits in the background and allows you to launch programs. Just like we envisioned back in the '90s. OK, not exactly like we envisioned, but I'm quite comfortable with the end result.
OS madness, chapter #7936

Mike Macgirvin
  last edited: Thu, 08 Nov 2007 14:54:42 +1100  from Diary and Other Rantings
Still struggling with device drivers on Windows Vista. The sound card drivers have an update, but I'm skeptical. Several folks reported BSOD when they installed it.

And I've lost any good feelings I had for Debian. Recently I moved my old RedHat installation to a newer PC - one that was only 8 years old rather than 10. All went extremely smooth. On bootup, it found the new motherboard, new network card, mouse, monitor, etc. - and configured all of them.  Everything worked fine.

Then I upgraded to Debian. The RedHat was a couple of years old, and I didn't want to mess with building PHP, MySQL, and Apache upgrades as well. Just boot up a newer Linux. Debian is currently one of the more popular Linux flavors - and I especially like the APT package management utility. Need PHP? 'apt-get install php'. You don't need to build and configure it and mess with library dependencies. These are all taken care of. If it needs new libraries, these are installed as well as any libraries that they depend on.  

Anyway, now (a couple of weeks later) I put in another newer PC - this time only 4 years old. I was expecting everything to go smoothly like it did last time. But it didn't. Debian doesn't have very good hardware (re-)detection, and they also don't load any other drivers than what is absolutely necessary. So I'm faced with an incomplete operating system that doesn't recognize the monitor or ethernet card. And I can't load in the modules for these devices over the net, because it doesn't recognize the network card. It's a Catch-22.

The only solution now is a re-install. Spend a few weeks getting everything configured and then start over. Right. I've been here before. Way too often...

But if you're one of those folks considering moving away from RedHat/Fedora, beware. It's nice to be able to plop your disks into another box if the one you've got goes bad - and keep running. Debian won't do this.
goodbye /dev/fd0

Mike Macgirvin
  last edited: Sat, 13 Oct 2007 08:50:39 +1000  from Diary and Other Rantings
I've got both a 5.25 and 3.5 floppy drive on my Linux box at home. I bought the 5.25 drive in the mid-80s as part of my first 'clone kit' (cheap IBM-style PC built completely from components made in Asia). Today I finally unplugged it, even though it still works just fine. I can't even recall the last time I actually used it. I believe I made a Linux emergency boot disk back in 2002, but that was on the 3.5 drive.

I tossed all my old 5.25 disks prior to moving overseas. After much more than ten years in non-controlled temp/humidity environments, there wasn't any 'critical data' left to speak of. Ditto for the 3.5's.

So I've got no disks left to read, and 720kbytes is pathetic and slow storage in 2007.  If you need a copy of something, just flash it to a USB stick.

It certainly wasn't my first floppy drive (that distinction would go to an old 8 inch CP/M drive which I never used because it was obsolete before I ever finished writing the assembly language driver code); - but it's the longest surviving piece of computer hardware I've got at the moment.  Seems a shame to let it go, however what's the point of keeping it running?

This data too shall pass.
Juli@ sound card and Vista

Mike Macgirvin
  from Diary and Other Rantings
If you're looking for a decent mid-range sound card and don't want to spend a fortune, the ESI Juli@ is pretty respectable. I really like the fact that it's about the cheapest card that'll provide balanced line. You do this by flipping the card around. Unbalanced connectors on one side, balanced on the other. It's a pretty neat concept.

Anyway, if you're trying to install one of these suckers on Vista, forget the installation CD. You can just throw it in the trash if you want. Even though the latest driver is for XP/2005, just go to the website and grab the latest. The driver on the install disk is a piece of crap and you'll be wondering why you bought such a sucky card. Can't even get the basic speaker test sounds to come out without about 300% signal distortion, dropouts, odd harmonics, etc. In short, the sound you get is almost totally unrecognizable.

The website driver makes it actually work.

Oh, and to use with Sonar, don't use the WDM channel. Just go with ASIO.
Multiviews is good, unless it's bad

Mike Macgirvin
  last edited: Tue, 11 Sep 2007 09:01:51 +1000  from Diary and Other Rantings
This had me pulling my hair out yesterday, so I thought I'd share the experience with enough key terms that the next person pulling their hair out will find it.

I was installing my CMS software on a work machine. I'll likely be doing additional development on it, and the university is the best place to do this. But that's neither here nor there. My software is designed around 'clean URLs'; which means what you see in the URL bar isn't (usually) littered with code and operating system artifacts. So for instance to post to my weblog, I go to the URL /post/weblog, not something like post.php?op=weblog.

To accomplish this, I use an Apache webserver module called 'mod_rewrite', which takes care of the nitty details of this process. Mod_rewrite is not without its faults, but that's the subject of another article. It does the job. The biggest thing it does is let you leave out the '.php', except I'm letting it do a whole lot more.

Anyway I'll cut to the chase. My software was horribly broken after installing it yesterday. It took hours to figure out why. Something else was trying to provide clean URLs and strip '.php' from places where I actually needed to have it in order for things to work. Well that's not technically correct either. It was actually executing PHP files by URL without the extension. Except that these were 'include files'. They weren't meant to be executed directly. They were meant to be included in something else, and the something else was managed by mod_rewrite. The something else was never getting called.  

This was inconceivable. Nothing in any of the system release notes said anything about some magic new clean URL ability. This was on Debian (edge). Apache, PHP, MySQL. I tried all the sites. I googled for everything I could think of. Clean URL debian. Clean url apache. '.php not needed'. mod_rewrite. strip file extension. ForceType. (ForceType also lets you execute files without providing a filename extension). I scoured the last several months of Apache release notes, to no avail.

Finally after several hours I happened upon a little gem of a snippet on an obscure website. 'Turn on Multiviews instead of ForceType'. Debian has multiviews turned on by default, but this is the first I'd heard of it. I had assumed (never do this of course) that it was yet another fancy mod_dir option or something I didn't care about.

No. Multiviews is a slick trick for Apache that takes any pathname, and if it thinks it can find a page to return, it returns it. It uses the basename of the file in the URL and if there's no file, it looks for the filename with an extension. Any extension. Then it sends the file back. So it gives you a clean URL. You type in 'index' and it will send back 'index.htm' or 'index.html' or 'index.php' or 'index.pl' or 'index.shtml'. You get the idea. You can test this on any site that has multiviews turned on by asking for 'index' and see if you actually get a page. Normally you wouldn't.  

If the URL is 'post' and there's a file in the directory called 'post.php' it will send that file back even if you don't want it to. So I'll let you research it further if multiviews is what you want. It's actually pretty cool. In my case I had to disable it.

Options -Multiviews in the .htaccess did the trick and made everything work again.
Keeping me busy

Mike Macgirvin
  last edited: Thu, 30 Aug 2007 16:21:42 +1000  from Diary and Other Rantings
Work has been going well, at least as far as the employment part of it goes. The task list is a freaking nightmare.

I reported a week or so ago about the number of people (279) that had 'root' (administrator) access across the domain due to a programming error by my predecessor. But that's not half of it. Another 240 or so had no access whatsoever (and should have). Yet another 40 or so had non-zero duplicate login ID's ('uids' in Unix-ese). This means that any of them could write over or steal files from the other folks with whom they shared ID's.

This is all due to programming errors by my predecessor. There are just under 2000 accounts, so a little more than one in four were hopelessly screwed up.  

I've also had to break in to about 40 machines because my predecessor didn't leave any password information for them and doesn't respond to (phone,email) queries. He's still at the 'Uni' (University) and somehow managed to get promoted to central IT services. Gawdd, I fear for the damage that he can do with even more access to the central infrastructure systems like payroll, purchasing, enrollment databases, etc. Most of the departmental machines (the ones I'm now caring for) have custom built scripts for performing user and system management. Those dealing with system management are as buggy as the ones managing users. It's some of the most horrid looking buggy code I've ever encountered - and I've been encountering buggy code for over half my life.

At least I'm not in danger of running out of things to fix any time soon. I'm amazed some of this stuff worked at all. In fact, most of it didn't - or just worked marginally enough that nobody ever noticed how flucked up it really was.

Oh well. Slowly but surely I'm getting all of this stuff whipped into shape.