cover photo

Mike Macgirvin

mike@macgirvin.com

Towards a distributed online social network

Mike Macgirvin
  from Diary and Other Rantings
Towards a distributed online social network.

Recent events have caused a great deal of consternation in the future of Facebook as a social portal. We've seen this before in other social networks and experiments over the years. Many folks have suggested (and there are some current development efforts) that the future of online social networking is distributed – where nobody has control over your private data but you. This is a top level examination of how such a network might be implemented and suggests some practical implications. This document will evolve. Contact me directly for updates and/or practical discussion.

On the issue of friendship

In the online social landscape of recent years, “friendship” has become a mutual and binary relationship between two people. You are either friends or you're not. There are no shades of grey for acquaintances, co-workers, etc. Likewise a “relationship” can only exist if both people mutually agree to the existence of the friendship.

This is completely alien to friendship as it exists in the real world. Often, feelings of kinship we might feel for somebody are not reciprocated, or are reciprocated at a different level or degree than our own interest.

So let us start with a clarification of relationships in this new world of online social networking. They can be one-sided. They can also have many levels of closeness and/or separation. Interaction may be complete and involve coordination and synchronicity or it may be intermittent and casual.

In software terms, we must dispose of the outdated notion of “request friendship, and if it is accepted, friendship commences”. That is simply not how the world works. The most convenient way to express this in terms of the social web is an “expression of interest”.

On the issue of privacy

Facebook in particular has been the focus of a good deal of scrutiny because of differences in the perception of privacy. Humans express themselves differently depending on who they are interacting with. Some of these communications may be open and public (as in the instance of a professional musician communicating with his/her fan base). Some of them can be quite private (as in the instance of two potential or de facto lovers). In between are a range of needs, such as the need to keep our private life away from close scrutiny by co-workers and potential employers.

If we are to communicate with this range of needs online, we need fine granularity in who can see different pieces of information or communications. Facebook has tried and in fact has advanced the state of the art in this concept of privileged communication, but it comes at odds with their business model - which requires most communications to be open so that the service has more value to advertisers.

If we are to propose a distributed social network model, it must provide at least this level of granularity.

The distributed social model

Many aspects of the online social network are made more difficult in a distributed model than they are in a monolithic environment. Conversations with several participants are not easy to present in real time. Instead of the conversations existing in a single database, fragments may need to be pulled in from disparate locations.

Also, without the supervision of a central authority with the power to revoke membership, abuse can occur in the system by either commercial forces or vandals. Let's take the issue of friend requests. Commercial forces will seek to make lots of friends, because it is the nature of advertising to spread your brand name.

We discovered in the “blogging era” of 2002-2008 that any comment box which was open to the public quickly became a spam target. A distributed social service will not succeed if it requires manual maintenance of thousands of undesirable requests for communication each day.

In order to prevent or reduce abuse, we need some form of verification of identity. A username/password scheme will not succeed in this environment as people will tend to use the same password in all cases and this can lead to forgeries and break-ins.

Symmetric or public key cryptography has to be central to a distributed social network. Here there is a key pair – one public and one private. Either key can decrypt communications encrypted with the other. One key is kept under tight control (the “private key”). The “public key” can be shared with anybody.

We can also assume that given an open and distributed implementation, software will be created with a range of features to fulfil individual needs. We do not need to define exactly how these work or what they do. All we need to do is specify how they will interact. We also cannot force anything on the individual “cell” of this social network. For instance, we cannot specify that if we comment on an author's communication, that our comment will be kept private from certain other people. We may not even be able to delete it once it has left our hands. All we can do is “request deletion” and hope for the best. It's a good thing for implementers and friends to honour these requests because otherwise they will fall into disfavour and not be trusted with future communications.

Also, how do we know that “Barbara Jensen” is actually the Barbara Jensen we have communicated with before? We need a shared secret and/or strong authentication. Who do we trust? In the early days of such a service, we might be willing to trust somebody is who they say they are – especially if this person becomes a member of our social network through email or personal communication. As time goes on and our network grows, the abuse potential will also grow. At that point we will need stronger verification. This might involve “friend of a friend” trust – or third party trust through a “respected” source, such as a major network provider who has already established a relationship with this person. It would be easy to say that we should use known certificate authorities to provide this information - but this will involve monetary requirements and complicated procedures for proving one's identity. These complications could prevent a network from ever growing to a level of “critical mass”. So we have to be lenient with what credentials we accept, while preferring more proof the further from our core group of social contacts one is.

XSS prevention is also a built-in feature of monolithic social network providers. This radically affects the ability to use HTML in our communications. Plain text may not be seen as a rich form of communication. Perhaps a rich text language should be provided from the outset. There is also the potential for third-party providers to provide filtering services.

Ditto for porn filtering.  

It is also probable that third-party aggregators will provide global searching and lookup to locate profiles of people that interest us.

Mandatory items:


  • Public profile page. The public profile page should include the URL's of any other communication endpoints we require, embedded into the page, be it either HTML or XML. This is the global access page to this particular profile. It may contain recent updates of items with global visibility. It also contains either inline or a pointer to the public encryption key for this person.

  • Expression of interest URL, discoverable on the public profile page. This is the friendship request application. This is activated by electronically “dropping a card” containing the requestor's public profile page. There need not be any communication that the expression of interest was acted on. A “ticket” is granted regardless. This ticket is to be used in further communications with this person.


  • Private profile page. This page is where to find the “feed” for this person. The feed itself is contained in an XML exchange of credentials (an encrypted copy of your ticket, along with your public key), which results in a personalised feed of information being returned. There is no requirement that the person recognise you as a friend. You might be returned a generic public page.


  • A “post page” where you may comment on published items.

  • A notification URL which is where notifications of recent activity will be sent.

  • A “friend” page. This is quite different from the friend pages you may be used to as it includes profile pictures/names/URLs for people who you follow – and are willing to share with the public. There is no requirement that they follow you. If you provide an XML credential exchange, you may be shown a different list depending on what the person wishes to share.
The only way to determine if a friendship is mutual is if both people actively follow each other's activities.
fortune_to_html in PHP

Mike Macgirvin
  last edited: Wed, 26 Nov 2008 22:04:48 +1100  from Diary and Other Rantings
One of the problems with using the Unix/Linux fortune (aka fortune-mod) command in web pages is making it readable in HTML. One can provide something that mostly works by substituting any HTML entities (&,<,>, and double quote) and converting linefeed to <br />.

However you're still going to get a lot of fortunes with unprintable characters where the original intent was lost - as many of these used 'backspace hacks' to provide character underlines, accent marks, and on really old fortune databases, using backspace to strike out text and replace it with something more amusing.

Here is a function that should make 99.999% of the fortunes you may encounter that use weird ASCII tricks display in web pages mostly as originally intended.

<?php

function fortune_to_html($s) {

  // First pass - escape all the HTML entities, and while we're at it
  // get rid of any MS-DOS end-of-line characters and expand tabs to
  // 8 non-breaking spaces, and translate linefeeds to <br />.
  // We also get rid of ^G which used to sound the terminal beep or bell
  // on ASCII terminals and were humorous in some fortunes.
  // We could map these to autoplay a short sound file but browser support
  // is still sketchy and then there's the issue of where to locate the
  // URL, and a lot of people find autoplay sounds downright annoying.
  // So for now, just remove them.

  $s = str_replace(
    array("&",
          "<",
          ">",
          '"',
          "\007",
          "\t",
          "\r",
          "\n"),

    array("&",
          "<",
          ">",
          """,
          "",
          "        ",
          "",
          "<br />"),
    $s);

  // Replace pseudo diacritics
  // These were used to produce accented characters. For instance an accented
  // e would have been encoded by '^He - the backspace moving the cursor
  // backward so both the single quote and the e would appear in the same
  // character position. Umlauts were quite clever - they used a double quote
  // as the accent mark over a normal character.

  $s = preg_replace("/'\010([a-zA-Z])/","&\\1acute;",$s);
  $s = preg_replace("/\"\010([a-zA-Z])/","&\\1uml;",$s);
  $s = preg_replace("/\`\010([a-zA-Z])/","&\\1grave;",$s);
  $s = preg_replace("/\^\010([a-zA-Z])/","&\\1circ;",$s);
  $s = preg_replace("/\~\010([a-zA-Z])/","&\\1tilde;",$s);

  // Ignore multiple underlines for the same character. These were
  // most useful when sent to a line printer back in the day as it
  // would type over the same character a number of times making it
  // much darker (e.g. bold). I think there are only one or two
  // instances of this in the current (2008) fortune cookie database.

  $s = preg_replace("/(_\010)+/","_\010",$s);

  // Map the characters which sit underneath a backspace.
  // If you can come up with a regex to do all of the following
  // madness  - be my guest.
  // It's not as simple as you think. We need to take something
  // that has been backspaced over an arbitrary number of times
  // and wrap a forward looking matching number of characters in
  // HTML, whilst deciding if it's intended as an underline or
  // strikeout sequence.

  // Essentially we produce a string of '1' and '0' characters
  // the same length as the source text.
  // Any position which is marked '1' has been backspaced over.

  $cursor = 0;
  $dst = $s;
  $bs_found = false;
  for($x = 0; $x < strlen($s); $x ++) {
    if($s[$x] == "\010" && $cursor) {
      $bs_found = true;
      $cursor --;
      $dst[$cursor] = '1';
      $dst[$x] = '0';
      $continue;
    }
    else {
      if($bs_found) {
        $bs_found = false;
        $cursor = $x;
      }
      $dst[$cursor] = '0';
      $cursor ++;
    }

  }

  $out = '';
  $strike = false;
  $bold = false;

  // Underline sequence, convert to bold to avoid confusion with links.
  // These were generally used for emphasis so it's a reasonable choice.
  // Please note that this logic will fail if there is an underline sequence
  // and also a strikeout sequence in the same fortune.

  if(strstr($s,"_\010")) {
    $len = 0;
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        $len ++;
        $bold = true;
      }
      else {
        if($bold) {
          $out .= '<strong>';
          while($s[$x] == "\010")
             $x ++;
          $out .= substr($s,$x,$len);
          $out .= '</strong>';
          $x = $x + $len - 1;
          $len = 0;
          $bold = false;
        }
        else
          $out .= $s[$x];
      }
    }
  }

  // These aren't seen very often these days - simulation of
  // backspace/replace. You could occasionally see the original text
  // on slower terminals before it got replaced. Once modems reached
  // 4800/9600 baud in the late 70's and early 80's the effect was
  // mostly lost - but if you find a really old fortune file you might
  // encounter a few of these.

  else {
    for($x = 0; $x < strlen($s); $x ++) {
      if($dst[$x] == '1') {
        if($strike)
          $out .= $s[$x];
        else
          $out .= '<strike>'.$s[$x];
        $strike = true;
      }
      else {
        if($strike)
          $out .= '</strike>';
        $strike = false;
        $out .= $s[$x];
      }
    }
  }

  // Many of the underline sequences are also wrapped in asterisks,
  // which was yet another way of marking ASCII as 'bold'.
  // So if it's an underline sequence, and there are asterisks
  // on both ends, strip the asterisks as we've already emboldened the text.

  $out = preg_replace('/\*(<strong>[^<]*<\/strong>)\*/',"\\1",$out);

  // Finally, remove the backspace characters which we don't need anymore.

  return str_replace("\010","",$out);
}
NameThingy

Mike Macgirvin
  from Diary and Other Rantings
Since I stopped actively updating this site several months back, it appears the bulk of the incoming traffic has been visiting my various random name generators.

I decided to clean these up a bit and spin them off onto a dedicated site. You can visit it at NameThingy.com. It was quite a fun exercise, as I've managed to reduce the random name generation and all the potential options to a single HTML page that is dynamically refreshed using Ajax. Check it out.
Stoppping XSS forever - and better web authentication

Mike Macgirvin
  last edited: Thu, 21 Aug 2008 11:46:11 +1000  from Diary and Other Rantings
I've been working on all kinds of different ways to completely stop XSS (and potentially the related CSRF) and provide a much better authentication framework for web applications.

The problem:

The HTTP protocol is completely stateless. On the server side each and very page access starts with zero knowledge of who is at the other end of the connection. In order to provide what were once considered 'sessions' in the pre-web computing days, the client is able to store a 'cookie' which is sent from the server, which is sent to every page within that domain. The server can look at this cookie and use it to bind a particular person who has presumably passed authentication so they don't have to re-authenticate.

But cookie storage has some serious flaws. If somebody who isn't the specified logged-in person can read the cookie, they can become that person. IP address checks can help to provide extra verification but in a world containing proxies this information can be spoofed.

Cross Site Scripting is a method whereby a malicious person who is allowed to post HTML on a page can inject javascript code which is then executed on a registered user's session and the cookie is leaked or sent elsewhere - allowing the malicious person to impersonate the registered person.

A possible solution:

I'm still working out the details so please let me know if this is flawed, but I think I've got a way to prevent XSS and still allow registered members to post full HTML, CSS, whatever - including javascript.  It relies on the fact that cookies are stored and used per-domain. Different domains are unable to see cookies from another domain.

We'll also assume SSL connections since anything else can leak everything (cookies, passwords, everything) to a port sniffer.

We'll start with a normal website at https://example.com - which we'll assume is a multi-user website where XSS could be a problem. If somebody on this site can inject javascript onto a page, they can steal the cookies of a logged-in user. There are hundreds of ways to do this that are beyond the scope of this discussion.

But we'll also create another domain - say https://private.example.com - which processes logins and does not serve content. This will have a different cookie than example.com. Perhaps we'll let it serve the website banner image just so that it is accessed on every page of the site. Since there is no active content allowed, it is immune to XSS eploits.

It is allowed to process login requests and send cookies, and one image. That's it.  

What this means from an attacker's viewpoint is that he/she now needs to steal two cookies to impersonate somebody else.  It may be easy to steal the cookie on the main site, but there's no way to get at the cookies for the private.example.com site since it isn't allowed to host active content.

The main site uses out-of-band methods (not involving HTTP) to communicate between the two domains and establish that the session is valid and authenticated. They're both hosted in the same place after all. It can check a file or database to see that the logged in session was authenticated by the other site. Both keys (cookies) have to match or the authentication is denied.

Anybody see a flaw in this? Granted I still haven't thought it through completely and haven't yet tested it, but I don't see any glaring problems on the surface. Some variation of this concept will probably work and both prevent XSS as well as provide a better way of doing web authentication that is much more resistant to intrusion.    

Again assuming https to prevent snooping, the only way I can see to steal both cookies and impersonate a logged-in user is to have access to the target person's desktop and browser.

It also allows a site to completely separate the authentication mechanism from the content server allowing the authentication code to be small, simple, self-contained, and verifiable.
Mike Macgirvin
  
An obvious flaw which quickly became apparent was using an image/entity on the main page to link to the auth server - as the page would then need to be rendered before authentication can succeed. This is backward because you usually want to know the authentication state before you provide content.

So the best way to work this is to use a redirect out front to ensure both domains are accessed before the page is rendered. This in fact matches what many larger sites do for authentication, a separate auth server which passes through to the request server. Using a second session key in another domain to neutralize any effect of stealing the primary session key I believe is relatively rare in practice, although it may be implemented on these larger sites. The basic concept can be applied to small hosted sites very easily without requiring multiple machines and a data cloud architecture. This is what makes it attractive - it can be easily added into any existing hosted community software.  

Also, there are many other reasons why you would want to limit the ability to use javascript on community pages - but these should be to reduce potential annoyance and disruptive behaviour rather than to protect the integrity of your authentication. There are just way too many ways to get javascript into a page to try and protect them all from sessionid theft. But if sessionid theft has no gain, such script restrictions are a matter of choice rather than an absolute neccessity.
Reflection CMS update

Mike Macgirvin
  last edited: Wed, 23 Jul 2008 13:44:25 +1000  from Diary and Other Rantings
At this time, I've managed to pull together a working kernel and prototype of the Reflection CMS. It is not yet ready for public release, but I've been pleased with the progress. Here's a bit of a white paper I've been putting together to explain the rationale and provide a high level overview.

                 Reflection Content Management System

Purpose:

Web content management systems and frameworks that exist today are clunky, overly-complicated, and often insecure. While many of the open source projects are developer friendly and openly encourage derivation, there is often a group that jealously protects the 'core' from feature creep. This makes it difficult to realise many web designs; as it is often the core that is insufficient to the task at hand. Being developer friendly does not mean that an application provides a workable development environment. Add-on modules often cannot be trusted - as they often reflect the work of novice software designers who have had to overcome the limitations of the core product.

In an effort to appeal to the most people, data abstraction is taken to new levels of absurdity and inefficiency. This is not limited to content management systems, as it is a software problem in general.

What I have attempted in taking on this gargantuan task of creating yet another content management system is to solve many of these problems, and to create a system that is extensible and encourages development at all levels - including the so-called core. To that end - most every function can be over-ridden without introducing serious versioning and update issues/incompatibilities. Nothing is sacred.  

The more that I mulled this task, the more it became apparent that what I was looking for in a content management framework is no less than an operating system for web pages. This involves user management, security, and the ability to execute arbitrary 'applications'. It also involves a notion of a file system hierarchy which can be represented entirely by URLs.

Many other content systems abstract data types, and this is a good idea; though it often makes for messy designs. At the heart is a generic nucleus of a content - who owns it, what the permissions are, various timestamps, etc. Data fields that are unique to a particular content item are stored elsewhere and joined on demand.

Implementation of this level of abstraction is a challenging problem. Due to design limitations of most database systems, it involves some tradeoffs - primarily in the ability to perform searches on extended data of multiple extensible data types. For a single type, it can be done with one query. However when multiple data types are involved, a second pass needs to be run to return the extended data for each item. For this reason, it is prudent to store as much 'searchable' information as practical within the nucleus.

There is also general agreement over using themes and templates at the presentation end, so that different renderings are possible without hacking code. Here I'd like to take it one step further and modularise the entire presentation layer. As well as a 'theme', once can choose a particular layout or representation of objects, such as a choice between list view and iconic view, and/or XML feed elements. By making this extensible and arbitrary, entirely new renderings can be accomplished without touching the object code or business logic.

Permissions System

Permissions are the core of any multi-user system. This needs to be well defined, and implemented close to the kernel or core and far away from the presentation layer. In a development environment, the developers should mostly be free of managing permissions. I've implemented a permissions concept similar to Unix/Linux - although modified for better adaptability to web applications. It uses the familiar rwx concept, but I've split the 'x' permission into 'x' and 'u'. 'x' is simply a list permission. 'u' is an ability to use or extend an item. For an article, the 'u' bit allows comment rights. For a vocabulary, it allows the ability to tag something using that vocabulary. I've also introduced higher level permissions. There are six levels:  
  • rwxu admin  
  • rwxu moderators  
  • rwxu owner  
  • rwxu group  
  • rwxu members  
  • rwxu other (aka visitors)
Members is for logged in members. Group is a group association to a unique group identifier, moderators are site moderator accounts. Admin privileges are included in the permissions flags for completeness; though it isn't obvious what value this serves and in most cases these will be masked to prevent locking out the system admin from managing the system.

The Directory Object

The directory or folder object is the primary means of implementing complex data structures and representations. It is an object like any other object on the system, but when navigated to, presents a listing of those items which are attached to it as siblings. It implements a general purpose search and list/enumerate operation. It also contains a path/filename to distinguish it in the URL hierarchy and provide file system semantics to database objects. However, the important items that it contains are a umask (permissions mask) which is applied to any child items, and it can also be configured only to hold items of certain types. This is what distinguishes a photo album from a weblog or forum list. One holds photos and the others hold articles. By allowing a directory to hold any type of content, it can be made to resemble a traditional filesystem; and indeed a multi-user website can be implemented which provides member sub-sites that they manage completely.  

The directory also has complete control over the presentation layer, via themes, renderings, and menu selection. This implies that directory is not simply a 'list', but the complete embodiment of the controls, settings, and the look of that list. These can be inherited and passed on to sub-directories. A limitless range of site policy and structure can be implemented by controlling the settings of the appropriate directory entries.

Applications

Applications or executable code lives outside the virtual directory tree. In order to address the need for an extensible application space and recognising the confines of URL management, applications are denoted by the first URL path parameter. For instance http://example.com/edit invokes the object edit/post application. Additional URL path components are passed to the application as arguments an a manner similar to Unix/Linux 'argv/argc' mechanisms. Application URLs take precedence over path URLs, such that creating a directory or document called 'edit' at the root level will be unavailable at that URL if the 'edit' application exists. An external path alias mechanism exists to redirect to another URL in the case of conflict with the application space.

An application framework exists that supplies plugin methods for handling initialisation, form posts, main page content, and menu callbacks. Arguments are parsed and passed in as argv/argc elements, although meta-arguments dealing with pagination (such as 'page=4') are dealt with by the kernel or core to minimise extra argument parsing at the application level. To provide pagination, an application only needs to obtain a count the total number of items and invoke a 'paginate' function.

Licensing

Reflection will be available under the generic Berkeley license. Free for all uses but with no implied warranty.

Platform

Recent/modern flavours of LAMP. Apache/mod_rewrite is required. PHP5.2+ is required for timezone support. Language: English.
The Reflection CMS Project

Mike Macgirvin
  from Diary and Other Rantings
Just wanted to update y'all on current happenings since I terminated my daily rants a while back...

I've been working under the covers on a new web project; which takes all that I've learned building this here website and social spaces in general and pushes it into a new realm.

The thing about CMS software is that they all suck. Some suck worse than others, but they're all really, really bad. Most of them try to be all things to all people - and as a consequence fail miserably at being anything to anybody. I guess I've been guilty of that myself.

I'll be putting up a serious contender over the next several months to show that the situation doesn't need to be so abysmally abysmal. Oh yeah, and it will be open source, extensible, yada, yada. While basically working securely and outperforming any of the competition - without resorting to caching to make up for the sucky performance; like everybody else does.

In order to accomplish this, I'm not even going to try to create something that is all things to all people. Apache2.x+, php5.x+, mysql5.x+ and English only. I've re-written my existing website engine to be leaner and meaner and am currently adding some core functionality back in, whilst tossing 90% of the code that nobody (but me) ever used.

I've boosted performance by a factor of 4 at least, and will be reducing the number of database queries per page to under 10 on average (from a current average of 20-35); still way under the market leaders which hammer the database several hundred times for each and every page - and hit the file system an equal number of times. That's piss poor engineering and an embarrassment to any serious software developer.  

Security on each object has been radically simplified - however is extremely robust and verifiable.

Stay tuned...
Reference: Updating timezone files LAMP

Mike Macgirvin
  last edited: Fri, 28 Mar 2008 22:02:19 +1100  from Diary and Other Rantings
Updating all the timezone stuff one needs on a LAMP environment: (necessary in Australia because they changed the daylight savings start date once again). I haven't yet been able to convince my hosting provider to go through all this hassle; and the tables are outdated - so Aussie visitors may see an incorrect time on some of my websites for the next week.  

Test:

# zdump -c 2009 -v Australia/Sydney | grep 2008
Australia/Sydney  Sat Apr  5 15:59:59 2008 UTC = Sun Apr  6 02:59:59 2008 EST isdst=1 gmtoff=39600
Australia/Sydney  Sat Apr  5 16:00:00 2008 UTC = Sun Apr  6 02:00:00 2008 EST isdst=0 gmtoff=36000
Australia/Sydney  Sat Oct  4 15:59:59 2008 UTC = Sun Oct  5 01:59:59 2008 EST isdst=0 gmtoff=36000
Australia/Sydney  Sat Oct  4 16:00:00 2008 UTC = Sun Oct  5 03:00:00 2008 EST isdst=1 gmtoff=39600

(If the first two lines contain 'Mar' instead of 'Apr' you've got old tables). e.g. this is what an unpatched system would report:

# zdump -c 2009 -v Australia/Sydney | grep 2008
Australia/Sydney  Sat Mar 29 15:59:59 2008 UTC = Sun Mar 30 02:59:59 2008 EST isdst=1 gmtoff=39600
Australia/Sydney  Sat Mar 29 16:00:00 2008 UTC = Sun Mar 30 02:00:00 2008 EST isdst=0 gmtoff=36000
Australia/Sydney  Sat Oct 25 15:59:59 2008 UTC = Sun Oct 26 01:59:59 2008 EST isdst=0 gmtoff=36000
Australia/Sydney  Sat Oct 25 16:00:00 2008 UTC = Sun Oct 26 03:00:00 2008 EST isdst=1 gmtoff=39600

Debian:

# apt-get update

# apt-get install tzdata

PHP5.x

# apt-get install php5-dev

[fetch and save] http://pecl.php.net/get/timezonedb

# tar zxvf timezonedb-xxxxxxx.tgz

# cd timezonedb-xxxxxxx

# phpize

# ./configure

# make

# make install

# echo "extension=timezonedb.so"  > /etc/php5/conf.d/timezonedb.ini

# /etc/init.d/apache2 restart

MySQL:

# mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p

(ignore all the errors from Riyadh{NN}, iso3166.tab, and zone.tab)
Q=`(parlez vous|hablas|sprechen ze) $geek?`

Mike Macgirvin
  from Diary and Other Rantings
It just occurred to me that in the last 4-5 days I've written 'code' in Visual Basic, SmallTalk, lisp, C, bash, awk, PHP, perl, and python. Thousands of lines of code total. And there are probably a few dialects I forgot here. Not to mention 30-40 different flavours of config files, sed, Oracle-SQL, mySQL, and LDAP and some other stuff that don't quite qualify as 'code' but still involve intimately knowing a strange computer dialect. Oh yeah, HTML and JavaScript (of course).
Installing Oracle (oci8) into pre-built Debian php5

Mike Macgirvin
  last edited: Wed, 21 May 2008 11:02:14 +1000  from Diary and Other Rantings
Some notes to save somebody some grief:

Installing the Oracle libraries and access module into an existing PHP5 installation on Debian...

First grab the Linux instantclient from oracle.com - you'll also need the client SDK kit. Here I'm using instantclient 11.1

create a directory for these such as /home/oracle and unpack both of them to that directory.

Go into the oracle directory (and into the instantclient_11_1 directory) and create a symlink:

$ ln -s libclntsh.so.11.1 libclntsh.so

Grab the oci8 PECL package and unpack it somewhere (~/oci).

Make sure you have the following packages (in addition to php5, php5-cli, apache2, etc).

  php5-dev

  libaio1

  php-pear
  

Go to the oci8 directory (~/oci/).

You need to run 'pecl build' once to create the configure script.

$pecl build

But the problem is that pecl build will claim the files are installed and they are not. I wasted half a day on this one. Now go into the oci8-1.3.0 directory and build again by hand:

$ cd oci8-1.3.0

$ ./configure --with-oci8=instantclient,/home/oracle/instantclient_11_1

$ make

Fix any errors/warnings before continuing

Don't make install, which won't work.

$ cp ./modules/oci8.so /usr/lib/php5/20060613+lfs

Replace 20060613+lfs with whatever module directory has been setup for you in /usr/lib/php5

Create /etc/php5/conf.d/oci8.ini:

----

extension=oci8.so

----

Now run the php cmdline in verbose mode (php -v) and see if everything loaded. Fix it if it didn't.

You may need some env variables setup in your /etc/init.d/apache2 file to make everything work and actually execute queries, but a phpinfo() at this point should show your oci8 extension. See the php.net Oracle pages if you need help with the env variables.

Restart the web server

$ /etc/init.d/apache2 restart
Gail
 
Many thanks! You helped me install oci8 in a few minutes... :-)

Greetings from Tirana, Albania.
SPHINX
 
Thanks man. It still took a while (had some include issues), but it would have been a marathon without this post. Lifesaver.

SPHINX www.battleforces.com
More McSoft Musings

Mike Macgirvin
  from Diary and Other Rantings
As I ponder the Microsoft/Yahoo! issue a bit more, what strikes me is Microsoft's arrogance. No, that's nothing new. But it's the corporate mindset that says 'We have to dominate online search'. Why? Why does Microsoft feel they have to dominate every software category that has ever existed, online and off? Why is it that nobody else can make a profit in the tech industry, without Microsoft threatening to take the ball away from them?

Google may make bold statements, the same way Netscape once made bold statements, but the fact of the matter is that Google is doing nothing to threaten Microsoft's business. McSoft's worst enemy is themselves and their arrogance. Get rid of that, and a tech industry could flourish - one where everybody can make new products and create markets and grow the industry as a whole. As it is today, you live in constant fear of being successful, because if you actually make money, you'll be in McSoft's laser sights and they'll stop at nothing to wipe you off the face of the earth. It's difficult for anybody to thrive under that kind of pressure.
Stunned would be the word I'm looking for

Mike Macgirvin
  last edited: Sat, 02 Feb 2008 18:55:23 +1100  from Diary and Other Rantings
Microsoft is making a bid to buy Yahoo!. Surprise, shock... What words could describe my emotion on hearing this?

I can see the motivation and the reasoning behind it. They want to put a stop to Google ("I'd like to buy a noun, please."). Still I believe this is the wrong way to do it. The only way for them to stop Google is to buy Google. Don't laugh. They are ideologically more closely aligned than you might realize. I don't believe that they've thought through the consequences of this decision - or maybe they have but just don't care. It is a culture clash of epic proportions that will result pretty much in the destruction of Yahoo! and all they've ever done - and do nothing to harm Google. I suspect many of the employees will quit outright, and there's not much place for them to go in Silicon Valley except to side with the enemy (Google), the largest employer in the valley that's still adding significant headcount.

But I also believe that this move can't be stopped, so it doesn't really matter what I think about it. I would however like to share with you the exact image that popped into my brain on hearing this.

Image/photo
The Seven Year Itch

Mike Macgirvin
  last edited: Fri, 01 Feb 2008 16:07:39 +1100  from Diary and Other Rantings
Sometime later this month "Diary and Other Rantings" (i.e. my weblog) will turn 7 years old, and I'll start my eighth year doing this activity called 'blogging'. Perhaps I'll mark the day, perhaps not. We'll see. Maybe I'll just stop doing it altogether. Maybe not. We'll see.

This all started in early 2001. I was at AOL making lots and lots of money from my Netscape stock options. I had a Netscape employee home page that was visited hundreds of thousands of times a day, but this was slowing. AOL no longer linked to it. I had started running a new server in the spare bedroom in 1998-1999, and later moved it to the garage. It took almost two years to get a working DSL link so that I could actually run a public website off of it. High-speed internet to the home was still an experimental technology. DSL wasn't yet ready for prime time and ISDN had other issues which plagued it. Leased-line required somebody to sell you an end-point on the public net and nobody was doing this, besides being limited to 56k which was now the speed of most modems. Web 'hosting' in those days was mostly for big business and costed big money. I could certainly afford it, but decided to spend my cash on more important things (like buying  a music store a year later).

Running a Linux box with an internet link isn't very expensive in the overall scheme of things. So once the DSL was finally working I made a new home page and started improving it.

I think it was Cindy at 'Off the Beaten Path' (now at 'dustingmybrain.com' ) who first introduced me to the concept of a rambling page. Instead of replacing your 'Current Interests' web page every week, you just keep adding to it. Drop in a date. Write what's happening. I started doing this. I was writing HTML in emacs. I called it an online diary. I didn't have titles, categories, RSS feeds, etc. These would come much later. I wasn't writing 'articles', I was just rambling. Why do you need a title for it? That makes it look so structured. The only important thing is the date, so somebody knows when it was that you thought this way. This was important. After several years of living on Netscape time, I firmly believed that one didn't think the same way for very long, and technology was always changing - so information had to have a date.

The other thing that I did was to take a cue from some of the large online news sites, which were the best model available for presenting information that had timestamps. I started writing in reverse chronological order (recent first). This was born of necessity, since nobody wanted to load a large page and scroll to the end to find recent stuff; which was how we did things previously (logfile format).  

In fact I maintained this format for a few years until it became unmanageable. Then I looked for ways of automating my monthly (or whenever) process of moving the current entries to an archive page and starting fresh. So after looking to see what programs were available and trying a few of them, I instead wrote a program to do it myself. Over time that evolved from a simple diary 'archiver' to the thing that you see today - a mega social portal that does everything but make coffee. (I miss this incidentally, I had my computer turn on the coffee pot from an online request in the early 1980s using my first homebrew social portal).  

I still wonder whether anybody reads these pages. Does anybody care? I don't subscribe to the current notions of SEO and affiliate marketing and trackware and all the other ways to improve one's blog ranking. Most notable these days are the pages and pages of 'widgets' attached to every blog, selling everything from online communities to soap. Why bother? Your only visitors will be other bloggers that are all trying to get you to visit their own blog. They aren't really reading what you have to say, they're too busy 'selling' their own wares. Still even after the RSS fiasco a few months back, I manage to pull in a few thousand humans a week. They come and read a page and leave again. This is the state of the modern internet.

It may be of some interest that I've managed to serve up a few hundred million pages since this all started - mostly to crawlers and robots; however last year activity peaked with about 100,000 daily hits (30,000 human visitors) and we've had six or seven days with over a million hits. I've written close to 1500 articles and there have been about 6600 total articles at one time or another from various feeds - before I was forced to nuke them for legal reasons. Only about 250 comments total, which I attribute to my decision a couple years back to do away with the daily spam cleanup and only allow website members to post comments. [I've since revised this policy.]

The 'community portal' (which I started writing a couple of years ago) doesn't have much community and I don't know if that will ever change. Community folks like big parties and unless you have one, you're late to the party. Bloggers only like communities where they can sell their blog.  I don't know how to convince them that a long-running website with several thousand non-blogging human visitors a week is actually a good place to drop a link. Yeah, I could put you on my blogroll, but I read thousands of blogs. It would quickly grow to be unmanageable and you'd be lost in the noise.

But you can add your own link and profile page and whatever - you don't need me to do it. Hint, hint.  

Anyway - we'll see if this lasts or whether I just decide that there are better things to do. Write into space everyday and maybe a couple of people will read it. Maybe not.

That's what it's all about.

Don't ask yourself if it is actually relevant or important or whether anybody cares. You might not like to hear the true answer. It's one blog amongst hundreds of millions, all trying to be visited. All thinking they should be relevant to somebody. It's like asking if one star in the entire universe is relevant. Maybe one is relevant to somebody. But the big question looms, is it yours? Unless it's the sun and brings life to this planet, it's likely just another star in the vastness of space.  

In fact, nobody really cares whether you blog or not when all is said and done. Well maybe one or two folks. In my case those are the same one or two folks that cared back in 2001. Everybody else is just passing through on their way to somewhere else.

Still every day (sometimes two) I go to my website and ramble about what's on my mind. I tweak the software to make it better. Even knowing that it is all an exercise in futility. Strange.
Documentation? What documentation?

Mike Macgirvin
  last edited: Thu, 31 Jan 2008 16:09:19 +1100  from Diary and Other Rantings
The school's weather station webpage seems to have stuffed it sometime around Thanksgiving. Today somebody finally noticed and alerted the support staff.

My boss asks "where's the documentation?".

Right.  There is none. This system has been in place for ten years or more and fails occasionally.  When that happens we go in and fix it.

Start with the webpage that actually displays the data. It's pulling the data from a file that is supposed to be automagically updated. Except we don't believe in magic. The file didn't get updated. Now to find out why.

Since this is a scheduled event, cron has to be involved. Let's have a look at the crontab file. Hmmm. It's pulling the changes from another file that is supposed to be automagically updated. That one hasn't been changing either. What changes that file? It isn't cron. Or is it? That file is symlinked to a file on another computer. Let's go have a look at the other computer. Ah, I see. There's a crontab running there which generates the contents of the update file from a data file via a collection of python scripts. Let's have a look at those.

As I suspected, they are pulling data from yet another file that is automagically updated. Right. It hasn't changed since November either. What changes this file? Time to scan the logs. Nothing.

OK, it's time to start from the other direction. The weather station is connected to a PC in the corner of a lab. Let's have a look there. It's hung and totally unresponsive. OK, maybe that's the problem. I reboot it. Then go back to the webpage. Nope. Nothing has changed.

OK, somehow the data has to get from the weather station computer to the other computer where the python scripts can munge it. Let's have a look at the logs.

The logs say everything is fine, but it isn't fine. Nothing. It's not happening. Well this is interesting. I check connectivity and network connections. They're OK. We've got an IP addess and pings work just fine. A closer look reveals that there's a Windows task scheduler which occasionally FTP's the weather files across the net to the second Unix box. The logs don't show any errors. Hmmm. The files aren't being FTP'd though. They aren't making it. Then I see a notice at the bottom of the screen. Updates were applied some time since the computer was last powered on - six months ago. OK, what updates? Windows firewall. Right. So I have a look, and sure enough the computer's FTP connection has been firewalled because of an automatic update. The FTP's are silently failing - and indicating success. This is pure evil. After several minutes I'm able to get in with an administrator account that can fix the firewall and do so.

Then have another look. Still nothing happening. What could be the problem now? Ah, on reboot FTP is automatically disabled on the weather station software - again without any warnings. The logs again say everything is working and files are being transferred. More evil. What's the use of having log files if they lie to you? I turn on the FTP. Bingo - now the files get through. Now back to the second computer to manually process the files and dump them into the directory where the third computer can pick them up. Then back to the third computer to manually update the processed files.

Yay! It works.

Back to the documentation. How would somebody document stuff like this? There's just too much that can go wrong. I could use up a tree or two writing it all down. This is why we've got systems folks.
Sun acquires MySQL

Mike Macgirvin
  last edited: Fri, 18 Jan 2008 09:30:46 +1100  from Diary and Other Rantings
Unless you've been watching closely, this announcement was easy to miss. Sun Microsystems is acquiring MySQL. This has ramifications both good and bad.

This will likely affect a huge number of people who are currently using open source web applications; a majority of which are being stored on MySQL databases. Their future viability is now questionable. It all depends on the license and revenue models Sun chooses to adopt.

I would also try to steer clear of the pending 6.0 release as it is likely to involve significant re-structuring of the code to suit Sun's business requirements. It may be a year or three before it stabilises again. Sun is legendary for introducing layers of bureaucracy into development projects.  

While Sun may make public announcements of their intent to continue to provide the product for free [and it should be noted that there was no such announcement in the press release], it is difficult to imagine the corporate bean counters not making a recommendation to derive as much revenue stream as possible from the acquisition.

You can read the announcement here.

Also of potential interest is this (dated) history of MySQL
Yet another somewhat useful thing

Mike Macgirvin
  from Diary and Other Rantings
Just turned on my 'Related Articles' plugin. I've been having some fun with it. Click on 'Random Article' or view any individual article (not a page full) to seed the process. Then down on the left hand side of the page (way down) is the list of related articles. I noticed lots of interesting brain warps and rants over the years that actually do follow some strange twist or turn of logic.
xml playground revisited

Mike Macgirvin
  last edited: Thu, 10 Jan 2008 16:01:49 +1100  from Diary and Other Rantings
Looks like I got sidetracked from my original mission to use this website as an xml playground to explore and develop new communications technologies, and instead wrote a social portal that hardly anybody cares about. That was a few years ago now. Well, I haven't given up. It just took a while to reach the state where I can get beyond the user-interface plumbing and get back to the machine interfaces which is where the fun is.  

Feeds have improved a lot. I'm using Atom paging now. Still holding off on atom-thread for comments since I can do it so much easier embedding into the articles - though I note that the latest Firefox parses atom-thread just fine. No use forcing it on the public until a few more feedreaders have jumped on board. The code has been working for a year or two, but I'm just waiting for the rest of the world to catch up before I turn it back on.

I've been playing with a weblog export tool that's basically an Atom feed, but replaces images and attachments with inline data: URL's. Have had a few glitches - including a PHP bug in the regular expression library that I need to report. But this in theory can let you take an entire weblog and move it elsewhere as one gigantic XML file. Everything. Images, attachments, comments, categories, the whole nine yards.

I've also got Atom Publishing Protocol support in a very primitive state (but not yet ready for prime time). This is a big effort and I don't expect to be finished for a few months. I've got a suitable framework, but this site works a bit differently than the model used by the atom publishing spec. It will take a while to resolve all the differences so it plays nicely. This would for instance allow you to import your entire weblog from elsewhere in the world - especially one that used data: URLs to bring in images and attachments. Otherwise if I use the default model, I've got to package everything into a workspace for export, and this takes more than one file to represent all the structures completely. But that's the big picture - on a smaller scale, you should soon be able to publish weblog posts from your cell phone, or sync new articles with another weblog you may have. I 'm also not bothering with the xml-rpc remote mechanisms for publishing. They're primitive now, the api's too fragmented, and pretty much dead.

Oh yeah, and we've got trackbacks now - for any weblog that allows non-member comments. This is a flavor of xml-rpc. It isn't a big deal, but a few folks have requested it. You can find the trackback URL in the 'more actions' menu of articles - that is for any member weblogs that allow them. Mine does.  

Oh, and photo albums can now be exported as zip files. That has nothing to do with XML...
John
 
Cool. Don't data: URLs involve a 33% expansion in size and come with some potential size limits (in libraries if nothing else)?

A general format for Atom that allowed cross-references in URLs (like cid:) would be useful both for images and other attachments and for related feeds like comments, trackback snippets, etc.
Mike Macgirvin
  
Yeah, there are issues. I'm just trying to figure out how to get there from here. Right now data: URLs are the only way I can come up with to encapsulate everything. If a few people adopted it, it might be viable. At least everything to export an entire blog in a single file would be standards compliant. I'd be glad to see something better...

Hey congrats - I hear you're at Google now....
Dynamic Font Resizing

Mike Macgirvin
  from Diary and Other Rantings
I've been working with some dynamic text/font resizing tools recently. Some visitors over the last few days may have seen some of these efforts in progress, but I've just turned them off again until I get it all sorted.

In order to change the text size on a page after it's loaded, and without re-loading, one has to walk the DOM tree and re-calculate every font size on the page. There are actually a few open-source packages which will do the dirty work, but there are still a lot of issues. Almost all of these center around MSIE (why am I not surprised?). Additionally, it takes some work to get them to play nicely with sifr - the modern day equivalent of webfonts; which was abandoned around the turn of the millennium as being too infected with DRM controls to ever mass deploy.

Anyway, if you're interested in doing dynamic font scaling, I'd like to point you to JS magnifier, which is a pretty cool little app for walking the DOM and changing all the sizes. I had a bit of trouble with it on web forms, because it listens for key events, and if you type -,+,<,or > anywhere it changes the page size - even if you were typing these into a text field. Rather than modify the event listener to determine if it was a form and check the current focus, I just commented out the event listener and used JS links to activate the functions.

But then on IE, everything was screwed up. First of all, even with a setting of '0' (no change), the page always goes through a resize cycle after you load it, and a lot of inherited sizes got messed up and set to something obscenely small. You can fix these by declaring them '!important', but you'll likely end up with over half your size tags set to !important in order to render your original page anywhere close to what it was designed to look like.  

The real trouble began however doing the DOM walk. IE does some real funny stuff to their DOM (this shouldn't be a surprise to anybody either). If you've got an embedded video, and there are any troubles loading the video, the entire DOM tree from that point forward seems to get rebuilt (resulting in the font size reverting for part of the page). The same thing happens if you've got AJAX-updated content on the page - every time there's an update, everything from that div to the end of the page reverts.

Looks like I'm back to square one. Think I'll go back to my original plan which I never quite finished implementing, but basically use proportional (em) fonts everywhere [this I do already], and then dynamically change the main body font declaration on the fly as the CSS page goes down the wire. This is hardly dynamic - it results in the need for a page reload to render everything correctly. At least it's portable and doesn't require javascript or depend on anybody's screwy DOM implementation. Oh well, live and learn I guess.
Mike Macgirvin
  
I've now updated the site with text size preferences. Go to the themes
page if you want to change the default text size - which is now a smidgin smaller than it used to be. If you're logged in when you do this, your preferred size will be restored every time you login.
AOL finally kills the 'Netscape' browser

Mike Macgirvin
  last edited: Sat, 29 Dec 2007 22:42:27 +1100  from Diary and Other Rantings
AOL finally pulled the plug on its branded version of Firefox which it amusingly calls 'Netscape'. Those of us who were a part of the real Netscape can only laugh. I haven't used the AOL browser in years and never really cared for it. But it's sort of the end of an era and I feel a bit saddened. The web browser with the 'N' is no more.

Long live Firefox.

Link

Oh, and I'm sorry but iceweasel? What's the point?
Netscape won - well sort of

Mike Macgirvin
  last edited: Mon, 10 Dec 2007 18:12:41 +1100  from Diary and Other Rantings
I was recently reflecting on my startup days at Netscape more than a decade ago. What was compelling about what we were trying to accomplish at the time was to make operating systems irrelevant.

Netscape tried to accomplish this by embedding what once were typically operating system functions into a multi-protocol window onto the world (the web browser). It had your text editor, the aforementioned web browser, a window system (aka frames) and a programming language or two (Java and JavaScript) to tie it all together and turn it into a general purpose information and communication appliance. The consensus at the time was that this would make the underlying operating system (*nix, Windows, Apple, OS/2, whatever) irrelevant. If you had this tool on your computer, you could work on any computer, any platform, and be able to do all of your information related tasks.

Needless to say, this grand idea failed. The browser is still around, but the original idea was lost along the way. New browsers don't come with the same tools we tried to provide back then. Java and calendaring in particular are now separate add-ons, as is the email package - available separately.

But does that mean Netscape lost? Well yeah, Netscape did. They're gone. But the concept of operating system irrelevance didn't. As I write this (on a Windows box), I've got windows open to Unix servers, I've got Unix command shells and utilities, I've got typical Unix programming languages and databases and web servers all running in this alien environment. It's actually impressive how far we've come. I'm currently dumping a remote Linux file system to a local (Windows) disk drive using nothing but Unix commands. ssh, tar, bash. These are all running on the Windows box using Cygwin, which comes with a couple hundred native Unix commands. I've got my familiar LAMP (Linux, Apache, MySQL, PHP) web programming environment via XAMPP; which I'm developing with emacs. Remote system monitors running via X (the Unix windowing system) and being displayed on my desktop via Cygwin/X. I've even got my mouse setup with hover-to-focus mode.  The only thing that provides solid evidence that I'm not running a Unix operating system is the IE icon on the desktop (which I never use).

And this is all running on Windows.

So the operating system is completely irrelevant. It's just something that sits in the background and allows you to launch programs. Just like we envisioned back in the '90s. OK, not exactly like we envisioned, but I'm quite comfortable with the end result.
Not NORK2 valid

Mike Macgirvin
  last edited: Tue, 04 Dec 2007 21:44:23 +1100  from Diary and Other Rantings
Apparently not HTML valid either.

I'm referring of course to Microsoft's IE homepage...

Check it out for yourself...