cover photo

Mike Macgirvin

mike@macgirvin.com

Towards a distributed online social network

Mike Macgirvin
  from Diary and Other Rantings
Towards a distributed online social network.

Recent events have caused a great deal of consternation in the future of Facebook as a social portal. We've seen this before in other social networks and experiments over the years. Many folks have suggested (and there are some current development efforts) that the future of online social networking is distributed – where nobody has control over your private data but you. This is a top level examination of how such a network might be implemented and suggests some practical implications. This document will evolve. Contact me directly for updates and/or practical discussion.

On the issue of friendship

In the online social landscape of recent years, “friendship” has become a mutual and binary relationship between two people. You are either friends or you're not. There are no shades of grey for acquaintances, co-workers, etc. Likewise a “relationship” can only exist if both people mutually agree to the existence of the friendship.

This is completely alien to friendship as it exists in the real world. Often, feelings of kinship we might feel for somebody are not reciprocated, or are reciprocated at a different level or degree than our own interest.

So let us start with a clarification of relationships in this new world of online social networking. They can be one-sided. They can also have many levels of closeness and/or separation. Interaction may be complete and involve coordination and synchronicity or it may be intermittent and casual.

In software terms, we must dispose of the outdated notion of “request friendship, and if it is accepted, friendship commences”. That is simply not how the world works. The most convenient way to express this in terms of the social web is an “expression of interest”.

On the issue of privacy

Facebook in particular has been the focus of a good deal of scrutiny because of differences in the perception of privacy. Humans express themselves differently depending on who they are interacting with. Some of these communications may be open and public (as in the instance of a professional musician communicating with his/her fan base). Some of them can be quite private (as in the instance of two potential or de facto lovers). In between are a range of needs, such as the need to keep our private life away from close scrutiny by co-workers and potential employers.

If we are to communicate with this range of needs online, we need fine granularity in who can see different pieces of information or communications. Facebook has tried and in fact has advanced the state of the art in this concept of privileged communication, but it comes at odds with their business model - which requires most communications to be open so that the service has more value to advertisers.

If we are to propose a distributed social network model, it must provide at least this level of granularity.

The distributed social model

Many aspects of the online social network are made more difficult in a distributed model than they are in a monolithic environment. Conversations with several participants are not easy to present in real time. Instead of the conversations existing in a single database, fragments may need to be pulled in from disparate locations.

Also, without the supervision of a central authority with the power to revoke membership, abuse can occur in the system by either commercial forces or vandals. Let's take the issue of friend requests. Commercial forces will seek to make lots of friends, because it is the nature of advertising to spread your brand name.

We discovered in the “blogging era” of 2002-2008 that any comment box which was open to the public quickly became a spam target. A distributed social service will not succeed if it requires manual maintenance of thousands of undesirable requests for communication each day.

In order to prevent or reduce abuse, we need some form of verification of identity. A username/password scheme will not succeed in this environment as people will tend to use the same password in all cases and this can lead to forgeries and break-ins.

Symmetric or public key cryptography has to be central to a distributed social network. Here there is a key pair – one public and one private. Either key can decrypt communications encrypted with the other. One key is kept under tight control (the “private key”). The “public key” can be shared with anybody.

We can also assume that given an open and distributed implementation, software will be created with a range of features to fulfil individual needs. We do not need to define exactly how these work or what they do. All we need to do is specify how they will interact. We also cannot force anything on the individual “cell” of this social network. For instance, we cannot specify that if we comment on an author's communication, that our comment will be kept private from certain other people. We may not even be able to delete it once it has left our hands. All we can do is “request deletion” and hope for the best. It's a good thing for implementers and friends to honour these requests because otherwise they will fall into disfavour and not be trusted with future communications.

Also, how do we know that “Barbara Jensen” is actually the Barbara Jensen we have communicated with before? We need a shared secret and/or strong authentication. Who do we trust? In the early days of such a service, we might be willing to trust somebody is who they say they are – especially if this person becomes a member of our social network through email or personal communication. As time goes on and our network grows, the abuse potential will also grow. At that point we will need stronger verification. This might involve “friend of a friend” trust – or third party trust through a “respected” source, such as a major network provider who has already established a relationship with this person. It would be easy to say that we should use known certificate authorities to provide this information - but this will involve monetary requirements and complicated procedures for proving one's identity. These complications could prevent a network from ever growing to a level of “critical mass”. So we have to be lenient with what credentials we accept, while preferring more proof the further from our core group of social contacts one is.

XSS prevention is also a built-in feature of monolithic social network providers. This radically affects the ability to use HTML in our communications. Plain text may not be seen as a rich form of communication. Perhaps a rich text language should be provided from the outset. There is also the potential for third-party providers to provide filtering services.

Ditto for porn filtering.  

It is also probable that third-party aggregators will provide global searching and lookup to locate profiles of people that interest us.

Mandatory items:


  • Public profile page. The public profile page should include the URL's of any other communication endpoints we require, embedded into the page, be it either HTML or XML. This is the global access page to this particular profile. It may contain recent updates of items with global visibility. It also contains either inline or a pointer to the public encryption key for this person.

  • Expression of interest URL, discoverable on the public profile page. This is the friendship request application. This is activated by electronically “dropping a card” containing the requestor's public profile page. There need not be any communication that the expression of interest was acted on. A “ticket” is granted regardless. This ticket is to be used in further communications with this person.


  • Private profile page. This page is where to find the “feed” for this person. The feed itself is contained in an XML exchange of credentials (an encrypted copy of your ticket, along with your public key), which results in a personalised feed of information being returned. There is no requirement that the person recognise you as a friend. You might be returned a generic public page.


  • A “post page” where you may comment on published items.

  • A notification URL which is where notifications of recent activity will be sent.

  • A “friend” page. This is quite different from the friend pages you may be used to as it includes profile pictures/names/URLs for people who you follow – and are willing to share with the public. There is no requirement that they follow you. If you provide an XML credential exchange, you may be shown a different list depending on what the person wishes to share.
The only way to determine if a friendship is mutual is if both people actively follow each other's activities.
IMSLP closes down

Mike Macgirvin
  from Diary and Other Rantings
The lawyers are winning. Found this over on Night Passage (one of my favorite jazz websites):

IMSLP.ORG, the Internet Music Score Library Project has closed down, the founder of the site sadly announces:

On Saturday October 13, 2007, I received a second Cease and Desist letter from Universal Edition. At first I thought this letter would be similar in content to the first Cease and Desist letter I received in August. However, after lengthy discussions with very knowledgeable lawyers and supporters, I became painfully aware of the fact that I, a normal college student, has neither the energy nor the money necessary to deal with this issue in any other way than to agree with the cease and desist, and take down the entire site. I cannot apologize enough to all IMSLP contributors, who have done so much for IMSLP in the last two years.

There were more than 10.000 music scores in the public domain on the site
I must add I am not surprised at all, as I wrote in my previous post I re-started playing piano, and I was so happy to find on IMSLP the whole Mikrokosmos by Bartok, a serie of studies divided in 6 volumes, very popular among piano students. It was an old Sovietic edition with titles in cyrliic, but all the studies were there. The problem is: Bartok died in 1945, 62 years ago, therefore Mikrokosmos does not comply with the rule life+70 years and it's not in the public domain. But this is not the only case. I really hope they open again, but being more careful about published scores.
 music  feeds
Coming soon...

Mike Macgirvin
  last edited: Mon, 05 Nov 2007 11:54:38 +1100  from Diary and Other Rantings
I've managed to code up some radical changes to the newsfeed system - which I believe will allow us to provide some (very limited) functionality w/r/t reading news without causing any legal issues and without subjecting any site members to liability just for the crime of looking at published newsfeeds. I'll be enabling some of this in the next several days as I get any remaining kinks worked out.

The ability to import articles into the CMS from newsfeeds will not be included. I could of course take the common approach and let y'all take legal responsibility for your own imported content, but that's hardly the kind of thing that a reputable web service would do to its members. Granted that's what some supposedly reputable web services do, but I don't necessarily agree that it's the right thing to do.

You probably should be cautious in any articles you publish to this site that any citations you might provide from elsewhere are free and clear - as we're under a bit of extra legal scrutiny at the moment. It's generally OK to include a snippet of an external article and add comment to it. Don't include the entire source article; and ensure that you have provided additional content/commentary written in your own words.
 feeds
Joe
Joe
 
I miss the old cr.unchy...........but I understand the current state of affairs. It won't be long before they'll be taxing the internet also........
Bye, Bye Miss America Pie

Mike Macgirvin
  from Diary and Other Rantings
A look at the website statistics bears out what I already knew from instinct. Traffic is dropping like a rock. What do people want? They want aggregated content. They want the answers to their questions. If you can't provide these, they'll move on to somebody who does; even if this exposes them to a lawsuit.  

In fact, it is only the 1% of people that are writers and contributors who might ever have legal troubles. The vast majority of people (99% or more) are lurkers who never perform a single action to create content.

Last week, over 100,000 page hits a day. Now, 29,000 and falling. There's still a lot of stuff here, but it's hard to compete with celebrity gossip and tech news that are now off limits as content sources.

I'm not complaining about this trend. Those of you who know me also should know better than to think so. I'm just an observer, reporting the facts.
 feeds
A friend (now at Google) responds

Mike Macgirvin
  last edited: Fri, 02 Nov 2007 15:28:17 +1100  from Diary and Other Rantings
A friend (now at Google) responds:

"Mike, obviously I cannot talk about Google internals, but what did we do about content violations at AOL?"

Uhm, nothing.

"Right."

OK, this isn't entirely true. There was that big flap with the EU over their different definition of the age at which a child is subject to parental controls. And then we spent weeks coming up with a solution to the problem of what should happen to group content if a group founder/moderator leaves the online service.

But mostly, content issues weren't our problem as software designers/engineers. The recently passed (at the time) DMCA had a "Safe Harbour" provision - which basically states that a service cannot be liable for infringement by its members if notified of content offenses and takes the offending content down/offline in a reasonable amount of time. Ultimately, this was our solution to any legal issues that might arise. We left it to the legal team to sort out what was or wasn't illegal content, and provided tools to operations staff to actually remove the infringing material.
 feeds
Mike Macgirvin
  
The important thing to take away from this is that the safe harbour provisions only protect a service providor from liability for infringement. It shifts this burden completely to the members themselves. If one takes Google Reader as an example, if the republication of a particular newsfeed is found to be unlawful, every member who viewed that newsfeed on the system is potentially liable, because their actions (collectively) allowed an unauthorized copy of copyrighted material to be made. In copyright law, computers do not make copies. People make copies. Google does not make copies of newsfeeds. Members make copies, through their actions of subscribing to and reading the feed - and even if they aren't aware that by performing a seemingly innocuous action as reading a published newsfeed an unlicensed copy is in fact being made on Google's server. This is important to know - for anybody who uses any online service anywhere. You will not likely see any warning that simply clicking this button or filling in this form field with a newsfeed URL could subject you to a nasty lawsuit.  

The take-down provisions of the DMCA usually end the matter. But they don't remove the liability of the member or members who violated the copyright. It could still end up in the courthouse. All it means is that the service providor won't be there.
more on copyright and newsfeeds

Mike Macgirvin
  last edited: Thu, 01 Nov 2007 23:08:49 +1100  from Diary and Other Rantings
I've been plunged headon into an absolute chaotic nightmare trying to fully understand the issues of copyright as they apply to newsfeeds.

The short answer is that in terms of the law, copyright wins. You legally cannot show an RSS feed on another website without the express permission of the content owner. There are rare exceptions. The existence of a syndication feed on a website does not grant any rights to the content it contains.

My previous post on why Google can get away with it, is simply because 'they are Google'. According to every document that I've managed to cram in the last few days, they are clearly in violation of copyright law both in letter and spirit. But if you feel infringed, your only choice is a lawsuit - and you will be going up against one of the shrewdest collections of intellectual property lawyers ever assembled. I don't believe anybody has attempted it for Google Reader, although others who have sued Google over copyright infringement have come home with their tail between their legs.

In any other case, posting a newsfeed on another website (in whole or in part) puts you in questionable legal status. Fair use is nebulous. You cannot code fair use into software, and it doesn't offer any protection against getting sued for infringement. It merely gives you some guidelines for a handful of possible defenses.

You can of course view a newsfeed legally on another website if you have the express permission of the copyright owner. But again, this cannot be coded into software, and even then you can't make it available for another person to read - who has not obtained similar permission. This makes Digg, Technorati, Google Reader, and del.icio.us violators in principle, if not in fact. Truncating articles (as some of these sites do) is a defense, it is not a legal standard - and they still could face legal challenges. This is not just limited to RSS/Atom and other syndication formats. It applies to any website content.

If you're a small website operator like me, and you provide any publicly accessible newsfeeds without express permission of the feed owner (or copyright holder), you're technically in violation of U.S. law (this assumes that the U.S. is somehow involved in the content and/or reproduction at issue). Even if members create their own feed sources - but in that case, the members are in violation - not you. You are guilty of facilitating their infringement.

There's only one way out of this mess. Somebody has to sue Google over this issue and lose. Then we'll have some established legal principle. Until then as one writer wrote - 'view a newsfeed, go to jail'. (Though technically copyright law is a civil violation and not a criminal violation).
 feeds
peonyden
 
Hi Mike

Your articles on Newsfeeds and Copyright are very interesting. I was taken aback some time ago to find you had been carrying my blog on your site, as a Newsfeed. Then I thought about it, and decided that as I was writing my stuff to get it read, then it didn't matter where it was read - the more coverage the better.

I have not used a Newsfeed on my blog, but I know others who do. I shall suggest that they read your posts.

Its interesting that Google gets away with what they do - based upon their market domination, no doubt.

Cheers

Denis
But wait, there's more...

Mike Macgirvin
  last edited: Wed, 31 Oct 2007 12:32:46 +1100  from Diary and Other Rantings
Seems I got locked out of the forum that was accusing me of all manner of illegal behavior for the way my websites (used to) use newsfeeds. I merely asked what I was doing that they felt was wrong, and how I might rectify it.  

If you've arrived here trying to find out what all the fuss is about, sorry, but I'm no longer able to respond directly to the accusations made.

But there's more. I'm going to use your esteemed Google Reader as an example, since this was used as a shining example of how to publish/republish newsfeeds in a non-infringing manner.

It was claimed that many sites can get away with publishing newsfeeds because they only contain snippets, and not a complete copy of the original copyrighted work. Further that they do not republish this information to third parties.

It just ain't so. I went to Google Reader and plugged in a feed URL for one of my web properties. Up came my newsfeed. Now lets take a look... hmmm. I don't see snippets of my articles. I see the complete articles. The whole tomato.

But wait, there's more. The argument is that Google just creates this list on the fly, so isn't storing and republishing protected work.  

Then how come I see a full copy of every article I've ever written since Google Reader came into existence? Everything. Tell me, I'd like to know. I'll tell you. It's because somebody else subscribed to my feed, and Google made a copy of every article that has ever been read on my site, and is republishing it to anybody who accesses that feed URL. My feed only contains 20 recent items. But everything is there, even articles which have been deleted from my website.

The only way this could happen is if they make a (complete) copy of every article, and republish it on their website. No different than anything which I did, and in fact they publish a whole lot more than I ever did for a given feed. I only provided a snapshot of the current feed, and the ability to import one or two articles a day from a few select feeds.

So Google Reader has the ability, and is actively creating copies of every weblog for which it is provided a feed - providing an alternate to ever visiting the source website, and without regard to copyright issues; and republishing this to the world. Exactly what I was accused of doing.  

Google as you may or may not know is pretty much exempt from copyright restrictions under the fair use clause. They have argued successfully that they can copy pretty much anything that has ever been written. But I can't - because I'm not Google, and fair use apparently only applies to large U.S. corporations with lots of lawyers.

But since I've been locked out of the dialog, the idiots who have accused me of wrong doing will probably never see this.

As I told folks on that forum before I was locked out, if you really want to protect your writings, don't publish them, and certainly don't syndicate them. And if you don't want your entire site to be cloned, certainly don't syndicate full articles.

It's too late to bring back news on this site. I'm done with it. Heck, I'll just use Google Reader and save some disk space, not have to worry about foul language and XSS injections and all the other mess that comes with importing content from the wild.

It's a bit of an inconvenience to those who had a desire to use this software as it was envisioned, to create personalized websites of personalized content compiled from their favorite sources around the globe. But nobody was really using any of that functionality anyway.  They were just letting me subscribe to interesting feeds and using it to see glimpses of the blogosphere that they wouldn't have seen otherwise.

Oh well. I've got better things to do.
 feeds
Mike Macgirvin
  
For point of reference, here is the text of the so-called 'fair-use' exclusion, which is commonly used as a defense against infringement of copyrighted material. Note that this applies only to U.S. copyright law. Different countries may or may not have similar exclusions.

107. Limitations on exclusive rights: Fair use

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.  
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
Rough day

Mike Macgirvin
  last edited: Mon, 29 Oct 2007 22:50:01 +1100  from Diary and Other Rantings
I really don't even want to talk about it, but you can see the remnants all over my website. Had some heated debate about intellectual property and specifically how it relates to RSS/Atom newsfeeds. Seems a large number of people think that it's really bad to show a newsfeed to a third party (goes by the term 'republish') - although I fail to see where one could possibly draw a line. A feed is a feed. So for the present time until I can figure out how to make everybody happy, I've disallowed the viewing of news items by non-members; and culled from the ranks of our esteemed collection of eclectic news sources any information provider with less than 1000 viewers/day.

This means to see news articles which didn't originate on this site you will have to login. It also means you can't share these news articles with non-members. And it also means you won't be seeing Junk Drawer, Frog and Goat, Fragments from Floyd, or any of your other favorite small blogs here anymore, unless you import them yourself for your own personal viewing. Please contact me if you have any troubles working out the feed settings. If I import them and allow you to read them it is considered republishing, and that's really bad.

I'm keeping the larger newsfeeds at the present time because these are primarily professionals who have no issue publishing or re-publishing their content assuming full attribution is provided.

The place to apply controls is in the feed itself and anybody who's been around the syndication space for a while knows that. Many novices don't understand that they control their own destiny and can include anything, everything, or nothing in their published syndication feed. If you include everything but object to how it is used by third parties, it's kinda' like taking your clothes off in the middle of the street and then complaining because somebody saw you naked.    
Be warned: things may change further as I figure out how to make everybody happy.
 feeds
Joe
Joe
 
I am very disappointed in this turn of events. After hearing this morning, on local radio, of yet another attempt to tax access to the internet and to add local sales tax to internet purchases, I then log onto my favorite site for quirky news only to find it's now been curtailed because some writers who have a very small audience have an inflated opinion of themselves and think they can profit from their rants and have limited access to their scribbling in order to what....collect ad revenue? Sell their opinions? It was fun (the free distribution of odd ideas) while it lasted, but hey, folks, yeah, you out there who suddenly think your crayon drawings are valuable in real cash terms, aren't really that much different and don't require that I pay to view them. And yes, this applies to my own crude attempts to ply this medium.
traffic and more traffic

Mike Macgirvin
  from Diary and Other Rantings
Sorry folks for the problems connecting this morning. Had our very first million hit day, and the servers in La Brea weren't ready for it - and neither was I....

It was a combination of several things converging at once. Shamita Shetty  apparently was exposed on the Style Ikon channel, and even with an overlaid star to protect the family jewels it seems that a bunch of folks from the UK couldn't get enough of it. Wonder what this incident did to the transatlantic backbones...

But what really killed the servers was that as this was going on - Google, Yahoo, and Ask Jeeves and a couple of other search engines all converged on 10-12 of my sites at the same time to do their normally periodic crawls. Normally I'll get one or two crawlers doing one or two sites a day.
Mike Macgirvin
  
Things seem to have now returned to a normal state of chaos.
MichaelAnn
 
Ah ha! I was wondering what happened to ya! So you have been elevated to the lofty million-air club? Congrats! :)
Tech Support

Mike Macgirvin
  last edited: Mon, 29 Jan 2007 03:22:25 +1100  from Diary and Other Rantings
This belongs to MichaelAnn. Reproduced here because I love it. I would normally just import the feed except this one is setup with teaser (short) feeds where all the articles are chopped off in mid [...]

You've reached support, how may I help you?
youtube:zwVHbINaUgI
MichaelAnn
 
Hey! Cool! I'm gonna be famous! Hi Mike! :)

One of my favorite support dialogs always begins with:
me: What operating system are you using? Windows or Mac?
caller: I don't know... how can I find out?
...
Britney Shaved

Mike Macgirvin
  last edited: Tue, 05 Dec 2006 17:33:38 +1100  from Diary and Other Rantings
I was trying to figure out why web traffic doubled today. Seems one of my news feeds had this picture on it (see attachment). Ohmigosh. Turns out Britney is not only a female, but has labias. Who would've thought? The same picture brought down several other sites across the globe because of the traffic it generated. You'll have to view the picture separately. It isn't exactly kid safe, so I'm not gonna inline it...
 sex  feeds
Another new chunk of code

Mike Macgirvin
  from Diary and Other Rantings
I've been thinking about how to do this for ages. Finally did it. You'll find some new stuff in the menus called 'Sharing'. No, it isn't about that half of a ham sandwich you're having for lunch.

Over the past few months you've gained the ability to subscribe to various information sources and recently to configure what features you desire from the website. I mentioned this ability months ago in my grand vision - kinda' like myspace on acid. Well I finally finished the thing which takes that vision and turns it into reality. When you create these private views of the website, it's like having your own private Content Management System (CMS). In fact that's the way I like to describe it to people. There are lots of folks peddling multi-user community CMS's. Yada, yada. But nobody that I'm aware of is selling or even working on a 'personal CMS as part of a multi-user community'.

So what's this 'Sharing' all about? It's simple. You spend the effort to configure your personal content system whatever way you want. Then you can turn around and let other people use it.  

See with MySpace, you get a page that's all yours to mess with. Some of the other community sites give you a few more things. But this let's you build your own complete community site from within a broader community platform. You can make this look like an auto racing site and when people visit your shared page, and then any other page on the site, that's what they'll see. You're in total control until they go elsewhere or turn it off. You control the menus, you control the skin, and you control all of the information sources for your space. They can be your info sources, they can be my info sources, or pull in whatever feeds you want from anywhere. Want your audience to have mail? Chat? And discuss the Confederate War? Two minutes. Photo dating website? Two minutes.
Content Filtering

Mike Macgirvin
  from Diary and Other Rantings
Content filtering is doomed.

I started off a while back with a bad word filter to try and prevent some real foul-mouthed pages from appearing on some of my websites. This was based pretty much on George Carlin's 'seven words', with the addition of 'multipart/', because it tends to show up in comment spam disproportionally.

Then a week or two ago, I noticed one of my 'urban' feeds was getting a bit raunchy, so I extended the word list with one I found on the net - which is a pretty good collection of raunchy innuendo, racist slurs, and hate speech terms.

I then found that one of my favorite 'tech' feeds was getting blocked for foul language. I had a look at the feed. It contained such bad terms as 'button', 'association', 'marketwatch', 'cockpit', 'documentation', etc. I'll let you figure out what parts of those words ended up getting red flags.

So I've gone back to my seven words list. Rather than block these entries completely, I also think a better idea would be to go ahead and import the articles and just mark them censored. That way the articles which get flagged because of sub-words are still available to folks with a higher tolerancy level - but will still perform the intended task of keeping really foul language away from innocent visitors to the front page.

It's an interesting challenge.
Congress targets social networking sites

Mike Macgirvin
  last edited: Sat, 01 Jul 2006 05:16:15 +1000  from Diary and Other Rantings
C|net reports:

[As you read this, be reminded that this website is a social networking site]

The concept of forcing companies to record information about their users' Internet activities to aid in future criminal prosecutions took another twist this week.

Rep. Diana DeGette, a Colorado Democrat, originally proposed legislation (click here for PDF) in April that would require Internet service providers to retain activity logs to aid in criminal investigations, including ones involving child abuse.

Now DeGette and some of her colleagues in the House of Representatives are suggesting that social-networking sites should be required to do the same thing.

"How much would it cost your company to preserve those IP addresses?" DeGette asked at a hearing on Wednesday that included representatives from Facebook, Xanga and Fox Interactive Media, the parent company of MySpace. "You're going to store the data indefinitely?"

An IP address is a unique four-byte address used to communicate with a device on a computer network that relies on the Internet Protocol. An IP address associated with CNET.com, for instance, is 216.239.113.101.

Michael Angus, executive vice president of Fox Interactive Media, said he agrees with the idea of data retention for MySpace. "As a media company, Fox is very committed to data retention," Angus said. "It helps us police piracy."

Rep. John Dingell, a Michigan Democrat, added: "Why can't data that links IP addresses to physical addresses be stored longer?"

The concept of mandatory data retention was pioneered by the European Union, which approved such a requirement for Internet, telephone and voice over Internet Protocol (VoIP) providers last December. A few months later, the Bush administration endorsed the idea, with Attorney General Alberto Gonzales calling it "an issue that must be addressed" and--as first reported by CNET News.com--following up in private meetings with Internet providers.

In those meetings, Justice Department representatives went beyond the argument that data retention was necessary to protect children--and claimed it would aid in terrorism investigations as well.

During Wednesday's hearing, politicians also claimed that social-networking sites were not doing enough to verify that their users who claimed to be a certain age were telling the truth. (Recent news reports have said that sex predators are using MySpace and similar sites to meet up with teens.)

"There is more you can do," DeGette said. "You can do algorithms that will go beyond just the date of birth that they register, to start to weed out some of the underage users." She also called for the companies to participate in a "national public service program" to distribute an educational video.

Two paths for data retention
Data retention legislation could follow one of two approaches, and it's not entirely clear which one U.S. politicians will choose.

One form could require Internet providers and social-networking sites to record for a fixed time, perhaps one or two years, which IP address is assigned to which user. The other would be far broader, requiring companies to record data such as the identities of e-mail correspondents, logs of who sent and received instant messages (but not the content of those communications), and the addresses of Web pages visited.

Earlier in the week, Internet companies tried to forestall potentially intrusive new federal laws by launching a campaign against child pornography designed to tip off police to illegal images. Participants include AOL, EarthLink, Microsoft, United Online and Yahoo.

In addition, Comcast announced that it will begin to retain logs that map IP addresses to user identities for 180 days, up from its current policy of 31 days. (The company stressed that it does not record information such as "Internet use or Web surfing habits.")

But Rep. Joe Barton, the Texas Republican who heads the Energy and Commerce Committee, said even after hearing the news, that he still wanted to enact "a comprehensive anti-child-pornography" law. "I think the Congress is tired of talking about it," Barton said, adding that it was time to "protect our children against these despicable child predators that are on the loose right now in our land."

Barton has not released details about his legislation.

This isn't the first time that MySpace and social-networking sites have faced criticism from politicians--and the threat of new federal laws.

A bill introduced last month by Rep. Michael Fitzpatrick, a Pennsylvania Republican, would cordon off access to commercial Web sites that let users create public "Web pages or profiles" and also offer a discussion board, chat room or e-mail service. It would affect most schools and libraries, which would be required to render those Web sites inaccessible to minors, an age group that includes some of the category's most ardent users.

In addition, politicians proposed a slew of related measures this week, including blocking access to off-color Web sites for all Americans, dispatching "search and destroy" bots that would seek out illegal content, regulating search engines and targeting peer-to-peer networks.
NORK2 valid

Mike Macgirvin
  last edited: Fri, 26 May 2006 04:56:02 +1000  from Diary and Other Rantings
Don't you get tired of all those little 'valid' buttons claiming that the website you're visiting passes validation for protocol x,y, or z? Why should you care? It should be QA's responsibility to worry about whether or not the software validates to the applicable specifications, not yours. Unless of course it is the website owner's way of telling you that it's up to you to verify their code because they can't be bothered with it. Or perhaps for bragging rights. Either way I think it's a bit arrogant. "Hey, our pages aren't broken....". Who cares anyway? If they're broken, you won't be able to read 'em to find out that they aren't supposed to be broken.  So it kind of defeats the purpose.  

In fact, as I've found on this site, just because you're compliant on one day, you might not be tomorrow. I find it a little irresponsible that web specs are allowed to change without notice. That certainly wasn't the case when I was trying to write standards compliant software.

If, like me, you mostly find this amusing, I've got some buttons for you. Put 'em on your website.

Image/photo

Image/photo

Image/photo

Now you too can be compliant with all the latest acronyms. And you can ask your local admin - 'Are you GQX2.3 compliant? What about NORK2?' Huh? You will no doubt cause them a huge amount of emotional distress as they search the web to find out what is required to be GQX2.3 compliant. The nice thing is that your website is in fact compliant, so you can wear these badges with honor.

Give me a day or two and I might be able to come up with a NORK2 validator if you really want to mess with their heads. It will validate any page which has a nork valid image link on it and will fail with unspecified errors any page which doesn't.  
Syntax error on line 1. Illegal command 'DOCTYPE'.Syntax error on line 2. Invalid or obsolete tag 'html'.  Script type 'javascript' is not legal in NORK2. Please use 'carrotscript' or a transcendant document format instead.'<title>' is deprecated in NORK2 without 'region=' specifier.Action cancelled. Too many errors.
...Isn't that wicked?

PS> In researching this article, I discovered what your local admin is going to discover, that even if they aren't NORK2 compliant, you can date a NORK girl; and they aren't anywhere near as ugly as Stalingrad babes.  They're also mostly Catholic, so you don't have to struggle with those pesky condoms. Isn't that the cat's meow? Details Here.
Nuclear feeds

Mike Macgirvin
  last edited: Sat, 13 May 2006 12:14:28 +1000  from Diary and Other Rantings
Family and friends are advised that I've now migrated all my existing photo albums from the old website. You will need an account here to access the photos. (Contact me after you've got the account so I can add you to the access list). The old password won't work anymore - and the old website is going away.

I've also finally implemented Atom feeds. I know it's been an RFC for several months, but I already had one feed format so it wasn't a priority while I built the rest of the community site software.

But my RSS feeds won't validate if any articles have more than one attachment. There was a huge debate about this a couple of years ago. Dave Winer and Rogers Cadenhead seem to hold the view that the RSS 2.0 spec clearly states that an item can have at most one enclosure.

I've read the spec over and over - in fact every version of the spec. Nowhere is this spelled out. I can't even find the passages they claim 'imply' this limit. But face it, it's a very poorly written spec which Dave Winer grabbed from Netscape and made his own. He changes it whenever it suits him, and interprets it any way which suits his personal ambitions. Along the way teaming up with Adam Curry and 'inventing podcasting' (which also led to the protocol abortion we call iTunes). Dave also get very wealthy off of RSS during the dot-com crash, probably the only person besides the Google founders to get wealthy off the Internet during this period.

But RSS has run its course. It is now time to rid the Internet of everything associated with Dave Winer. He is a disgrace. RSS is a disgrace.  Upgrade all your feeds to Atom. It's a much better syndication format, clearly thought out - well defined and specified.

Just do it.
Gail
 
The single-or-multiple enclosures issue is one of the biggest reasons why all of my new projects use Atom in preference to RSS.

The RSS 2.0 specification implies the singular nature of enclosure by spelling out that category is allowed to be present in an item more than once. Since enclosure lacks the same statement, the inference is that it is not permitted.

That's not much to go on, of course, but Dave Winer has written on Scripting News and elsewhere that he intended enclosure to be present zero or one times in each item, not multiple times. That ought to be enough to justify revising the spec to make this clarification, but as you've seen, nothing is that easy in RSS.
We're so far beyond 1984

Mike Macgirvin
  last edited: Fri, 12 May 2006 06:00:15 +1000  from Diary and Other Rantings
The Bush is defending some of the new NSA wiretap programs, stating that this isn't about monitoring what you or I are doing. But in fact, that's what is happening. Everything which goes through the telecommunications structure is being intercepted and filtered. Email, web traffic, phone calls. There are back doors and monitors in every conceivable place.

It is particularly telling in that the Justice Department is not able to review the program. The Justice Department! Seems nobody there has an adequate security clearance. Don't you find that a bit alarming?

It's only a matter of time. The remote control channel changer on your TV already has a little window with an IR detector on it. You would never know if the TV manufacturers replaced this with a CCD through a government mandated covert program. Since it also covers the IR spectrum, it would work just like a channel changer. Just like you expect it to work. Except it's also a webcam, feeding what's happening in your living room back to a monitoring post through your cable system (which coincidentally is also a high-speed internet line these days). The technology to do all of this exists today. All the microchips required to make this a reality are available off the shelf. The monitoring posts are already in place.  

And you would never know...
AJAX chat

Mike Macgirvin
  last edited: Thu, 16 Mar 2006 14:19:13 +1100  from Diary and Other Rantings
So it turned out to be easy enough to build an AJAX chat module that I've gone ahead and built one.  I'll plug it in once I've finished a couple of more features and tested it all. The basic chat works fine right now. But chat isn't much fun without rooms - so I made sure it would support multiple rooms. Right now it'll chat in multiple rooms, but the missing ingredient is room presence - to answer the question "how many people are here right now?". That brings up the issue of private rooms, because some conversations are best done behind closed doors. The reason that's an issue with room presence is because if you've got room presence, you've got to know which rooms not to show. Yada, yada, yada. Feature creep. Maybe I should just open one public room and be done with it, but that's hardly intellectually inspiring.

After some long thought about the systems issues, I think I've got a novel concept. Since there are to be multiple rooms, who gets to create a room? Just the admin? Logged in users? On most sites you have to suggest a new room name and wait a few days/weeks or else they are created for you based on marketing research.  

As it turns out, in my implementation as it exists right now a room is nothing more than a tag on a message. Maybe that's all it should be. You want a room called 'Lesbian Buddhist secret agents from Norway'? Fine by me. All you have to do is go to a room by that name - and that act alone makes it exist. Rooms cease to exist when all their messages expire. This is actually pretty cool from an administration viewpoint. Rooms appear magically as they're needed, and they vanish when they stop being used.

Zero maintenance.

It's probably more of an academic exercise than anything else. There aren't enough people that actually hang out at the BADDCAFE to be much use here. Then again, I wasn't expecting 60,000 visitors the other day, and social communications ware is the kind of software I've always enjoyed doing. AJAX chat fits nicely into the portfolio and looks good on the resume if nothing else.
Gail
 
Very interesting and beautiful site. It is a lot of helpful information. Thanks!
In the nick of time

Mike Macgirvin
  last edited: Wed, 15 Mar 2006 15:09:28 +1100  from Diary and Other Rantings
Good thing I got those sessions under control yesterday morning. Yesterday afternoon this site got hammered. It wasn't search engines, but the similiarity of requests is highly suspect. Over 60,000 unique sites requested my home page yesterday between noon and 2PM. None of them went any further. They all were redirected from my old site. Then just as suddenly, it all stopped. Hmmm. That's certainly strange.

It wouldn't surprise me at all if it turned out to be a coordinated DOS attack on my site after my mentioning a vulnerability to overloading. Somebody trained a firehose of hacked drone sites at me. That's my suspicion anyway. I can't quite picture 60,000 visitors doing exactly the same thing under other circumstances unless my site was mentioned on a major news channel as a place to get a free Lexus or download videos of Catherine Zeta Jones having sex with Brad Pitt or something like that.
alpha preview

Mike Macgirvin
  last edited: Sun, 19 Feb 2006 04:31:58 +1100  from Diary and Other Rantings
here goes... lots of little things I'm trying to fix. But the sooner I get it up and going the sooner they'll get fixed.

Here's the way it's supposed to work...

The main site now lives at 'BADDCAFE.com'.  The main site is an aggregator of unique contributors. At the moment I'm the only contributor, so bear with me...

baddcafe.com/mike is where my weblog lives. Baddcafe.com/julio is where you would find Julio's blog if there was such a person contributing here. On the main site you'll find all these contributions. On the individual site, you'll find very personalized environments (theoretically --  I'm still going through the issues list).

For instance, 'categories' are unique to an individual. You won't find them on the top level site (until I get a tag cloud going). You see, Julio (the hypothetical contributor) might be into martial arts and set up categories according to those interests. I might add things like brewing to mine. There's no reason we have to share the same namespace for categories - in fact it's absurd. So we won't.

Feeds:

RSS feeds are at baddcafe.com/feed

You can also tailor these results - feed/mike will get you my feeds. feed/mike/music will get you my 'music' category.  

Don't worry ... all of this will make more sense as it starts to stabilize.