Some notes to save somebody some grief:
Installing the Oracle libraries and access module into an existing PHP5 installation on Debian...
First grab the Linux instantclient from oracle.com - you'll also need the client SDK kit. Here I'm using instantclient 11.1
create a directory for these such as /home/oracle and unpack both of them to that directory.
Go into the oracle directory (and into the instantclient_11_1 directory) and create a symlink:
$ ln -s libclntsh.so.11.1 libclntsh.so
Grab the oci8 PECL package and unpack it somewhere (~/oci).
Make sure you have the following packages (in addition to php5, php5-cli, apache2, etc).
php5-dev
libaio1
Go to the oci8 directory (~/oci/oci8-1.3.0). Forget about 'pecl build' - which won't work. Well it will, but it will quietly and quickly remove all the built packages before you can save them or install them. Yargghh. I wasted half a day trying to fix this one.
Better to just build by hand:
$ ./configure --with-oci8=instantclient,/home/oracle/instantclient_11_1
$ make
Fix any errors/warnings before continuing
Don't make install, which won't work.
$ cp ./modules/oci8.so /usr/lib/php5/20060613+lfs
Replace 20060613+lfs with whatever module directory has been setup for you in /usr/lib/php5
Create /etc/php5/conf.d/oci8.ini:
----
extension=oci8.so
----
Now run the php cmdline in verbose mode (php -v) and see if everything loaded. Fix it if it didn't.
You may need some env variables setup in your /etc/init.d/apache2 file to make everything work and actually execute queries, but a phpinfo() at this point should show your oci8 extension. See the php.net Oracle pages if you need help with the env variables.
Unless you've been watching closely, this announcement was easy to miss. Sun Microsystems is acquiring MySQL. This has ramifications both good and bad.
This will likely affect a huge number of people who are currently using open source web applications; a majority of which are being stored on MySQL databases. Their future viability is now questionable. It all depends on the license and revenue models Sun chooses to adopt.
I would also try to steer clear of the pending 6.0 release as it is likely to involve significant re-structuring of the code to suit Sun's business requirements. It may be a year or three before it stabilises again. Sun is legendary for introducing layers of bureaucracy into development projects.
While Sun may make public announcements of their intent to continue to provide the product for free [and it should be noted that there was no such announcement in the press release], it is difficult to imagine the corporate bean counters not making a recommendation to derive as much revenue stream as possible from the acquisition.
You can read the announcement here.
Also of potential interest is this (dated) history of MySQL
Looks like I got sidetracked from my original mission to use this website as an xml playground to explore and develop new communications technologies, and instead wrote a social portal that hardly anybody cares about. That was a few years ago now. Well, I haven't given up. It just took a while to reach the state where I can get beyond the user-interface plumbing and get back to the machine interfaces which is where the fun is.
Feeds have improved a lot. I'm using Atom paging now. Still holding off on atom-thread for comments since I can do it so much easier embedding into the articles - though I note that the latest Firefox parses atom-thread just fine. No use forcing it on the public until a few more feedreaders have jumped on board. The code has been working for a year or two, but I'm just waiting for the rest of the world to catch up before I turn it back on.
I've been playing with a weblog export tool that's basically an Atom feed, but replaces images and attachments with inline data: URL's. Have had a few glitches - including a PHP bug in the regular expression library that I need to report. But this in theory can let you take an entire weblog and move it elsewhere as one gigantic XML file. Everything. Images, attachments, comments, categories, the whole nine yards.
I've also got Atom Publishing Protocol support in a very primitive state (but not yet ready for prime time). This is a big effort and I don't expect to be finished for a few months. I've got a suitable framework, but this site works a bit differently than the model used by the atom publishing spec. It will take a while to resolve all the differences so it plays nicely. This would for instance allow you to import your entire weblog from elsewhere in the world - especially one that used data: URLs to bring in images and attachments. Otherwise if I use the default model, I've got to package everything into a workspace for export, and this takes more than one file to represent all the structures completely. But that's the big picture - on a smaller scale, you should soon be able to publish weblog posts from your cell phone, or sync new articles with another weblog you may have. I 'm also not bothering with the xml-rpc remote mechanisms for publishing. They're primitive now, the api's too fragmented, and pretty much dead.
Oh yeah, and we've got trackbacks now - for any weblog that allows non-member comments. This is a flavor of xml-rpc. It isn't a big deal, but a few folks have requested it. You can find the trackback URL in the 'more actions' menu of articles - that is for any member weblogs that allow them. Mine does.
Oh, and photo albums can now be exported as zip files. That has nothing to do with XML...
Still struggling with device drivers on Windows Vista. The sound card drivers have an update, but I'm skeptical. Several folks reported BSOD when they installed it.
And I've lost any good feelings I had for Debian. Recently I moved my old RedHat installation to a newer PC - one that was only 8 years old rather than 10. All went extremely smooth. On bootup, it found the new motherboard, new network card, mouse, monitor, etc. - and configured all of them. Everything worked fine.
Then I upgraded to Debian. The RedHat was a couple of years old, and I didn't want to mess with building PHP, MySQL, and Apache upgrades as well. Just boot up a newer Linux. Debian is currently one of the more popular Linux flavors - and I especially like the APT package management utility. Need PHP? 'apt-get install php'. You don't need to build and configure it and mess with library dependencies. These are all taken care of. If it needs new libraries, these are installed as well as any libraries that they depend on.
Anyway, now (a couple of weeks later) I put in another newer PC - this time only 4 years old. I was expecting everything to go smoothly like it did last time. But it didn't. Debian doesn't have very good hardware (re-)detection, and they also don't load any other drivers than what is absolutely necessary. So I'm faced with an incomplete operating system that doesn't recognize the monitor or ethernet card. And I can't load in the modules for these devices over the net, because it doesn't recognize the network card. It's a Catch-22.
The only solution now is a re-install. Spend a few weeks getting everything configured and then start over. Right. I've been here before. Way too often...
But if you're one of those folks considering moving away from RedHat/Fedora, beware. It's nice to be able to plop your disks into another box if the one you've got goes bad - and keep running. Debian won't do this.
Tomorrow is Daylight Savings. Remember 'Spring forward, fall back'? That's right - tomorrow we move it forward, no matter how odd that may seem. It's October, but it's spring.
At least the Australian government hasn't been mucking with and tweaking DST as it did before the 2000 Olympics. The software engineers need time to code in the changes - I think that a lot of the world now has Sydney time right.
Well that would be anybody using the Olsen timezone databases. I know personally about thirty web services which just give you a choice of 'GMT+10' - and these are all going to be wrong tomorrow. On the bright side, I really don't care if they get it wrong. I'm not using any of them for anything globally time sensitive. It always makes my head hurt trying to figure out how many hours I'm going to be away from GMT with all the conversions and tweaks in effect. I suppose it'll probably be GMT+11. One hour forward. But wait, we'll then be one hour closer to Greenwhich, England as the earth spins. Not further from it. So maybe it's GMT+9. Silicon Valley will be... Uh, I give up. It's in negative GMT and the time is going back. So is it forward or backward? I'll have to figure it out on paper to work out the difference between LA (where this server is) and Sydney (close enough to where I am).
But this will also give a good test of my own daylight savings and timezone functions (which use Olsen tables). The U.S. is going one hour back and we're going one hour forward. I might be poring over the code tomorrow if something gets askew.
In fact the time changed here - I was just a bit premature on when it changes there. They used to try and change the whole world on the same day, but you're right. Last year's energy act messed up that part of it.
No worries. Everything seems to be working. It just means I'll have to go through all of this again when you folks change over. I won't bother calculating the delta right now, since it's in a temporary state. It's nice to know the delta before I make a phone call overseas. Nothing worse than 'Hello? Who is this? It's 3 in the morning!'
Mike
Now that I am properly awake, I see that your comment: "that the time had changed here" was written on Sunday morning - sorry if I implied you were a day ahead.
That's the other problem with Daylight Saving changeover. Body awake, but brain not yet awake.
Denis
This had me pulling my hair out yesterday, so I thought I'd share the experience with enough key terms that the next person pulling their hair out will find it.
I was installing my CMS software on a work machine. I'll likely be doing additional development on it, and the university is the best place to do this. But that's neither here nor there. My software is designed around 'clean URLs'; which means what you see in the URL bar isn't (usually) littered with code and operating system artifacts. So for instance to post to my weblog, I go to the URL /post/weblog, not something like post.php?op=weblog.
To accomplish this, I use an Apache webserver module called 'mod_rewrite', which takes care of the nitty details of this process. Mod_rewrite is not without its faults, but that's the subject of another article. It does the job. The biggest thing it does is let you leave out the '.php', except I'm letting it do a whole lot more.
Anyway I'll cut to the chase. My software was horribly broken after installing it yesterday. It took hours to figure out why. Something else was trying to provide clean URLs and strip '.php' from places where I actually needed to have it in order for things to work. Well that's not technically correct either. It was actually executing PHP files by URL without the extension. Except that these were 'include files'. They weren't meant to be executed directly. They were meant to be included in something else, and the something else was managed by mod_rewrite. The something else was never getting called.
This was inconceivable. Nothing in any of the system release notes said anything about some magic new clean URL ability. This was on Debian (edge). Apache, PHP, MySQL. I tried all the sites. I googled for everything I could think of. Clean URL debian. Clean url apache. '.php not needed'. mod_rewrite. strip file extension. ForceType. (ForceType also lets you execute files without providing a filename extension). I scoured the last several months of Apache release notes, to no avail.
Finally after several hours I happened upon a little gem of a snippet on an obscure website. 'Turn on Multiviews instead of ForceType'. Debian has multiviews turned on by default, but this is the first I'd heard of it. I had assumed (never do this of course) that it was yet another fancy mod_dir option or something I didn't care about.
No. Multiviews is a slick trick for Apache that takes any pathname, and if it thinks it can find a page to return, it returns it. It uses the basename of the file in the URL and if there's no file, it looks for the filename with an extension. Any extension. Then it sends the file back. So it gives you a clean URL. You type in 'index' and it will send back 'index.htm' or 'index.html' or 'index.php' or 'index.pl' or 'index.shtml'. You get the idea. You can test this on any site that has multiviews turned on by asking for 'index' and see if you actually get a page. Normally you wouldn't.
If the URL is 'post' and there's a file in the directory called 'post.php' it will send that file back even if you don't want it to. So I'll let you research it further if multiviews is what you want. It's actually pretty cool. In my case I had to disable it.
Options -Multiviews in the .htaccess did the trick and made everything work again.
I've been thinking about how to do this for ages. Finally did it. You'll find some new stuff in the menus called 'Sharing'. No, it isn't about that half of a ham sandwich you're having for lunch.
Over the past few months you've gained the ability to subscribe to various information sources and recently to configure what features you desire from the website. I mentioned this ability months ago in my grand vision - kinda' like myspace on acid. Well I finally finished the thing which takes that vision and turns it into reality. When you create these private views of the website, it's like having your own private Content Management System (CMS). In fact that's the way I like to describe it to people. There are lots of folks peddling multi-user community CMS's. Yada, yada. But nobody that I'm aware of is selling or even working on a 'personal CMS as part of a multi-user community'.
So what's this 'Sharing' all about? It's simple. You spend the effort to configure your personal content system whatever way you want. Then you can turn around and let other people use it.
See with MySpace, you get a page that's all yours to mess with. Some of the other community sites give you a few more things. But this let's you build your own complete community site from within a broader community platform. You can make this look like an auto racing site and when people visit your shared page, and then any other page on the site, that's what they'll see. You're in total control until they go elsewhere or turn it off. You control the menus, you control the skin, and you control all of the information sources for your space. They can be your info sources, they can be my info sources, or pull in whatever feeds you want from anywhere. Want your audience to have mail? Chat? And discuss the Confederate War? Two minutes. Photo dating website? Two minutes.
The latest feature drop is a parental controls infrastructure. You can find out more on (towards the bottom of) the help page. One of the key components is a tool to allow you to turn any of the site features on or off - either for you or for your children. This combined with subscriptions lets you fine tune exactly what you wish to see when you visit. I've been using this immediately to provide different feature sets to the music portal vs. the community portal vs. software engineering portals.
As always, this is just a hint at the types of things you can do. Any member for instance can turn off chat, the photo ratings, guitar chord charts, or anything else they don't care about.
I think I'm finally running out of website features to implement...
Actually I think the bug is in Apache, but whatever.
Let's say you've got a site like this that uses clean URL's.
http://someplace.somewhere/something/more/stuff
Now let's say you wanted to insert a category name in the middle of this URL, and the category name contains a slash. Let's say 'more/less' instead of 'more'. But you can't change the number of slashes, because in the example 'stuff' isn't a category, it's something else.
So what you would normally do in PHP (and many other languages) is urlencode() the name. This gives you a category that looks like 'more%2Fless'. Now you can just urldecode() it and turn it back into a legal category name without messing up the URL.
But the problem is that if you use mod_rewrite to support clean URL's, it currently decodes the URL in the process of doing its work - before you ever see it. So there's no way of knowing if a category has a slash or not. Ditto for hash and several other characters. It turns out the bug is actually in Apache, which is decoding the URL before it hands it off to mod_rewrite, but that part doesn't matter. If you don't use mod_rewrite you won't see the bug. Some of us have to use it though.
This violates the primary law of encoding information - you must have one and only one decoder for every encoder. Apache/mod_rewrite is decoding something it has no right to touch.
Fortunately, there's a way out of this mess, but it's very non-standard. You have to further encode the URL so that it can't get automatically decoded by the middle software layers. You could just double encode it which works today, but then if they ever fix the bug, you'll end up with a bad decoding. I'm currently turning %2F into ^2F, since ^ is one of the few characters which isn't normally used in a URL. This then gets encoded by the browser to be %5E2F. It doesn't matter if the %5E part is turned back into a ^, or even decoded more than once (the second and subsequent decodes will essentially be a no-op). All that matters is that the slash remains encoded all the way through this hostile communications channel.
What an ugly hack.
I'd go into apache and fix it, but that won't help me in the short term. I'd have to wait for the patch to get rolled into a future release, and then wait for my service providor to pick up the later release. ...That could be a year or more unless some urgent security bug pops up. So I guess I'd better get used to living with this hack.
I can write most any kind of software and usually do it pretty well. But there are times when it's better to let somebody else do the dirty work. In this case it's so-called Cross Site Scripting or XSS. For a community site such as this it's a nightmare - but one which refuses to go away.
In simple terms, it's Javascript injection. If you can get code onto a page, somebody will execute it by visiting that page, and one can exploit the fact that somebody is running their code. These exploits can range from minor infractions to serious felonies, and you can stick the code most anywhere that you can type something and have it show on a web page.
I had several regex's setup to stop XSS and still allow HTML authoring, but it turns out that the browsers have too many holes to plug with a few regex's. The XSS hack which took down myspace.com was instigated by putting javascript code into a stylesheet and breaking up the word jav a sc r ipt. Internet Explorer gladly packed it back together and ran the code. IE will also execute the same code written in hexadecimal. You can't keep writing regex's to stop all this stuff. Regex's aren't the correct tool for the job (they are part of the solution, but not the total solution). At some point it requires an HTML parser to take the 'bad' HTML one character at a time, and rebuild it into good HTML.
There are four possible solutions:
- Ignore the problem and hope it goes away. It won't.
- Do away with HTML authoring completely and either force everybody to learn another tag system or just force everybody to use plain text.
- Write an HTML language parser to rebuild the code based on every historical variation of HTML which might be encountered.
- Let somebody else write this parser.
I hate writing parsers, and in this case the task is to write a parser which duplicates the code flow of the most horrifically buggy web browsers.
So I went with number 4...
PS> I found one developer website which seriously recommends using 'strip_tags' in PHP to make your site safe from XSS attacks. It won't, because strip_tags doesn't recurse. One can embed tags within tags and blow right through it. They should be shot. If you'd like to have a look at the number of ways that hackers can blow through your security, visit http://ha.ckers.org/xss.html
So would anybody be interested in obtaining a mySQL import file containing the complete contents of fortune-mod? If you have need of such a thing, you'll know what to do with it. I whipped this up after finding that the latest fortune-mod package now takes about 7 seconds on a fast machine to produce a cookie. This is totally unacceptable. The reason for this is that some recently added command line switches let you specify the probability that any particular category will be chosen. In order to calculate this number, the program by default loads in all the fortunes from all the category files so it can count them all.
This collection belongs in a database. Randomly selecting one element from within a collection of elements and then from within a collection of flat files (after counting each entry and then assigning probability weighting to each file) is ludicrous.
So I looked all over the web for a mySQL table containing the fortune database but couldn't find a single one. Weird. In my travels I found a tool to add fortunes to mySQL but it was pretty badly written. So I started over and did all the dirty work for you. Use these tables as you wish.
First file is fortune-mod.1.99 in a table named 'fortune' with two elements, 'id' and 'fortune'. 'id' auto-increments.
Second file is the 'offensive' tree in a table named 'fortune2' which is otherwise identical to the first. You can keep them separate, merge them, whatever.
You can pull out a fortune cookie with the following SQL - just point $table at either fortune or fortune2.
SELECT `fortune` FROM `$table`ORDER BY RAND() LIMIT 1
To display on the web, you'll need to escape the HTML entities and translate linefeeds (or it will look like mush). I use something like this... where $cookie is the raw fortune from the SQL query:
str_replace(array("&","<",">",'"',"\n"),
array("&","<",">",""","<br />"),
$cookie);The third file (fortune-mod.sql.gz) is the complete fortune-mod.1.99 in a single table. This table contains a couple of extra fields. The 'category' field contains the original fortune-mod category name, such as 'quotations' or 'zippy'. There is also an 'offensive' field which is an integer 0 for normal fortunes, 1 for offensive fortunes. Making use of these extra fields is left as an exercise for the reader. One more thing... the third file also has been stripped of all the fortunes containing backspace characters. These might be OK for a text terminal, but they look pretty ugly on the web.
Enjoy.
like it, that's ok: that's why I'm boss. I simply know better than you do.
-- Linus "what, me arrogant?" Torvalds, on c.o.l.advocacy

Digg
Delicious
Netscape
Technorati
fortune.sql.gz
A general format for Atom that allowed cross-references in URLs (like cid:) would be useful both for images and other attachments and for related feeds like comments, trackback snippets, etc.Yeah, there are issues. I'm just trying to figure out how to get there from here. Right now data: URLs are the only way I can come up with to encapsulate everything. If a few people adopted it, it might be viable. At least everything to export an entire blog in a single file would be standards compliant. I'd be glad to see something better...
Hey congrats - I hear you're at Google now....