I tried to save myself the normal hassle of creating a regular expression to convert any relative URL's that appear in my RSS feeds to absolute form. Writing regex's is often a painful and time-consuming process. Besides, this is a problem that must've been solved before. Right?
Turns out that it has, but not as often as I would have thought. Buried on page 29 of the search results I finally found a cute little regex that did what I wanted.
Well, not quite. A beautiful function, but hopelessly broken. You see the author (I won't embarrass him by naming him in public) wrote something like '[^http|ftp|https]' to dig out an existing protocol string. Brackets in perl regex's (just like eregs) denote a character range. So this function didn't work if the relative URL began with any of the characters 'h,t,p,f,s, or |'.
So much for trying to save effort. Whenever I do this I end up with a function that is just plain broken. Sigh... You get what you pay for. Anyway, I fixed the function. If this applies to what you're doing, at least it works better than the one I originally found.
<?php
function reltoabs($text, $base)
{
if (empty($base))
return $text;
// base url needs trailing /
if (substr($base, -1, 1) != "/")
$base .= "/";
// Replace links
$pattern = "/<a([^>]*) href=\"(?!http|ftp|https)([^\"]*)\"/";
$replace = "<a\${1} href=\"" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
// Replace images
$pattern = "/<img([^>]*) src=\"(?!http|ftp|https)([^\"]*)\"/";
$replace = "<img\${1} src=\"" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
// Done
return $text;
}
?>
applications where reliability is critical, i.e., nuclear power stations,
cruise missiles, early warning systems, anti-ballistic missle defense
systems. The next rocket to go astray as a result of a programming language
error may not be an exploratory space rocket on a harmless trip to Venus:
It may be a nuclear warhead exploding over one of our cities. An unreliable
programming language generating unreliable programs constitutes a far
greater risk to our environment and to our society than unsafe cars, toxic
pesticides, or accidents at nuclear power stations.
- C. A. R. Hoare
reltoabs.php.txt
Digg
Delicious
Netscape
Technorati
preg_replace("/(href|src)=\"(?!http|ftp|https)([^\"]*)\"/", "$1=\"$base\$2\"", $text);Sup -
I thought about doing that but only wanted to apply the absolute reference to named anchors and images. The attributes 'href' and 'src' are used in other tags (href is used in link tags and src in script tags for example). But now that you've brought it up, a relative URL is a relative URL. It's probably a good thing to convert all of them, no matter what kind of element they might be attached to.