Mike Macgirvin
Diary and Other Rantings
Beyond Silicon Valley
   
Monday, May 12 2008, 04:48 pm
Apr 02, 2006
Relative to absolute URL's in PHP

I tried to save myself the normal hassle of creating a regular expression to convert any relative URL's that appear in my RSS feeds to absolute form. Writing regex's is often a painful and time-consuming process. Besides, this is a problem that must've been solved before. Right?

Turns out that it has, but not as often as I would have thought. Buried on page 29 of the search results I finally found a cute little regex that did what I wanted.

Well, not quite. A beautiful function, but hopelessly broken. You see the author (I won't embarrass him by naming him in public) wrote something like '[^http|ftp|https]' to dig out an existing protocol string. Brackets in perl regex's (just like eregs) denote a character range. So this function didn't work if the relative URL began with any of the characters 'h,t,p,f,s, or |'.

So much for trying to save effort. Whenever I do this I end up with a function that is just plain broken. Sigh... You get what you pay for. Anyway, I fixed the function. If this applies to what you're doing, at least it works better than the one I originally found. 

<?php
function reltoabs($text, $base)
{
if (empty($base))
return $text;
// base url needs trailing /
if (substr($base, -1, 1) != "/")
$base .= "/";
// Replace links
$pattern = "/<a([^>]*) href=\"(?!http|ftp|https)([^\"]*)\"/";
$replace = "<a\${1} href=\"" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
// Replace images
$pattern = "/<img([^>]*) src=\"(?!http|ftp|https)([^\"]*)\"/";
$replace = "<img\${1} src=\"" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
// Done
return $text;
}
?>
 

 

Categories: software
Comments:

May 25, 2006 14:02
vision
Hi here is the modified one to access <link attribute:

function reltoabs($text, $base) { if (empty($base)) return $text; //base url needs trailing / if (substr($base, -1, 1) != "/") $base = $base."/"; //echo $base; // Replace links $pattern = "/<a([^>]*) href=\"(?!http|ftp|https)([^\"]*)\"/"; $replace = "<a\${1} href=\"" . $base . "\${2}\""; $text = preg_replace($pattern, $replace, $text); // Replace images $pattern = "/<img([^>]*) src=\"(?!http|ftp|https)([^\"]*)\"/"; $replace = "<img\${1} src=\"" . $base . "\${2}\""; $text = preg_replace($pattern, $replace, $text);
// Replace link tags $pattern = "/<link([^>]*) href=\"(?!http|ftp|https)([^\"]*)\"/"; $replace = "<link\${1} href=\"" . $base . "\${2}\""; $text = preg_replace($pattern, $replace, $text); $text = str_replace("../", "", $text);
// Done return $text; } it can be xpanded to accept even JS ... :-)

sup
August 31, 2006 05:50
sup
Thanks, dude. This is just what I was looking for. I took what you had and condensed it into a single replace which handles both href and src:
preg_replace("/(href|src)=\"(?!http|ftp|https)([^\"]*)\"/", "$1=\"$base\$2\"", $text);


mike (Mike Macgirvin)
August 31, 2006 07:00
mike

Sup -

I thought about doing that but only wanted to apply the absolute reference to named anchors and images. The attributes 'href' and 'src' are used in other tags (href is used in link tags and src in script tags for example). But now that you've brought it up, a relative URL is a relative URL. It's probably a good thing to convert all of them, no matter what kind of element they might be attached to.


Comments? | More Actions Open/Close menu
Back
Do not allow this language (Ada) in its present state to be used in
applications where reliability is critical, i.e., nuclear power stations,
cruise missiles, early warning systems, anti-ballistic missle defense
systems. The next rocket to go astray as a result of a programming language
error may not be an exploratory space rocket on a harmless trip to Venus:
It may be a nuclear warhead exploding over one of our cities. An unreliable
programming language generating unreliable programs constitutes a far
greater risk to our environment and to our society than unsafe cars, toxic
pesticides, or accidents at nuclear power stations.
- C. A. R. Hoare