Tag Archives: HTML

jQuery: How to extract a tag from an HTML response

Making a website, and using ajax for some things. Sometimes things fail and return custom error pages. I made them to be helpful, but since you can only see them in the browser developer console, they were a bit of a hassle to look at.

To see what the error was much easier, I figured I could just parse the returned HTML, extract the message I knew was there, and insert it into the page that way.

And you’d think the following would work fine:

$(document).ajaxError(function(event, jqxhr, settings, error)
    // Find the message in the response HTML
    var m = $(jqxhr.responseText)

    // Except .find() doesn't find anything

    // And we replace our DOM with nothing

But… No… Apparently, since the response was a complete HTML page, i.e. including html, head and body tags, jQuery was getting a bit tricked up when trying to parse it. Actually not sure if it’s jQuery or native browser parsing behind that’s causing it, but where there’s a will, there’s a way:

$(document).ajaxError(function(event, jqxhr, settings, error)
    // Find the inner HTML of the body tag
    var body = /<body.*>([\s\S]+)<\/body>/

    // Parse the HTML
    body = $.parseHTML(body[1])

    // Append the HTML to a non-special root tag
    body = $('<output>').append(body);

    // And *now* we can finally find our message
    var message = body.find('#message');

    // And add it to our DOM


PHP: How to get all images from an HTML page

I was curious to how I could make something similar to what Facebook does when you add a link. Somehow it loads images found on the page your link leads to, and then it presents them to you so you can select one you want to use as a thumbnail.

Well, step one to solve this is of course to find all the images on a page, and that is what I will present in this post. It will be sort of like a backend service we can use later from an AJAX call. You post it a URL, and you get all the image URLs it found back. Let’s put the petal to medal!

Continue reading PHP: How to get all images from an HTML page

PHP: Dealing with absolute and relative URLs

I’m currently writing a post on how to get image tags from a remote HTML page using PHP. One sticky issue with that is that the image URLs you find is a joyful mix of absolute and relative URLs.

Luckily, I found a function on nashruddin.com which seem to handle them alright. After a bit of clean up and fixing an error, we have this function:

function make_absolute($url, $base)
    // Return base if no url
    if( ! $url) return $base;

    // Return if already absolute URL
    if(parse_url($url, PHP_URL_SCHEME) != '') return $url;
    // Urls only containing query or anchor
    if($url[0] == '#' || $url[0] == '?') return $base.$url;
    // Parse base URL and convert to local variables: $scheme, $host, $path

    // If no path, use /
    if( ! isset($path)) $path = '/';
    // Remove non-directory element from path
    $path = preg_replace('#/[^/]*$#', '', $path);
    // Destroy path if relative url points to root
    if($url[0] == '/') $path = '';
    // Dirty absolute URL
    $abs = "$host$path/$url";
    // Replace '//' or '/./' or '/foo/../' with '/'
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {}
    // Absolute URL is ready!
    return $scheme.'://'.$abs;

I can sort of read through and see what it does, but I can’t explain it very well. So, I’ll just leave it at that. So far it has worked fine for me. Maybe some corner cases that are missing, and if there are, please let me know!

💡 What I added to the original function was line 5 and 17. The first to prevent it from crashing if the url is null or empty, and the second to prevent it from crashing if parse_url finds no path. For example if the url is http://www.example.com (No /whatever at the end).

The base tag

A tag that is easy to forget about is the base tag. The above function gets the base path from the URL given as base. For example if you gave it http://www.example.com/directory/file.html as base, it would use http://www.example.com/directory/. However, if file.html included the following base tag:

<base href="http://www.example.com/">

Then the base path would be http://www.example.com/ instead. Fun, eh?

As long as you know about it, it’s not to hard to deal with though. You just need to get a hold of it and provide that as base instead when using the function above.

Works On My Machineâ„¢! And if it doesn’t on yours, let me know. If it’s a mistake in the function, I’d like to fix it!