Iterating over HTML node siblings

Perry Mitchell / 2016-10-03 23:16:07
Iterating over HTML node siblings

The DOM is great be­cause nav­i­gat­ing around it is a piece of cake. Browsers give you a huge ar­ray of tools for find­ing your way around, in­clud­ing help­ing you lo­cate those nodes (elements) that you’re af­ter.

Libraries like jQuery have done their part in at­tempt­ing to straighten the learn­ing curve for front-end de­vel­op­ers, but the av­er­age use-case (well hell, most use-cases) of a new site or ap­pli­ca­tion don’t need func­tion­al­ity be­yond what the built-ins can pro­vide.

Getting to the point - once you have an el­e­ment in your hand (variable), you can quite eas­ily nav­i­gate around that el­e­ment (no need to per­form query af­ter query). Say you have an <li>...</li> - the lis el­e­ment ob­ject has a num­ber of meth­ods to as­sist your pe­rusal of the nearby DOM (inherited from Node).

Some such meth­ods are previousSibling and nextSibling. Once again, these guys are from the Node level and will point to Nodes, not el­e­ments, so be care­ful han­dling the ob­jects you find (no two DOMs are the same). For in­stance myLI.nextSibling might re­turn some­thing like:

text

This #text ob­ject is not your av­er­age el­e­ment, ob­vi­ously:

nextNode instanceof HTMLElement;    // false
nextNode instanceof Text;           // true

Checking for these guys can be tricky, and instanceof is­n’t the best over­all, but there is a much neater and more ro­bust way: nextNode.nodeType holds the nu­mer­i­cal iden­tity of the DOM node:

node type

There are also other types of nodes out there that don’t be­have like el­e­ments, so make sure to ac­count for them cor­rectly:

function isComment(node) {
    return node.nodeType ===  Node.COMMENT_NODE;
}

function isText(node) {
    return node.nodeType === Node.TEXT_NODE;
}

If you’re crawl­ing part of the DOM col­lect­ing text, for in­stance, then de­tect­ing text nodes may be the right way to go. In newer browsers how­ever it is pos­si­ble to skip non-el­e­ment nodes com­pletely by us­ing element.nextElementSibling and element.previousElementSibling (these only re­turn the next and pre­vi­ous el­e­ments and no other node types).

Explaination by ex­am­ple: Here’s a getPlainText im­ple­men­ta­tion that scrapes text from a DOM el­e­ment us­ing node.childNodes and node.nodeValue. Given the fol­low­ing DOM:

example DOM html

You could write the getPlainText scraper like so:

function isElementNode(node) {
    return node.nodeType === Node.ELEMENT_NODE;
}

function isTextNode(node) {
    return node.nodeType === Node.TEXT_NODE;
}

function stripWhitespace(text) {
    return text
        .replace(/\t/g, " ")
        .replace(/\n/g, " ")
        .replace(/[ ]{2,}/g, " ");
}

function getPlainText(node) {
    var target = node.cloneNode(true),
        text = "";
    if (isTextNode(target)) {
        text += target.nodeValue.replace(/<br>/gi, "\n");
    } else if (isElementNode(target)) {
        text += Array.prototype.slice.call(target.childNodes || [])
            .map(getPlainText)
            .join(" ");
    }
    return stripWhitespace(text.trim());
}

var target = document.getElementById("target"),
    text = getPlainText(target);

console.log(text);

Which would out­put:

My head­ing Some start­ing text. Some text node con­tent. Some sub con­tent. End text.”

Notice the Array.prototype.slice call? Some low-level calls re­turn NodeLists, which don’t work ex­actly like ar­rays. Using slice we can con­vert them back to an ar­ray for easy ma­nip­u­la­tion.

Don’t be afraid of work­ing with raw el­e­ments and nodes - their in­ter­faces are quite in­tu­itive!