What is simple html dom?

PHP Simple HTML DOM Parser

A fast, simple and reliable HTML document parser for PHP.

Created by S.C. Chen, based on HTML Parser for PHP 4 by Jose Solorzano.

Parse any HTML document

PHP Simple HTML DOM Parser handles any HTML document, even ones that are considered invalid by the HTML specification.

Select elements using CSS selectors

PHP Simple HTML DOM Parser supports CSS style selectors to navigate the DOM, similar to jQuery.

Download

Download the latest version from SourceForge

Contributing

Request features on the Feature Request Tracker
Report bugs on the Bug Tracker
Get involved with the community on the Discussions Board

License

PHP Simple HTML DOM Parser is Free Software licensed under the MIT License.

Index

Quick Start
How to create HTML DOM object?
How to find HTML elements?
How to access the HTML element's attributes?
How to traverse the DOM tree?
How to dump contents of DOM object?
How to customize the parsing behavior?
API Reference
FAQ

Quick Start

Top

Get HTML elements
Modify HTML elements
Extract contents from HTML
Scraping Slashdot!

$html = file_get_html['//www.google.com/'];

foreach[$html->find['img'] as $element]
echo $element->src . '
';

foreach[$html->find['a'] as $element]
echo $element->href . '
';

$html = str_get_html['

Hello

World

']; $html->find['div', 1]->class = 'bar';

$html->find['div[id=hello]', 0]->innertext = 'foo';

echo $html;

echo file_get_html['//www.google.com/']->plaintext;

$html = file_get_html['//slashdot.org/'];

foreach[$html->find['div.article'] as $article] {
    $item['title']     = $article->find['div.title', 0]->plaintext;
    $item['intro']    = $article->find['div.intro', 0]->plaintext;
    $item['details'] = $article->find['div.details', 0]->plaintext;
    $articles[] = $item;
}

print_r[$articles];

How to create HTML DOM object?

Top

Quick way
Object-oriented way

$html = str_get_html['Hello!'];

$html = file_get_html['//www.google.com/'];

$html = file_get_html['test.htm'];

$html = new simple_html_dom[];

$html->load['Hello!'];

$html->load_file['//www.google.com/'];

$html->load_file['test.htm'];

How to find HTML elements?

Top

Basics
Advanced
Descendant selectors
Nested selectors
Attribute Filters
Text & Comments

$ret = $html->find['a'];

$ret = $html->find['a', 0];

$ret = $html->find['a', -1];

$ret = $html->find['div[id]'];

$ret = $html->find['div[id=foo]'];

$ret = $html->find['#foo'];

$ret = $html->find['.foo'];

$ret = $html->find['*[id]'];

$ret = $html->find['a, img'];

$ret = $html->find['a[title], img[title]'];

Supports these operators in attribute selectors:

FilterDescription

[attribute]	Matches elements that have the specified attribute.
[!attribute]	Matches elements that don't have the specified attribute.
[attribute=value]	Matches elements that have the specified attribute with a certain value.
[attribute!=value]	Matches elements that don't have the specified attribute with a certain value.
[attribute^=value]	Matches elements that have the specified attribute and it starts with a certain value.
[attribute$=value]	Matches elements that have the specified attribute and it ends with a certain value.
[attribute*=value]	Matches elements that have the specified attribute and it contains a certain value.

$es = $html->find['ul li'];

$es = $html->find['div div div'];

$es = $html->find['table.hello td'];

$es = $html->find[''table td[align=center]'];

foreach[$html->find['ul'] as $ul]
{
       foreach[$ul->find['li'] as $li]
       {

       }
}

$e = $html->find['ul', 0]->find['li', 0];

How to access the HTML element's attributes?

Top

Get, Set and Remove attributes
Magic attributes
Tips

$value = $e->href;

$e->href = 'my link';

$e->href = null;

if[isset[$e->href]]
echo 'href exist!';

$html = str_get_html["

foo bar

"];
$e = $html->find["div", 0];

echo $e->tag;
echo $e->outertext;
echo $e->innertext;
echo $e->plaintext;

Attribute NameUsage

$e->tag	Read or write the tag name of element.
$e->outertext	Read or write the outer HTML text of element.
$e->innertext	Read or write the inner HTML text of element.
$e->plaintext	Read or write the plain text of element.

echo $html->plaintext;

$e->outertext = '

' . $e->outertext . '

$e->outertext = '';

$e->outertext = $e->outertext . '

foo

$e->outertext = '

foo

' . $e->outertext;

How to traverse the DOM tree?

Top

Background Knowledge
Traverse the DOM tree

echo $html->find["#div1", 0]->children[1]->children[1]->children[2]->id;

echo $html->getElementById["div1"]->childNodes[1]->childNodes[1]->childNodes[2]->getAttribute['id'];

You can also call methods with Camel naming convertions.

Method Description

mixed $e->children [ [int $index] ]	Returns the Nth child object if index is set, otherwise return an array of children.
element $e->parent []	Returns the parent of element.
element $e->first_child []	Returns the first child of element, or null if not found.
element $e->last_child []	Returns the last child of element, or null if not found.
element $e->next_sibling []	Returns the next sibling of element, or null if not found.
element $e->prev_sibling []	Returns the previous sibling of element, or null if not found.

How to dump contents of DOM object?

Top

How to customize the parsing behavior?

Top

Callback function

function my_callback[$element] {

        if [$element->tag=='b']
                $element->outertext = '';
}

$html->set_callback['my_callback'];

echo $html;

What is simple DOM?

A fast, simple and reliable HTML document parser for PHP. Created by S.C.

What is simple HTML DOM parser?

The web scraping can be done by targeting the selected DOM components and then processing or storing the text between that DOM element of a web page. To do the same in PHP, there is an API which parses the whole page and looks for the required elements within the DOM. It is the Simple HTML DOM Parser.

What is simple HTML DOM parser PHP?

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to find, extract and modify the HTML elements of the dom. jquery like syntax allow sophisticated finding methods for locating the elements you care about.

What is the use of HTML DOM?

The DOM defines a standard for accessing documents: "The W3C Document Object Model [DOM] is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document."

What are DOM elements in HTML?

In the HTML DOM, the Element object represents an HTML element, like P, DIV, A, TABLE, or any other HTML element.

What is the difference between HTML and DOM?

DOM is a model of a document with an associated API for manipulating it. HTML is a markup language that lets you represent a certain kind of DOM in text.