What is simple html dom?

PHP Simple HTML DOM Parser

A fast, simple and reliable HTML document parser for PHP.

Created by S.C. Chen, based on HTML Parser for PHP 4 by Jose Solorzano.

Parse any HTML document

PHP Simple HTML DOM Parser handles any HTML document, even ones that are considered invalid by the HTML specification.

Select elements using CSS selectors

PHP Simple HTML DOM Parser supports CSS style selectors to navigate the DOM, similar to jQuery.

Download

  • Download the latest version from SourceForge

Contributing

  • Request features on the Feature Request Tracker
  • Report bugs on the Bug Tracker
  • Get involved with the community on the Discussions Board

License

PHP Simple HTML DOM Parser is Free Software licensed under the MIT License.


Index

  • Quick Start
  • How to create HTML DOM object?
  • How to find HTML elements?
  • How to access the HTML element's attributes?
  • How to traverse the DOM tree?
  • How to dump contents of DOM object?
  • How to customize the parsing behavior?
  • API Reference
  • FAQ

Quick Start

Top

  • Get HTML elements
  • Modify HTML elements
  • Extract contents from HTML
  • Scraping Slashdot!


$html = file_get_html('http://www.google.com/');


foreach($html->find('img') as $element)
       echo $element->src . '
'
;


foreach($html->find('a') as $element)
       echo $element->href . '
'
;


$html = str_get_html('

Hello
World
'); $html->find('div', 1)->class = 'bar';

$html->find('div[id=hello]', 0)->innertext = 'foo';

echo $html;


echo file_get_html('http://www.google.com/')->plaintext;


$html = file_get_html('http://slashdot.org/');


foreach($html->find('div.article') as $article) {
    $item['title']     = $article->find('div.title', 0)->plaintext;
    $item['intro']    = $article->find('div.intro', 0)->plaintext;
    $item['details'] = $article->find('div.details', 0)->plaintext;
    $articles[] = $item;
}

print_r($articles);

How to create HTML DOM object?

Top

  • Quick way
  • Object-oriented way


$html = str_get_html('Hello!');


$html = file_get_html('http://www.google.com/');


$html = file_get_html('test.htm');


$html = new simple_html_dom();


$html->load('Hello!');


$html->load_file('http://www.google.com/');


$html->load_file('test.htm');

How to find HTML elements?

Top

  • Basics
  • Advanced
  • Descendant selectors
  • Nested selectors
  • Attribute Filters
  • Text & Comments


$ret = $html->find('a');


$ret = $html->find('a', 0);


$ret = $html->find('a', -1);


$ret = $html->find('div[id]');


$ret = $html->find('div[id=foo]');


$ret = $html->find('#foo');


$ret = $html->find('.foo');


$ret = $html->find('*[id]');


$ret = $html->find('a, img');


$ret = $html->find('a[title], img[title]');

Supports these operators in attribute selectors:

FilterDescription
[attribute] Matches elements that have the specified attribute.
[!attribute] Matches elements that don't have the specified attribute.
[attribute=value] Matches elements that have the specified attribute with a certain value.
[attribute!=value] Matches elements that don't have the specified attribute with a certain value.
[attribute^=value] Matches elements that have the specified attribute and it starts with a certain value.
[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.
[attribute*=value] Matches elements that have the specified attribute and it contains a certain value.


$es = $html->find('ul li');


$es = $html->find('div div div');


$es = $html->find('table.hello td');


$es = $html->find(''table td[align=center]');


foreach($html->find('ul') as $ul)
{
       foreach($ul->find('li') as $li)
       {
            
       }
}


$e = $html->find('ul', 0)->find('li', 0);

How to access the HTML element's attributes?

Top

  • Get, Set and Remove attributes
  • Magic attributes
  • Tips


$value = $e->href;


$e->href = 'my link';


$e->href = null;


if(isset($e->href))
        echo 'href exist!';


$html = str_get_html("

foo bar
");
$e = $html->find("div", 0);

echo $e->tag;
echo $e->outertext;
echo $e->innertext;
echo $e->plaintext;

Attribute NameUsage
$e->tag Read or write the tag name of element.
$e->outertext Read or write the outer HTML text of element.
$e->innertext Read or write the inner HTML text of element.
$e->plaintext Read or write the plain text of element.


echo $html->plaintext;


$e->outertext = '

' . $e->outertext . '
';


$e->outertext = '';


$e->outertext = $e->outertext . '

foo
';


$e->outertext = '

foo
' . $e->outertext;

How to traverse the DOM tree?

Top

  • Background Knowledge
  • Traverse the DOM tree


echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;

echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');

You can also call methods with Camel naming convertions.

Method Description

mixed

$e->children ( [int $index] )
Returns the Nth child object if index is set, otherwise return an array of children.

element

$e->parent ()
Returns the parent of element.

element

$e->first_child ()
Returns the first child of element, or null if not found.

element

$e->last_child ()
Returns the last child of element, or null if not found.

element

$e->next_sibling ()
Returns the next sibling of element, or null if not found.

element

$e->prev_sibling ()
Returns the previous sibling of element, or null if not found.

How to dump contents of DOM object?

Top

How to customize the parsing behavior?

Top

  • Callback function


function my_callback($element) {
        
        if ($element->tag=='b')
                $element->outertext = '';
}


$html->set_callback('my_callback');


echo $html;

What is simple DOM?

A fast, simple and reliable HTML document parser for PHP. Created by S.C.

What is simple HTML DOM parser?

The web scraping can be done by targeting the selected DOM components and then processing or storing the text between that DOM element of a web page. To do the same in PHP, there is an API which parses the whole page and looks for the required elements within the DOM. It is the Simple HTML DOM Parser.

What is simple HTML DOM parser PHP?

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to find, extract and modify the HTML elements of the dom. jquery like syntax allow sophisticated finding methods for locating the elements you care about.

What is the use of HTML DOM?

The DOM defines a standard for accessing documents: "The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document."

What are DOM elements in HTML?

In the HTML DOM, the Element object represents an HTML element, like P, DIV, A, TABLE, or any other HTML element.

What is the difference between HTML and DOM?

DOM is a model of a document with an associated API for manipulating it. HTML is a markup language that lets you represent a certain kind of DOM in text.