Html-dom Laravel

Phương thức Mô tả $e->children ( [int $index] ) Trả về đối tượng con thứ N nếu chỉ mục được đặt, nếu không thì trả về một mảng con. $e->parent() Trả về phần tử cha của phần tử. $e->first_child() Trả về phần tử con đầu tiên, hoặc null nếu không tìm thấy. $e->last_child() Trả về phần tử con cuối cùng hoặc null nếu không tìm thấy. $e->next_sibling () Trả về anh chị em tiếp theo của phần tử hoặc null nếu không tìm thấy. $e->prev_sibling () Trả về anh chị em trước đó của phần tử hoặc null nếu không tìm thấy

Bạn đã từng nghe nói đến thư viện này chưa? . Do đó để tích hợp vào Laravel chúng ta có thể sử dụng các cách sau đây

Nội dung chính Show

Cài đặt thư viện PHP Simple HTML Dom Parser
Cài đặt thư viện Simple HTML Dom mới nhất + biên dịch với phiên bản php 7, 8
Nhận phần tử HTML
Sửa đổi phần tử HTML
Trích xuất nội dung từ HTML
Cách tìm phần tử HTML
Kiểm tra bộ chọn CSS trên phương pháp 1 của 60, 'verify' => false, ]); // Hackery to allow HTTPS $client->setClient($guzzleclient);5. Phương pháp Goutte và 60, 'verify' => false, ]); // Hackery to allow HTTPS $client->setClient($guzzleclient);4 của 2. Trình phân tích cú pháp Dom HTML đơn giản

Cài đặt thư viện PHP Simple HTML Dom Parser

Đây là 1 package của Laravel

GitHub. https. //github. com/sunra/php-simple-html-dom-parser

Cài đặt đơn giản qua Composer

composer require sunra/php-simple-html-dom-parser

Hoặc thêm vào trình soạn thảo tệp. json

"require": {
    "sunra/php-simple-html-dom-parser": "1.5.2"
}

Sau đó chạy lệnh composer update

VD. Sử dụng PHP Simple HTML Dom Parser với Laravel

use Sunra\PhpSimple\HtmlDomParser;

...
$dom = HtmlDomParser::str_get_html( $str );
or
$dom = HtmlDomParser::file_get_html( $file_name );

$elems = $dom->find($elem_name);

Cài đặt thư viện Simple HTML Dom mới nhất + biên dịch với phiên bản php 7, 8

Gói PHP Simple HTML Dom Parser ở trên phiên bản không tương thích PHP 7,8, nếu bạn sử dụng phiên bản PHP 7,8 thì có thể tham khảo cách này

Đầu tiên các bạn vào trang web này để tải xuống tệp simple_html_dom. php để vào thư mục Helpers of laravel không giới hạn (thư mục mình tự tạo ra, bạn có thể bỏ qua bất kỳ thư mục nào bạn muốn). sau đó mở file composer. json ra và thêm đường dẫn tệp vừa tạo vào phần tự động tải

Trong hướng dẫn này, chúng ta sẽ xem xét Trình phân tích cú pháp DOM HTML đơn giản của PHP so sánh với Goutte FriendsOfPHP mạnh mẽ như thế nào. Trong những ngày đầu, Trình phân tích cú pháp DOM HTML đơn giản của PHP là tất cả những gì chúng tôi phải làm việc liên quan đến việc trích xuất dữ liệu từ HTML. Bây giờ chúng ta đã có FriendsOfPHP Goutte, có nhiều tính năng phong phú hơn để thực hiện loại công việc này. Trước khi bắt đầu, bạn sẽ cần định cấu hình một số thứ để thiết lập và chạy PHP Simple HTML DOM Parser, FriendsOfPHP Goutte và Guzzle PHP HTTP client. Điều này cực kỳ dễ thực hiện nhờ Composer. Tạo một thư mục trên máy tính của bạn có tên là guzzle. CD vào thư mục đó và đặt

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

0 này vào đó

{
   "require": {
      "guzzlehttp/guzzle": "~6.0",
      "emanueleminotto/simple-html-dom": "^1.5",
      "fabpot/goutte": "^3.1"
   }
}

Chạy

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

1 và hoặc

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

2 từ dòng lệnh. Mọi thứ sẽ được thiết lập cho bạn. Giờ đây, bạn có thể chỉ cần đặt trực tiếp tệp

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

3 vào đây và kiểm tra bất kỳ mã nào chúng tôi thảo luận trong hướng dẫn này. bản soạn sẵn của bạn trong chỉ mục. php sẽ trông giống như thế này, đảm bảo đừng quên yêu cầu tệp tự động tải mà trình soạn thảo tạo cho bạn

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

Chúng tôi thiết lập điều này để bạn không gặp phải bất kỳ lỗi nào như “lỗi cURL 60. Sự cố chứng chỉ SSL. không thể lấy chứng chỉ nhà phát hành địa phương”. Nếu bạn không thiết lập ứng dụng khách của mình như trên, bạn có thể gặp các lỗi này

Nhận phần tử HTML

Nhận các phần tử HTML (Trình phân tích cú pháp DOM HTML đơn giản của PHP)

Khi bạn bắt đầu với Trình phân tích cú pháp DOM HTML đơn giản của PHP, họ sẽ yêu cầu bạn làm điều gì đó như thế này. Trong trường hợp này, bạn lưu trữ một số HTML vào một biến, sau đó tìm giá trị của thuộc tính src của tất cả các thẻ hình ảnh, cùng với việc tìm giá trị của tất cả các thuộc tính href của bất kỳ liên kết nào trên trang

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

Nhận các phần tử HTML (FriendsOfPHP Goutte)

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

Như chúng ta thấy ở trên, Thư viện trình phân tích cú pháp DOM HTML đơn giản của PHP sử dụng phương thức

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

4, trong khi ở FriendsOfPHP Goutte, bạn thường sẽ sử dụng phương thức

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

5 để tìm các phần tử trong DOM. Dưới đây là các chữ ký chức năng cho cả hai phương pháp này

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

6chuỗi $selector [, int $index]

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

7
Loại trả lại có thể thay đổi. Tìm các phần tử bằng bộ chọn CSS. Trả về đối tượng phần tử thứ N nếu chỉ mục được đặt, nếu không thì trả về một mảng đối tượng.

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

8chuỗi $selector

 60,
    'verify' => false,
]);

//  Hackery to allow HTTPS
$client->setClient($guzzleclient);

7
Luôn trả về phiên bản Trình thu thập thông tin công khai. Lọc danh sách các nút bằng bộ chọn CSS

ghi chú. Phương thức find() thay đổi tùy thuộc vào những gì nó trả về cho bạn dựa trên các tham số bạn truyền vào cho nó. Điều này đôi khi có thể dẫn đến nhầm lẫn. Mặt khác, phương thức filter() luôn trả về một phiên bản Symfony Crawler.

Sửa đổi phần tử HTML

Sửa đổi các phần tử HTML (Trình phân tích cú pháp DOM HTML đơn giản của PHP)

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

Sửa đổi phần tử HTML (FriendsOfPHP Goutte)

FriendsOfPHP Goutte thực sự khuyên không nên sửa đổi DOM bằng phần mềm của họ

Mặc dù có thể, thành phần DomCrawler không được thiết kế để thao tác DOM hoặc kết xuất lại HTML/XML

Do đó, chúng tôi sẽ không cố gắng sửa đổi DOM, nhưng đây là cách bạn tìm nạp tiêu đề như trên với Goutte

$crawler = $client->request('GET', 'https://httpbin.org');

$crawler->filter('title')->each(function ($node) {
    echo $node->text() . '
';
    //  httpbin(1): HTTP Client Testing Service
});

Trích xuất nội dung từ HTML

Trích xuất nội dung từ HTML (PHP Simple HTML DOM Parser)

$html = file_get_html('https://httpbin.org');

foreach ($html->find('li') as $li) {
    echo $li->plaintext . '
';
}

Trích xuất nội dung từ HTML (FriendsOfPHP Goutte)

$crawler = $client->request('GET', 'https://httpbin.org');

$crawler->filter('li')->each(function ($node) {
    echo $node->text() . '
';
});

Kết quả của mỗi bài kiểm tra

Những thứ khá đơn giản ở đây. Như chúng ta có thể thấy, các phiên bản Goutte của FriendsOfPHP thường có cú pháp hiện đại và thanh lịch hơn một chút nhờ vào việc sử dụng chức năng của chúng, điều này giúp dễ dàng lặp lại mọi phần tử bằng chức năng ẩn danh

Cách tìm phần tử HTML

Thực sự cốt lõi của cách các thư viện này hoạt động là nhờ khả năng tìm nạp các phần tử từ DOM bằng cách sử dụng Bộ chọn CSS tiêu chuẩn. Ở đây, chúng tôi kiểm tra hầu hết tất cả các bộ chọn CSS có sẵn, ngoại trừ những bộ chọn chỉ có ý nghĩa trong ngữ cảnh của một trình duyệt web thực tế. Nếu bạn không thấy một bộ chọn cụ thể trong bảng này, điều đó có nghĩa là nó không hoạt động trong cả hai thư viện. Khi thử nghiệm tất cả các bộ chọn này, chúng tôi nhận thấy rằng Goutte có bộ tùy chọn lựa chọn CSS lớn hơn và giàu tính năng hơn. Bạn có thể sử dụng danh sách tham chiếu này của các bộ chọn CSS hoạt động với Goutte và DOM HTML đơn giản

Kiểm tra bộ chọn CSS trên phương pháp 1 của 60, 'verify' => false, ]); // Hackery to allow HTTPS $client->setClient($guzzleclient);5. Phương pháp Goutte và 60, 'verify' => false, ]); // Hackery to allow HTTPS $client->setClient($guzzleclient);4 của 2. Trình phân tích cú pháp Dom HTML đơn giản

Selector FormatExampleExample description12.class

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

2Selects all elements with class=”bash”YesYes#id

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

3Selects the element with id=”manpage”YesYes*

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

4Selects all elementsYesYeselement

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

5Selects all

elementsYesYeselement, element

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

6Selects all elements and all

elementsYesYeselement element

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

7Selects all elements inside

elementsYesYeselement > element

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

8Selects all elements where the parent is a

elementYesYeselement + element

// Create DOM from URL or file
$html = file_get_html('https://www.facebook.com');

// Find all images
foreach ($html->find('img') as $element) {
    echo $element->src . '
';
}

// Find all links
foreach ($html->find('a') as $element) {
    echo $element->href . '
';
}

9Selects all

elements that are placed immediately after

elementsYesYeselement1 ~ element2

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

0Selects every

element that are preceded by a

elementYesNo[attribute]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

1Selects all elements with a href attributeYesYes[attribute=value]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

2Selects all elements with data-bare-link=”true”YesYes[attribute~=value]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

3Selects all elements with a href attribute containing the word “Fork”YesNo[attribute|=value]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

4Selects all elements with an id attribute value starting with “-curl”YesNo[attribute^=value]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

5Selects every element whose href attribute value begins with “https”YesYes[attribute$=value]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

6Selects every element whose href attribute value ends with “.org”YesYes[attribute*=value]

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

7Selects every element whose href attribute value contains the substring
“bin”YesYes:checked

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

8Selects every checked elementYesNo:disabled

//  Make a GET request (Create DOM from URL or file)
$crawler = $client->request('GET', 'https://www.facebook.com');

//  Filter the DOM by calling an anonymous function on each node (Find all images)
$crawler->filter('img')->each(function ($node) {
    echo $node->attr('src') . '
';
});

//  (Find all links)
$crawler->filter('a')->each(function ($node) {
    echo $node->attr('href') . '
';
});

9Selects every disabled elementYesNo:empty

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

0Selects every

element that has no children (including text nodes)YesNo:enabled

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

1Selects every enabled element (simply means one that does not have disabled attribute)YesNo:first-child

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

2Selects every

element that is the first child of its parentYesNo:first-of-type

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

3Selects every

element that is the first

element of its parentYesNo:lang(language)

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

4Selects every

element with a lang attribute equal to “en”YesNo:last-child

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

5Selects every

element that is the last child of its parentYesNo:last-of-type

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

6Selects every

element that is the last

element of its parentYesNo:not(selector)

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

7Selects every element that is not a

elementYesNo:nth-child(n)

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

8Selects every element that is the second child of its parentYesNo:nth-last-child(n)

$html = file_get_html('https://httpbin.org');

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  httpbin(1): HTTP Client Testing Service
}

$html->find('title', 0)->innertext = 'Made with PHP Simple HTML DOM Parser!';

foreach ($html->find('title') as $element) {
    echo $element->plaintext;
    //  Made with PHP Simple HTML DOM Parser!
}

9Selects every element that is the second child of its parent, counting
from the last childYesNo:nth-last-of-type(n)

$crawler = $client->request('GET', 'https://httpbin.org');

$crawler->filter('title')->each(function ($node) {
    echo $node->text() . '
';
    //  httpbin(1): HTTP Client Testing Service
});

0Selects every element that is the second element of its parent, counting
from the last childYesNo:nth-of-type(n)

$crawler = $client->request('GET', 'https://httpbin.org');

$crawler->filter('title')->each(function ($node) {
    echo $node->text() . '
';
    //  httpbin(1): HTTP Client Testing Service
});

1Selects every element that is the first element of its parentYesNo:only-child

$crawler = $client->request('GET', 'https://httpbin.org');

$crawler->filter('title')->each(function ($node) {
    echo $node->text() . '
';
    //  httpbin(1): HTTP Client Testing Service
});

2Selects every element that is the only child of its parentYesNo:root

$crawler = $client->request('GET', 'https://httpbin.org');

$crawler->filter('title')->each(function ($node) {
    echo $node->text() . '
';
    //  httpbin(1): HTTP Client Testing Service
});

3Selects the document’s root elementYesNo

//---------------------------------------------------------
//---------------------------------------------------------
// .bash selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('.bash')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('.bash') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// #manpage selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('#manpage')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('#manpage') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// * all elements selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('*')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('*') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// li selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('li')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('li') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// a,h1 selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('a,h1')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('a,h1') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// li a selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('li a')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('li a') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// p > a selector test

$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('p > a')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('p > a') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// div + h1 selector test
// Goutte
// note: In Goutte you must use chained method calls for this selector
//       to work (->filter('div')->filter('h1'))
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('div')->filter('h1')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('div + h1') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// p ~ h2 selector test

$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('p ~ h2')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('p ~ h2') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// [data-bare-link=true] selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('[data-bare-link=true]')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('[data-bare-link=true]') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// [alt~=Fork] selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('[alt~=Fork]')->each(function ($node) {
    echo $node->attr('alt') . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('[alt~=Fork]') as $node) {
    echo $node->alt . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// [id|=\-curl] selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('[id|=\-curl]')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('[id|=\-curl]') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// a[href^="https"] selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('a[href^="https"]')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('a[href^="https"]') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// a[href$=".org"] selector test
// Goutte
$crawler = $client->request('GET', 'https://httpbin.org');
$crawler->filter('a[href$=".org"]')->each(function ($node) {
    echo $node->text() . '
';
});

// Simple HTML Dom
$html = file_get_html('https://httpbin.org');
foreach ($html->find('a[href$=".org"]') as $node) {
    echo $node->plaintext . '
';
}
//---------------------------------------------------------
//---------------------------------------------------------
// input:checked selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('input:checked')->each(function ($node) {
    echo $node->attr('value') . '
';
    // Condo
});
//---------------------------------------------------------
//---------------------------------------------------------
// input:disabled selector test

$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('input:disabled')->each(function ($node) {
    echo $node->attr('name') . '
';
    // job
});
//---------------------------------------------------------
//---------------------------------------------------------
// div:empty selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('div:empty')->each(function ($node) {
    echo $node->attr('id') . '
';
    // notextbud
});
//---------------------------------------------------------
//---------------------------------------------------------
// input:enabled selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('input:enabled')->each(function ($node) {
    echo $node->attr('name') . '
';
// shelter
// shelter
// name
//
// state
// username
});
//---------------------------------------------------------
//---------------------------------------------------------
// li:first-child selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('li:first-child')->each(function ($node) {
    echo $node->text() . '
';
// Apples
// 1
// one
});
//---------------------------------------------------------
//---------------------------------------------------------
// p:first-of-type selector test

$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('p:first-of-type')->each(function ($node) {
    echo $node->text() . '
';
// This first paragraph has text.
// Do you speak English?
// Yum!
});
//---------------------------------------------------------
//---------------------------------------------------------
// p:lang(en) selector test

$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('p:lang(en)')->each(function ($node) {
    echo $node->text() . '
';
// Do you speak English?
});
//---------------------------------------------------------
//---------------------------------------------------------
// li:last-child selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('li:last-child')->each(function ($node) {
    echo $node->text() . '
';
// Blueberries
// 4
// four
});
//---------------------------------------------------------
//---------------------------------------------------------
// li:last-of-type selector test

$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('li:last-of-type')->each(function ($node) {
    echo $node->text() . '
';
// Blueberries
// 4
// four
});
//---------------------------------------------------------
//---------------------------------------------------------
// :not(div) selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter(':not(div)')->each(function ($node) {
    echo $node->attr('type') . '
';
// This works, but returns too much data to put in a comment!
});
//---------------------------------------------------------
//---------------------------------------------------------
// span:nth-child(2) selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('span:nth-child(2)')->each(function ($node) {
    echo $node->text() . '
';
// Lego Dimensions
});
//---------------------------------------------------------
//---------------------------------------------------------
// span:nth-last-child(2) selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('span:nth-last-child(2)')->each(function ($node) {
    echo $node->text() . '
';
// Minecraft
});
//---------------------------------------------------------
//---------------------------------------------------------
// span:nth-of-type(1) selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('span:nth-of-type(1)')->each(function ($node) {
    echo $node->text() . '
';
// Star Wars
// Ha Ha!
// Contrived Markup!
});
//---------------------------------------------------------
//---------------------------------------------------------
// span:only-child selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter('span:only-child')->each(function ($node) {
    echo $node->text() . '
';
// Contrived Markup!
});
//---------------------------------------------------------
//---------------------------------------------------------
// :root selector test
// Goutte
$crawler = $client->request('GET', 'http://localhost/guzzle/domtesting.php');
$crawler->filter(':root')->each(function ($node) {
    echo $node->text() . '
';
// Works but too much output to comment!
});
//---------------------------------------------------------

Bây giờ, bạn sẽ nhận thấy rằng thử nghiệm ở trên tham chiếu đến hai url mục tiêu. Một là https. //httpbin. org, một trang web chuyên cung cấp loại sân chơi thử nghiệm này. Cái còn lại là một tệp trên máy chủ cục bộ bao gồm đánh dấu HTML tùy chỉnh cho mục đích thử nghiệm. Nếu bạn cũng muốn hoàn thành các bài kiểm tra trong môi trường của riêng mình, đây là phần đánh dấu cho http. // localhost/guzzle/domtesting. php

programming html

Html-dom Laravel

Cài đặt thư viện PHP Simple HTML Dom Parser

Cài đặt thư viện Simple HTML Dom mới nhất + biên dịch với phiên bản php 7, 8

Nhận phần tử HTML

Nhận các phần tử HTML (Trình phân tích cú pháp DOM HTML đơn giản của PHP)

Nhận các phần tử HTML (FriendsOfPHP Goutte)

Sửa đổi phần tử HTML

Sửa đổi các phần tử HTML (Trình phân tích cú pháp DOM HTML đơn giản của PHP)

Sửa đổi phần tử HTML (FriendsOfPHP Goutte)

Trích xuất nội dung từ HTML

Trích xuất nội dung từ HTML (PHP Simple HTML DOM Parser)

Trích xuất nội dung từ HTML (FriendsOfPHP Goutte)

Cách tìm phần tử HTML

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội