使用PHP DOM文档,按其类选择HTML元素并获取其文本

2022-08-30 21:57:44

我试图从div中获取文本,其中class = 'review-text',通过使用PHP的DOM元素和以下HTML(相同的结构)和以下代码。


  1. 断续器

    $html = '
        <div class="page-wrapper">
            <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
                <article class="review clearfix">
                    <div class="review-content">
                        <div class="review-text" itemprop="reviewBody">
                        Outstanding ... 
  2. 网络工序代码

        $classname = 'review-text';
        $dom = new DOMDocument;
        $xpath     = new DOMXPath($dom);
        $results = $xpath->query("//*[@class and contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
        if ($results->length > 0) {
            echo $review = $results->item(0)->nodeValue;

博客提供了按类选择元素的 XPATH 语法


答案 1

以下 XPath 查询执行所需的操作。只需将提供给$xpath->query 的参数替换为以下内容:


编辑:为了便于开发,您可以在 http://www.xpathtester.com/test 在线测试自己的XPath查询。



$html = '
    <div class="page-wrapper">
        <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
            <article class="review clearfix">
                <div class="review-content">
                    <div class="review-text" itemprop="reviewBody">
                    Outstanding ... 

$classname = 'review-text';
$dom = new DOMDocument;
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;


答案 2

扩展Frak Houweling答案,也可以使用DomXpath在特定的DomNode中进行搜索。这可以通过将作为第二个参数传递给方法来实现:contextNodeDomXpath->query

$dom = new DOMDocument;
$dom->loadHTML ($html);
$xpath = new DOMXPath ($dom);

foreach ($xpath->query ("//section[@class='page single-review']") as $section)
    // search for sub nodes inside each element
    foreach ($xpath->query (".//div[@class='review-text']", $section) as $review)
        echo $review->nodeValue;


"//div[@class='review-text']" // absolute path, search starts from the root element
".//div[@class='review-text']" // relative path, search starts from the provided contextNode
