PHP Xpath ：获取所有包含针的 href 值

php href xpath

2022-08-30 20:39:42

使用PHP Xpath尝试快速拉取html页面中的某些链接。

下面将找到 mypage 上的所有 href 链接.html：$nodes = $x->query("//a[@href]");

然而，以下内容将找到描述与我的针匹配的所有href链接：$nodes = $x->query("//a[contains(@href,'click me')]");

我试图实现的是匹配href本身，更具体地查找包含某些参数的URL。这在 Xpath 查询中是可能的，还是应该开始操作第一个 Xpath 查询的输出？

答案 1

不确定我是否正确理解了这个问题，但是第二个XPath表达式已经执行了您所描述的操作。它与 A 元素的文本节点不匹配，但与 href 属性匹配：

$html = <<< HTML
<ul>
    <li>
        <a href="http://example.com/page?foo=bar">Description</a>
    </li>
    <li>
        <a href="http://example.com/page?lang=de">Description</a>
    </li>
</ul>
HTML;

$xml  = simplexml_load_string($html);
$list = $xml->xpath("//a[contains(@href,'foo')]");

输出：

array(1) {
  [0]=>
  object(SimpleXMLElement)#2 (2) {
    ["@attributes"]=>
    array(1) {
      ["href"]=>
      string(31) "http://example.com/page?foo=bar"
    }
    [0]=>
    string(11) "Description"
  }
}

如您所见，返回的 NodeList 仅包含 A 元素，其中 href 包含 foo（我理解这是您要查找的）。它控制整个元素，因为 XPath 转换为 Fetch 所有具有包含 foo 的 href 属性的 A 元素。然后，您将访问该属性

echo $list[0]['href'] // gives "http://example.com/page?foo=bar"

如果您只想返回属性本身，则必须这样做

//a[contains(@href,'foo')]/@href

请注意，在 SimpleXml 中，这将返回一个 SimpleXml 元素：

array(1) {
  [0]=>
  object(SimpleXMLElement)#3 (1) {
    ["@attributes"]=>
    array(1) {
      ["href"]=>
      string(31) "http://example.com/page?foo=bar"
    }
  }
}

但是您现在可以通过以下方式输出URL

echo $list[0] // gives "http://example.com/page?foo=bar"

答案 2