PHP “pretty print” HTML (not Tidy)

2022-08-30 14:56:47


  1. “formatOutput = true” 在 saveHTML() 中根本不起作用,只有 saveXML()
  2. 即使我使用了 saveXML(),它仍然只适用于通过 DOM 创建的元素,而不适用于 loadHTML() 中包含的元素,即使使用“preserveWhiteSpace = false”




铌。正如您可能已经猜到的那样,我不想使用Tidy扩展,因为a)它做了更多我需要它的事情(标记已经有效)和b)它实际上对HTML内容进行了更改(例如HTML 5 doctype和一些元素)。


好吧,在下面的答案的帮助下,我已经弄清楚了DOM扩展不起作用的原因。尽管给定的示例有效,但它仍然不适用于我的代码。借助此注释,我发现,如果您有任何文本节点,其中 isWhitespaceInElementContent() 为 true,则不会应用超出该点的格式。无论 preserveWhiteSpace 是否为假,都会发生这种情况。解决方案是删除所有这些节点(尽管我不确定这是否可能对实际内容产生不利影响)。

答案 1

你是对的,HTML似乎没有缩进(其他人也很困惑)。XML 工作正常,即使使用加载的代码也是如此。

function tidyHTML($buffer) {
    // load our document into a DOM object
    $dom = new DOMDocument();
    // we want nice output
    $dom->preserveWhiteSpace = false;
    $dom->formatOutput = true;

// start output buffering, using our nice
// callback function to format the output.

    <title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
// this will be called implicitly, but we'll
// call it manually to illustrate the point.


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "">
<title>foo bar</title>
<meta name="bar" value="foo">
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>

与 saveXML() 相同...

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "">
    <title>foo bar</title>
    <meta name="bar" value="foo"/>
    <h1>bar foo</h1>
    <p>It's like comparing apples to oranges.</p>


免责声明:我从泰森clugg / php手动评论中窃取了大部分演示代码。懒惰的我。

更新:我现在记得几年前我尝试了同样的事情,并遇到了同样的问题。我通过应用一个肮脏的解决方法来解决这个问题(不是性能关键):我只是以某种方式在SimpleXML和DOM之间转换,直到问题消失。我想转换摆脱了这些节点。也许用 dom 加载,用 导入,然后输出字符串,再次用 DOM 解析它,然后打印得很漂亮。据我所知,这有效(但它真的很慢)。simplexml_import_dom

答案 2


<!DOCTYPE html>
        <title>My website</title>


function indentContent($content, $tab="\t"){
    $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content); // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $token = strtok($content, "\n"); // now indent the tags
    $result = ''; // holds formatted version as it is built
    $pad = 0; // initial indent
    $matches = array(); // returns from preg_matches()
    // scan each line and adjust indent based on opening/closing tags
    while ($token !== false && strlen($token)>0){
        $padPrev = $padPrev ?: $pad; // previous padding //Artis
        $token = trim($token);
        // test for the various tag states
        if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)){// 1. open and closing tags on same line - no change
        }elseif(preg_match('/^<\/\w/', $token, $matches)){// 2. closing tag - outdent now
            if($indent>0) $indent=0;
        }elseif(preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)){// 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
            foreach($matches as $m){
                if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m)){// Void elements according to
            $indent = 1;
        }else{// 4. no indentation needed
            $indent = 0;

        if ($token == "<textarea>") {
            $line = str_pad($token, strlen($token) + $pad, $tab, STR_PAD_LEFT); // pad the line with the required number of leading spaces
            $result .= $line; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
        } elseif ($token == "</textarea>") {
            $line = $token; // pad the line with the required number of leading spaces
            $result .= $line . "\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
        } else {
            $line = str_pad($token, strlen($token) + $pad, $tab, STR_PAD_LEFT); // pad the line with the required number of leading spaces
            $result .= $line . "\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
            if ($voidTag) {
                $voidTag = false;

    return $result;

//$htmldoc - DOMdocument Object!

$niceHTMLwithTABS = indentContent($htmldoc->saveHTML(), $tab="\t");

echo $niceHTMLwithTABS;

将生成具有以下各项的 HTML:

  • 基于“级别”的缩进
  • 块级元素之后的换行符
  • 而内联和自闭合元件不受影响

