我在两个项目中实现了@ridgerunner的答案,最终在其中一个项目的暂存中遇到了一些严重的减速(10-30秒的请求时间)。我发现我必须同时将两者都设置得很低才能使其工作,但即使这样,它也会在大约2个仙几毫秒的处理后放弃并返回false。pcre.recursion_limit
pcre.backtrack_limit
从那以后,我用这个解决方案(更容易掌握的正则表达式)替换了它,它受到Smarty 2的exputfilter.trimwhitespace函数的启发。它不做回溯或递归,每次都有效(而不是在蓝月亮中灾难性地失败一次):
function filterHtml($input) {
// Remove HTML comments, but not SSI
$input = preg_replace('/<!--[^#](.*?)-->/s', '', $input);
// The content inside these tags will be spared:
$doNotCompressTags = ['script', 'pre', 'textarea'];
$matches = [];
foreach ($doNotCompressTags as $tag) {
$regex = "!<{$tag}[^>]*?>.*?</{$tag}>!is";
// It is assumed that this placeholder could not appear organically in your
// output. If it can, you may have an XSS problem.
$placeholder = "@@<'-placeholder-$tag'>@@";
// Replace all the tags (including their content) with a placeholder, and keep their contents for later.
$input = preg_replace_callback(
$regex,
function ($match) use ($tag, &$matches, $placeholder) {
$matches[$tag][] = $match[0];
return $placeholder;
},
$input
);
}
// Remove whitespace (spaces, newlines and tabs)
$input = trim(preg_replace('/[ \n\t]+/m', ' ', $input));
// Iterate the blocks we replaced with placeholders beforehand, and replace the placeholders
// with the original content.
foreach ($matches as $tag => $blocks) {
$placeholder = "@@<'-placeholder-$tag'>@@";
$placeholderLength = strlen($placeholder);
$position = 0;
foreach ($blocks as $block) {
$position = strpos($input, $placeholder, $position);
if ($position === false) {
throw new \RuntimeException("Found too many placeholders of type $tag in input string");
}
$input = substr_replace($input, $block, $position, $placeholderLength);
}
}
return $input;
}