$xml_string = str_replace(' ', "\t", $xml_string);
$re = '%# Match leading spaces following leading tabs.
^ # Anchor to start of line.
(\t*) # $1: Preserve any/all leading tabs.
[ ]{2} # Match "n" spaces.
while(preg_match($re, $xml_string))
$xml_string = preg_replace($re, "$1\t", $xml_string);
请注意,Qtax有一个优雅的解决方案,工作得很好(我给了它我的+1)。但是,我的基准测试显示它比原始回调方法慢。我认为这是因为表达式不允许正则表达式引擎利用:“锚定在模式开始时”内部优化。RE 引擎被迫针对目标字符串中的每个位置测试模式。对于以锚点开头的模式表达式,RE引擎只需要在每行的开头进行检查,这样它就可以更快地匹配。/(?:^|\G) /um
<?php // test.php 20120308_1200
// -------------------------------------------------------
// Test 1: Recursive method. (ridgerunner)
function tabify_leading_spaces_1($xml_string) {
$re = '%# Match leading spaces following leading tabs.
^ # Anchor to start of line.
(\t*) # $1: Any/all leading tabs.
[ ]{2} # Match "n" spaces.
while(preg_match($re, $xml_string))
$xml_string = preg_replace($re, "$1\t", $xml_string);
return $xml_string;
// -------------------------------------------------------
// Test 2: Original callback method. (hakre)
function tabify_leading_spaces_2($xml_string) {
return preg_replace_callback('/^(?:[ ]{2})+/um', '_callback', $xml_string);
function _callback($m) {
$spaces = strlen($m[0]);
$tabs = $spaces / 2;
return str_repeat("\t", $tabs);
// -------------------------------------------------------
// Test 3: Qtax's elegantly simple \G method. (Qtax)
function tabify_leading_spaces_3($xml_string) {
return preg_replace('/(?:^|\G) /um', "\t", $xml_string);
// -------------------------------------------------------
// Verify we get the same results from all methods.
$data = file_get_contents('testdata.txt');
$data1 = tabify_leading_spaces_1($data);
$data2 = tabify_leading_spaces_2($data);
$data3 = tabify_leading_spaces_3($data);
if ($data1 == $data2 && $data2 == $data3) {
echo ("GOOD: Same results.\n");
} else {
exit("BAD: Different results.\n");
// Measure and print the function execution times.
$time1 = benchmark_12('tabify_leading_spaces_1', $data, 2, true);
$time2 = benchmark_12('tabify_leading_spaces_2', $data, 2, true);
$time3 = benchmark_12('tabify_leading_spaces_3', $data, 2, true);
<?php // benchmark.inc.php
function benchmark_12($funcname, $p1, $reptime = 1.0, $verbose = true, $p2 = NULL) {}
By: Jeff Roberson
Created: 2010-03-17
Last edited: 2012-03-08
This function measures the time required to execute a given function by
calling it as many times as possible within an allowed period == $reptime.
A first pass determines a rough measurement of function execution time
by increasing the $nreps count by a factor of 10 - (i.e. 1, 10, 100, ...),
until an $nreps value is found which takes more than 0.01 secs to finish.
A second pass uses the value determined in the first pass to compute the
number of reps that can be performed within the allotted $reptime seconds.
The second pass then measures the time required to call the function the
computed number of times (which should take about $reptime seconds). The
average function execution time is then computed by dividing the total
measured elapsed time by the number of reps performed in that time, and
then all the pertinent values are returned to the caller in an array.
Note that this function is limited to measuring only those functions
having either one or two arguments that are passed by value and
not by reference. This is why the name of this function ends with "12".
Variations of this function can be easily cloned which can have more
than two parameters.
$funcname: String containing name of function to be measured. The
function to be measured must take one or two parameters.
$p1: First argument to be passed to $funcname function.
$reptime Target number of seconds allowed for benchmark test.
(float) (Default=1.0)
$verbose Boolean value determines if results are printed.
(bool) (Default=true)
$p2: Second (optional) argument to be passed to $funcname function.
Return value:
$result[] Array containing measured and computed values:
$result['funcname'] : $funcname - Name of function measured.
$result['msg'] : $msg - String with formatted results.
$result['nreps'] : $nreps - Number of function calls made.
$result['time_total'] : $time - Seconds to call function $nreps times.
$result['time_func'] : $t_func - Seconds to call function once.
$result['result'] : $result - Last value returned by function.
$time: Float epoch time (secs since 1/1/1970) or benchmark elapsed secs.
$i: Integer loop counter.
$nreps Number of times function called in benchmark measurement loops.
function benchmark_12($funcname, $p1, $reptime = 1.0, $verbose = false, $p2 = NULL) {
if (!function_exists($funcname)) {
exit("\n[benchmark1] Error: function \"{$funcname}()\" does not exist.\n");
if (!isset($p2)) { // Case 1: function takes one parameter ($p1).
// Pass 1: Measure order of magnitude number of calls needed to exceed 10 milliseconds.
for ($time = 0.0, $n = 1; $time < 0.01; $n *= 10) { // Exponentially increase $nreps.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $n; ++$i) { // Loop $n times. ($n = 1, 10, 100...)
$result = ($funcname($p1)); // Call the function over and over...
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$nreps = $n; // Number of reps just measured.
$t_func = $time / $nreps; // Function execution time in sec (rough).
// Pass 2: Measure time required to perform $nreps function calls (in about $reptime sec).
if ($t_func < $reptime) { // If pass 1 time was not pathetically slow...
$nreps = (int)($reptime / $t_func); // Figure $nreps calls to add up to $reptime.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $nreps; ++$i) { // Loop $nreps times (should take $reptime).
$result = ($funcname($p1)); // Call the function over and over...
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$t_func = $time / $nreps; // Average function execution time in sec.
} else { // Case 2: function takes two parameters ($p1 and $p2).
// Pass 1: Measure order of magnitude number of calls needed to exceed 10 milliseconds.
for ($time = 0.0, $n = 1; $time < 0.01; $n *= 10) { // Exponentially increase $nreps.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $n; ++$i) { // Loop $n times. ($n = 1, 10, 100...)
$result = ($funcname($p1, $p2)); // Call the function over and over...
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$nreps = $n; // Number of reps just measured.
$t_func = $time / $nreps; // Function execution time in sec (rough).
// Pass 2: Measure time required to perform $nreps function calls (in about $reptime sec).
if ($t_func < $reptime) { // If pass 1 time was not pathetically slow...
$nreps = (int)($reptime / $t_func); // Figure $nreps calls to add up to $reptime.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $nreps; ++$i) { // Loop $nreps times (should take $reptime).
$result = ($funcname($p1, $p2)); // Call the function over and over...
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$t_func = $time / $nreps; // Average function execution time in sec.
$msg = sprintf("%s() Nreps:%7d Time:%7.3f s Function time: %.6f sec\n",
$funcname, $nreps, $time, $t_func);
if ($verbose) echo($msg);
return array('funcname' => $funcname, 'msg' => $msg, 'nreps' => $nreps,
'time_total' => $time, 'time_func' => $t_func, 'result' => $result);
当我使用 的内容运行时,这是我得到的结果:test.php
GOOD: Same results.
tabify_leading_spaces_1() Nreps: 1756 Time: 2.041 s Function time: 0.001162 sec
tabify_leading_spaces_2() Nreps: 1738 Time: 1.886 s Function time: 0.001085 sec
tabify_leading_spaces_3() Nreps: 2161 Time: 2.044 s Function time: 0.000946 sec