如何确定 csv 文件字段是制表符分隔还是逗号分隔

php
2022-08-30 23:55:10

如何确定csv文件字段是制表符分隔还是逗号分隔。我需要 php 验证。任何人都可以帮忙。提前致谢。


答案 1

现在回答这个问题已经太晚了,但希望它能帮助别人。

下面是一个简单的函数,它将返回文件的分隔符。

function getFileDelimiter($file, $checkLines = 2){
        $file = new SplFileObject($file);
        $delimiters = array(
          ',',
          '\t',
          ';',
          '|',
          ':'
        );
        $results = array();
        $i = 0;
         while($file->valid() && $i <= $checkLines){
            $line = $file->fgets();
            foreach ($delimiters as $delimiter){
                $regExp = '/['.$delimiter.']/';
                $fields = preg_split($regExp, $line);
                if(count($fields) > 1){
                    if(!empty($results[$delimiter])){
                        $results[$delimiter]++;
                    } else {
                        $results[$delimiter] = 1;
                    }   
                }
            }
           $i++;
        }
        $results = array_keys($results, max($results));
        return $results[0];
    }

使用此功能,如下所示:

$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter

附言:我用preg_split()代替explode(),因为explode('\t',$value)不会给出正确的结果。

更新:感谢您@RichardEB指出代码中的错误。我现在已经更新了这个。


答案 2

这就是我的工作。

  1. 解析 CSV 文件的前 5 行
  2. 计算每行中分隔符 [逗号、制表符、分号和冒号] 的数量
  3. 比较每行中的分隔符数。如果您有格式正确的 CSV,则每行中的一个分隔符计数将匹配。

这不会100%工作,但这是一个不错的起点。它至少会减少可能的分隔符的数量(使用户更容易选择正确的分隔符)。

/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab'       => "\t",
                'comma'     => ",",
                'semicolon' => ";"
                );

$handle = file( $file );    # Grabs the CSV file, loads into array

$line = array();            # Stores the count of delimiters in each row

$valid_delimiter = array(); # Stores Valid Delimiters

# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
    $line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}


# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){

# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
    $match = true;

    $prev_value = '';
    foreach ( $count as $value ){

        if ( $prev_value != '' )
            $match = ( $prev_value == $value and $match == true ) ? true : false;

        $prev_value = $value;
    }

} else { 
    $match = false;
}

if ( $match == true )    $valid_delimiter[] = $delimiter;

}//foreach

# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";


/*  !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/

# The Delimiter for the CSV
echo $delimiters[$delimiter]; 

推荐