PHP - 使用“加载数据”将 CSV 文件导入 mysql 数据库在数据库中批量插入1分钟内超过700万条记录(超高速查询与计算)

2022-08-31 00:11:29

我有一个.csv这样的文件数据

Date,Name,Call Type,Number,Duration,Address,PostalCode,City,State,Country,Latitude,Longitude
"Sep-18-2013 01:53:45 PM","Unknown","outgoing call",'123456',"0 Secs","null","null","null","null","null",0.0,0.0,,,
"Sep-18-2013 01:54:14 PM","Unknown","outgoing call",'1234567890',"0 Secs","null","null","null","null","null",0.0,0.0,,,
"Sep-18-2013 01:54:37 PM","Unknown","outgoing call",'14772580369',"1 Secs","null","null","null","null","null",0.0,0.0,,,

并且我正在使用以下代码将数据插入到数据库中

$sql = "LOAD DATA INFILE `detection.csv`
              INTO TABLE `calldetections`
              FIELDS TERMINATED BY '".@mysql_escape_string(",").
             "` OPTIONALLY ENCLOSED BY `".@mysql_escape_string("\"").
             "` OPTIONALLY ENCLOSED BY `".@mysql_escape_string("\'").
             "` ESCAPED BY `".@mysql_escape_string("\\").
              "` LINES TERMINATED BY `".",,,\\r\\n".
             "`IGNORE 1 LINES `"

             ."(`date`,`name`,`type`,`number`,`duration`,`addr`,`pin`,`city`,`state`,`country`,`lat`,`log`)";
      $res = @mysql_query($con,$sql); 

但没有插入任何内容;错误在哪里?


答案 1

如果您在执行之前这样做,您会看到查询的语法不正确,原因如下:echo($sql);

  1. 文件名应括在引号中,而不是反引号,因为它是字符串文本而不是标识符。

  2. 绝对没有必要调用来指定 and 和 子句中的分隔符。mysql_escape_string()FIELDS TERMINATED BYENCLOSED BYESCAPED BY

  3. 你过度使用反引号。事实上,在你的情况下,由于没有使用保留字,你把它们都抛弃了。它们只会增加混乱。

  4. 在 CSV 文件第一行的末尾,您必须拥有,因为您使用它们作为行分隔符的一部分。如果您不这样做,您不仅可以跳过第一行,还可以跳过包含数据的第二行。,,,

  5. 不能多次使用子句。你必须以不同的方式处理领域。ENCLOSED BYNumber

  6. 看看你的样本行,恕我直言,你不需要。但是,如果您觉得需要它,请使用它。ESCAPED BYESCAPED BY '\\'

话虽如此,语法正确的陈述可能看起来像这样

LOAD DATA INFILE 'detection.csv'
INTO TABLE calldetections
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"' 
LINES TERMINATED BY ',,,\r\n'
IGNORE 1 LINES 
(date, name, type, number, duration, addr, pin, city, state, country, lat, log)

现在恕我直言,您需要在加载时转换相当多的字段:

  1. 如果表中的数据类型为数据类型,则需要对其进行转换,否则您将收到错误datedatetime

    日期时间值不正确:行“日期”列的“Sep-18-2013 01:53:45 PM”

  2. 您必须处理围绕字段中的值的单个 qoutesNumber

  3. 您很可能希望将列的字符串文本更改为实际值"null"NULLaddr, pin, city, state, country

  4. 如果持续时间始终以秒为单位,则可以提取秒的整数值,并以这种方式将其存储在表中,以便以后能够轻松聚合持续时间值。

话虽如此,该语句的有用版本应如下所示

LOAD DATA INFILE 'detection.csv'
INTO TABLE calldetections
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"' 
LINES TERMINATED BY ',,,\r\n'
IGNORE 1 LINES 
(@date, name, type, @number, @duration, @addr, @pin, @city, @state, @country, lat, log)
SET date = STR_TO_DATE(@date, '%b-%d-%Y %h:%i:%s %p'),
    number = TRIM(BOTH '\'' FROM @number),
    duration = 1 * TRIM(TRAILING 'Secs' FROM @duration),
    addr = NULLIF(@addr, 'null'),
    pin  = NULLIF(@pin, 'null'),
    city = NULLIF(@city, 'null'),
    state = NULLIF(@state, 'null'),
    country = NULLIF(@country, 'null') 

以下是在我的计算机上执行查询的结果

mysql> LOAD DATA INFILE '/tmp/detection.csv'
    -> INTO TABLE calldetections
    -> FIELDS TERMINATED BY ','
    -> OPTIONALLY ENCLOSED BY '"' 
    -> LINES TERMINATED BY ',,,\n'
    -> IGNORE 1 LINES 
    -> (@date, name, type, @number, @duration, @addr, @pin, @city, @state, @country, lat, log)
    -> SET date = STR_TO_DATE(@date, '%b-%d-%Y %h:%i:%s %p'),
    ->     number = TRIM(BOTH '\'' FROM @number),
    ->     duration = 1 * TRIM(TRAILING 'Secs' FROM @duration),
    ->     addr = NULLIF(@addr, 'null'),
    ->     pin  = NULLIF(@pin, 'null'),
    ->     city = NULLIF(@city, 'null'),
    ->     state = NULLIF(@state, 'null'),
    ->     country = NULLIF(@country, 'null');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from calldetections;
+---------------------+---------+---------------+-------------+----------+------+------+------+-------+---------+------+------+
| date                | name    | type          | number      | duration | addr | pin  | city | state | country | lat  | log  |
+---------------------+---------+---------------+-------------+----------+------+------+------+-------+---------+------+------+
| 2013-09-18 13:53:45 | Unknown | outgoing call | 123456      |        0 | NULL | NULL | NULL | NULL  | NULL    | 0.0  | 0.0  |
| 2013-09-18 13:54:14 | Unknown | outgoing call | 1234567890  |        0 | NULL | NULL | NULL | NULL  | NULL    | 0.0  | 0.0  |
| 2013-09-18 13:54:37 | Unknown | outgoing call | 14772580369 |        1 | NULL | NULL | NULL | NULL  | NULL    | 0.0  | 0.0  |
+---------------------+---------+---------------+-------------+----------+------+------+------+-------+---------+------+------+
3 rows in set (0.00 sec)

最后,在php中,将查询字符串分配给变量应该如下所示$sql

$sql = "LOAD DATA INFILE 'detection.csv'
        INTO TABLE calldetections
        FIELDS TERMINATED BY ','
        OPTIONALLY ENCLOSED BY '\"' 
        LINES TERMINATED BY ',,,\\r\\n'
        IGNORE 1 LINES 
        (@date, name, type, @number, @duration, @addr, @pin, @city, @state, @country, lat, log)
        SET date = STR_TO_DATE(@date, '%b-%d-%Y %h:%i:%s %p'),
            number = TRIM(BOTH '\'' FROM @number),
            duration = 1 * TRIM(TRAILING 'Secs' FROM @duration),
            addr = NULLIF(@addr, 'null'),
            pin  = NULLIF(@pin, 'null'),
            city = NULLIF(@city, 'null'),
            state = NULLIF(@state, 'null'),
            country = NULLIF(@country, 'null') ";

答案 2

在数据库中批量插入1分钟内超过700万条记录(超高速查询与计算)

    mysqli_query($cons, '
    LOAD DATA LOCAL INFILE "'.$file.'"
    INTO TABLE tablename
    FIELDS TERMINATED by \',\'
    LINES TERMINATED BY \'\n\'
    IGNORE 1 LINES
    (isbn10,isbn13,price,discount,free_stock,report,report_date)
     SET RRP = IF(discount = 0.00,price-price * 45/100,IF(discount = 0.01,price,IF(discount != 0.00,price-price * discount/100,@RRP))),
         RRP_nl = RRP * 1.44 + 8,
         RRP_bl = RRP * 1.44 + 8,
         ID = NULL
    ')or die(mysqli_error());
    $affected = (int) (mysqli_affected_rows($cons))-1; 
    $log->lwrite('Inventory.CSV to database:'. $affected.' record inserted successfully.');

RRP和RRP_nl和RRP_bl不在csv中,但我们计算出来并在插入后。


推荐