如何使用休眠和弹簧启动启用批处理插入
我想将大量实体插入到交叉表中。为此,我想启用休眠的批处理插入选项,以便每个插入一次添加不是1行,而是20或50行。
我正在尝试重现休眠教程中的结果。我创建了一个测试函数,该函数将尝试插入30个客户,就像示例中所示:
//Batch inserting test entities
//This code is inside transaction already, so I'm not creating a new one
entityManager.flush();
for (int i = 0; i < 30; ++i) {
Customer customer = new Customer("Customer" + i, (i + 5) * 1000, "Position " + i);
entityManager.persist(customer);
if (i % 20 == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
客户类没有生成的 ID。
@Entity
public class Customer {
@Id
private String id;
@Column
private String name;
@Column
private long salary;
@Column
private String position;
public Customer() {
}
public Customer(String name, long salary, String position) {
this.id = UUID.randomUUID().toString();
this.name = name;
this.salary = salary;
this.position = position;
}
}
我期望看到的是 2 个插入语句,一个用于 20 条记录,一个用于 10 条记录。但是,当我打开postgres日志时,我看到30个插入语句,每个语句仅插入1行。我已经仔细检查了是否可以使用postgres插入多行。
我相信问题是由参数引起的,我应该将其传递到休眠状态。但是,由于我使用的是spring boot,因此我唯一的配置文件是appplication.properties。所以我尝试将其插入那里:hibernate.jdbc.batch_size 20
hibernate.jdbc.batch_size=20
hibernate.order_inserts=true
hibernate.order_updates=true
hibernate.jdbc.batch_versioned_data=true
spring.jpa.hibernate.jdbc.batch_size=20
spring.jpa.hibernate.order_inserts=true
spring.jpa.hibernate.order_updates=true
spring.jpa.hibernate.jdbc.batch_versioned_data=true
spring.jpa.properties.hibernate.jdbc.batch_size=20
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
根据这个答案,这应该足够了,但我也尝试将这些参数复制到我的jar中的文件中,并通过命令行提供参数:如本文档所述。hibernate.properties
-Dhibernate.jdbc.batch_size=20
这些都无济于事。
我也找不到在我的代码中读取批大小属性的方法。似乎hibernate文档中提到的大多数对象在spring boot应用程序中都不存在。
如何在弹簧启动应用程序中启用休眠的批量插入?
我创建了一个最小的工作示例来重现问题:https://github.com/Alexey-/spring-boot-batch
在收到Vlad Mihalcea的答案后,我意识到是的,实际上批处理语句正在起作用,至少在休眠和jdbc级别上是这样。
但是,postgres日志本身表现出非常有趣的行为:对于批处理插入和普通插入的情况,它们几乎相同。
我最初期望看到的是,休眠将使用这样的语句:
test=# INSERT INTO customer (name, position, salary, id) values ('CustomerX', 'PositionX', '1000', 'idX'), ('CUSTOMERY', 'POSITIONY', '1000', 'idY');
这将产生类似于以下内容的日志记录:
2015-12-15 11:43:33.238 MSK LOG: statement: INSERT INTO customer (name, position, salary, id) values ('CustomerX', 'PositionX', '1000', 'idX'), ('CUSTOMERY', 'POSITIONY', '1000', 'idY');
但是,事实并非如此。
当启用批量插入时(p6spy显示语句实际上是批处理的),postgres将生成类似于以下内容的日志:
2015-12-15 12:07:00.638 MSK LOG: execute <unnamed>: BEGIN
2015-12-15 12:07:00.638 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.638 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.638 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.638 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9c6a86fb-c991-4e98-aa65-fa736ef67dd7'
2015-12-15 12:07:00.638 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.638 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9c6a86fb-c991-4e98-aa65-fa736ef67dd7'
2015-12-15 12:07:00.639 MSK LOG: duration: 1.000 ms
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms parse S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'c8b2669c-044a-4a4d-acbd-31c3bcd9a783'
2015-12-15 12:07:00.648 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'c8b2669c-044a-4a4d-acbd-31c3bcd9a783'
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.648 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.648 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '1c694f41-2ce7-4ee2-a0c0-f359690506f0'
2015-12-15 12:07:00.649 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '1c694f41-2ce7-4ee2-a0c0-f359690506f0'
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '1815947d-2604-48d4-a6be-43f6905130cf'
2015-12-15 12:07:00.649 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '1815947d-2604-48d4-a6be-43f6905130cf'
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = 'cc521007-820f-4d58-8e1a-16a166aa91cf'
2015-12-15 12:07:00.649 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:07:00.649 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = 'cc521007-820f-4d58-8e1a-16a166aa91cf'
2015-12-15 12:07:00.649 MSK LOG: duration: 0.000 ms
... the rest of the logs is identical and do not provide any valuable information...
当批处理语句被禁用时(p6spy显示不执行批处理),日志将如下所示:
2015-12-15 12:09:00.246 MSK LOG: execute <unnamed>: BEGIN
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.246 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9e085ad0-437f-4d7d-afaa-e342e031cbee'
2015-12-15 12:09:00.246 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.246 MSK DETAIL: parameters: $1 = 'Customer0', $2 = 'Position 0', $3 = '0', $4 = '9e085ad0-437f-4d7d-afaa-e342e031cbee'
2015-12-15 12:09:00.246 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.248 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.248 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.248 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'f29cfa40-7d24-49a6-ae5d-2a2021932d80'
2015-12-15 12:09:00.248 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.248 MSK DETAIL: parameters: $1 = 'Customer1', $2 = 'Position 1', $3 = '10', $4 = 'f29cfa40-7d24-49a6-ae5d-2a2021932d80'
2015-12-15 12:09:00.249 MSK LOG: duration: 1.000 ms
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '067dd6d4-5060-467f-b533-75994ecbaedc'
2015-12-15 12:09:00.250 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK DETAIL: parameters: $1 = 'Customer2', $2 = 'Position 2', $3 = '20', $4 = '067dd6d4-5060-467f-b533-75994ecbaedc'
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms parse <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.250 MSK LOG: duration: 0.000 ms bind <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '7df32327-f2f5-4011-848d-55aafb3f09fa'
2015-12-15 12:09:00.251 MSK LOG: execute <unnamed>: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer3', $2 = 'Position 3', $3 = '30', $4 = '7df32327-f2f5-4011-848d-55aafb3f09fa'
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms parse S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms bind S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = '1e55ab6a-8780-4c8f-8af2-2886d954f819'
2015-12-15 12:09:00.251 MSK LOG: execute S_1: insert into customer (name, position, salary, id) values ($1, $2, $3, $4)
2015-12-15 12:09:00.251 MSK DETAIL: parameters: $1 = 'Customer4', $2 = 'Position 4', $3 = '40', $4 = '1e55ab6a-8780-4c8f-8af2-2886d954f819'
2015-12-15 12:09:00.251 MSK LOG: duration: 0.000 ms
... the rest of the logs is identical and do not provide any valuable information...
因此,两者之间的唯一区别是当禁用批处理时,postgres将花费更多的时间来意识到它应该重用预准备的语句。
因此,我决定运行性能测试来确认这一点。
我尝试将30.000条记录插入空数据库。
禁用批处理后,需要 334 毫秒。
启用批处理后,需要 4650 毫秒!
因此,我删除了对entityManager.flush的所有调用(只留下entityManager.clear),时间下降到320ms。我不知道为什么hibernate的教程建议使用齐平。我想这只是一个错误。
所以我想底线是这样的:批处理正在工作,但它并没有提供任何真正的好处(至少对于postgres)。此外,不正确地使用它(如教程中所示)可能会导致可怕的,可怕的结果。使用风险自负,并始终衡量实际性能提升。