在每 100 行 10 000 行上使用 flush（）方法会降低事务速度

java flush entitymanager spring-data-jpa spring-boot

2022-09-04 20:47:44

我有一个示例项目，使用一个表和一个表。spring-bootspring-data-jpapostgres db

我正在尝试将循环中的 10 000 条记录放入表中并测量执行时间 - 为每 100 条记录启用或禁用类中的方法。INSERTflush()EntityManager

预期的结果是，启用方法的执行时间比禁用方法的执行时间短得多，但实际上我有相反的结果。flush()

用户服务.java

package sample.data;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class UserService {
    @Autowired
    UserRepository userRepository;

    public User save(User user) {
        return userRepository.save(user);
    }
}

用户存储库.java

package sample.data;

import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface UserRepository extends JpaRepository<User, Long> { }

应用.java

package sample;

import org.springframework.data.jpa.repository.config.EnableJpaRepositories;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.transaction.annotation.Transactional;

import sample.data.User;
import sample.data.UserService;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;

@SpringBootApplication
@EnableJpaRepositories(considerNestedRepositories = true)
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @Autowired
    private UserService userService;

    @PersistenceContext
    EntityManager entityManager;

    @Bean
    public CommandLineRunner addUsers() {
        return new CommandLineRunner() {
            @Transactional
            public void run(String... args) throws Exception {
                long incoming = System.currentTimeMillis();
                for (int i = 1; i <= 10000; i++) {
                    userService.save(new User("name_" + i));

                    if (i % 100 == 0) {
                        entityManager.flush();
                        entityManager.clear();
                    }
                }
                entityManager.close();
                System.out.println("Time: " + (System.currentTimeMillis() - incoming));
            }
        };
    }
}

答案 1

确保在持久性提供程序配置中启用 JDBC 批处理。如果您使用的是 Hibernate，请将以下内容添加到您的 Spring 属性中：

spring.jpa.properties.hibernate.jdbc.batch_size=20   // or some other reasonable value

如果不启用批处理，我猜性能回归是由于每 100 个实体清除持久性上下文的开销，但我不确定这一点（您必须进行测量）。

更新：

实际上，启用JDBC批处理或禁用它不会影响这样一个事实，即每隔一段时间完成一次不会比没有它更快。您使用手册控制的不是如何完成刷新（通过批处理语句或单元插入），而是控制何时完成对数据库的刷新。flush()flush()

因此，您要比较的是以下内容：

对于每 100 个对象：在刷新时将 100 个实例插入到数据库中，执行 10000 / 100 = 100 次。flush()
如果没有：，您只需在内存中收集上下文中的所有 10000 个对象，并在提交事务时执行 10000 次插入。flush()

另一个上的 JDBC 批处理会影响刷新的发生方式，但它仍然是使用 vs 发出的语句数与不带的语句数相同。flush()flush()

在循环中每隔一段时间刷新和清除的好处是可以避免由于缓存包含太多对象而导致的可能。OutOfMemoryError

答案 2

编写微基准测试很难，Aleksey Shipilev在他的“JMH vs Caliper：参考线程”帖子中对此进行了大量说明。您的案例并不完全是一个微观基准，而是：

低于 10，000 次重复不会让 JVM 在默认设置上预热和 JIT 代码。在测量代码性能之前，请预热 JVM。
System.nanoTime()不用于测量经过的时间。如果您在测量结果会因时钟漂移而偏斜。System.currentTimeMillis()msSystem.currentTimeMillis()
您很可能希望在数据库端对此进行测量，以查明瓶颈。如果没有瓶颈，很难理解根本原因是什么，例如，您的数据库可能位于大西洋的另一边，网络连接成本将超过报表成本。INSERT
您的基准测试是否足够孤立？如果数据库由多个用户和连接共享，则除了基准测试之外，它的性能会有所不同。

找到当前设置中的瓶颈，假设如何验证它，更改基准以匹配假设，然后再次测量以确认。这是解决问题的唯一方法。

在每 100 行 10 000 行上使用 flush（） 方法会降低事务速度

在每 100 行 10 000 行上使用 flush（）方法会降低事务速度