石英在发生故障时重试

2022-09-01 17:27:12

假设我有一个触发器,它以这种方式配置:

<bean id="updateInsBBTrigger"         
    class="org.springframework.scheduling.quartz.CronTriggerBean">
    <property name="jobDetail" ref="updateInsBBJobDetail"/>
    <!--  run every morning at 5 AM  -->
    <property name="cronExpression" value="0 0 5 * * ?"/>
</bean>

触发器必须与另一个应用程序连接,如果存在任何问题(如连接失败),则应每 10 分钟重试任务五次,或者直到成功。有没有办法将触发器配置为像这样工作?


答案 1

来源在 Quartz 中自动重试失败的作业

如果你想让一个不断尝试一遍又一遍直到成功的工作,你所要做的就是抛出一个带有标志的JobExecutionException,告诉调度程序在失败时再次触发它。下面的代码演示如何:

class MyJob implements Job {

    public MyJob() {
    }

    public void execute(JobExecutionContext context) throws JobExecutionException {

        try{
            //connect to other application etc
        }
        catch(Exception e){

            Thread.sleep(600000); //sleep for 10 mins

            JobExecutionException e2 = new JobExecutionException(e);
            //fire it again
            e2.setRefireImmediately(true);
            throw e2;
        }
    }
}

如果您想重试一定次数,它会变得更加复杂。您必须使用 StatefulJob 并在其 JobDataMap 中保留 retryCounter,如果作业失败,则会递增该计数器。如果计数器超过最大重试次数,则可以根据需要禁用该作业。

class MyJob implements StatefulJob {

    public MyJob() {
    }

    public void execute(JobExecutionContext context) throws JobExecutionException {
        JobDataMap dataMap = context.getJobDetail().getJobDataMap();
        int count = dataMap.getIntValue("count");

        // allow 5 retries
        if(count >= 5){
            JobExecutionException e = new JobExecutionException("Retries exceeded");
            //make sure it doesn't run again
            e.setUnscheduleAllTriggers(true);
            throw e;
        }


        try{
            //connect to other application etc

            //reset counter back to 0
            dataMap.putAsString("count", 0);
        }
        catch(Exception e){
            count++;
            dataMap.putAsString("count", count);
            JobExecutionException e2 = new JobExecutionException(e);

            Thread.sleep(600000); //sleep for 10 mins

            //fire it again
            e2.setRefireImmediately(true);
            throw e2;
        }
    }
}

答案 2

我建议使用这样的实现在失败后恢复作业:

final JobDataMap jobDataMap = jobCtx.getJobDetail().getJobDataMap();
// the keys doesn't exist on first retry
final int retries = jobDataMap.containsKey(COUNT_MAP_KEY) ? jobDataMap.getIntValue(COUNT_MAP_KEY) : 0;

// to stop after awhile
if (retries < MAX_RETRIES) {
  log.warn("Retry job " + jobCtx.getJobDetail());

  // increment the number of retries
  jobDataMap.put(COUNT_MAP_KEY, retries + 1);

  final JobDetail job = jobCtx
      .getJobDetail()
      .getJobBuilder()
       // to track the number of retries
      .withIdentity(jobCtx.getJobDetail().getKey().getName() + " - " + retries, "FailingJobsGroup")
      .usingJobData(jobDataMap)
      .build();

  final OperableTrigger trigger = (OperableTrigger) TriggerBuilder
      .newTrigger()
      .forJob(job)
       // trying to reduce back pressure, you can use another algorithm
      .startAt(new Date(jobCtx.getFireTime().getTime() + (retries*100))) 
      .build();

  try {
    // schedule another job to avoid blocking threads
    jobCtx.getScheduler().scheduleJob(job, trigger);
  } catch (SchedulerException e) {
    log.error("Error creating job");
    throw new JobExecutionException(e);
  }
}

为什么?

  1. 它不会阻止石英工人
  2. 它将避免背压。使用setRefire立即启动工作,这可能会导致背压问题

推荐