function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
MJ09MJ09 

Batch Apex problems

In several projects, in several orgs, I've scheduled some Batch Apex jobs to run nightly to process large numbers of records. I've run into a couple of problems that are leaving me very uncertain about whether Batch Apex really can handle large jobs.

Every now and then, a job will fail with this error: Unable to write to any of the ACS stores in the alloted time. I first encountered this in September 2010. I filed a Case and created a discussion group posting (http://boards.developerforce.com/t5/Apex-Code-Development/Unable-to-write-to-any-of-the-ACS-stores-in-the-alloted-time/m-p/205908#M36022). After a few weeks, I was finally told that it was an internal issue that had been resolved. After months of running nightly Batch Apex jobs without this problem, it just recurred.

A second issue is that, every now and then, a Batch Apex job gets stuck in the queue in the "Queued" state. When you launch a Batch Apex job, it gets added to the queue in the "Queued" state, and when the system gets around to executing it, the job gets moved to a "Processing" state. Well, I have batch jobs that have been stuck in the "Queued" state since early January. I've had cases open on this problem for over a month, and while the Case finally found its way to Tier 3 Support, there's still no sign of a resolution.

In both cases, the issue is NOT an Apex coding problem. It's an issue with how the platform is queueing and processing Batch Apex jobs.

I'm wondeirng whether anybody else has run into these problems, or other problems executing Batch Apex jobs. What problems have you run into? How have you resolved or worked around them?

Thanks for your insights.

 
Aiden ByrneAiden Byrne

I ran into the same queue issues. This can really impact your solution as you are limited to a max of 5 jobs at any one time. In my case, I kick off jobs on a scheduled basis. The first thing I do is "clean the slate" where I kill any existing jobs. I use a simple abort job loop like the following;

 

 

 ApexClass batchClass = [select id, name from apexclass where name = 'WVBatchGeocodeBatchable'];
 AsyncApexJob[] activeJobs = [select id from AsyncApexJob where(ApexClassId = :batchClass.Id and (Status = 'Queued' or Status = 'Processing')];
  
 for(Integer count=0; count<activeJobs.size(); count++ )
 {
  try 
  {  
   System.abortJob(activeJobs[count].id);
  }
  catch( Exception myException )
  {
  system.debug(myException);
  }
 }  

 

dmchengdmcheng

I just received the ACS Stores error today.  I've opened a case, let's see what they say.

 

BTW I searched the force.com site for "ACS stores" and there were no results.  Very annoying that the search engine here is still so low-quality.

jwhartfieldjwhartfield

We have been having this problem as well recently - and it seems to have started up around 2/4.  We are an ISV and have hundreds of instances out there and now we are seeing ACS errors come in at all times of the day, not just in the wee hours of the morning as we had previously seen.

 

Last night we got something new: " Unable to retrieve file from ACS, transient error?" from 4 of our customer ORGS right at 11pm.  (Not sure why it is posed as a question, cause I don't really know. :smileywink: )

MJ09MJ09

I finally got through to somebody in Tier 3 Support about the issue where some Batch Apex jobs get stuck in the "Queued" state. Turns out to be 2 related issues.

 

First, the query in our start() method is taking so long to run that it eventually times out. Apparently, the system re-tries the query five times, then gives up.

 

Second, when the system finally gives up, it fails to change the job's status in the queue that's visible from the Setup | Monitoring | Apex Jobs page, so the job appears to be stuck in the Queued state, when in fact, the system has given up on it.

 

According to support, our query is taking so long to execute not because of the large number of records it fetches, but rather because the query uses an ORDER BY clause, which, for large numbers of records, can take a long time to process. I removed the ORDER BY clause, and will monitor whether we stop seeing the job getting stuck. (It would have been nice to keep the ORDER BY in place so I could count on records being processed in a specific order, but in this particular job, it's not absolutely necessary.)

 

They suggested another way that we might improve performance of that query. Apparently, the results of a query remain in the system's cache until there are 10 other queries issued for that org. So if we have a job that runs in the wee hours of the morning, when the org is otherwise relatively quiet, we could schedule another job to run shortly before the real job. That "pre" job would simply issue the query and then (probably in its first execute() invocation) abort itself. Issuing the query would cause the query results to be placed in the system's cache, where (depending on other activity in that org) they might still remain when the "real" job executes, causing the "real" job's query to execute more quickly. I understand the theory behind their suggestion, but it's not clear to me that it's a viable approach, since it's so dependent on what other processes might be issuing queries in that org at the same time.

 

Support says they're working on the second problem, updating the Apex Jobs list when the system gives up on a job.

 

I mentioned the "Unable to write to any of the ACS stores" issue, and Tier 3 Support said that it's a known issue that happens when there is a "huge load" on their servers. They say it's currently only an issue on na7. (We got the same error several times a few months ago. I think it was on several instances, not just na7, but I'm not sure. At the time, they said that it was a fragmentation issue. I don't know whether it's the same issue or not.) They suggested we create a Case for it, so we can be notified when the issue is resolved.

 

dmchengdmcheng

Thanks for posting that info.  Yup I'm having that ACS Stores error on the na7 instance.

DennisYZFDennisYZF

Thank you, this was an extremely helpful explanation - much appreciated.

Denis

MikeCampMikeCamp
@Aiden Byrne
If it is a scheduled batch job you can use the following
CronJobDetail cronJob = [Select Id From CronJobDetail where Name = :'Cron Job Name Here'];
CronTrigger cronTrigger = [Select Id From CronTrigger where CronJobDetailId = :cronJob.id];
try{  
     System.abortJob(cronTrigger.id);
   }
   catch( Exception myException )
   {
     system.debug(myException);
   }
kumar_arunkumar_arun
Thanks !! this solution work correctly.