+ Start a Discussion
Yad JayanthYad Jayanth 

Unable to retrieve file from ACS, transient error?

Having created a managed package which is deployed in a customer's org, we are running into an error that results in the processing stalling and never finishing. 

Details : 
-34 Batches are queued, processing of the first appears to complete (dml associated with it seems to happen)
-Debug logs seem to indicate that the second batch is processing, but dml does not occur (though it says it does in logs)
-In Apex Jobs, we perpetually see the job as processing the first batch. After many hours it seems to resolve, usually as a Internal Salesforce.com Error.
-We received the " Unable to retrieve file from ACS, transient error?" error by email for this once. 
-There appear to be other processes in this org (triggers in managed packages) that may be put to work as a result of inserts/updates from our batch processes.
-The code itself is making some callouts, processing the response, and inserting/updating records in salesforce. This code/package has not experienced this issue in any other orgs. 
-The particular response being processed for this customer does not appear anomalous in anyway. 
- It ALWAYS stalls in the same way, processing the first batch then not completing the second or continuing after that

Seeing some old results on this error indicating this was an issue particular to na7 a number of years ago. This instance is na9, and they tell me that it is now 2016. 

Opened up a case with Salesforce, to no avail; a lack of premier support resulted in their suggestion to make this post. 

I appreciate your help. Thanks.

There is another post available for the same error. It has some explanations for the error https://developer.salesforce.com/forums/?id=906F00000008ybwIAA
Yad JayanthYad Jayanth
Thank you Prolay : unfortunately that post did not help with debugging this issue, I did encounter it when looking into this earlier. Our batch method does not rely on a query of data and our start method is not bogged down by it. Further, since the first batch does process, we are past the start() method. Rather, our approach performs a callout (each batch execution is a certain page of an endpoint) and processes records. That discussion also mentions there being a high load on a particular instance (na7) at the time (2011), would be good to know if something like that is happening now too.
As you mentioned it as a particular Org specific you need to check whether other batch/schedule apex codes are running on the same objects or not including the triggers. If you are managing your package specific triggers through custom settings the try to switch off your trigger execution and run other batch/schedule apex which are developed by the customer. As I understood from the error explanations that this issue occured due to load in the server as per Salesforce.
To investigate which batch/scheduled job causing the issue