+ Start a Discussion
Ker Vin.ax1237Ker Vin.ax1237 

Triggers and batch processing

Hi there, just wanted to get some information from the users around here. I'm fairly new to the APEX scene and can't quite find the information I want to know.

As I understand, SFDC processes records in batches and therefore APEX coding/triggers have to be written in a manner of "batch processing". I understand that there is a limit of concurrent batches. Also, there is a limit to the number of records returned from any SOQL query. My questions are as follows

1) When writing a trigger and using the     Set<Id> idsToUpdate = trigger.newmap.keySet();command, won't we hit the limit fairly quickly if there is a large update running? As I understand, the command captures all records that execute the trigger. If I were to update 5000 records, then there will be that many entries in that set. Won't this be an issue?

2) Let's say that there is a limit of concurrent batch jobs. Using the above as an example, I want to match each newly created record to every other record in the database using Database.batchable. 
So, I can issue a (for Sobject objectname : objectList) loop and then pass each [objectname] into a Database.batchable job. 
As I understand, there is a limit to the number of concurrent batch jobs. For a sufficiently large number of records, this limit will be hit quite quickly right?

Can anyone suggest a better way for situation (2)?


1) Salesforce.com batches your records in units of 200, so you won't reach that limit. Also, getKey() isn't a query by itself, and so has no signifcant performance penalty. When you query using it, you'll only pull 200 records (or more, if you're querying a child of the object in the trigger; these should use a LIMIT clause if you feel there will be more than 2k results or so).


2) You probably do not want to "batch" this sort of transaction, because, as you've noted, if you updated 5000 records at once, you'd probably hit your queued job limit (which peaks at 2000 records at 200 per batch). Depending on your query logic, you should be able to match against even a large record set using just a selective query to bring back only the results that interest you. You might have to be creative with your queries to minimize the number of records returned.


I can't really guide you must beyond this point, because I simply don't have enough details about what you're attempting, but I believe your goals are within reach.

Ker Vin.ax1237Ker Vin.ax1237

Hi, thanks for the reply. I understand what you mean here.


Just to clarify my situation, the company uses a LOT of custom fields which aren't detected by the "Find duplicate Leads" process (i.e. mobile no 1, mobile no 2, etc). My job is to make it so that everytime a lead is created, it is matched against all other unclosed leads in the system to check whether it is a possible duplicate (And I give praises that it is only limited to Lead CREATION). 


Depending if you need UI-only duplicate detection (i.e. it's okay to have duplicates if they're coming from a website/import file), then you can create a Visualforce page to help reduce duplicates. If you need to enforce the business logic at the database level, a trigger would certainly help you. You might need to use Dynamic Apex Code (using a string to query the database instead of a native SOQL statement) in order to "bulkify" your trigger; this will help you get around the awkwardness of querying many more records than you actually need. You might also consider making some sort of composite key field (a concatenation of your various fields, or perhaps a hash signature) to help reduce complexity. You might make several hashes of a combination of fields so that you can identify which records are duplicates easily. For example, you might do this:


// creating a new lead, assign hash
for(lead l:trigger.new) {
  l.hash1__c = crypto.generatedigest('MD5',blob.valueof(l.field1__c+l.field2__c+l.field3__c);

// later, look for duplicates
map<string,lead> hashes = new map<string,lead>();
for(lead l:trigger.new) {
for(lead l:[select id,hash1__c from lead where hash1__c in :hashes.keyset()]) {
  if(hashes.containskey(l.hash1__c) && hashes.get(l.hash1__c).id!=l.id) {
    // The origin lead in trigger.new is a duplicate...

Of course, this might not work for you, but it's just one of several ways you can go about this. If you use this method, I'd suggest you use "tolowercase" on the fields you are hashing to prevent mis-matching hashes (the algorithm is case sensitive).