+ Start a Discussion
garybgaryb 

Dataloader batch size and triggers

All,

I've noticed that when setting Dataloader's batch size option to 200, triggers are operating in batches of 100. This is not a problem in itself, but I've also noticed that static variables are not cleared between the 100 batch runs. I've included a short example below, but basically all I'd like to know is:

Why are the Dataloader batches broken up, and when does this happen?
Should the static variables be cleared automatically?

As I say, here's a quick example; it's a trigger on Contact and a class with a static variable:

Code:
public class quickClass {
    public static boolean lock = false;
}
 

trigger quick on Contact (before insert) { 
    System.debug('lock: ' + quickClass.lock);
    quickClass.lock = true;
    System.debug('Trigger.new.size(): ' + Trigger.new.size());
 
}

When we insert, say, 150 Contacts, using Dataloader with a batch size of 200 in the log file we will see:

Code:
lock: false
Trigger.new.size(): 100
lock: true
Trigger.new.size(): 50

It doesn't look like it's Dataloader fault - it does look like it's sending 150 Contacts, rather than 100 and then 50.

Like I say, I don't mind the process being broken up, but obviously I'd also like the static variables to be cleared between the processing of the first 100 records finishing, and the processing of the next 50 records starting, as we are using static variables to control execution flow. If these variables are keeping their value, it causes a problem for us!

Hope that's enough detail, thanks in advance for any help...
Gary


philbophilbo
Hey,

Interesting you should bring this up - I just encountered the same issue last Friday.  It's not just the static variables that aren't getting reinitialized - neither do the governor limits!

How it "bit" us was - we have a trigger on a custom object that kicks off some pretty heavy-duty processing - spidering out to various other objects, etc., and winds up issuing 11 SOQL queries all told.  It's bulk-safe so it's 11 queries pretty much regardless of batch size.  <Extra points if you can see what's coming!>  So when we load records into that object via Dataloader with a batch size anything over 100, the trigger gets called twice, once with 100 records and once with the remainder - in the SAME CONTEXT - so those 11 queries turn into 22 queries, which blows SFDC's 20-query-per-Trigger limit.

We got around the problem simply by setting the Dataloader batch size to 100.  However, this seems like peculiar, probably unintentional, behaviour.

Thoughts anyone?
helpForcehelpForce

Hi,

Same situation as Garyb above. Just to add to Gary's comments, I first encountered this issue in September 08, before the Winter release came out. Have had to set batch sizes on Data Loader to 100 also to work around the issue. Happens in Developer edition and in my production sandboxes. Never tested in production environment as the data will get dirty, I assume same thing happens, philb was your code running in a production environment?

Cheers,

philbophilbo
My env't was a Sandbox.  Dollars to doughnuts it's not related to environment though.
RickyGRickyG
Ummm, doughnuts.  Yum!
helpForcehelpForce

I just got around to logging this as case (ref: 02292799). I referenced this thread too so please add to it with more details as and when you find any. I will post any updates that come from the logged case if any.

I asked Garyb to take out the reference to the static variables just break the issue down further, he did and still reported the issue as the same even with no static variable references.

Cheers

garybgaryb
We received a response about this, confirming that this was in fact intended behaviour. The recommendation was to reduce the batch size or code with this functionality in mind. It was noted this was not clearly documented and this would be recitified.
GuyClairboisGuyClairbois

Hi all,

did anybody find any other solutions or workarounds for this issue? The same thing happens when e.g. loading data from Excel. But when loading from Excel, I can't properly control the batch size because any end user will be able to change it back to 200. Also for other API integrations, it is difficult to coordinate (I can't force smaller batches from within salesforce).

 

So what I'm looking for is a way to code around this, i.e. a way to reset the static variables in between the 2 batches of 100 records. Or any other alternative. Can my apex code e.g. recognize whether it is the 1st or 2nd time it runs within the batch?

 

I also still wonder what the intention of this behavior is, if this is intended behaviour...

 

Thanks,

Guy

garybgaryb

Wow, a blast from the past! :)

 

TBH, I'd forgotten about this and I can't remember what we did - I think it was to reduce the batch size.

 

I guess you could have something like:

 

 

trigger {
  if(someStaticVariable == true)
    clearStaticVariables();

   someStaticVariable = true;
}

 

That way, you can manually clear the static variables... Any help?

 

GuyClairboisGuyClairbois

Thanks for the reply, Gary. Unfortunately this will not help. Let me explain why (and hopefully trigger your creativity :-))

 

My case is the following: I have a trigger on a custom object. There are also numerous Workflow rules which also do field updates on the same record, based on some other fields on the record.

 

Since we need to do large uploads of this object, I want to prevent the triggers from running twice for the same record, in order to improve upload performance. Therefore we introduced the static variables, which keep track of whether a certain trigger has run or not in the current context.

 

Now If I load a batch of 200 record, the following happens:

 

1. Trigger is started for first set of 100 records

2. Trigger checks status of static variable hasrun (which is false on default) If false, the run trigger content

3. Trigger content is run

4. Static variable is set to hasrun=true;

 

5. Trigger is started for second set of 100 records

6. Trigger checks status of static variable hasrun (which is false on default) It is now true (has not been reset), so trigger content is not run

 

7 and further - for any reruns of triggers because of field updates, the trigger content is not run, since the static variable remains hasrun=true

 

This is obviously not what we intended. The trigger content should run after step 6 and should not run in steps 7 and further.

 

Any ideas, anyone? Maybe salesforce can comment on the intention of this behavior?

Many thanks,

Guy

garybgaryb

Wow, this is a familiar problem, triggers and workflow!

 

I'd love to hear an elegant solution to this... Do you always have workflow that will cause the trigger to run again? If so, at least that's predictable. However, Workflow fire when certain conditions are met, what if those conditions aren't met? So you have to deal with workflow causing triggers to fire as well as the times they won't cause triggers to fire.

 

I'd say it's worth raising this with support - I've been given "That's standard behaviour" as a response before but then gone on to prove that's NOT how it should work, and support have accepted it. I can't see why this behaviour would be desirable.

 

In the meantime... How about storing the IDs of processed records in a static Set? That way you could have something like:

 

if(hasRun && !myStaticSet.contains(Trigger.new[0]))

   clearVars();

 

That's my initial thought, not thought it through enough to think of why it wouldn't work. Also, if you're using before triggers, there won't be any IDs to store in the set so you may have to use something else instead (something unique and accesible in a before trigger).

 

Marco__oMarco__o

My understanding is the Maximum trigger size is 200.

If you use DataLoader and push 300 records that will generate 2 trigger context and the 1st trigger context will process 200 records in the same context.

It will in deed split that process into 2 batch of 100 records (but I guess this is Salesforce way) and you shouldn't consider this as an unexpected behavior.

I will recommend you to write code (or test class) that is compliant to 200 records at a time.

garybgaryb

I've not tested this myself recently so don't know if this is still happening, but I'm yet to hear a good reason for the variables not being cleared between the two sub-batches of 100. Or, for that matter, why batches of 200 are allowed - why not reduce the limit to 100?

 

OK, fine, my batch of 200 will be split into two groups of 100. However, not clearing the variables between those groups just introduces additional, unnecessary complexity into the trigger code that is written. Like I say, if I could have a good reason, I'd let it go :) As it is, I can't even find it documented anywhere - anyone have a link to this being explained in the documentation?

ChrisparxChrisparx

 


garyb wrote:

I've not tested this myself recently so don't know if this is still happening, but I'm yet to hear a good reason for the variables not being cleared between the two sub-batches of 100. Or, for that matter, why batches of 200 are allowed - why not reduce the limit to 100?

 

OK, fine, my batch of 200 will be split into two groups of 100. However, not clearing the variables between those groups just introduces additional, unnecessary complexity into the trigger code that is written. Like I say, if I could have a good reason, I'd let it go :) As it is, I can't even find it documented anywhere - anyone have a link to this being explained in the documentation?


 

it is still right now not "fixed" and this batch is still divided into 2 chunks of 100 rows. it's really annoying because I did lots of implementation in different trigger and classes and realized unfortunately a few weeks later that some data were not updated. I have to use these static variable in order to control useless update and I guess that the only way is to use your trick at the top of this page, even if it is not really nice... A salesforce fix would be greatly appreciated, at least that salesforce provides an easy way to reset these static variable...

Arnt mongoDBArnt mongoDB
resurrecting an old thread but still a problem, at least for me. The behavior is documented here and I don't like it, I think it is bad design:
https://help.salesforce.com/apex/HTViewSolution?id=000003793&language=en_US
Someone start an idea about it?
Static variables are reset between API batches (which are typically size of 200), but they are not reset between the smaller Apex chunks (size is 100), which means the second invocation of the trigger doesn't get the "fresh" static variables. That makes any design where you want a trigger to run only once per transaction (instead of 2 or 3 times) impossible.

I thought about trying garyb's suggestion to store IDs in a static set, but that won't work for before insert triggers