You need to sign in to do that
Don't have an account?
Brian.ax17
getUpdated and very large numbers of changes
Is there a limit to the number of records that the getUpdated method can return?
If this limit is 2000, say, and a change occurs, such as changing a picklist value of a field, which affects more than this number is a very short time span, how should this be handled?
There doesn't appear to be a similar method to queryMore, so looping to retrieve all changes or setting a smaller batch size is not an option here.
My thanks for any advice.
Brian
I'm interested in the answer to this as well.
It's well documented that query will return a maximum of 2000 objects and queryMore is used to get batches of additional objects.
But the documentation is silent on whether getUpdated returns 2000 IDs maximum, or if it is possible to get an array of > 2000 objects (which you would then pass to retrieve in batches of 2000). Has anyone done an experiment?
If getUpdated is limited to returning 2000 IDs, what's the recommended method for getting all changes? Brian's picklist-change example is a good one...I might also expect a lot of changes when importing a batch of new leads.
Does getDeleted behave in the same was as getUpdated?
I have received the following response:
The getUpdated call contains certain limitations. I'm in the middle of running several tests on this but here are my thoughts on the matter so far...
Brian's response is accurate, but I would make one modification to the recommended field to use for the date. Rather than use lastModifiedDate use the SystemModstamp. LastModifiedDate is updated when a user makes a change directly to the record. Records can be changed idirectly by workflow or other process in salesforce.com and the SystemModstamp will reflect both direct and indirect changes.
Cheers and thanks Brian.
SELECT FROM WHERE SystemModstamp > AND SystemModstamp
If you are iterating through a set of sForce objects and running this query on each object, the query will cause an exception for the objects without SystemModstamp. Therefore, you would have to determine which objects don't have the SystemModstamp field and execute a different query for them. It is an unfortunate complexity.
- Eli
- Eli
Hi SeattleEli,
Scanning back in my memory, I believe the issue referenced as problems in the previous posts really reflect the inability to apply an appropriate model to handling very large change sets. Certainly a number as large as 20000 changes will take a fair amount of time to retrieve making the actual timestamps used as start and end insignificant when processing the results. By the time you got to the last one, an hour may have elapsed. In fact, the change that caused the record to be included in getUpdated occurred between the start and end times, and the record you are processing may actually have been changed more than once by the time you finally have retrieved it making the retrieval time the actual "time of current state" to coin a term.
So, then where do you stand with respect to the next pass for getUpdated? To try an example suppose we take all the records that changed in the last hour (start - 1 hour). Suppose this is a large enough set that by the time we are done processing them 30 minutes has elapsed (start + 30 min). Also assume we are polling once an hour. This would make our next time span now + 1 hour. If the last record we processed was changed at start + 20, we will have already handled the oringinal change and the start + 20 min change since we ended up retrieving it 30 minutes after it's initial change was reported. The goal is to not have to process that record twice (remember the second time it showed up in getUpdated, all changes from start - 1 hour to start + 30 min have already been accounted for).
So, the problems are in how to efficiently manage these large change sets, not that the system has any problems reporting them.
Cheers
I am *assuming* that the result set created by query() (which I would iterate through with repeated queryMore calls when the result set is large) is a snapshot in time of the records that met the where clause criteria when query() was executed. Is that assumption correct?
To elaborate. I am presuming that the query approach works as follows:
Assume I call query() at time T and specify the where clause to retrieve records that changed between T-1 hour and T. If I call queryMore at T+10 minutes, and a record in that QueryResult changed at both T-30 minutes and T+5 minutes, the record I read from the QueryResult will be the record that changed at T-30 minutes, not the version of that record that changed at T+5 minutes because the QueryResult will draw its contents from the result set selected by the original query() call.
I haven't seen that documented, but I'm guessing that's the way it works. Is that correct?
Thanks for your time.
- Eli
It states "Upon invocation, the Sforce Web service executes the query against the specified object, caches the results of the query on the Sforce Web service, and returns a query response object to the client application." I read "caches the results" to mean that the entire result set is cached - implying a snapshot in time. Knowing the truth, I can see that it doesn't *exactly* say that, but it sure is easy to read it that way.
Thanks again,
Eli