+ Start a Discussion

SOQL Query Optimization Question

I'm trying download, locally, all of our Tasks.  We have over 3.5 million tasks.  I'm using the SOAP API to download these (ALL ROWS).  I'm having trouble with query timeouts so I'm going to download in batches.  I've looked over the documentation and I'm trying my best to rely on the standard indexes.

I'm only relying on the Id field to help order and define my batches.  Right now I'm tyring to find the largest batch size that will use the Index and not produce timeouts.  So, imagine I've downloaded the first 99,999 Tasks. That limit was NOT chose arbitrarilly.  There is SF documentation (http://www.salesforce.com/docs/en/cce/ldv_deployments/salesforce_large_data_volumes_bp.pdf) stating an index will typically be used if you limit your records to under 100,000 rows.

Select Id from Task Order By Id ASC Limit 99999

Then, I look at the last Task.Id downloaded and get all Tasks greater than that one (sort order is important here).

Select Id from Task Where Id > '00T7000000cr8SzEAI' Order By Id ASC Limit 24999

I'd then repeat this until I've downloaded all my tasks.  I cant' understand why these subsequent queries cannot use the 99,999 limit.  24,999 is as large as I can go.  If I enter 25,000, it will NOT use the Index and instead timeout after attempting a full table scan.  I realize the differnece between the first and second query is the 'where' clause, but I still thought the index would be used if we were returning < 30% of the first million records.  Is this 25,000 limit some undocumented characteristic of the query optimizer?  The Query Optimizer (https://help.salesforce.com/apex/HTViewSolution?id=000199003&language=en_US) tool is really helpful here.  I just can't understand where this 25,000 limit is coming from.

Any insight is appreciated.

kevin lamkevin lam
Can I ask why you need to download them in APEX? You can get it from the Data Export.
SVS SolutionsSVS Solutions
Interesting tactic, although I'm not sure it's actually optimal.

For bulk api actions like this, you really should be using the asynchronous bulk api: https://www.salesforce.com/us/developer/docs/api_asynch/

Thanks for the suggestion. My understanding was that the bulk api does nothing to help with querying large datasets. Its primarily for large uploads into SF, not out of it. If I were to use the bulk api I would still have to break the job into batches using this same technique. I suspect one bulk job with a single batch would most likely timeout. the bulk api has the one possible benefit of being able to run the batches in parallel without hitting the query locator limit 10), though I'm skeptical it can really get around that limit. Otherwise, I could do them in parallel myself. I ultimately want these in a SQL database. I suspect there is an undocumented soql optimization limit behind the scenes affecting my query. Andrew