You need to sign in to do that
Don't have an account?

SOQL Query Optimization Question
I'm trying download, locally, all of our Tasks. We have over 3.5 million tasks. I'm using the SOAP API to download these (ALL ROWS). I'm having trouble with query timeouts so I'm going to download in batches. I've looked over the documentation and I'm trying my best to rely on the standard indexes.
I'm only relying on the Id field to help order and define my batches. Right now I'm tyring to find the largest batch size that will use the Index and not produce timeouts. So, imagine I've downloaded the first 99,999 Tasks. That limit was NOT chose arbitrarilly. There is SF documentation (http://www.salesforce.com/docs/en/cce/ldv_deployments/salesforce_large_data_volumes_bp.pdf) stating an index will typically be used if you limit your records to under 100,000 rows.
Then, I look at the last Task.Id downloaded and get all Tasks greater than that one (sort order is important here).
I'd then repeat this until I've downloaded all my tasks. I cant' understand why these subsequent queries cannot use the 99,999 limit. 24,999 is as large as I can go. If I enter 25,000, it will NOT use the Index and instead timeout after attempting a full table scan. I realize the differnece between the first and second query is the 'where' clause, but I still thought the index would be used if we were returning < 30% of the first million records. Is this 25,000 limit some undocumented characteristic of the query optimizer? The Query Optimizer (https://help.salesforce.com/apex/HTViewSolution?id=000199003&language=en_US) tool is really helpful here. I just can't understand where this 25,000 limit is coming from.
Any insight is appreciated.
Andrew
I'm only relying on the Id field to help order and define my batches. Right now I'm tyring to find the largest batch size that will use the Index and not produce timeouts. So, imagine I've downloaded the first 99,999 Tasks. That limit was NOT chose arbitrarilly. There is SF documentation (http://www.salesforce.com/docs/en/cce/ldv_deployments/salesforce_large_data_volumes_bp.pdf) stating an index will typically be used if you limit your records to under 100,000 rows.
Select Id from Task Order By Id ASC Limit 99999
Then, I look at the last Task.Id downloaded and get all Tasks greater than that one (sort order is important here).
Select Id from Task Where Id > '00T7000000cr8SzEAI' Order By Id ASC Limit 24999
I'd then repeat this until I've downloaded all my tasks. I cant' understand why these subsequent queries cannot use the 99,999 limit. 24,999 is as large as I can go. If I enter 25,000, it will NOT use the Index and instead timeout after attempting a full table scan. I realize the differnece between the first and second query is the 'where' clause, but I still thought the index would be used if we were returning < 30% of the first million records. Is this 25,000 limit some undocumented characteristic of the query optimizer? The Query Optimizer (https://help.salesforce.com/apex/HTViewSolution?id=000199003&language=en_US) tool is really helpful here. I just can't understand where this 25,000 limit is coming from.
Any insight is appreciated.
Andrew
For bulk api actions like this, you really should be using the asynchronous bulk api: https://www.salesforce.com/us/developer/docs/api_asynch/