function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
mharikian1.3892229047844834E12mharikian1.3892229047844834E12 

Bulk API Retrieve Results Very Slow

I'm using the Bulk API to retrieve records from Salesforce and load them into Microsoft SQL Server for data reporting as well as linking data sets to other databases.

Some of our custom Salesforce objects contain over 1 million records, and after reading through the Bulk API documentation, it seemed to fit the bill. However, the actual file retrieval is taking far too long. Even for 80,000 records, the time span is into the 4+ hours to retrieve the file results from Salesforce after the batch requests have been processed. This is not making any sense to me since the Salesforce Bulk API is stated as being geared for high volumes of records vs utilizing the SOAP API, but it seems the SOAP API and Bulk API take the same amount of time and there is no time savings at all whether I use Bulk API or SOAP API.

I tried adding additional logic in my code to send multiple requests to Salesforce for the file retrieval in an asynchronous method, and with SOAP API i could usually send 20 queries to Salesforce at a time in a Parallel.For loop, and get the responses back. However, I tried that with the Bulk API, and it timed out.

I then reduced the total number of requests down to 5 at a time, and it still timed out.

I'm now at 2 requests at a time and it appears to be working, but have over 1460 file results to retrieve. 

What other options do I have available to me to make the file retrieval process quicker and more efficient? This isn't making any sense to me especially since Salesforce has claimed the Bulk API is the recommended way to go for large dataset retrieval and uploads.
 
mharikian1.3892229047844834E12mharikian1.3892229047844834E12
Some additional information on this: When setting up the batch sizes, I'm breaking them down into subsets of 1000, utilizing PK Chunking for larger sets of data.

The file size when downloaded is around 7 to 8 MB. Will increasing the PK Chunk Size help with reducing the number of batches but increasing the file size slightly? If the file size gets too large, the whole tool times out and then I'm stuck trying to re-run the tool again, or just tossing the original batches and re-creating them with smaller chunks.
aBOBinationaBOBination
Where you able to find a solution for this?  I am running into something similar with a python script.  I used PK Chunking on an object with 7 million records.  The job/batches finish in 7 minutes but the retrieval of the records takes forever.  
mharikian1.3892229047844834E12mharikian1.3892229047844834E12
Unfortunately for record retrieval, using the Bulk API did not work for me. Loading records into Salesforce works great since you can throw them up on the server and let Salesforce determine how best to handle them, but trying to retrieve them, no matter how small or large the file size was, timed out and did not complete successfully. For retrieval I had to revert back to utilzing SOAP API sending in 20 asynchronous query/queryMore methods at at time. I chose 20 due to tests I ran and found this to be the optimal number. You can try increasing the asynchronous requests to 25, but anything over 30 I found would time out, or Salesforce would send an error back to me. One thing I had to also do, since I was using the SOAP API, was to limit the number of fields per object, especially if the object (like Account, Contact, Opportunity) had lots of fields.  For the most part it works and retrieves the data I need to run more complex reports from a Microsoft SQL Server, but it does take a lot longer to complete.