function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
COZYROCCOZYROC 

New Bulk API from Winter '10 is not streaming

Hello,

 

I'm reviewing the new Bulk API being prepared for Winter '10 release and I see one major limitation. It expects the number of batches to be sent in advance. This implies you must have all data you want to load in advance, so you can determine how many batches you will need to upload the data. This requirement makes the Bulk API not usable when you want to stream data, without knowing in advance how much total data you have.

 

My question is to the developers of the Bulk API. Is there time left to extend the API to allow streaming of data? Is it possible to remove this requirement to submit the number of batches in advance?

werewolfwerewolf
How much data could you possibly be streaming that you'd need the Bulk API for it?  Would the standard API not be sufficient for this streamed data?
COZYROCCOZYROC
The streamed data could be possibly thousand of records, even 10's and 100's of thousands. The way the new Bulk API is proposed makes it not so versatile.
werewolfwerewolf
But in what time frame?  The standard API can handle thousands of records a minute, I think.
COZYROCCOZYROC
Can it possibly handle 100's of thousand and millions? I don't think so. I don't think even thousands is going to work very well because of the overhead and granularity of the existing API, which is one record at a time.
werewolfwerewolf
That is not correct.  In fact, you can insert and update records in bulk using the standard API, inserting or updating as many as 200 rows in a single call.  Have a look at the docs for the create() and update() calls here.
COZYROCCOZYROC

You are correct. I guess this may actually work.

 

Still the question with the versatility of the New Bulk API is still around.

jesperfjjesperfj

Hi,

 

What made you think it needs to know the number of batches in advance? It doesn't. You should definitely be able to implement a fully streaming client that doesn't know the number of records in advance. 

COZYROCCOZYROC

This is great news! I guess I misunderstood the documentation. I will give it a try.

 

Thanks,

Ivan

 

jesperfjjesperfj
Please let us know what you misunderstood, so we can fix the docs and make it more clear. Thanks for taking a look at this.
COZYROCCOZYROC

Jesper,

 

This is the paragraph that confused me (api_bulk.pdf):

 

What You Can Do with the Bulk API
The REST Bulk API lets you insert, update, or upsert a large number of records asynchronously. That is, you first send a number
of batches to the server using an HTTP POST call and then the server processes the batches in the background.While batches
are being processed, you can track progress by checking the status of the job using an HTTP GET call. All operations use
HTTP GET or POST methods to send and receive XML or CSV data.

 

----

 

The text "That is, you first send a number of batches to the server" is not very clear.