function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
ksherry49ksherry49 

Best Practice for doing nightly back-ups from Salesforce.com?

I'm planning on taking nightly back-ups from Salesforce.com into a database (SQL Server). I have a couple questions regarding the export process:

1.   If I only want the deltas should I use the LastModified Date as the selection criteria to determine recenlty created or updated records in the last 24 hours?

2.  Should I use Salesforce.com's server date/time value as a starting point for determining the above. If yes,  is it possible to do a select statement to get this value (outside of the SOAP getServerTimeStamp())?

3.  Do folks generally convert the UTC date/time stamp to their local time zone before inserting the records?

4.   If new fields are added (or maybe updated) should I consider dropping my back-up table and re-creating it to address this case?

I'm sure there are other points I need to consider so I'm hopefully folks can provide some additional input.
BTW, I'm planning on using Pervasive's Business Integrator to do the nightly exports.

Thanks in advance
Keith


qmanqman

Hi Keith,

If you don't want to write this yourself, you could use DBAmp (www.forceamp.com). A lot of people are using it to make nightly backups and the latest version has the delta capability you are looking for. All for under $700 bucks.

If you still want to write it, let me know and I'll try to help. Here's what I did for DBAmp:

1. I used SystemModstamp instead of LastModified. That gives you delta capablity for everything except the ..Share and ..History table (i.e. CaseShare).

2. What I did was get the Max(SystemModstamp) time from the local table and subtract 10 mins. The 10 mins stuff is to allow for long Units of Work on the Salesforce side. Then you can do the query like Select... where SystemModstamp > your_calculated_value.

3. Since SQL Server doesn't really understand timezones, I just store all dates as local based on the location of the SQL Server box.

4. If the schema changes, I just drop and recreate.  Yeah, it's expensive but I didn't want to do the ALTERs on the SQL side.

A couple of other points:

5. I used the bulk load interface to SQL Server. It really makes a difference when initially moving 50.000 records!

6. Fudge the column schema a little when creating the local table. Even if the metadata says that it is a Numeric(5), it could have been a Numeric(10) 2 weeks ago and there may still be values for the column larger than Numeric(5).

7. You still have to use the GetDeleted to pick up the deletes. For tables that don't support GetDeleted, I used a query that checks to see if the ID's of the local table still exist up on Salesforce (kinda of a Select ... where Not Exists )

Bill

 

Message Edited by qman on 04-25-2006 04:45 PM

darozdaroz

qman wrote:

6. Fudge the column schema a little when creating the local table. Even if the metadata says that it is a Numeric(5), it could have been a Numeric(10) 2 weeks ago and there may still be values for the column larger than Numeric(5).


This alone is worth repeating... An excellent tip and easily overlooked.
ksherry49ksherry49
Bill,

Thanks for the input this is really helpful and will certainly save me from over or under designing the process.

So if I understand correctly I should be doing the following:

1. Establish a basline snapshot of all objects that I plan to back-up nightly (Accounts, Cases, Opportunities, etc)
2. Make sure I set the SystemModStamp for each record that's been archived into the local data repository (MS SQL Server)
3. In future replication process use the SystemModStamp value that was set from the previous baseline process
  • Be sure to take the Max(SystemModStamp) and substract 10 minutes to account for record processing efforts.
  • This new calculation LastRunModStamp will be need to set per object being replicated.
4. When querying the SF.com use the LastRunModStamp to establish the date/time range
  • Select < some fields> from Accounts where SystemModStamp >= LastRunModStamp
5. If the object structure changes you have two options
  • Alter the schema and rerun the replication process to re-sync the deltas
  • Drop the schema and rerun the replication process (least expensive time wise)
6. Make sure to check for deleted records by periodically using the ID to see if it exist within Salesforce.com. If it doesn't purge from local data repository or better yet keep an audit trail

  • Select Id from Accounts where Id  == 0013000000AjLp6AAF

Thanks
Keith
qmanqman

Hi Keith,

A couple of corrections:

1. Be sure to use SQL Server Bulk loading. For tables with > 100K rows, it makes a dramatic difference in load time (hours vs minutes).

2. Don't set the SystemModStamp, just make sure it's one of the columns you replicate down.  It will already have values in it.

3. Since you're backing up 10 mins, be sure your delta logic allows for seeing the same record twice.

6. If DBAmp can use GetDeleted on the table, we do because it is much faster. If we cannot use GetDeleted (either because the object is not marked 'Replicable' by the salesforce api or the last date was older than 30 days),  then DBAmp does an SQL statement:

select ID from Account where ID not in (Select Id from SALESFORCE...Account)

where SALESFORCE...Account is the Account table on Salesforce (DBAmp allows you to access real time data by using the prefix SALESFORCE...).

Bill

 

ksherry49ksherry49
Hey Bill,

Thanks for the corrections.  I'm probably going to poll every 24 hours  but increase the frequency as salesforce usage grows - we have a fairly small user group at this point.


I'm really impressed with the salesforce.com user community.  I hope as my experience grows I'll be able to provide similiar support to others.

Thanks
Keith
rajarakrajarak

I am just wondering if anyone have any exposure in writing the nightly export for the delta using the process-conf.xml via dataloader CLI?

It's possible to export data that are modified/created after a specific date.

But, I was trying to make the same work for data modified/created every day instead of specific date.

 

Please post snapshot of the process-conf.xml if anyone had accomplished the same.