String.substring() creates copies = heap impact

Writing a class right now that handles inbound mail with possibly-somewhat-large text attachments. The intent is to parse the attachment and do stuff with the contents.

Because Apex strings are similar to Java strings - in particular, they are immutable - I assumed/hoped that substring() would behave like Java substring, returning a lightweight object that just has offset pointers into the original string data. No copying necessary.

However, that doesn't seem to be the case. Tests with a 5,000 character string that has five 4,0000 character substrings taken out show that the heap grows by 4K with each substring operation:

string s = '...'; // build your own 5K string here

System.debug('pre: ' + Limits.getHeapSize());

string s1 = s.substring(0,4000);
string s2 = s.substring(0,4000);
string s3 = s.substring(0,4000);
string s4 = s.substring(0,4000);
string s5 = s.substring(0,4000);

System.debug('post: ' + Limits.getHeapSize());

Output:

13:46:51.317|USER_DEBUG|[58]|DEBUG| pre: 5100
13:46:51.318|USER_DEBUG|[66]|DEBUG| post: 25100

This isn't a bug, per se, and I understand that Apex is an interpreted language, but this seems like it could be a straightforward performance win to leverage the underlying Java string mechanics.

November 10, 2010
·
Like
0
·
Follow
0

Best Answer chosen by Admin (Salesforce Developers)

Greg Fee
Hi jhart,

Your description of the behavior is correct and to the degree that it is the behavior we intended it is not a 'bug'. You do have a point about using shared buffers.

On the other hand, there are some awkward patterns when using a shared buffer for substring. For instance, the buffer is kept alive as long as any substring that references it is kept alive. This means that if you take a substring against a very large string, you will have that large buffer in memory even after you drop the reference to the original string. The solution to that in Java is to identify those situations manually and explicitly use a constructor that copies bits.

So we are left with the question of which way better serves Apex customers. My default claim would be that the current semantic in Apex is easier to understand and therefore is the 'right' choice for Apex. Clearly that is a choice for the community to make though. :)

Greg Fee

November 10, 2010
·
Like
0
·
Dislike
0

This was selected as the best answer

jhart
Hi Greg,

Thanks for the prompt & considered reply!

The reason this came up is that we're coding up an integration piece that exports data from a separate system into Salesforce. The cleanest way to do this is to push the raw data to Salesforce and run Apex code in your environment to tease apart the raw data & do the right thing with it. (this is much better than keeping the logic in client code with a whole bunch of queries & updates etc. over the SOAP API).

There are basically two ways of invoking Apex on the raw data: Apex WSDL methods or Email Services.

Apex WSDL methods don't support multidimensional array parameters, so there's no good way to push a table of data at them. So, in either case, we're looking at pushing data as a big string and then parsing it in Apex.

The difficulty of parsing non-XML data (eg, JSON or column-formatted SQL output or whatever) is one of Apex Code's weak spots. This would be one step in the right direction, at least =)

thanks,

john

November 10, 2010
·
Like
0
·
Dislike
0

Greg Fee
Hi John,

I pinged Taggart and he says he talked to you. We have a variety of work on our roadmap that we believe will substantially improve this area. Your feedback and others on priorities will help us in getting the most needed features out as quickly as possible.

Greg

November 10, 2010
·
Like
0
·
Dislike
0

jhart
Hi Greg,

Yep, I'm all good w/r/t what I need to get the current job done.

In a related note, I noticed during some string wrangling that Apex "substring" throws a "Starting position out of bounds" error if given an argument equal to the string length.

string s = 'abc'; System.debug('>' + s.substring(2) + '<'); System.debug('>' + s.substring(3) + '<'); Outputs: >c< System.StringException: Starting position out of bounds: 3

Logically speaking, i believe this case should return an empty string rather than throw an exception. I can call "substring(0,0)" to get an empty string; so I think I should be able to do the same at the end of a string.

Just to check my sanity I verified that Java is OK with this:

public class tmp { public static void main(String[] args) { String s = "abc"; System.out.println(s); System.out.println(">" + s.substring(1) + "<"); System.out.println(">" + s.substring(2) + "<"); System.out.println(">" + s.substring(3) + "<"); } } Output: abc >bc< >c< ><

Only when I pass in (4) do I get a StringIndexOutOfBounds exception.

Again, this is a minor nit, just something that took me by surprise.

November 11, 2010
·
Like
0
·
Dislike
0

Greg Fee
Good find. I filed a bug.

November 11, 2010
·
Like
0
·
Dislike
0

NK123
Hi Everyone,

I am facing this issue with substring and I don't see it covered in the above discussions in this thread.
I have phone string and I want to parse it and use substring. Here is the sample:

        String strphone = '+19876543210';
        String newPhone = '';

        newPhone += '(';
        System.debug('Phone: ' + newPhone);

        newPhone += strphone.substring(0,5) ;
        System.debug('Phone: ' + newPhone);
        System.debug('oPhone: ' + strPhone);

        newPhone += ') ' ;
        System.debug('Phone: ' + newPhone);
        System.debug('oPhone: ' + strPhone);

        newPhone += strphone.substring(5,3) ;
        System.debug('Phone: ' + newPhone);
        System.debug('oPhone: ' + strPhone);

        newPhone += '-' ;
        System.debug('Phone: ' + newPhone);
        System.debug('oPhone: ' + strPhone);

        newPhone += strphone.substring(8,3);
        System.debug('Phone: ' + newPhone);
        System.debug('oPhone: ' + strPhone);

I am getting the error (System.StringException: Ending position out of bounds: 3) on the line "newPhone += strphone.substring(5,3) ;"
I am thinking its just the morning to blame and I am missing something here and unable to notice.
Anyways, anyone have any idea?

~NK

March 31, 2011
·
Like
0
·
Dislike
0

jhart
Hi NK,

Your substring parameters are reversed - the 2nd argument has to be >= the first. So your first call has it right, but your later two calls do not - ie, substring(5,3) is wrong, but substring(3,5) is OK.

From the docs:

Returns a new String that begins with the character at the specified startIndex and extends to the character at endIndex - 1. For example:
'hamburger'.substring(4, 8); // Returns "urge"
'smiles'.substring(1, 5); // Returns "mile"

You might want to look at the regular expression classes to help you with your task; any time you're calling "substring" more than once it's a sign that maybe a regular expression should be considered instead.

March 31, 2011
·
Like
0
·
Dislike
0

NK123
That's what happens when you switch between languages sometimes. Thanks for the morning wake-up call.
~NK

March 31, 2011
·
Like
0
·
Dislike
0

You need to sign in to do that.

Need an account? Sign Up

Have an account? Sign In

Dismiss

Browse by Topic

Welcome to Support!

Show

sorted by

String.substring() creates copies = heap impact

All Answers

You need to sign in to do that.