With the arrival of Java 7 update 6 a while ago, there was one minor change that wanted to be the salvation of a few
memory leaks. This change has to do with
String class has suffered from an undocumented bug-feature that has given headaches to a lot of people al lot of
time. Turns out that this class can be the cause of a kind of memory leak that is specially transparent to the
In their attempt to fix it, however, they have forced different types of equally severe bug-features to the programmer, equally transparent or maybe more, as this new behaviour really makes less sense.
Let me explain myself…
String class in Java is implemented by wrapping a
char object. For performance reasons, it includes two fields,
length, that identify the first character of the array that is part of the string and its size.
Those performace reasons have to do with the
String.substring method. This method allows you to take an already
String object and create a new one using a smaller character segment of the original. Because strings in
Java are immutable, this can be done in constant time just by adjusting those
length fields but sharing
the same character buffer.
But speed is not the only thing getting better by this. By sharing the same buffer we are also reducing the potential memory usage of the system. That’s ok, except this introduces a new problem, a silent memory leak that is quite difficult to spot if one does not pay attention to the whole process.
Suppose you have a big string. A 2 MiB string. You perform a regular expression search over it, find your piece, get it
substring call, store it, and then let the original 2 MiB string be GC’d. You would expect your final memory
usage to be the few bytes your final string uses (plus object overhead), while reality kicks and you still have 2 MiB
of used heap space.
The reason is simple. Remember when I told you that strings can share the character array? That’s what’s happening. As long as the original string is still used, you’re saving space. If not, you’re wasting space. You’re keeping a reference to a 2 MiB string just to represent a string of 10-100 bytes. And this is one of the most overlooked memory leaks in Java.
It is, however, very easy to avoid. The problem is that this is not well documented in the JavaDoc, so not everyone knows about it, but it can be fixed just like this:
String str = twoMegsString.substring(someIndex, 100); str = new String(str);
That’s it. Really. Creating a new string like this in Java will allocate a new character array of the exact size you need and expect.
This is something that has been known for a while. The memory leak has been explained a thousand times and people who
really know to code in Java know about it. Most people actually use it to their advantage. We all know we can perform
trim on everything we get from the user and only pay an object allocation and maybe a few lost bytes. If we know we
have a large string, we can use the above trick to trim down the internal array, while still profiting of the extra
performance of not re-building a large string just to remove 10 bytes of trailing spaces.
But then, reality kicks again…
String class after Java 7 update 6 has been rewritten. Two fields have been lost in the process:
length. This means that now a string always has its first caracter at position 0 and the same size as its
This means no more memory leaks. Nobody can ever again
substring a 2 MiB string down to a 10 byte micro-string and
still get a 2 MiB of unused memory filling up the heap. We are effectively protecting people from ever needing to learn
how things work, because things work exactly as expected.
Except this introduces new problems. New problems that, sadly, and unlike the previous one, have no solution.
The first of our problems is memory. Every time we use
trim) we copy the buffer. For small strings,
this really doesn’t matter. For the average case, even for long strings, this also doesn’t matter, as the original
string is rarely kept. It is in a few key circumstances where a lot of strings are created from a large one where this
may matter. An application that yesterday used a decent amount of memory now uses a lot more with no direct solution.
The second of our problems is computing time. Every time we use
trim, we copy the buffer. Copying a
buffer means an
O(n) run on the target size. Again, for short strings, this is not a problem. For a long one, it is.
We can no longer rely on
trim to be fast. Every time we call it, we risk copying a large string just to save a few
bytes we didn’t even want to save. Applications that called
substring in loops may now behave closer to
to their previous linear times. Existing applications may slow down just by running them with the new updates. And
again, this doesn’t have a direct solution.
These are not trivial problems. This change is introducing a lot of accidental bad behaviour to well-behaved applications only to eliminate an avoidable memory leak. Again: to remove an error that was avoidable.
I can’t stress this hard enough. The bad behaviour of the old
String was avoidable, everything bad introduced with
String is forced to the programmer. We have introduced bad behaviour just so bad programmers didn’t have
to learn to code.
Instead of telling people that they should care about memory in Java (even though they sell it as the language in which you forget about memory management), they break their old users that were using their platform the right way.
This is specially odd in the very people that introduced that thing they call type erasure just so new code could be run in old platforms, when it should be the other way around. And this time, is neither way.
Edit (Nov 23): Looks like the author of this changes has made a few comments on Reddit.
To summarize, he says the change does more good than harm because of how strings are usedin the average application. Although I mostly agree, I think that it’s a change that potentially breaks a lot of things in those applications that are not the usual one.
There is one thing I can say about all this: Is not documented anywhere. Neither the old or the new behaviour. So, in the end, anyone relying on it was relying on undocumented features, something that should not be done.