Since String.length()
method returns an int
we could guess that the maximum length would be Integer.MAX_VALUE
characters. That’s not correct. Let’s forget about Unicode for now, and try to create the longest possible string by repeating a lowercase letter a.
String text = "a".repeat(Integer.MAX_VALUE);
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.base/java.lang.String.repeat(String.java:4428)
...
Let’s try to understand what happened and why the code complied but threw an error in runtime.
How is the String implemented
Since Java 9 and JEP 254: Compact Strings the String
class is internally storing the characters in a byte[]
array. The stack trace from OutOfMemoryError
points to String.java:4428
line, which in Java 17 source code is an array creation expression:
final byte[] single = new byte[count];
As per Java Language Specification, Java SE 17 Edition, Chapter 10
The variables contained in an array have no names; instead they are referenced by array access expressions that use non-negative integer index values.
The language specification doesn’t prohibit the Integer.MAX_VALUE
array index so the compiler doesn’t complain if we try to allocate new byte[Integer.MAX_VALUE]
. However, we will get the familiar OutOfMemoryError: Requested array size exceeds VM limit
error in runtime.
This is not a new behaviour. The ancient java.util.Hastable
class present since Java 1.0 mentions it.
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
How is the array creation implemented
The maximum array length limitation is coming from the JVM implementation. In OpenJDK 17 source code it will be arrayOop.hpp max_array_length(BasicType type)
method which performs the below calculation.
const size_t max_element_words_per_size_t =
align_down((SIZE_MAX/HeapWordSize - header_size(type)), MinObjAlignment);
const size_t max_elements_per_size_t =
HeapWordSize * max_element_words_per_size_t / type2aelembytes(type);
if ((size_t)max_jint < max_elements_per_size_t) {
// It should be ok to return max_jint here, but parts of the code
// (CollectedHeap, Klass::oop_oop_iterate(), and more) uses an int for
// passing around the size (in words) of an object. So, we need to avoid
// overflowing an int when we add the header. See CRs 4718400 and 7110613.
return align_down(max_jint - header_size(type), MinObjAlignment);
}
return (int32_t)max_elements_per_size_t;
After going through OpenJDK code it looks like MinObjAlignment
and other values here will depend on the CPU architecture. If so, there won’t be just a single answer.
On Linux x86_64 debugging with gdb
shows that the expression align_down(max_jint - header_size(type), MinObjAlignment)
is executed and the method returns 2147483645. This value is equal to Integer.MAX_VALUE - 2
. Let’s try to create the longest possible string again:
String text = "a".repeat(Integer.MAX_VALUE - 2);
This time the code doesn’t throw any errors, confirming that for Java 17 running on Linux x86_64 a string can have up to Integer.MAX_VALUE - 2
characters.