In one of my favourite presentations Why you should avoid Linked Lists we see a comparison of C++ vector
and list
performance in the following problem:
Generate \(N\) random integers and insert them into a sequence so that each is inserted in it’s proper position in the numerical order. 5 1 4 2 gives:
- 5
- 1 5
- 1 4 5
- 1 2 4 5
Remove elements one at a time by picking a random position in the sequence and removing the element there. Positions 1 2 0 0 gives:
- 1 2 4 5
- 1 4 5
- 1 4
- 4
For which \(N\) is better to use a linked list than a vector (or an array) to represent the sequence?
Let’s compare Java java.util.ArrayList
and java.util.LinkedList
in the same way and see which one will be faster.
Possible benchmark
To start we can consider following scenarios:
- Insert \(N\)
Integer
into a sizedArrayList
. - Insert \(N\)
Integer
into a sizedArrayList
while maintaining list order. - Insert \(N\)
Integer
into aLinkedList
. - Insert \(N\)
Integer
into aLinkedList
while maintaining list order.
Scenario 1 gives us the baseline for ArrayList
by measuring how expensive is the element insertion. By comparing results from scenario 1 and 2 we can then measure how expensive is maintaining the ArrayList
order. Similar idea applies to scenario 3 and 4, this time for LinkedList
.
Running the benchmark
We can implement the benchmark using JMH:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 10, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class ArrayListVsLinkedListBenchmark {
@State(Scope.Benchmark)
public static class BenchmarkState {
final int N = 10000; // our N
final Integer[] numbers = new Integer[N];
public BenchmarkState() {
var rand = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = rand.nextInt();
}
}
}
@Benchmark // Scenario 1
public List<Integer> arrayListSizedInsert(BenchmarkState state) {
var list = new ArrayList<Integer>(state.numbers.length);
for (Integer number : state.numbers) {
list.add(number);
}
return list;
}
@Benchmark // Scenario 2
public List<Integer> arrayListSizedInOrderInsert(BenchmarkState state) {
var list = new ArrayList<Integer>(state.numbers.length);
outer:
for (Integer number : state.numbers) {
for (int i = 0; i < list.size(); i++) {
if (number <= list.get(i)) {
list.add(i, number);
continue outer;
}
}
list.add(number);
}
return list;
}
@Benchmark // Scenario 3
public List<Integer> linkedListInsert(BenchmarkState state) {
var list = new LinkedList<Integer>();
for (Integer number : state.numbers) {
list.add(number);
}
return list;
}
@Benchmark // Scenario 4
public List<Integer> linkedListInOrderInsert(BenchmarkState state) {
var list = new LinkedList<Integer>();
outer:
for (Integer num : state.numbers) {
for (int i = 0; i < list.size(); i++) {
if (num <= list.get(i)) {
list.add(i, num);
continue outer;
}
}
list.add(num);
}
return list;
}
}
On Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz the results are:
Scenario | N=10 | N=25 | N=50 | N=100 | N=1000 |
---|---|---|---|---|---|
arrayListSizedInsert | 47.423 | 115.480 | 199.008 | 377.808 | 3,661.444 |
arrayListSizedInOrderInsert | 190.376 | 758.708 | 1,771.247 | 5,441.725 | 314,949.200 |
linkedListInsert | 71.766 | 169.755 | 337.902 | 668.914 | 6,410.276 |
linkedListInOrderInsert | 180.522 | 990.222 | 4,782.757 | 38,536.118 | 66,048,424.268 |
Right away we see that For both ArrayList
and LinkedList
the cost of finding insertion point significantly overtakes the cost of inserting an element. Only for \(N=10\) LinkedList
is faster than ArrayList
. With \(N\) increasing LinkedList
performance quickly deteriorates.
Improving the benchmark
Before we move further it has to be noted that existing LinkedList
scenarios perform the list traversal inefficiently. Instead of traversing the list once for each inserted element this happens every time we call get()
or add()
method, Let’s improve this by switching to ListIterator
.
To make the benchmark more interesting we will also avoid allocating the ArrayList
upfront and instead allow periodic re-sizing to take place. This will make the ArrayList
slower, but the behaviour will be closer to LinkedList
which doesn’t allocate the memory upfront.
Once again using JMH:
@Benchmark // Scenario 5
public List<Integer> arrayListUnsizedInOrderInsert(BenchmarkState state) {
var list = new ArrayList<Integer>();
outer:
for (Integer number : state.numbers) {
for (int i = 0; i < list.size(); i++) {
if (number <= list.get(i)) {
list.add(i, number);
continue outer;
}
}
list.add(number);
}
return list;
}
@Benchmark // Scenario 6
public List<Integer> linkedListIteratorInOrderInsert(BenchmarkState state) {
var list = new LinkedList<Integer>();
outer:
for (Integer num : state.numbers) {
var it = list.listIterator();
while (it.hasNext()) {
var el = it.next();
if (num <= el) {
it.set(num);
it.add(el);
continue outer;
}
}
it.add(num);
}
return list;
}
On the same Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz the results are:
Scenario | N=10 | N=25 | N=50 | N=100 | N=1000 |
---|---|---|---|---|---|
arrayListUnsizedInOrderInsert | 218.910 | 931.218 | 2,034.632 | 5,746.831 | 380,032.696 |
linkedListIteratorInOrderInsert | 175.161 | 698.068 | 2,242.376 | 9,627.562 | 1,249,834.336 |
Although new LinkedList
scenario is considerably more efficient around \(N=50\) ArrayList
becomes faster. This time LinkedList
performance doesn’t degrade as rapidly as before, however \(N=50\) is still a small \(N\).
Conclusion
LinkedList
insertion complexity is \(O(1)\) while ArrayList
is \(O(N)\). Because of that LinkedList
is often the first choice when dealing with rapid element insertions. However, this complexity doesn’t measure the cost of finding a random insertion point within list.
On a modern hardware the cost of list traversal will be driven primarily by cost of accessing RAM. Because LinkedList
is a linked data structure:
- It allocates more memory as it must store references between nodes.
- It can’t guarantee memory locality with nodes possibly allocated in non-consecutive memory which will interfere with cache prefetching.
So don’t discard ArrayList
right away when dealing with rapid insertions, it might actually be faster!