NSArray enumeration performance examined
One day, I was thinking about NSArray enumeration (also called iteration): since Mac OS X 10.6 and iOS 4, there's the wonderful new world of blocks, and with it came enumerateObjectsUsingBlock:. I assumed it must be slower than fast enumeration (for (object in array) { ... }) due to overhead, but I didn't know for sure and thus decided to actually measure the performance.
Which kinds of enumeration do exist?
Basically, we have four kinds of enumeration available (see also Mike Ash's Friday Q&A 2010-04-09: Comparison of Objective-C Enumeration Techniques).
-
objectAtIndex: enumeration
Using a for loop which increases an integer and querying the object using [myArray objectAtIndex:index] is the most basic form of enumeration.
NSUInteger count = [myArray count];
for (NSUInteger index = 0; index < count ; index++) {
[self doSomethingWith:[myArray objectAtIndex:index]];
} -
NSEnumerator
This is a form of external iteration: [myArray objectEnumerator] returns an object. This object has a method nextObject that we can call in a loop until it returns nil
NSEnumerator *enumerator = [myArray objectEnumerator];
id object;
while (object = [enumerator nextObject]) {
[self doSomethingWith:object];
} -
NSFastEnumerator
The idea behind fast enumeration is to use fast C array access to optimize iteration. Not only is it supposed to be faster than traditional NSEnumerator, but Objective-C 2.0 also provides a very concise syntax.
id object;
for (object in myArray) {
[self doSomethingWith:object];
} -
Block enumeration
Available since the introduction of blocks, this allows to iterate an array with blocks. Its syntax isn't as nice as fast enumeration, but there is one very interesting feature: concurrent enumeration. If enumeration order is not important and the jobs can be done in parallel without locking, this can provide a considerable speedup on a multi-core system. More about that in the concurrent enumeration section.
[myArray enumerateObjectsUsingBlock:^(id object, NSUInteger index, BOOL *stop) {
[self doSomethingWith:object];
}];
[myArray enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
[self doSomethingWith:object];
}];
Linear enumeration
First, let's discuss linear enumeration: one item after the other.
Graphs
Conclusions
Somewhat surprisingly, NSEnumerator is even slower than using objectAtIndex:. This is true for both Mac OS X and iOS. I suspect this is due to the enumerator checking whether the array was modified with each iteration. Somewhat unsurprisingly, fast enumeration holds up to its name and is the fastest solution.
For small arrays, block enumeration is a bit slower than objectAtIndex:, but with more elements in the array the performance becomes almost as fast as fast enumeration.
The difference between fast enumeration and NSEnumeration is already quite big with a million entries: on the iPhone 4S, the former took about 0.037 seconds while the later took about 0.140 seconds. That's already a factor of 3.7.
An oddity
The very first time an NSArray is allocated in a program and the very first time an enumerator is requested with objectEnumerator it takes an unusual long time to complete. For example, to allocate an array with one element the median was 415 nanoseconds on my 2007 MacBook Pro 17". But the very first time an array was allocated it take more than 500,000 nanoseconds, sometimes even up to 1,000,000 nanoseconds! The same with querying the enumerator: instead of the median of 673 nanoseconds it took also took more than 500,000 nanoseconds.
I can only guess about the reason, but I suspect lazy loading is to blame. You probably won't notice this in a real application because by the time your code is running Cocoa or Cocoa Touch probably already has created an array.
Concurrent enumeration
Block enumeration offers the option to enumerate the objects concurrently, if possible. This means the work load is potentially spread over several CPU cores. Not every operation to be done while enumerating can be parallelized, so concurrent enumeration only makes sense if no locking is required within the block: either because each operation really is absolutely independent or because atomic operations can be used (like OSAtomicAdd32 and friends).
So, how well does it perform compared to the other enumeration methods?
The graphs
Conclusions
For small number of elements, the concurrent enumeration is by far the slowest method. This probably has to do with the additional work that needs to be done preparing the array for concurrent access and starting the threads (I don't know whether GCD or "traditional" threading is used and it doesn't matter; it's an implementation detail we shouldn't need to care about).
However, if the array is large enough all of a sudden concurrent enumeration becomes the fastest method, just as expected. On the iPhone 4S and a million entries, it took about 0.024 seconds to enumerate with concurrent enumeration, but 0.036 seconds with fast enumeration. By contrast, NSEnumeration took 0.139 seconds for the same array! That's already a pretty big difference, a factor of 5.7.
On my office 2011 iMac 24" with a Core i7 quad-core CPU, the million entries where enumerated concurrently in 0.0016 seconds. Fast enumeration of the same array took 0.0044 seconds and NSEnumeration 0.0093 seconds. That's a factor of 5.8 which is pretty close to the iPhone 4S results. I was expecting a bigger difference here, though. On my 2007 MacBook Pro with Core2 Duo dual-core CPU, the factor was "just" 3.7 here.
The threshold when concurrent enumeration became useful was somewhere between 10,000 and 50,000 elements in my tests. With less elements, just go with normal block iteration.
Allocation
I also wanted to know whether the performance is any different depending on how the array was created. I tested two different methods:
- Create a C array which references the object instances and create the array using initWithObjects:count:.
- Create a NSMutableArray and subsequently add objects using addObject:.
There's no difference when iterating, but there is a difference when allocating: the initWithObjects:count: method is faster. With a very large number of objects, this difference can become significant. Here's an example on how to create an array with NSNumbers:
id *entries;
NSArray *result;
entries = malloc(sizeof(id) * numEntries);
for (NSUInteger i = 0; i < numEntries; i++) {
entries[i] = [NSNumber numberWithUnsignedInt:i];
}
result = [NSArray arrayWithObjects:entries count:numEntries];
free(entries);
return result;
}
How did I measure?
You can download the test application to see how I've measured. Basically I'm measuring how long it takes to iterate an array without actually doing anything else, and repeat that 1000 times. For the graph, the median for each array size was taken. Compilation was done with optimization turned off (-O0). For iOS, testing was done with an iPhone 4S. For Mac OS X, I used my 2007 MacBook Pro 17" and my office 2011 iMac 24". The graphs for Mac OS X show the results of the iMac, but on the MacBook Pro the graphs looked similar, it just was slower.
What's an effective and great way to compare all the values of Any ideas how to do this nice and quick in |
|||||||||||||
|
If execution speed (not programming speed) is important, then an explicit loop is the fastest. I made the following tests with an array of 1000000 random numbers: Version 1: sort the array:
Version 2: Key-value coding, using "doubleValue":
Version 3: Key-value coding, using "self":
Version 4: Explicit loop:
Version 5: Block enumeration:
The test program creates an array of 1000000 random numbers and then applies all sorting techniques to the same array. The timings above are the output of one run, but I make about 20 runs with very similar results in each run. I also changed the order in which the 5 sorting methods are applied to exclude caching effects. Update: I have now created a (hopefully) better test program. The full source code is here: https://gist.github.com/anonymous/5356982. The average times for sorting an array of 1000000 random numbers are (in seconds, on an 3.1 GHz Core i5 iMac, release compile): Sorting 1.404 KVO1 1.087 KVO2 0.367 Fast enum 0.017 Block enum 0.021 Update 2: As one can see, fast enumeration is faster than block enumeration (which is also stated here: http://blog.bignerdranch.com/2337-incremental-arrayification/). EDIT: The following is completely wrong, because I forgot to initialize the object used as lock, as Hot Licks correctly noticed, so that no synchronization is done at all. And with This changes dramatically if you add the
The timing here is Concurrent enum 0.009 so it is about twice as fast as fast enumeration. The result is probably not representative because it depends on the number of threads available. But interesting anyway! Note that I have used the "easiest-to-use" synchronization method, which might not be the fastest. |
|||||||||||||||||||||||||||||||
|