From: http://lwn.net/Articles/283793/
Getting high-performance, three-dimensional graphics working under Linux is quite a challenge even when the fundamental hardware programming information is available. One component of this problem is memory management: a graphics processor (GPU) is, essentially, a computer of its own with a distinct view of memory. Managing the GPU's memory - and its view of system RAM - must be done carefully if the resulting system is intended to work at all, much less with acceptable performance.
Not that long ago, it appeared that this problem had been solved with thetranslation table maps (TTM) subsystem. TTM remains outside of the mainline kernel, though, as do all drivers which use it. A recent queryabout what would be required to get TTM merged led to an interesting discussion where it turned out that, in fact, TTM may not be the future of graphics memory management after all.
A number of complaints about TTM have been raised. Its API is far larger than is needed for any free Linux driver; it has, in other words, a certain amount of code dedicated to the needs of binary-only drivers. The fencing mechanism (which manages concurrency between the host CPUs and the GPU) is seen as being complex, difficult to work with, and not always yielding the best performance. Heavy use of memory-mapped buffers can create performance problems of its own. The TTM API is an exercise in trying to provide for everything in all situations; as a result it is, according to some driver developers, hard to match to any specific hardware, hard to get started with, and still insufficiently flexible. And, importantly, there is a distinct shortage of working free drivers which use TTM. So Dave Airlie worries:
All of these worries would seem to be moot, since TTM is available and there is nothing else out there. Except, as it turns out, there issomething out there: it's called the Graphics Execution Manager, or GEM. The Intel-sponsored GEM project is all of one month old, as of this writing. The GEM developers had not really intended to announce their work quite yet, but the TTM discussion brought the issue to the fore.
Keith Packard's introduction to GEMincludes a document describing the API as it exists so far. There are a number of significant differences in how GEM does things. To begin with, GEM allocates graphical buffer objects using normal, anonymous, user-space memory. That means that these buffers can be forced out to swap when memory gets tight. There are clear advantages to this approach, and not just in memory flexibility: it also makes the implementation of suspend and resume easier by automatically providing backing store for all buffer objects.
The GEM API tries to do away with the mapping of buffers into user space. That mapping is expensive to do and brings all sorts of interesting issues with cache coherency between the CPU and GPU. So, instead, buffer objects are accessed with simple read() and write() calls. Or, at least, that's the way it would be if the GEM developers could attach a file descriptor to each buffer object. The kernel, however, does not make the management of that many file descriptors easy (yet), so the real API uses separate handles for buffer objects and a series of ioctl()calls.
That said, it is possible to map a buffer object into user space. But then the user-space driver must take explicit responsibility for the management of cache coherency. To that end there is a set of ioctl()calls for managing the "domain" of a buffer; the domain, essentially, describes which component of the system owns the buffer and is entitled to operate on it. Changing the domains (there are two, one for read access and one for writes) of a buffer will perform the necessary cache flushes. In a sense, this mechanism resembles the streaming DMA API, where the ownership of DMA buffers can be switched between the CPU and the peripheral controller. That is not entirely surprising, as a very similar problem is being solved.
This API also does away with the need for explicit fence operations. Instead, a CPU operation which requires access to a buffer will simply wait, if necessary, for the GPU to finish any outstanding operations involving that buffer.
Finally, the GEM API does not try to solve the entire problem; a number of important operations (such as the execution of a set of GPU commands) are left for the hardware-specific driver to implement. GEM is, thus, quite specific to the needs of Intel's driver at this time; it does not try for the same sort of generality that was a goal of TTM. As describedby Eric Anholt:
The advantage to this approach is that it makes it relatively easy to create something which works well with Intel drivers. And that may well be a good start; one working set of drivers is better than none. On the other hand, that means that a significant amount of work may be required to get GEM to the point where it can support drivers for other hardware. There seem to be two points of view on how that might be done: (1) add capabilities to GEM when needed by other drivers, or (2) have each driver use its own memory manager.
The first approach is, in many ways, more pleasing. But it implies that the GEM API could change significantly over time. And that, in turn, could delay the merging of the whole thing; the GEM API is exported to user space, and, as a result, must remain compatible as things change. So there may be resistance to a quick merge of an API which looks like it may yet have to evolve for some time.
The second approach, instead, is best describedby Dave Airlie:
One other remaining issue is performance. Keith Whitwell posted some benchmark results showing that the i915 driver performs significantly worse with either TTM or GEM than without. Keith Packard gets different results, though; his tests show that the GEM-based driver is significantly faster. Clearly there is a need for a set of consistent benchmarks; performance of graphics drivers is important, but performance cannot be optimized if it cannot be reliably measured.
The use of anonymous memory also raises some performance concerns: a first-person shooter game will not provide the same experience if its blood-and-gore textures must be continually paged in. Anonymous memory can also be high memory, and, thus, not necessarily accessible via a 32-bit pointer. Some GPU hardware cannot address high memory; that will likely force the use of bounce buffers within the kernel. In the end, GEM will have to prove that it can deliver good performance; GEM's developers are highly motivated to make their hardware look good, so there is a reasonable chance that things will work out on this front.
The conclusion to draw from all of this is that the GPU memory management problem cannot yet be considered solved. GEM might eventually become that solution, but it is a very new API which still needs a fair amount of work. There is likely to be a lot of work yet to be done in this area.
(Thanks to Timo Jyrinki for suggesting this topic.)