TomCooksey
Painting
Graphics Dojo
OpenGL
Performance
Posted by TomCooksey
 in Painting, Graphics Dojo, OpenGL, Performance
 on Wednesday, January 06, 2010 @ 12:01

Introduction

Here’s the next instalment of the graphics performance blog series. We’ll begin by looking at some background about how OpenGL and QPainter work. We’ll then dive into how the two are married together in OpenGL 2 Paint Engine and finish off with some advice about how to get the best out of the engine. Enjoy!

Why OpenGL?

Before I dive into the OpenGL paint engine, I want to make sure we all understand the motivation for the OpenGL 2.0 paint engine. I’ve talked about this before in my article about hardware acceleration, but we still frequently get questions like “Why not implement a Direct2D paint engine?”.

Everyone knows OpenGL means fast graphics right? Well, this is actually a bit of a misconception. What makes graphics fast is a bit of hardware dedicated to computer graphics called a GPU (Graphics Processing Unit). OpenGL 2.x is a software library which often (but not always) uses a particular class of GPU to help satisfy drawing operations (Note: OpenGL 1.x used a different class of GPU). A modern programmable GPU (e.g. nVidia GTX 295) can usually be programmed via both OpenGL, Direct3D and OpenCL. The only difference then is that Direct3D is only available on the Windows platform and OpenCL is not universally supported.

So the reason we are investing our time and effort into OpenGL, rather than Direct3D or OpenCL, is that OpenGL 2.0 is sufficient to give us access to all the GPU features we currently want to use. It is also available on more platforms, especially if you limit yourself to the ES sub-set. We are also looking into restricting ourselves further to only use APIs in OpenGL 3.2 Core Profile.

This might change in the future if we see a new class of GPU, like ones designed for 2D vector graphics which can’t be abstracted by OpenGL 2.0 very well (enter OpenVG), or, if we want to start using GPU features which OpenGL (ES) 2.0 doesn’t give us access to. Having said that, OpenGL is very good at exposing new GPU features through extensions.

History

Qt has had an OpenGL paint engine since early Qt 4.0 days. This engine was designed for the fixed-function hardware available at the time. As time went on and manufacturers added newer bits of hardware to their GPUs, the OpenGL paint engine was adapted to use those features through OpenGL extensions. Over the last 4 years, lots of people have hacked on the engine and added support for things like ARB fragment programs and even adapted the engine to work on OpenGL ES 1.1. The engine is pretty stable and has lots of fall-backs (or original code-paths, depending on how you look at them) for old hardware missing GL extensions the engine can utilise. But, fundamentally, it is an OpenGL 1.x engine.

In early 2008, around the time of the Falcon project (the Falcon Project was an internal project started for Qt 4.5 which focused on painting performance and architecture), it became increasingly clear that Qt needed to support hardware acceleration using the OpenGL ES 2.0 API which was starting to appear on embedded System-On-Chips like the OMAP3. There were two options available: Extend the existing OpenGL paint engine further still, or develop a new paint engine from scratch. When looking at the existing engine, there was a major problem – although it supported fragment programs, it was heavily reliant on fixed-function vertex processing. A further consideration was that the Falcon project had just kicked off and the future of the QPaintEngine API was uncertain. Both of these factors resulted in a new paint engine being written from scratch for OpenGL ES 2.0. This new engine had a distinct advantage over the existing engine: everything I wanted to use from OpenGL was in the core OpenGL ES 2.0 API. This meant I didn’t need to add fallbacks in case of missing functionality, leading to much cleaner and leaner code.

Another point about OpenGL ES 2.0 is that it doesn’t have much in the way of fixed function features – forcing you to write shader programs. While annoying at the time, this is apparently the best way to do things even on desktop GPUs. This point is important because it quickly became apparent that although the engine was designed for GLES2, not only would it also work on desktop OpenGL 2.0, but it would use that API in a way better suited for modern programmable GPUs. So, in Qt 4.6, the new engine is used by default on both GLES2 and on desktop systems which support OpenGL 2.0.

What does OpenGL (ES) 2 provide?

As I’ve already mentioned, OpenGL ES 2.0 is a pretty lean and mean API which models programmable GPUs. The “programmable” bit is fundamental to the API. It means that you write small programs known as shaders, ask OpenGL to compile and then run them on the GPU to process the data you give it. There are two types of shaders: one type processes positions (vertices) and another type processes pixels (fragments), called the vertex shader and fragment shader, respectively. The idea is that you tell OpenGL you want to draw some triangles and the vertex shader is run to determine the position of each of those triangles. Then, the GPU turns each triangle into a bunch of pixels and the fragment shader is run to determine the colour of each of those pixels. The API provides various ways of passing data from the CPU to the GPU (from textures and lists of triangle positions to individual floats) and ways of passing data from the vertex shader to the fragment shader. That’s basically it. All the complexity lives in the shaders you give to the GPU to run.

What does QPainter require?

The rest of this blog assumes you are familiar with the QPainter API (if not, go check the QPainter docs) ). It might also be a good idea to read through Gunnar’s post about how the Raster engine works.

So, the QPainter API provides more than just triangles. It is therefore the GL paint engine’s job to turn the whole of the QPainter API into “just a bunch of triangles”. To understand its task a little better, you have to split QPainter up into chunks which map better to OpenGL. A great example of this is drawRect(). In QPainter terms, this is a single primitive, but in GL engine terms, it is actually two: A rectangle (the fill) and a (possibly quite complex) line round the outside (the stroke). The OpenGL paint engine tries to keep a fairly clean separation between the shape of something which is drawn and its fill. So, here’s the list of primitives (shapes) QPainter requires the engine to draw:

  • Simple primitives (Rectangles, convex polygons, ellipses, etc.)
  • Text
  • Pixmaps
  • Strokes
  • Complex vector paths (QPainterPath)

In addition to this, we have various fills which we can use on our primitives provided by QBrush:

  • Solid colour
  • Linear gradients
  • Radial gradients
  • Conical gradients
  • Bitmap patterns
  • Textures

Not only do we have different types of fill, but we also support a full 3×3 transformation matrix on the brushes. This allows you to draw a rectangle but use it as a kind-of stencil over (for example) a perspective transformed texture.

Finally, QPainter also requires the engine to implement clipping, different composition modes and support it’s state stack (QPainter::save() & QPainter::restore()).

Engine Operation

Primitive Rendering

  • Simple Primitives: To render convex primitives such as rounded rectangles, we just generate a GL triangle fan and render it using glDrawArrays
  • Text: For large text, we convert it to a complex path and render is as such. However, for smaller font sizes, we rasterize the individual font glyphs and upload them as a texture (8-bit texture for bitmap & anti-aliased glyphs and 24-bit RGB for sub-pixel anti-aliased glyphs). This glyph texture is used as a mask in the engine’s pixel pipeline (see below). So, in terms of primitives, text is actually rendered as a set of rectangles - one rectangle for each glyph. When rendering with sub-pixel anti-aliased glyphs, it is possible that the engine will need to do two passes (if the brush is not a solid colour). This is because the engine uses a clever trick and sets the brush’s colour as the glBlendColor and outputs the RGB mask in the fragment shader. It is then able to set a glBlendFunc which combines the two and gives per-sub-pixel blending. If you set a more complex brush, the engine has to do two passes - first apply the mask to the destination, then a second pass to apply the brush, with glBlendFunc set to give the correct result.
  • Pixmaps: A pixmap is actually just a rectangle.
  • Strokes: Strokes can be very complex - just take a look at the pathstoke demo! However, even the most complex dashed pattern with rounded joins and end caps can be turned into a GL triangle strip relatively easily. This is done by the QTriangulatingStroker.
  • Complex vector paths: This is where things get tricky. QPainterPaths can have lots of things which break the “turn lineTo, moveTo and curveTo into verticies and render as triangle fan” algorithm…

Rendering Using Stencil Technique

Take the following path as an example:

Convex Path (1)

Here we have a seemingly trivial path with only 4 points. To draw this with GL, you could just convert the path’s points to verticies and draw it as a triangle fan, which results in two triangles: Triangle 1: ABC and Triangle 2: ACD. The problem is that just looks like a solid triangle, not the path we wanted:

Convex Path (2)

So, to overcome this difficulty, we drop to a 2-pass rendering method which uses the stencil buffer as a temporary scratchpad. So first off, we clear the stencil buffer to all zeros (represented as white):

Stencil Buffer (Clear)

Next, we set the stencil operation to invert, which means instead of setting the stencil value to ‘1′ when a triangle touches a pixel, invert the existing value instead. So 0->1 & 1->0. First we render the first triangle (ABC). As all the pixels are currently 0, every pixel touched by the triangle turns to 1 (represented as black):

Stencil Buffer (Triangle 1)

Next, we draw the second triangle (ACD). Note: We are inverting the stencil’s value, so black pixels touched by the second triangle turn to white and white pixels turns to black:

Stencil Buffer (Triangle 2)

So now the stencil buffer contains the silhouette of our path. All we do now is draw a rectangle into the destination window, but with the stencil test enabled.

In addition to the stencil technique, we are also adding experimental support for triangulating QPainterPaths and caching the triangulation. While this is slower for paths which change often or are zoomed in & out, paths which are relatively static can be triangulated once and rendered multiple times without having to re-triangulate.

Filling Primitives

Now we know how all the different QPainter operations get turned into GL primitives, but we’re still missing how they get filled. As already mentioned, the colour of a pixel is determined by the fragment shader. We therefore have lots of different fragment shaders for different types of fill. However, we also need to support text rendering with arbitrary fills (QPainter lets you fill text with a perspective transformed radial gradient). In the future, we also want to support composition modes which OpenGL doesn’t provide. We’ve also found there are ways we can simplify the shaders for certain situations (and thus improve performance). The result is that Qt needs lots of different shaders. At last count, we’d need over 1000 different shaders to cover all situations. That’s a lot of GLSL to maintain and test, far more than the resources we have available. So instead we split the shaders into different interchangeable “stages”. This is achieved by having each stage in it’s own GLSL function. As an example, lets take regular, non sub-pixel anti-aliased text rendering with a transformed radial gradient. Note, this is just an example to demonstrate how the engine operates and you probably shouldn’t do it in performance critical situations.

We render gradients by pre-calculating a 1px high texture (like a 1D texture) on the CPU which we sample from in the fragment shader. However, we calculate the texture coordinates in the vertex shader and pass it to the fragment shader as a varying. This is because it’s a good idea to do as much work as possible in the vertex shader rather than the fragment shader as it is called so much less frequently.

As already mentioned, we render (non sub-pixel) anti-aliased text by using an 8-bit mask texture. We then multiply the fragment colour by a sample taken from this mask. So, if we’re on the edge of a glyph where the alpha value is <1, we adjust the alpha of the srcPixel by that amount (actually, we also adjust the RGB values too as we use pre-multiplied alpha pixel format internally).

If there was a non-standard composition mode, we’d then pass the masked pixel to another stage which would blend it with the background (although this isn’t implemented yet!).

So you can see in the fragment shader, there’s 3 different stages. The first stage (srcPixel) determines the brush colour of the fragment. The next stage (applyMask) modulates the pixel by a mask to achieve anti-aliased text rendering. The final stage (compose) then blends the pixel with the background. We also have a similar staging technique for the vertex shader. All this complexity is nicely abstracted by the QGLEngineShaderManager. The paint engine tells the shader manager what it wants to draw and the shader manager selects an appropriate selection of shaders. One final note on this: While desktop OpenGL 2 supports linking multiple fragment shaders in a single program, OpenGL ES 2.0 does not. This means that we actually use the different stages by appending them to a single string of GLSL we pass to GL. This also gives the GL implementation the best chance to inline the different stages (without which, performance would suck).

Texture Management

The OpenGL paint engine makes heavy use of gradients. For example, even though it’s perfectly possible to calculate colours for gradients in the fragment shader, we still use a texture as a look-up-table as it is so much faster. Repeatedly uploading textures every time we need them would ruin performance. So instead, we keep a per-context cache of what QPixmap/QImage is already present in texture memory. If two contexts are sharing then we also detect this and don’t duplicate the textures. This functionality is available publicly in QGLContext::bindTexture() too.

On Linux/X11 platforms which support it, Qt will use glX/EGL texture-from-pixmap extension. This means that if your QPixmap has a real X11 pixmap backend, we simply bind that X11 pixmap as a texture and avoid copying it. You will be using the X11 pixmap backend if the pixmap was created with QPixmap::fromX11Pixmap() or you’re using the “native” graphics system. Not only does this avoid overhead but it also allows you to write a composition manager or even a widget which shows previews of all your windows.

Antialiasing

The OpenGL paint engine uses OpenGL multisampling to provide anti-aliasing. Typically, this will be 4x/8x FSAA, meaning 4/8 levels of coverage, which is worse quality than the raster engine, which always uses 256 levels of coverage. However, as the DPI of modern displays increases, you can get away with lower-quality anti-aliasing.

Using multisampling also doesn’t affect text rendering as text is anti-aliased using masks rather than multisampling (for smaller font sizes). So text rendered with the OpenGL engine should look almost as good as text rendered with the raster engine (which also does gamma-correction). The only drawback of using multisampling is that some OpenGL implementations don’t support switching multisampling off. Indeed, the OpenGL ES 2.0 specification doesn’t even provide the API to switch it off. The consequence is that non-anti-aliased (a.k.a. aliased) rendering can be broken (Everything gets anti-aliased even when the QPainter::Antialiasing hint isn’t set). There’s little we can do about this. :-(

Clipping

QPainter supports setting an arbitrary clip, including complex QPainterPaths. Qt uses the GL stencil buffer (or more specifically the lower 7 bits of the stencil buffer) to store the clip. The clip is written to in the same way as we render any other primitive, even using the stencil technique for complex paths. However, instead of filling pixel colours into a colour buffer, we fill stencil values into the stencil buffer. The actual value we use depends on the current QPainter stack depth (how many times save() was called minus the number of time restore() was called). This means that if you restrict yourself to intersect clips (Qt::ClipOperation == Qt::IntersectClip), the engine only needs to write to the part of the stencil buffer which is being clipped to. What’s more, the engine doesn’t need to write to the stencil buffer at all when you call restore() - it just changes the value at which the stencil test passes.

In addition to using the stencil buffer for clipping, the OpenGL paint engine can also just use glScissor. This only allows a single, untransformed rectangle to be used as the clip, which can be quite restrictive. However, it is by far the fastest way to do clipping. So if performance is more important to you than utility, only ever use untransformed rectangular clips.

Recommendations

Interleaved Rendering

Unlike OpenGL, QPainter allows an arbitrary number of rendering contexts (QPainters) to be active in the same thread at the same time. For example, in your widget’s paint event, you can begin a painter on your widget and begin another painter on a QPixmap and interleave rendering to them:

void Widget::paintEvent(QPaintEvent*)
{
QPainter widgetPainter(this);
widgetPainter.fillRect(rect(), Qt::blue);
QPixmap pixmap(256, 256);
QPainter pixmapPainter(&amp;pixmap);
pixmapPainter.drawPath(myPath);
widgetPainter.drawPixmap(0, 0, &amp;pixmap);
}

While this works ok with the OpenGL graphics system, having to switch from doing something with one painter to doing something with a different painter can be very costly and should be avoided whenever possible.

Mixing QPainter and Native OpenGL

As shown in several examples, it is possible to mix your own OpenGL rendering code with QPainter rendering code. However, as OpenGL is a giant state machine, it is very easy for you to accidently clobber Qt’s GL state and vice-versa. To overcome this, we’ve added some new API to QPainter in Qt 4.6 - QPainter::beginNativePainting() and QPainter::endNativePainting(). To prevent artifacts, you must enclose your custom painting in beginNativePainting() and endNativePainting(). This is very important - even if you’re not seeing any problems now, you might find your code starts failing in a future Qt release in which the GL paint engine works slightly differently. Also, as beginNativePainting and endNativePainting sets lots of OpenGL state, it can be quite expensive and thus you should try to use it sparingly. Try to batch up all your custom OpenGL code in a single block.

QGLWidget vs OpenGL Graphics-System

Unlike the raster & OpenVG paint engine, you don’t have to use a specific graphics system to render widgets using the OpenGL paint engine. The QtOpenGL module provides several classes, including QGLWidget, which all use the OpenGL paint engine regardless of what graphics system is being used. QGLWidget is basically a regular widget which always has a native window ID and is always rendered to using OpenGL. You are free to choose whichever method you want to get OpenGL rendering (graphics system or QGLWidget). However, using the opengl graphics system can often be slower than using a QGLWidget, as Qt needs the contents of the “back buffer” (or QWindowSurface) to be preserved when flushing the render to the window system. OpenGL does not guarantee this and it is often not the case so Qt has to use either an FBO or a PBuffer as the back buffer. When the render needs to be flushed, the FBO or PBuffer is bound to a texture, rendered into the window and then the GL buffers are swapped. This extra overhead is avoided by using a QGLWidget, however as a consequence, it is not possible to redraw a sub-region of a QGLWidget: Whenever a QGLWidget is updated, the entire widget must be re-drawn.

It should also be noted that using the OpenGL paint engine isn’t a silver bullet which makes everything faster. For example, the GL engine really sucks at drawing lots of small geometry with state changes between each drawing operation. While we’re working on improving that use case at the moment, the raster paint engine will probably always be faster just because it has so much less overhead. So QGLWidget might be a great way to get the best of both worlds when combined with the raster graphicssystem - Use QGLWidget for operations which GL excels at and the raster engine for everything else.

Tips for Performance (fps)

As a general rule of thumb, OpenGL state changes are expensive. So, use the knowledge you now have of what’s going on under QPainter and try to minimise the number of OpenGL state changes the paint engine needs to do. For example, if you implement a virtual keyboard, you now know that the engine uses a shader for text rendering and a different shader for pixmaps, so draw all the key pixmaps first, then draw all the text on top. That way, the engine only needs to change shaders twice per frame.

  • Never, ever use anything other than intersecting clips
  • Don’t switch render target in the middle of a render
  • Try to use use untransformed rectangular clips whenever possible
  • Minimise changing the brush wherever possible
  • Render batches of primitives of the same types together.
  • Avoid drawing translucent pixels & blending (particularly important on mobile GPUs)
  • Try to cache QPainterPaths and re-use them rather than creating & discarding them in your paintEvent
  • Use QPainterPaths even when there’s a QPainter convenience function. E.g. Rounded rects and elipses.
  • If you’re drawing lots of small pixmaps, try bunching them up into a single, larger pixmap
  • Prefer to use power-of-two (2^n) widths & heights for QImages and QPixmaps (128×256, 256×256, 512×512, etc)
  • If using QGLWidget and don’t need anti-aliasing, don’t enable sample buffers in the QGLFormat
  • If rendering complex QPainterPaths, try to only use odd-even fill rule
TomCooksey
Painting
Graphics
Posted by TomCooksey
 in Painting, Graphics
 on Thursday, August 06, 2009 @ 12:53

Working with clever people is great, but sometimes it comes back to bite you… This post is going to describe a great new feature in Qt and why we’re not going to implement it. :-)

Over the last few months I’ve been toying with the idea of starting a new research project to develop a binary vector graphics format, qvg. Why? Well, svg’s great and all, but it does too much: You can embed animations, text, has a DOM tree, all sorts of things. It’s also a plain text format (xml) so slow to initially read & parse. QVG would be our own binary file format to describe vector graphics with paint engine specific sections. You could take an SVG (which only uses a subset of the standard) and “compile” it to a qvg. During the complication, off-line processing can be done. E.g. Obscured regions of geometry could be clipped, vector paths could be triangulated, gradient textures could be generated, etc. etc. There’s loads of optimisations we could do to “compile” svg into a binary format.

So, why would we want a compiled svg? Because it’s faster to load and render than svg! So the question actually becomes, why do we want svg? Theming, that’s why! Qt provides applications with re-usable UI controls, things like buttons, scrollbars, tick boxes, etc. C++ programmers can write applications using those controls and graphic designers can create themes for them. The idea would be for those graphic designers to draw their super sexy looking scroll bar, export it as SVG and then compile it to QVG, where it would be rendered very quickly by one of Qt’s paint engines.

Having patted myself on the back for coming up with a great idea, I started talking to people about it. Most people in the graphics team seemed to like it and are already developing something a little similar. I then figured it would be a good idea to talk it over with our theming expert, Jens. “What’s wrong with pixmaps?” was essentially his first reaction. Well, good question. The problem with pixmaps is that they’re not resolution independent. If I want a theme to work equally well on both my 96 dpi TV and on my 320 dpi smart phone, pixmaps aren’t going to work. The scrollbar will be too small on the smart phone but take up half of the TV (unless you’ve got a fancy HD one). The only option would be to scale the pixmap, but that generally looks horrible.

We talked about it and compared pixmaps to vector graphics and the advantages of each. I honestly thought that qvg would solve all the problems with svg (I.e. it’s speed). But then Jens delivered the mortal blow to my argument: If UI elements are themed as vector graphics, the graphic designer can influence rendering speed. Think about it: If the scroll bar button is a single square and triangle, it will render very very quickly. However, if it’s a vector path with a pen and conical gradient, it’s going to be much slower to render. We’d get into the silly situation where people change the theme to get better rendering performance.

So, the answer is to use the best of both worlds: Supply themes as resolution independent SVGs, but cache them as a pixmap (in the correct resolution) and just render that. You get the flexibility of vector graphics but the speed of pixmaps. Of course this isn’t a new idea either – Both GNOME and KDE already do this.

TomCooksey
Painting
Graphics
Posted by TomCooksey
 in Painting, Graphics
 on Friday, March 13, 2009 @ 17:32

I am one of our QWS developers. QWS, the Qt Window System, is the heart of Qt for Embedded Linux, formally Qtopia Core, formally Qt/Embedded. :-) What’s great about working on embedded is that you have a view of the system as a whole - the complete stack. So it’s my job to have a pretty good idea how QPainter commands you write in a widget’s paint event end up as voltage levels rapidly alternating between +3.3v and 0v on wires going to your LCD. While QWS handles all the usual window system tasks such as keyboard focus and mouse events, the biggest component is probably graphics. The window system is inherently part of the graphics stack and I actually spend a lot of my time working with the graphics team.

Over the last year our clear focus has been on performance. This performance push has been in all areas on all platforms and architectures. When it comes to graphics, there’s a very broad range of hardware Qt can expect to see - from a simple MMU-less ARM with only a frame-buffer all the way to scary gamer PCs with thousands of graphics cards & neon lights installed. Our lives are made complicated because if we’re not careful, one can end up writing code which runs faster on that little arm than it does on the gamer PC. This post hopes to explain why this is so.

Let’s begin by categorising the range of hardware available:

First, we need to differentiate Unified Memory Architecture (UMA) devices from those with dedicated graphics memory. Generally, high-end hardware will have dedicated graphics memory whereas low-end devices will just use system memory (sometimes reserving a memory region, sometimes not). This is pretty strait forward on PCs - You can almost tell from a PC’s price tag if it has dedicated graphics memory. Sadly, in the world of embedded devices, this is not the case. High-end devices often have UMA and low-end devices (especially set-top-boxes) have dedicated graphics memory.

The next differentiation is the graphics operations supported by the hardware. Generally they are wide ranging but can be loosely categorised as:

1) No acceleration (framebuffer only)
2) Blitter & alpha-blending hardware
3) Path based 2D vector graphics
4) Fixed-function 3D
5) Programmable 3D

Hardware with no acceleration whatsoever or a simple video overlay is the most common we see in embedded devices. This will always be the case until someone figures out how to design and manufacture silicon for free. Blitter and alpha-blending hardware is almost non-existent on desktops these days, but it does seem to still be around in the current generation of embedded hardware. Path based 2D vector graphics is pretty new and looks ready to replace blitter-only style hardware. NOTE: This does not refer to hardware which can draw a 1-pixel wide, non-anti-aliased, non-dashed, solid-colour line without clipping. Fixed-function 3D tends to be the older generation of desktop graphics processors. Generally, fixed function has pretty much been replaced with programmable 3D. This is even the case on mobile hardware.

So, there’s five categories of operations and two types of memory architecture leading to ten different overall types of graphics hardware. I’ve collected an example of each, just so you know we don’t make this stuff up. :-)

Type UMA Non-UMA
None Marvel PXA270 Various*
Blitter NXP PNX8935** Fujitsu Lime MB86276***
2D vector Freescale i.MX35
Fixed-3D Freescale i.MX31 nVidia GeForce 2
Programmable-3D TI OMAP3530 AMD Radeon HD 4600

* Various: Some devices use dedicated framebuffer memory to reduce load on the system memory bus
** NXP PNX8935: http://www.nxp.com/applications/set_top_box/ip_stb/stb225/
*** Fujitsu Lime MB86276: http://www.fujitsu.com/downloads/MICRO/fma/pdf/MB86276.pdf

The next question then becomes: How can Qt off-load graphics operations to these different types of hardware? Well this is done through Qt’s QPaintEngine API. The idea is that Qt applications (& Qt itself of course) always uses QPainter, which in turn uses one of the paint engines. To take advantage of graphics acceleration, we write a new paint engine (like the OpenGL ES 2.0 engine we’ve added in 4.5.0). The advantage is that existing applications can benefit from new rendering back ends and new applications can still work on older or less advanced hardware (albeit with lower performance). There seems to be a misconception in the community that Qt is out-of-date because it has no OpenGL scene graph API. While that statement is technically correct, Qt does have QGraphicsView scene graph API which uses QPainter. Because it uses QPainter, if OpenGL (for example) is available, it can be used as the rendering back end.

So, now that’s cleared up, what QPaintEngines are there and do we have all the hardware acceleration types covered by them?

Well, for devices with no acceleration, Qt will use it’s raster paint engine. The raster engine has seen some very impressive optimizations in Qt 4.5, as Gunnar has previously blogged about. For higher end graphics hardware, there’s usually a nice high-level API which is powerful enough to express all of QPainter. I.e. OpenGL & OpenVG. The trouble we’ve recently hit is the hardware in-between, I.e. those with blitters but not much else. Such hardware is not powerful enough to express the whole of QPainter, so we must fall back to the raster paint engine for unsupported operations. The raster paint engine needs a pointer to the memory it renders to (and reads from). On UMA systems, this is not a problem as the buffer is obviously in system memory (that’s all there is!). It’s on systems with dedicated graphics memory where the fun begins…

First, on many systems, you simply can not map graphics memory into your process’ address space - The architecture simply has no way to do it. On such systems, the buffer must be copied to system memory, rendered to with the raster engine, then copied back. If this happens _every_ time you switch between a fall-back and the hardware, it’s going to be _slow_!!

On some systems (particularly PowerPC for some reason?), the graphics controller sits on the SoC’s external bus and can be addressed directly by an application. All that needs to happen is for the kernel to configure the process’s page table to point to the graphics controller’s memory range. It’s then up to the graphics controller to access data in it’s dedicated memory on behalf of the host CPU. Although this kind of set-up does allow the raster paint engine to get a pointer to graphics memory, all accesses go over this external bus - which is usually slow. On PC/x86 architecture, things get more even more complicated, the kernel has to fiddle with lots more hardware, cache controllers, PCI bus controllers, IOMMUs, etc. However, in all cases, if you’re lucky enough to get a pointer to graphics memory, all access must go over a slower external bus.

So now we know what’s going on, what conclusion can we draw? Well, reading & writing to external graphics memory is slow. If your on non-UMA, don’t have OpenGL or OpenVG available, but do want to use your blitter then you’d better make sure your mostly using QPainter::drawPixmap(). NOTE: Graphics view’s cache modes can help you out a lot there - see Andreas’ previous posts! ;-) Otherwise falling back to the raster engine is going to be slow. Fortunatly, this type of hardware is (finally) on it’s way out.

NOTE: I should also mention that there’s a similar issue with X11. There’s no API to get a pointer to an X pixmap and X11 does not provide enough API to implement the whole of QPainter. While the X11 paint engine does not inherit from the raster paint engine, it does make use of software fall-backs which involve copying the pixmap, executing the fall-back and then uploading the result. It’s for that reason that we’ve added the raster graphics system which uses system memory (via the MITSHM extension) in Qt 4.5. On desktop, this is a fairly temporary measure until our OpenGL 2.0 engine & graphics system is in a fit state to take over all of Qt’s rendering. No promises, but we hope that can happen for Qt 4.6. For X11 on low-end embedded devices (like the n810), MITSHM provides a pretty decent long term solution.

So, when we look to the future of Qt’s graphics architecture and the required paint engines, I think we’re well on the way to having all the bases covered:

Type UMA Non-UMA
None Raster Raster*
Blitter DirectFB DirectFB**
2D vector OpenVG*** OpenVG***
Fixed-3D OpenGL (ES) 1.x OpenGL (ES) 1.x
Programmable-3D OpenGL (ES) 2.x OpenGL (ES) 2.x****

* When using raster on NUMA, rendering is actually done in system memory first, then flushed to VRAM
** This is the one which is going to be slow when doing anything other than QPainter::drawPixmap()
*** It shouldn’t be a big surprise we’re researching an OpenVG paint engine!
**** Qt 4.5 contains a new paint engine for OpenGL ES 2.x which we’re now making work on desktop OpenGL 2.0

I just want to finish by asking you to take another look at the above table. Do you notice anything interesting? All of the graphics systems (apart from DirectFB) are cross platform which means, when we make something faster in one engine, all platforms will benefit.



© 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
All other trademarks are property of their respective owners.