gunnar
Threads
Painting
Graphics Dojo
Performance
Posted by gunnar
 in Threads, Painting, Graphics Dojo, Performance
 on Thursday, January 21, 2010 @ 08:18

Previous posts in this topic:

In this series that we’ve been doing, I wanted to cover threading, a topic that has been actively discussed amongst some of the trolls over the last few months. We’ve had support for rendering into QImage’s from non-GUI threads since the early Qt 4.0 days, but its only in recent versions of Qt, I think, 4.4 that we got support for rendering text into images. Now that support is there, it begs the question how to make proper use of it. Generating the actual content in a thread is one usecase, here is an example of it.

What it means is that instead of rendering all the content of a certain view in the QWidget::paintEvent() or in the QGraphicsItem::paint() function, we use a background thread which produces the cache. The benefit is that even though drawing the actual content can be quite costly, drawing a pre-rendered image is fast, making it possible for the UI to stay 100% responsive while the heavy loading is happening in the background. It does imply that not all content is available at all times, but for many scenarios this is perfectly fine. There is nothing novel about this approach, I just think its a nice way to solve a problem that often comes up when dealing with user experience.

This approach is used by Google Maps (actually, what the server does I don’t know, but it sends individual tiles to the browser at least), iPhone and N900 web browsers, and I’ve talked to customers in the past that use this approach for usecases where generating the content is costly, but the user interface needs to stay responsive. In fact, this approach applies to pretty much anything where it is ok that the content is not immediately there, such as data tables like an mp3-index or a contact list, images in a data folder, etc.

The Task

Lets first look at the task. I’ve done a trivial implementation which looks in a directory and displays all the images in there. Each image is a separate content piece and I’ve put a background, a small frame around it and a drop shadow under it. Just so that there is a bit of active work going on. If you are into it, here is the Source Code

The content pieces could have been tiles in a map of Norway or tiles composing a webpage, but I choose images, because I already had some images around and I figured it made for an ok example. The demo is run on an N900 with compositor disabled using the following command lines:

  • Non-Threaded: ./threaded_tile_generation -no-thread -graphicssystem opengl MyImageFolder
  • Threaded: ./threaded_tile_generation -graphicssystem opengl MyImageFolder

Here’s how it looks when the content is generated in the GUI thread:


The UI is running super-smooth as long as I show only the content that is already loaded. Once work is needed, the entire UI stops and the user experience is really bad. Here is how it looks if we move the work into a background thread.

The algorithm

Don’t use this particular algorithm. It is very crude and written to show an idea. First of all, because I was lazy, I used queued connections rather than a synchronized queue to schedule the pieces to be rendered. This means that the queue is managed by Qt’s event loop, out of my control. So if I pan far out, I will schedule a lot of images to be rendered, then pan beyond them before they are done. In a decent implementation, I would dequeue these and make sure that only the pieces that are directly visible are being processed.

The other thing is that there is no logic to “peek ahead”. I schedule images to be generated only when I need them. If I instead scheduled them based on the current panning direction, in addition to not discarding so aggressively, it would probably result in a situation most images are rendered ahead of time.

QGraphicsView

It would be kinda cool if this could be applied directly to QGraphicsView. You set a flag on the item and instead of generating its cache pixmap in the GUI thread, it was offloaded to the worker thread. This is not straight forward however, because the GUI thread can, pr today at least, continue to modify the state of the item, while its being rendered in the worker thread. Synching these two becomes a bit of a mess, and how to solve it, if at all, is not something we have a plan for. That doesn’t prevent people from doing this kind of work in their own custom paint() functions of course.

gunnar
Painting
Graphics Dojo
OpenGL
Posted by gunnar
 in Painting, Graphics Dojo, OpenGL
 on Monday, January 18, 2010 @ 10:00

Previously in this topic:

In my previous post, The Cost of Convenience, we saw quite clearly that text drawing was a major bottleneck. Text drawing is quite common in GUI applications though, so we need a solution for that. If we break down what happens behind QPainter::drawText(), it is split into two distinct parts. Converting the string into a set of positioned glyphs, often refer to as “text layout” because it positions the glyphs, does text wrapping and adjustments for alignment. The second part is passing the glyphs to the paint engines to be rendered. When the text is the same all the time, the first part could be done once and the glyphs/positions just reused.

We have a class in Qt which allows you to cache the first part and only do the drawing for each frame. The class is QTextLayout. This is a low-level class, throwing asserts at you for the most trivial of mistakes. It also comes with a really inconvenient API, but it does reduce the most costly step of text drawing, which is the layout part. It is also only fair to mention that QTextLayout uses a lot more memory than just the glyph-array and positions array, as one could expect, so in a memory constrained setup, it should be used with caution. In 4.7, we plan to introduce an API for static text, which takes care of all the layout and stores only the required parts, reducing the overall memory footprint, but for now, QTextLayout is how you do it.

Going back to my virtual keyboard, updated Source Code, I’ve changed the “-buttonview” example to make use of QTextLayout. In the constructor, I build the layout:

    ButtonView() {
        QString content;
        for (int i='0'; i< ='Z'; ++i) {
            content += QLatin1Char(i);
            content += QChar(QChar::LineSeparator);
        }
        m_layout = new QTextLayout(content, font());
        QFontMetricsF fm(font());
        m_layout->beginLayout();
        for (int i=0; i<content .size() / 2; ++i) {
            QTextLine line = m_layout->createLine();
            line.setNumColumns(1);
            int x = (i) % 10;
            int y = (i) / 10;
            QSizeF s = fm.boundingRect(content.at(i*2)).size();
            line.setPosition(QPointF(x * 32, y * 32) + QPointF(16 + s.width() / 2, 16 + s.height() / 2));
        }
        m_layout->endLayout();
        m_layout->setCacheEnabled(true);
    }

</content>

If you look at the source code, there is more stuff going on in the constructor than I show above. This is because I extracted the text layout relevant parts only. So what we do is to build a string of the characters. Between each character I insert a LineSeparator. Without this, I wouldn’t be able to split the text into multiple QTextLine objects. From the content string, I construct the layout. For each character, I find its position in the grid and construct a QTextLine and move the line to its position. Each line is one column/character big. Finally I enabled caching on the layout. This is the step where we start caching the laid out text.

When it comes to the paint method, the code is rather straightforward. All the text is contained inside a single layout object so I can just call its draw function.

    void paint(QPainter *p, const QStyleOptionGraphicsItem *, QWidget *) {

        // Draw background pixmaps...

        m_layout->draw(p, QPointF(0, 0));
    }

Now, lets have a look at what this gains us:

Text Layouts

The graph shows the number of milliseconds per frame including the blit. Measured on an N900 with composition disabled. Smaller is better!

If we compare the “-no-indexing -optimize-flags” to the one with “-no-indexing -optimize-flags -text-layout”, we see that there is a significant reduction per frame. It brings raster from 9.3 ms per frame down to 5.5, OpenGL drops from 16 ms per frame to 9.1 ms when using a text layout. A drop of about 4 ms is also visible in the X11 paint engine.

Needless to say, using the QTextLayout class introduces a huge benefit, but it requires a bit more setup to get there. In this implementation I merged all the text into a single object which also makes it impossible for me to move one item relative to the others, such as adding an offset when a button is pressed. I could have one QTextLayout for each item, which would have been roughly the same performance, but at a higher memory cost.

Until next time, take care!

PS: A small comment on the item cache / X11 numbers. The connection is asynchronous and Qt completes its job at about 2.7 ms pr frame. With “-sync” on the command line, which makes all X calls synchronous, raises the time to about 10 ms per frame. If I had put a QApp::syncX() into each frame, synchronizing once per frame which is essentially what GL and VG are doing, I would probably get a number that is in between these two. What this means is that the numbers for X11 in this test are actually quite a bit worse than the graphs show.

gunnar
Graphics View
Painting
Graphics Dojo
OpenGL
Posted by gunnar
 in Graphics View, Painting, Graphics Dojo, OpenGL
 on Monday, January 11, 2010 @ 09:25

Previous posts in this topic:

So, its time for my next post. Todays topic is how convenience relates to performance, specifically in the context of QGraphicsView. My goal is to illustrate that the way to achieve fast graphics is to pack your QPainter draw calls as tightly together as possible. The more stuff that happens in the middle, the slower it gets.

To illustrate this, I’ve implemented a virtual keyboard. Granted, its not a very common layout nor is it usable, but the rendering is the point here, not the functionality. The full source code is here and it looks like this:

Virtual Keyboard Image

I’ve implemented the keyboard using three different approaches. One using proxy widgets, one using graphics items and one where the entire view is one graphics item. In addition to that, I added a number of options to tweak various properties, such as whether or not the text is drawn. I measured this on an N900 rather than a desktop because the difference becomes more profound on a small device. On the desktop it is easy to be fooled because most things complete in a matter of micro seconds anyway. It is only when the entire application comes together one notices that things are not as smooth as in the prototype, but too much work has been invested into the current design that one loses out on the super-slick feeling application.

QGraphicsProxyWidget

Since we’re implementing a series of clickable buttons, a natural and convenient starting point is to use an existing button class, such as the QPushButton. It already implements the logic for mouse/keyboard interaction and has signals for clicking and all sorts of other useful functionality. To get widgets into QGraphicsView, we use a QGraphicsProxyWidget. To make the test “fair”, I actually use a plain QWidget which just paints a pixmap and a draws a text. Had I gone through the styling API, these numbers would have been even worse.

ProxyWidget Results
Milliseconds spent per frame including blit to screen when using QGraphicsProxyWidgets. Low is better!

If we look at the plain “-proxywidgets” run, the fastest engine was the raster engine, running at 26ms per frame. If I wanted to slide this keyboard onto screen, I have 16ms available if I want it running at 60 FPS and 33ms available if I want to do it at 30 FPS. When each frame takes 26ms, I can barely do 30, but with only a little bit of slack, so if another process is soaking up CPU time, that number is also a bit difficult to reach. So, not very good. (BTW, the exact numbers in the graphs are listed as a comment in the top of the .cpp file I linked above).

The first thing I noticed with this approach was that the each button now had a gray background. This is of course the widget background. A QWidget embedded in QGraphicsView will be treated as a top-level and will therefore draw its background. I added an option “-no-widget-background” which sets the Qt::WA_NoBackground on the widget. This brings the rendering speed with raster down to 22ms. 4ms saved per frame, just by setting a flag, not too bad, but still pretty far from being awsome.

I’ve mentioned before that text drawing is not as fast as we would like it, so just to compare how it looks without text, I added a “-no-text” option to the test. This brings the raster results down to 13ms. That is pretty nice and below the 16ms threshold required to achieve 60 FPS, but only with a small margin. And I’m not drawing any text! Before I give up with this approach, I’ll enable item caching. By setting ItemCoordinateCache on each button, I cache both the background pixmap and the text in one single pixmap. This brings the raster results down to 8.5ms, and its starting to look acceptable. But at a very high memory cost… In my original usecase I had one shared pixmap for all the button backgrounds, but now I have one per button.

You may notice that there was a vast difference between item caching and the proxy widget drawing the pixmap. One thing that adds to the proxy widget cost is that the QPainter is recreated and initialized for each button in the buttons paint event. Also, as I mentioned in my previous post, An Overview, you may remember that I said that each widget has a system clip and that there is an overhead involved with calling the paintEvent. For items in QGraphicsView, there is already a painter, and I don’t need a clip, nor do I need any of the other stuff that goes on behind the scenes there. When we enable item coordinate caching, we don’t leave graphics view world and we don’t enter the widget world. This crossing is expensive, so by not going into the widget world, we save a lot.

So, if there is a lesson to be learned it is that QGraphicsProxyWidget should be used with extreme caution. If you really need it, use very few of them.

QGraphicsWidget

If proxy widgets are too slow to be usable in this scenario, then the next best thing is to use a QGraphicsWidget. This is a subclass of both QObject and QGraphicsItem, which gives me signals, slots and properties, but its not a QWidget and therefore still fairly lightweight. The numbers are as follows:

GraphicsWidgets Results
Milliseconds spent per frame including blit to screen when using QGraphicsWidgets. Lower is better!

Compared to the proxy widgets approach we’re starting out quite a bit better, with raster at 13 ms per frame, OpenGL at 20ms and X11 at 22ms. Below this line is a new line: “-no-indexing -optimize-flags”. QGraphicsView will by default put all the items in a view into a BSP tree for fast lookup, this is beneficial when the scene contains many items and you often need to find items that intersect with a small portion of the scene. In the testcase we’re always doing a full update, so there is no benefit from the index, so it can be disabled by calling scene->setItemIndexMethod(QGraphicsScene::NoIndex). Having a BSP is the default behaviour because graphics view was initially intended to be a static scene for many items. The most common usecase today is a few (a few hundred at max) items which tend to move a lot. For this reason, it is always a good idea to try to disable the BSP and see if it makes a difference in performance. If it helps, then leave it off.

I also know that the items play nice, meaning that they don’t change the clip, translate the painter, change the composition mode or modify any other state that would propagate to other items. This means I can safely set the DontSavePainterState optimization flag. Actually, based on an old habit, I set all possible optimization flags. I only consider unsetting them if my drawing code starts to look weird, at which point I would rather fix the drawing code and keep the flags set. By disabling indexing and enabling optimization shaves off 2ms per frame in for all rendering backends, so that is definitely worth it.

If I don’t do text, the performance is about twice as fast. Again we see that text drawing is a huge cost. We’re working on an API to fix this and we’ll have more information for you when we do. You may notice that enabling item caching drops the performance a bit compared to the “-no-text” case. There isn’t much overhead inside QGraphcisView for this path. A likely reason for the decrease is that reading from multiple memory sources (multiple pixmaps) results in a lot of cache misses, compared to the straight approach which draws the same pixmap over and over.

ButtonView Item

In my previous post I briefly mentioned that there is a slight overhead involved with the use of a QGraphicsItem too. Prior to calling the paint function, the painter is transformed to the coordinate system of the item and the painter state is saved. If the item draws a big polygon, this setup cost can be ignored, but when drawing just a pixmap and a few pixels of text, then it may be worth considering. In the spirit of “The more direct the painting code is, the faster it gets”, I implemented the keyboard as a single item. The numbers are as follows:

ButtonView Results
Milliseconds per frame including blit to screen when using a single item. Lower is better!

Raster is now down to 10ms, which is 1ms better than the QGraphicsWidget approach when all optimizations were enabled, so even though graphics items are cheaper than widgets, they still cost a bit. The keyboard is now rendered in a tight loop, and the major difference in performance here is caused by the fact that items in the scene have a transform associated with them. Prior to calling paint() a transform is set to match the painter to the items local coordinate system. This causes a state change in the paint engine. For each button we’re drawing a 32×32 pixmap which means alpha blending 1024 pixels, followed by doing text layout and drawing a single character. Even then do we save about 10% time by not having a QPainter::translate() in the midst, so bear that in mind. By enabling the optimization flags and disabling the index, raster drops a bit more, so having those are still a good idea.

You may have noticed that there is one dataset that is named “cheat” for OpenGL. I was reluctant to include this, because its using a private API that is not, and I really mean NOT, subject to binary compatibility rules. You cannot call this from your application. We’re going to add a public API for this in the future, hopefully 4.7, so until its there, wait. In the interest of showing what we are thinking internally, I thought I would show it.

OpenGL is really great for accelerating graphics, but its way of working does not map optimally to how Qt works. GL is really good at taking a few large datasets of triangles and rendering them, but its not so good at drawing loads of small things. Small things like button backgrounds, icons, single text items, etc. However, all the buttons backgrounds are the same pixmaps, so what if I could tell QPainter to draw the same pixmap in multiple places at once? In GL this would correspond to setting up a texture and one vertex and texture coordinate array and drawing some 40 pixmaps in one go. This fits much better with how GL is made to work. The result is that drawing the buttons drop from 5.2ms to 3.9ms, so another piece of juice squeezed out. Naturally, the more times the pixmap is drawn and the smaller the pixmap gets, the more benefit you get from batching commands like this.

There is a second option to OpenGL for the button view case, which is the “-ordered”. This was done after Tom brought to my attention that the testcase would do a shader program update for each painter call. In the default buttonview implementation we do:

                    for (int i=0; i < m_rects.size(); ++i) {
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));
                    }

Because pixmaps use one shader pipeline and text drawing uses another, the pipeline needs to be switched and reset all the time, which renders at 16m per frame. To see if it makes a difference, I added a second alternative rendering, “-ordered”, where I do all the pixmaps first, then all the text:

                    for (int i=0; i < m_rects.size(); ++i)
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                    for (int i=0; i&lt;m_rects.size(); ++i)
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));

This prevents the shader pipeline updates and bring the rendering time per frame down to 13ms, so definitely worth it.

Summing Up

Virtual Keyboard Combined Results
Milliseconds per frame including blit to screen for proxy widgets, graphics widgets and a single widget. Lower is better!

OpenGL comes out rather bad in this testcase, which I was a bit disappointed to see, but it did send Tom into an optimization frenzy, so we’re hoping to remove some of the constant overhead. It should also be said that when using the OpenGL graphics system, we enable multisampling by default, which increases rendering time on the N900 by around 30%. A plain QGLWidget would thus perform slightly better. Another aspect to OpenGL is that it uses a dedicated low-power chip, so even though it for this particular usecase runs at half the speed, it also uses a lot less battery, so it may still be the right choice. OpenGL will also scale significantly better than raster and X11 as the pixmaps get bigger or if the content of the button is slightly more advanced, say like a horizontal gradient.

The best numbers are definitely in the button view case, where all the content is rendered as one item, which is what I wanted to highlight with this blog. The button view item also opens up for other optimizations such as batching. We don’t have that many batching functions in QPainter today, its only drawRects(), drawLines() and drawPoints(), but we’re considering to add more, we are just not sure on how the API’s would look yet.

The bottom line is still that how Qt is used defines how well it performs. On one hand there may be an easy and convenient way to get the job done which performs quite sub-optimally. On the other hand there may be a more involved implementation which performs very well. I’m not trying to suggest that you do one or the other, there are a lot of good reasons for picking either one. But I hope that I’ve illustrated that some features come at a cost and that this is kept in mind along with what the target is when designs evaluated and chosen.

I’ll round off with a question. If you were to implement a particle effect when you press a button, which approach would you choose, having seen the numbers above?

TomCooksey
Painting
Graphics Dojo
OpenGL
Performance
Posted by TomCooksey
 in Painting, Graphics Dojo, OpenGL, Performance
 on Wednesday, January 06, 2010 @ 12:01

Introduction

Here’s the next instalment of the graphics performance blog series. We’ll begin by looking at some background about how OpenGL and QPainter work. We’ll then dive into how the two are married together in OpenGL 2 Paint Engine and finish off with some advice about how to get the best out of the engine. Enjoy!

Why OpenGL?

Before I dive into the OpenGL paint engine, I want to make sure we all understand the motivation for the OpenGL 2.0 paint engine. I’ve talked about this before in my article about hardware acceleration, but we still frequently get questions like “Why not implement a Direct2D paint engine?”.

Everyone knows OpenGL means fast graphics right? Well, this is actually a bit of a misconception. What makes graphics fast is a bit of hardware dedicated to computer graphics called a GPU (Graphics Processing Unit). OpenGL 2.x is a software library which often (but not always) uses a particular class of GPU to help satisfy drawing operations (Note: OpenGL 1.x used a different class of GPU). A modern programmable GPU (e.g. nVidia GTX 295) can usually be programmed via both OpenGL, Direct3D and OpenCL. The only difference then is that Direct3D is only available on the Windows platform and OpenCL is not universally supported.

So the reason we are investing our time and effort into OpenGL, rather than Direct3D or OpenCL, is that OpenGL 2.0 is sufficient to give us access to all the GPU features we currently want to use. It is also available on more platforms, especially if you limit yourself to the ES sub-set. We are also looking into restricting ourselves further to only use APIs in OpenGL 3.2 Core Profile.

This might change in the future if we see a new class of GPU, like ones designed for 2D vector graphics which can’t be abstracted by OpenGL 2.0 very well (enter OpenVG), or, if we want to start using GPU features which OpenGL (ES) 2.0 doesn’t give us access to. Having said that, OpenGL is very good at exposing new GPU features through extensions.

History

Qt has had an OpenGL paint engine since early Qt 4.0 days. This engine was designed for the fixed-function hardware available at the time. As time went on and manufacturers added newer bits of hardware to their GPUs, the OpenGL paint engine was adapted to use those features through OpenGL extensions. Over the last 4 years, lots of people have hacked on the engine and added support for things like ARB fragment programs and even adapted the engine to work on OpenGL ES 1.1. The engine is pretty stable and has lots of fall-backs (or original code-paths, depending on how you look at them) for old hardware missing GL extensions the engine can utilise. But, fundamentally, it is an OpenGL 1.x engine.

In early 2008, around the time of the Falcon project (the Falcon Project was an internal project started for Qt 4.5 which focused on painting performance and architecture), it became increasingly clear that Qt needed to support hardware acceleration using the OpenGL ES 2.0 API which was starting to appear on embedded System-On-Chips like the OMAP3. There were two options available: Extend the existing OpenGL paint engine further still, or develop a new paint engine from scratch. When looking at the existing engine, there was a major problem – although it supported fragment programs, it was heavily reliant on fixed-function vertex processing. A further consideration was that the Falcon project had just kicked off and the future of the QPaintEngine API was uncertain. Both of these factors resulted in a new paint engine being written from scratch for OpenGL ES 2.0. This new engine had a distinct advantage over the existing engine: everything I wanted to use from OpenGL was in the core OpenGL ES 2.0 API. This meant I didn’t need to add fallbacks in case of missing functionality, leading to much cleaner and leaner code.

Another point about OpenGL ES 2.0 is that it doesn’t have much in the way of fixed function features – forcing you to write shader programs. While annoying at the time, this is apparently the best way to do things even on desktop GPUs. This point is important because it quickly became apparent that although the engine was designed for GLES2, not only would it also work on desktop OpenGL 2.0, but it would use that API in a way better suited for modern programmable GPUs. So, in Qt 4.6, the new engine is used by default on both GLES2 and on desktop systems which support OpenGL 2.0.

What does OpenGL (ES) 2 provide?

As I’ve already mentioned, OpenGL ES 2.0 is a pretty lean and mean API which models programmable GPUs. The “programmable” bit is fundamental to the API. It means that you write small programs known as shaders, ask OpenGL to compile and then run them on the GPU to process the data you give it. There are two types of shaders: one type processes positions (vertices) and another type processes pixels (fragments), called the vertex shader and fragment shader, respectively. The idea is that you tell OpenGL you want to draw some triangles and the vertex shader is run to determine the position of each of those triangles. Then, the GPU turns each triangle into a bunch of pixels and the fragment shader is run to determine the colour of each of those pixels. The API provides various ways of passing data from the CPU to the GPU (from textures and lists of triangle positions to individual floats) and ways of passing data from the vertex shader to the fragment shader. That’s basically it. All the complexity lives in the shaders you give to the GPU to run.

What does QPainter require?

The rest of this blog assumes you are familiar with the QPainter API (if not, go check the QPainter docs) ). It might also be a good idea to read through Gunnar’s post about how the Raster engine works.

So, the QPainter API provides more than just triangles. It is therefore the GL paint engine’s job to turn the whole of the QPainter API into “just a bunch of triangles”. To understand its task a little better, you have to split QPainter up into chunks which map better to OpenGL. A great example of this is drawRect(). In QPainter terms, this is a single primitive, but in GL engine terms, it is actually two: A rectangle (the fill) and a (possibly quite complex) line round the outside (the stroke). The OpenGL paint engine tries to keep a fairly clean separation between the shape of something which is drawn and its fill. So, here’s the list of primitives (shapes) QPainter requires the engine to draw:

  • Simple primitives (Rectangles, convex polygons, ellipses, etc.)
  • Text
  • Pixmaps
  • Strokes
  • Complex vector paths (QPainterPath)

In addition to this, we have various fills which we can use on our primitives provided by QBrush:

  • Solid colour
  • Linear gradients
  • Radial gradients
  • Conical gradients
  • Bitmap patterns
  • Textures

Not only do we have different types of fill, but we also support a full 3×3 transformation matrix on the brushes. This allows you to draw a rectangle but use it as a kind-of stencil over (for example) a perspective transformed texture.

Finally, QPainter also requires the engine to implement clipping, different composition modes and support it’s state stack (QPainter::save() & QPainter::restore()).

Engine Operation

Primitive Rendering

  • Simple Primitives: To render convex primitives such as rounded rectangles, we just generate a GL triangle fan and render it using glDrawArrays
  • Text: For large text, we convert it to a complex path and render is as such. However, for smaller font sizes, we rasterize the individual font glyphs and upload them as a texture (8-bit texture for bitmap & anti-aliased glyphs and 24-bit RGB for sub-pixel anti-aliased glyphs). This glyph texture is used as a mask in the engine’s pixel pipeline (see below). So, in terms of primitives, text is actually rendered as a set of rectangles - one rectangle for each glyph. When rendering with sub-pixel anti-aliased glyphs, it is possible that the engine will need to do two passes (if the brush is not a solid colour). This is because the engine uses a clever trick and sets the brush’s colour as the glBlendColor and outputs the RGB mask in the fragment shader. It is then able to set a glBlendFunc which combines the two and gives per-sub-pixel blending. If you set a more complex brush, the engine has to do two passes - first apply the mask to the destination, then a second pass to apply the brush, with glBlendFunc set to give the correct result.
  • Pixmaps: A pixmap is actually just a rectangle.
  • Strokes: Strokes can be very complex - just take a look at the pathstoke demo! However, even the most complex dashed pattern with rounded joins and end caps can be turned into a GL triangle strip relatively easily. This is done by the QTriangulatingStroker.
  • Complex vector paths: This is where things get tricky. QPainterPaths can have lots of things which break the “turn lineTo, moveTo and curveTo into verticies and render as triangle fan” algorithm…

Rendering Using Stencil Technique

Take the following path as an example:

Convex Path (1)

Here we have a seemingly trivial path with only 4 points. To draw this with GL, you could just convert the path’s points to verticies and draw it as a triangle fan, which results in two triangles: Triangle 1: ABC and Triangle 2: ACD. The problem is that just looks like a solid triangle, not the path we wanted:

Convex Path (2)

So, to overcome this difficulty, we drop to a 2-pass rendering method which uses the stencil buffer as a temporary scratchpad. So first off, we clear the stencil buffer to all zeros (represented as white):

Stencil Buffer (Clear)

Next, we set the stencil operation to invert, which means instead of setting the stencil value to ‘1′ when a triangle touches a pixel, invert the existing value instead. So 0->1 & 1->0. First we render the first triangle (ABC). As all the pixels are currently 0, every pixel touched by the triangle turns to 1 (represented as black):

Stencil Buffer (Triangle 1)

Next, we draw the second triangle (ACD). Note: We are inverting the stencil’s value, so black pixels touched by the second triangle turn to white and white pixels turns to black:

Stencil Buffer (Triangle 2)

So now the stencil buffer contains the silhouette of our path. All we do now is draw a rectangle into the destination window, but with the stencil test enabled.

In addition to the stencil technique, we are also adding experimental support for triangulating QPainterPaths and caching the triangulation. While this is slower for paths which change often or are zoomed in & out, paths which are relatively static can be triangulated once and rendered multiple times without having to re-triangulate.

Filling Primitives

Now we know how all the different QPainter operations get turned into GL primitives, but we’re still missing how they get filled. As already mentioned, the colour of a pixel is determined by the fragment shader. We therefore have lots of different fragment shaders for different types of fill. However, we also need to support text rendering with arbitrary fills (QPainter lets you fill text with a perspective transformed radial gradient). In the future, we also want to support composition modes which OpenGL doesn’t provide. We’ve also found there are ways we can simplify the shaders for certain situations (and thus improve performance). The result is that Qt needs lots of different shaders. At last count, we’d need over 1000 different shaders to cover all situations. That’s a lot of GLSL to maintain and test, far more than the resources we have available. So instead we split the shaders into different interchangeable “stages”. This is achieved by having each stage in it’s own GLSL function. As an example, lets take regular, non sub-pixel anti-aliased text rendering with a transformed radial gradient. Note, this is just an example to demonstrate how the engine operates and you probably shouldn’t do it in performance critical situations.

We render gradients by pre-calculating a 1px high texture (like a 1D texture) on the CPU which we sample from in the fragment shader. However, we calculate the texture coordinates in the vertex shader and pass it to the fragment shader as a varying. This is because it’s a good idea to do as much work as possible in the vertex shader rather than the fragment shader as it is called so much less frequently.

As already mentioned, we render (non sub-pixel) anti-aliased text by using an 8-bit mask texture. We then multiply the fragment colour by a sample taken from this mask. So, if we’re on the edge of a glyph where the alpha value is <1, we adjust the alpha of the srcPixel by that amount (actually, we also adjust the RGB values too as we use pre-multiplied alpha pixel format internally).

If there was a non-standard composition mode, we’d then pass the masked pixel to another stage which would blend it with the background (although this isn’t implemented yet!).

So you can see in the fragment shader, there’s 3 different stages. The first stage (srcPixel) determines the brush colour of the fragment. The next stage (applyMask) modulates the pixel by a mask to achieve anti-aliased text rendering. The final stage (compose) then blends the pixel with the background. We also have a similar staging technique for the vertex shader. All this complexity is nicely abstracted by the QGLEngineShaderManager. The paint engine tells the shader manager what it wants to draw and the shader manager selects an appropriate selection of shaders. One final note on this: While desktop OpenGL 2 supports linking multiple fragment shaders in a single program, OpenGL ES 2.0 does not. This means that we actually use the different stages by appending them to a single string of GLSL we pass to GL. This also gives the GL implementation the best chance to inline the different stages (without which, performance would suck).

Texture Management

The OpenGL paint engine makes heavy use of gradients. For example, even though it’s perfectly possible to calculate colours for gradients in the fragment shader, we still use a texture as a look-up-table as it is so much faster. Repeatedly uploading textures every time we need them would ruin performance. So instead, we keep a per-context cache of what QPixmap/QImage is already present in texture memory. If two contexts are sharing then we also detect this and don’t duplicate the textures. This functionality is available publicly in QGLContext::bindTexture() too.

On Linux/X11 platforms which support it, Qt will use glX/EGL texture-from-pixmap extension. This means that if your QPixmap has a real X11 pixmap backend, we simply bind that X11 pixmap as a texture and avoid copying it. You will be using the X11 pixmap backend if the pixmap was created with QPixmap::fromX11Pixmap() or you’re using the “native” graphics system. Not only does this avoid overhead but it also allows you to write a composition manager or even a widget which shows previews of all your windows.

Antialiasing

The OpenGL paint engine uses OpenGL multisampling to provide anti-aliasing. Typically, this will be 4x/8x FSAA, meaning 4/8 levels of coverage, which is worse quality than the raster engine, which always uses 256 levels of coverage. However, as the DPI of modern displays increases, you can get away with lower-quality anti-aliasing.

Using multisampling also doesn’t affect text rendering as text is anti-aliased using masks rather than multisampling (for smaller font sizes). So text rendered with the OpenGL engine should look almost as good as text rendered with the raster engine (which also does gamma-correction). The only drawback of using multisampling is that some OpenGL implementations don’t support switching multisampling off. Indeed, the OpenGL ES 2.0 specification doesn’t even provide the API to switch it off. The consequence is that non-anti-aliased (a.k.a. aliased) rendering can be broken (Everything gets anti-aliased even when the QPainter::Antialiasing hint isn’t set). There’s little we can do about this. :-(

Clipping

QPainter supports setting an arbitrary clip, including complex QPainterPaths. Qt uses the GL stencil buffer (or more specifically the lower 7 bits of the stencil buffer) to store the clip. The clip is written to in the same way as we render any other primitive, even using the stencil technique for complex paths. However, instead of filling pixel colours into a colour buffer, we fill stencil values into the stencil buffer. The actual value we use depends on the current QPainter stack depth (how many times save() was called minus the number of time restore() was called). This means that if you restrict yourself to intersect clips (Qt::ClipOperation == Qt::IntersectClip), the engine only needs to write to the part of the stencil buffer which is being clipped to. What’s more, the engine doesn’t need to write to the stencil buffer at all when you call restore() - it just changes the value at which the stencil test passes.

In addition to using the stencil buffer for clipping, the OpenGL paint engine can also just use glScissor. This only allows a single, untransformed rectangle to be used as the clip, which can be quite restrictive. However, it is by far the fastest way to do clipping. So if performance is more important to you than utility, only ever use untransformed rectangular clips.

Recommendations

Interleaved Rendering

Unlike OpenGL, QPainter allows an arbitrary number of rendering contexts (QPainters) to be active in the same thread at the same time. For example, in your widget’s paint event, you can begin a painter on your widget and begin another painter on a QPixmap and interleave rendering to them:

void Widget::paintEvent(QPaintEvent*)
{
QPainter widgetPainter(this);
widgetPainter.fillRect(rect(), Qt::blue);
QPixmap pixmap(256, 256);
QPainter pixmapPainter(&amp;pixmap);
pixmapPainter.drawPath(myPath);
widgetPainter.drawPixmap(0, 0, &amp;pixmap);
}

While this works ok with the OpenGL graphics system, having to switch from doing something with one painter to doing something with a different painter can be very costly and should be avoided whenever possible.

Mixing QPainter and Native OpenGL

As shown in several examples, it is possible to mix your own OpenGL rendering code with QPainter rendering code. However, as OpenGL is a giant state machine, it is very easy for you to accidently clobber Qt’s GL state and vice-versa. To overcome this, we’ve added some new API to QPainter in Qt 4.6 - QPainter::beginNativePainting() and QPainter::endNativePainting(). To prevent artifacts, you must enclose your custom painting in beginNativePainting() and endNativePainting(). This is very important - even if you’re not seeing any problems now, you might find your code starts failing in a future Qt release in which the GL paint engine works slightly differently. Also, as beginNativePainting and endNativePainting sets lots of OpenGL state, it can be quite expensive and thus you should try to use it sparingly. Try to batch up all your custom OpenGL code in a single block.

QGLWidget vs OpenGL Graphics-System

Unlike the raster & OpenVG paint engine, you don’t have to use a specific graphics system to render widgets using the OpenGL paint engine. The QtOpenGL module provides several classes, including QGLWidget, which all use the OpenGL paint engine regardless of what graphics system is being used. QGLWidget is basically a regular widget which always has a native window ID and is always rendered to using OpenGL. You are free to choose whichever method you want to get OpenGL rendering (graphics system or QGLWidget). However, using the opengl graphics system can often be slower than using a QGLWidget, as Qt needs the contents of the “back buffer” (or QWindowSurface) to be preserved when flushing the render to the window system. OpenGL does not guarantee this and it is often not the case so Qt has to use either an FBO or a PBuffer as the back buffer. When the render needs to be flushed, the FBO or PBuffer is bound to a texture, rendered into the window and then the GL buffers are swapped. This extra overhead is avoided by using a QGLWidget, however as a consequence, it is not possible to redraw a sub-region of a QGLWidget: Whenever a QGLWidget is updated, the entire widget must be re-drawn.

It should also be noted that using the OpenGL paint engine isn’t a silver bullet which makes everything faster. For example, the GL engine really sucks at drawing lots of small geometry with state changes between each drawing operation. While we’re working on improving that use case at the moment, the raster paint engine will probably always be faster just because it has so much less overhead. So QGLWidget might be a great way to get the best of both worlds when combined with the raster graphicssystem - Use QGLWidget for operations which GL excels at and the raster engine for everything else.

Tips for Performance (fps)

As a general rule of thumb, OpenGL state changes are expensive. So, use the knowledge you now have of what’s going on under QPainter and try to minimise the number of OpenGL state changes the paint engine needs to do. For example, if you implement a virtual keyboard, you now know that the engine uses a shader for text rendering and a different shader for pixmaps, so draw all the key pixmaps first, then draw all the text on top. That way, the engine only needs to change shaders twice per frame.

  • Never, ever use anything other than intersecting clips
  • Don’t switch render target in the middle of a render
  • Try to use use untransformed rectangular clips whenever possible
  • Minimise changing the brush wherever possible
  • Render batches of primitives of the same types together.
  • Avoid drawing translucent pixels & blending (particularly important on mobile GPUs)
  • Try to cache QPainterPaths and re-use them rather than creating & discarding them in your paintEvent
  • Use QPainterPaths even when there’s a QPainter convenience function. E.g. Rounded rects and elipses.
  • If you’re drawing lots of small pixmaps, try bunching them up into a single, larger pixmap
  • Prefer to use power-of-two (2^n) widths & heights for QImages and QPixmaps (128×256, 256×256, 512×512, etc)
  • If using QGLWidget and don’t need anti-aliasing, don’t enable sample buffers in the QGLFormat
  • If rendering complex QPainterPaths, try to only use odd-even fill rule
gunnar
Painting
Graphics Dojo
Performance
Posted by gunnar
 in Painting, Graphics Dojo, Performance
 on Friday, December 18, 2009 @ 09:21

Todays topic is the raster engine, Qt’s software rasterizer. Its the reference implementation and the only paint engine that implements all possible feature combinations that QPainter offers.

History

The story of Qt’s software engine started around December 2004, if my memory serves me. My colleague Trond and I had been working for a while on the new painting architecture for Qt 4, codenamed “Arthur”. Trond had been working on the X11 and OpenGL 1.x engines and I was focusing on the combined Win32 GDI/GDI+ engine along with QPainter and surrounding APIs. We had introduced a few new features, such as antialiasing, alpha transparency for QColor, full world transformation support and linear gradients. As few of these new features were supported by GDI, it meant that using any of these features implied switching to GDI+, which at the time was insanely slow, at least on all the machines we had in the Oslo office back then. Actually, enabling the GDI advanced graphics mode to do transformations was also not very fast.

Then we came upon this toolkit called Anti-Grain Geometry (AGG) which did everything in software, in plain C++, and we were just amazed at what it could do. Our immediate reaction was to curl up on the floor in agony, thinking that we were going about this all wrong. Using these native API’s was not helping us at all. In fact it was preventing us from getting the feature set we wanted with a performance that was acceptable. Once we settled down again, our first idea was to try to implement a custom AGG paint engine which would just delegate all drawing into the AGG pipeline. But alas, the template nature of the AGG API combined with the extremely generic QPainter API bloated up into a pipeline that didn’t perform nearly as good as the demos we had seen.

So we took our Christmas vacation and started over in January of 2005. Still quite depressed over the new feature set that didn’t perform combined with being limited by a minimal subset of native API’s, I went to Matthias and Lars and asked if I could get three weeks of time to hack together a software only paint engine as a proof of concept. I got an “OK” and spent the following weeks implementing software pixmap transformation, bi-linear filtering, clipping support in the crudest possible way and three weeks later I had a running software paint engine and quite proudly announced that I was “just about done”. I’ve reconstructed an image of how I remember it:

groupboxes

The system clipping was all over the place, bitmap patterns were broken, but perhaps worst of all, all text is rendered using QPainterPath’s, and all drawing was antialiased. Despite it not looking 100% good, the performance of the various features was pretty ok. It was agreed that this was a good start, but that we needed a bit more work. And so started the sprint for the Qt 4.0 beta a few months later.

The initial version that was released with Qt 4.0 worked quite well in terms of features, but in hindsight the performance was far from what our users demanded from Qt. As a result, we harvested a lot of criticism over the first year of Qt 4.0. Since then, we’ve done a lot, and I mean a LOT, and my gut feeling is that it is the engine that performs the best for average Qt usage, so I think we made a good choice back then in dropping GDI and GDI+. And, as I outlined in my previous post, we are toying with making raster the default across all desktop systems for the sake of speed and consistency.

Overall structure

The overall structure of the engine is that all drawing is decomposed into horizontal bands with a coverage value, called spans. Many spans will together form the “mask” for a shape and each pixel that is inside the mask is filled using a span function.

antialiasing

The image highlights one scanline of a polygon which is filled with a linear gradient. There are 4 spans, one which fades in the opacity of the polygon and two which fade out the opacity of the gradient. For each pixel in the polygon, the gradient function is called and we write the pixel to the destination, possibly alpha blending it, if the coverage value is other than full opacity or if the pixel we got from the gradient function contains alpha.

Clipping also use the same mechanism. The span function for clipping takes the incoming spans, intersects them with the set of spans that defines the clip and calls the actual filling span function.

clipspans

All operations followed this pattern. When a drawRect call comes in, we generate a list of spans for each scan line and set up a span function according to the current brush. A pixmap is similar, we create a list of spans and use a pixmap span function. A polygon is passed to a scanconverter which produces a span list, etc. We have two scan converters, one for antialiased and one for aliased drawing. The antialiased one is pretty much a fork of FreeType’s grayraster.c, with some minor tweaks, I think we needed to add support odd-even fills, for instance. Text is also converted into spans.

Lines, Polylines and Path Strokes

These primitives are passed to a separate processor called a stroker. The stroker creates a new path that visually matches the fillable shape that the outline represents. There is a public API for this too, in QPainterPathStroker. This fillable shape is then passed to one of the scan converters which in turn scan converts the shape into spans. For dashed outlines, the same process happens, and the resulting fillable shape is a path with a potentially very large amount of subpaths. Naturally, such a sub-path is costly to scan convert, which is part of the reason why we explicitly do not put dashed lines on the list of high-performance features. In fact, in many cases, line dashing is one of the slowest operations available in the raster engine, so use it with extreme caution.

A hacky alternative which performs much better, is to set a 2×2 black/white or black/transparent pixmap brush and draw the stroke using a pen with brush. A bit more to set up, but if that’s what it takes to get in running fast, then that’s what it takes.

State changes

Any setBrush, setTransform or any other state change on QPainter will result in a different set of span functions being set up. Each brush, or fill-type if you like as pens on this level are essentially just fills too, has a special span function associated with it and we also pass a per brush span data. For solid color fills the span data contains the color, for transformed pixmap drawing it contains the inverse matrix, a source pixel pointer, bytes per line and other required information. For clips it contains the span function to call after you clipped the spans. The thing to notice about state changes is that each time you switch from one brush to another brush or from one transformation to another, these structures do need to be updated. Up to Qt 4.4, this was in many cases a noticeable performance problem, bubbling up to 10-15% in profilers when rendering graphics view scenes, but since 4.5 the impact of this is minimal.

Well, perhaps not minimal compared to drawing a 2 pixel long line, but minimal compared to filling a 64×64 rectangle. The point is that though the raster engine is the engine that probably handles state changes best of all our engines, there are some usecases where it still shows up, and it should still be minimized.

Span functions

The task of the span functions is to generate a pixel and combine it with the destination according to the current state of the painter. Though the raster engine supports rendering to any of our image formats except 8-bit indexed, it will internally do all rendering in ARGB32_Premultiplied. Premultiplied alpha has the benefit that we don’t have to multiply the alpha into the color channels and it saves us a division in the blending. The reason for doing all rendering in one format is that the alternative simply doesn’t scale. Just think of the combination of composition modes multiplied with the number of image formats a source image can have multiplied with what formats the destination can have. To support all combinations we have a generic approach where we for each span do:

  • Get the source pixels, e.g. from a gradient, pixmap, image or solid color, and convert them to ARGB32_Premultiplied.
  • Get the destination pixels and convert them to ARGB32_Premultiplied
  • Blend the source into the destination using current composition mode
  • Convert the result to destination format and write it back.

This may seem like a lot of work, so luckily the story doesn’t end there.

Special casing and Optimizations

As I outlined in the QPainter documentation patch that I added recently, which was the start of this blog series, its all about defining which scenarios we want to be fast and which scenarios we just need working. Over the years since the initial release of the raster engine in the summer of 2005, we’ve added tons of of special cases to support what we experience as the functions that are called the most and which have the most impact.

  • First of all, if you look at the things we do for each span above, you see that we convert into ARGB32_Premultiplied. Solid colors are easy to represent, gradients are generated in this format directly, so conversion only happens for images and pixmaps. If the image is ARGB32_Premultiplied, then no conversion is needed, and we just use the scanline pointer directly, without any copying. Our RGB32 format is specified to be 0xffRRGGBB, with the alpha set to 0xff. This means it is pixel-wise compatible with ARGB32_Premultiplied, which again means that it can also be used directly. If the source is ARGB32, you’ll get a memcpy for each scanline where the ARGB32 data is copied into a temporary buffer and converted to ARGB32_Premultiplied. What can you read from that: Do not draw ARGB32 images into the raster engine. Secondly, don’t open a painter on an ARGB32 image, as that implies the exact same, but when reading and writing the destination pixels. Now you know why QPixmap’s prefer to be in these formats too..
  • Source composition modes are special cased for most operations. For instance, we don’t read the destination for source operations because we know there is no blending involved, unless the spans have partial coverage that is. This means that Source is effectively just a memory write.
  • SourceOver is usually special cased to be either inlined and merged with the coverage opacity so it is also usually faster than the other composition modes. As for the other optimizations down below, these only hold for Source and SourceOver, so if you want best performance, make sure that this is what you are using. SourceOver is the default in QPainter, by the way.
  • For gradients and pixmaps, we need to create an array of source data. For solid colors, its just a single pixel, so this is faster. Source color also benefits from that you only have to traverse memory for the destination, where you write to, so the cache misses are significantly reduced.
  • Rectangle fills are very common, both through QPainter::fillRect and through QPainter::drawRect. In 4.4 both of these implied a state change. Actually, fillRect implied two state changes because it set the brush to what was passed to fillRect and then set it back to what the painter state was. In 4.5, as part of this Falcon project, we introduced a new internal QPaintEngine subclass which supports a state-less fillRect with a color. This matches how applications normally use the painter anyway.
  • In addition to being stateless, the fillRect function is special cased for a number of use-cases. For instance, for RGB16, we write two pixels at a time, for Intel machines there is an SSE/MMX optimzied version. The special cased fillRect also has the benefit that it doesn’t require spans, its just a tight 2D for loop, which also saves us quite a bit of work, at least if the spans are short.
  • Duffs Device. I cannot take credit for its addition, but it’s used in a lot of different places in the raster engine today. Its all about loop-unrolling. If you’re not familiar with it yet, read up on it. Its a beautiful abuse of the C++ language to make things potentially faster.
  • Rectangular clipping is also special cased, at least as long as there is no transformation set on the painter. Translate is of course special cased, but scaling and rotating disables this optimization. The benefit we get from doing rectangular clipping is that finding the spans to fill is done on the QRect level, rather than on the pr span level, which makes it significantly faster.
  • So if you have Source of SourceOver, a non-perspective, non-smooth transform and the clip is a rectangular clip, you also get the benefit of our pixmap blend functions. These were added in Qt 4.5 and is the reason why pixmap drawing is quite a bit faster now than in the earlier versions. In Qt 4.5, we had blend functions for scale and translate only, and in Qt 4.6 we added rotations to the list as well. Again, we focus on a selected subset of formats, matching what QPixmap will be using, we only have these for:
    • ARGB32_Premultiplied on ARGB32_Premultiplied
    • ARGB32_Premultiplied on RGB32
    • ARGB32_Premultiplied on RGB16
    • ARGB8565_Premultiplied on RGB16
    • RGB32 on RGB32
    • RGB16 on RGB16

    I think that was all of them.

  • The outlines are processed via the stroker in the general case. However, there are again a number of special cases where we drop to doing a midpoint-algorithm instead. Lines, polylines and paths that only contain line segments will be rendered using the fast midpoint approach as long as the pen width is equal to or less than 1. We also support dashing line segments for 1 pixel wide lines using this method. For any pen width greater than 1, curved paths or antialiasing, we drop to the stroker approach which works, but is far less optimal. Actually, I think there is a special-case for antialiased dashed lines too, as long as they are thin.
  • When antialiasing is enabled, we often need to fall back to the stroker for outlines which is quite a bit slower than the plain case. In addition to that there are a lot of more spans generated for antialiased content, due to the fade-in, fade-out effect on the edge of the primitive, so expect antialiasing to be a significant cost.
  • Text drawing is since 4.5 highly optimized for most engines, to the point where the major bottleneck these days are in doing the actual text layout on the string. We’re working on an API to cache this, so text drawing can be made truly fast, but based on the current API, its as good as it gets. However, if the transformation is a rotate/scale, then we fall back to path drawing. Only the windows version of the raster engine supports drawing glyphs at rotated angles using the fast paths, so beware of that.
  • A lot of details, but it gives an idea of what to consider when you write code for this engine. If all you are drawing is 1024×1024 pixmaps, then none of these things matter because all the time is anyway spent in the span function that does pixmap blending, but the second you have more content, several lines, several polygons, which are smaller in size, then these things are critical to achieve good performance.

    The overall performance of the engine, when used according to how it’s outlined above, can be thought of as:

    Overhead + O(pixelsTouched * memoryAndBusCapacity)

    There is nothing scientific about that formula, but when you’re hitting the optimal path, all time should be spent in one of the many for loops inside qdrawhelper_xxx.cpp or even better qblendfunctions.cpp. These loops will spend all their time on per pixel processing. If these functions could be made faster by doing the algorithms slightly differently, then great, but if you see in your profiling that all time is spent in for instance qt_blend_argb32_on_argb32, then that means you told us to blend alpha pixmaps together and we’re doing that as fast as we can and you have zero loss between your app and actual processing. If all time is spent processing pixels, then that is a good thing. The overhead here is the time spent in state changes, function call overhead, and similar.

    Some numbers

    I got some feedback on one of the previous blogs that a few bar charts would be nice, so I’ll post some numbers on what kind of throughput is possible with the raster paint engine. I’ve timed it on both my Windows desktop machine and on my N900 to get a comparison. The operations range from several million pr second to only a few hundred so the scale is logarithmic, keep that in mind as you look at them.

    Raster Results

    As you can see, the fill-rate is more or less tied to the number of pixels involved. For some operations it takes a little bit longer to do something, like drawPixmap with scaling is somewhat slower than drawPixmap without, but you see that the rough formula I gave above holds quite often. Double the size of the primitive in each direction and you have one quarter the performance. It was also not my intention to trick you with using different numbers for drawPixmap, its just how the test was set up.

    If you compare the three 4×4 rectangle drawing versions, you see that they differ when the rectangles are small. drawRect without brush change is fastest at around 7.4Mops/sec, followed by fillRect at ~6.1Mops/sec and then drawRect with brush change at 1.8Mops/sec. At 128×128 there is just a little difference between the two, which is what I was getting at with the state changes above. It is possible to do them and if you’re drawing semi-large areas, it doesn’t matter, but if you’re plotting pixels, doing loads of small lines here and there or particle effects with 8×8 pixmaps, then you want to do that in a tight loop with nothing else happening.

    You can also see that the speed of non-smooth scaling is holding its own vs non-scaled pixmap drawing.

    Finally, if you compare the N900 to the desktop Windows machine you see that despite windows only having a 4 times faster processor the speed is often around 10 times worse. Why? Because the CPU isn’t the only limitation, bus/memory capacity is also a limiting factor, and it’s to be honest not a fair comparison…

    I hope you enjoyed this post and more will come in 2010.

    gunnar
    Painting
    Graphics Dojo
    OpenGL
    Performance
    Posted by gunnar
     in Painting, Graphics Dojo, OpenGL, Performance
     on Wednesday, December 16, 2009 @ 06:54

    For this blog series that I’m doing, I figure its nice to start with an overview of the whole painter, pixmaps, widgets, graphicsview, backingstore idea.

    At the centre of all Qt graphics is the QPainter class. It can render to surfaces, through the QPaintDevice class. Examples of paint devices are QImage’s, QPixmaps and QWidgets. The way it works is that for a given QPaintDevice implementation we return a custom paint engine which supports rendering to that surface. This is all part of our documentation so perhaps not too interesting. Lets look at this in more detail.

    QWidgets and QWindowSurface

    Even though QWidget is a QPaintDevice subclass, one will never render directly into a QWidget’s surface. Instead, during the paintEvent, the painting is redirected to an offscreen surface which is represented by the internal class QWindowSurface. This was traditionally implemented using the QPainter::setRedirected(), but has since been replaced by an internal mechanism between QPainter and QWidget which is slightly more optimal.

    Some times we refer to this surface as “the backingstore”, but it really is just a 2D surface. If you ever looked through the Qt source code and found a class QWidgetBackingStore, this class is responsible for figuring out which parts of the window surface needs to be updated prior to showing it to screen, so its really a repaint manager. When the concept of backingstore was introduced in Qt 4.1, the two classes were the same, but the introduction of more varying ways to get content to screen made us split it in two.

    In the old days widgets were rendered “on screen”. Though the option to paint on screen is still available, it is not recommended to use it. I believe the only system that remotely supports it is X11, but it is more or less untested and thus often cause artifacts in the more complex styles. Setting the flag Qt::WA_PaintOnScreen means that the repaint manager inside Qt ignores that widget when repainting the windowsurface and instead sends a special paintEvent to that widget only. Prior to Qt 4.5 there was a significant speed gain to be had when 10-100 widgets updated at max fps, but in Qt 4.5 the repaint manager was optimized to handle this better so, on screen painting is usually worse than buffered.

    Back to the window surface. All widgets are composited into the window surface top to bottom and the top-level widget will fill the surface with its background or with transparent if the Qt::WA_TranslucentBackground attribute is set. All other widgets are considered transparent. A label only draws a bit of text, but doesn’t touch anything else. What that means for the repaint manager, is that every widget that overlaps with the label, but stacks behind it, needs to be drawn before it. If the application knows that a certain widget is opaque and will draw every single pixel for every paint event, then one should set the Qt::WA_OpaquePaintEvent, which causes the repaint manager to exclude the widgets region when painting the widgets behind it.

    Since all widgets are repainted into the same surface, we need to make sure that widgets don’t accidentally paint outside their own boundaries and into other widgets. Since there is no guarantee that widgets will paint inside their bounds, this could potentially lead to painting artifacts, so we set up a clip behind QPainter’s back called the “system clip”. For most widgets the system clip is a rectangle and looking at the performance section of the QPainter docs, we see that that is not so bad. Rectangular clips, when pixel aligned, are fast. A masked widget, on the other hand, is a performance disaster. It is slower to set up and slower to render. The system clip is the same clip that is passed to the paint event, except that the clip in the paint event has been translated to be relative to the top-left of the widget, rather than to the top-left of the surface. Do NOT set the paint event’s region as a clip on the painter. It is already set up, and we don’t detect that it is the exact same region and just process it fully again. The purpose of the region/rect in the paint event is so that widgets can decide to not draw certain parts. This is primarily useful when you have big scenes in the widgets, such as a map application, graphics view or similar.

    In addition to the system clip which is set up prior to calling paintEvent, the painter also needs to be in a clean state, which means setting up brushes, pens, fonts and others. Its not a huge amount, but if you have many widgets it adds up. So, though widgets are no longer native window handles (aka Alien), there is still a price tag involved in repainting them. Be aware of that when you design your application. For instance, implementing a photo gallery using QLabel’s with pixmaps in a QScrollArea doesn’t scale. You would have to set up clipping and all the other states per label, even though the label only draws a pixmap. A single “view” widget would scale much better, because the widget can then implement a tight loop that draws pixmaps in the right places.
    This whole backingstore and window surface logic only hold for Mac OS X when raster or opengl graphics systems are used. Personally I would strongly recommend to use raster, it implements the full feature set, it is often faster, has the same performance profile as Qt on Windows and painting bugs are prioritized higher for raster than for the CoreGraphics backend. In qt/main we plan to switch the default for Mac OS X to raster, we just have to iron out some window system integration issues.

    Graphics systems

    The concept of a graphics system was introduced in Qt 4.5. The idea is to be able to select at startup time, on an application level, what kind of graphics stack you should be using. The graphics system is responsible for creating the pixmap backends and the window surface. We currently have graphics systems for raster, OpenGL 1.x, OpenGL/ES 2.0, OpenVG and X11. You can select graphics systems either by starting the application with the command line option -graphicssystem raster|opengl|opengl1|x11|native, where “native” means to use the system default. Another option is to provide the exact same option to configure which will set that option for all applications using Qt. Finally there is the function QApplication::setGraphicsSystem which hardcodes the graphics system for a given application.

    In later blogs, we plan to go into each of the paint engines in more detail, but for now, lets just look at the highlights.

    Raster

    The raster graphics system is the reference implementation of QPainter. It implements all the features we specify and does it all in software. When a new port is started, such as with S60, we usually start with getting raster running. It is currently the default on Windows, Embedded, S60 and will also be on Mac OS X.

    Just a though. What do you think of raster on X11? If you ignore for a second that you currently get a local process local font cache. It performs quite nice on X11 and I’ve seen many people switch it at runtime. If we consider remote displays, this seems daunting, but it still may not be too bad. The way it works in the X11 paint engine today is that any gradient and pixmap transform is anyway done in software and uploaded as an image on a per painter-command level. Why not just do it all client side and upload only the parts that needs updating. We can watch HD videos (for some definition of HD, anyway) on youtube, certainly we can afford to upload a few pixels. This is bound to generate comments on XRender and server-side gradients and transforms, but these have been tried numerous times and the performance is simply not good enough.

    The window system integreation is handcoded for each platform to make the most out of it. For windows the windowsurface is a QImage which shares bits with a DIBSECTION, which results in pretty good blitting speed. On X11 we use MIT Shared Memory Images. We used to use Shared Memory Pixmaps, but this is removed from Xorg, but we got this awesome patch from the community, so we’re back up and running. On Mac OS X, we’re experimenting with using GL texture streaming for getting the backbuffer to screen and we’re seeing some promising numbers with that, so I hope that will make into Qt for 4.7 too.

    Because it is just an array of bytes, most native API’s have the ability to render into the same buffer we do. This makes integration with native theming quite straightforward, which is one of the reasons why this is attractive as a default desktop graphics system, despite not being hardware accelerated.

    OpenGL

    We have two OpenGL based graphics systems in Qt. One for OpenGL 1.x, which is primarily implemented using the fixed functionality pipeline in combination with a few ARB fragment programs. It was written for desktops back in the Qt 4.0 days (2004-2005) and has grown quite a bit since. You can enable it by writing -graphicssystem opengl1 on the command line. It is currently in life-support mode, which means that we will fix critical things like crashes, but otherwise leave it be. It is not a focus for performance from our side, though it does perform quite nicely for many scenarios.

    Our primary focus is the OpenGL/ES 2.0 graphics system, which is written to run on modern graphics hardware. It does not use a fixed functionality pipeline, only vertex shaders and fragment shaders. Since Qt 4.6, this is the default paint engine used for QGLWidget. Only when the required feature set is not available will we fall back to using the 1.x engine instead. When we refer to our OpenGL paint engine, its the 2.0 engine we’re talking about.

    We’ve wanted to have GL as a default graphics system on all our desktop systems for a while, but there are two major problems with it. Aliased drawing is a pain, it is close to impossible to guarantee that a line goes where you want it for certain drivers. Integration with native theming is a pain. It is rarely possible to pass a GL context to a theming function and tell it draw itself, hence we need to use temporary pixmaps for style elements. On Mac OS X, there is a function to get a CGContext from a GL context, but we’ve so far not managed to get any sensible results out of it. On the other hand, much of the UI content doesn’t depend on these features, which makes GL optimal for typical scene rendering, such as the viewport of a QGraphicsView or a photo gallery view. So as far as how the default setup in Qt will look in the future, we’re considering that the best default setup for desktop may be a combination of raster for the natively themed widgets and GL for one or two high-performance widgets. Nothing is decided on this topic though, we’re just looking at alternatives.

    Another problem with using GL by default is font sharing. With raster we could theoretically share pre-rendered glyphs between processes in a cross platform manner using shared memory, with GL this becomes a bit more difficult. On X11, there is an extension to bind textures as XPixmaps which can be shared across processes, but this will usually force the textures into a less optimal format which makes them somewhat slower to draw, so it is still not optimal. On Windows, Mac OS X, S60 or QWS, we would need driver-level support for sharing texture ids, which we currently don’t have.

    OpenVG

    I actually quite blank in this area. I’ve not been involved with writing it nor getting it up and running. It sits on top of EGL which makes it quite similar to the OpenGL graphics systems. We expect that OpenVG will be used in a number of mid-range embedded devices.

    The cool thing about OpenVG is that it matches the QPainter API quite nicely. It supports paths, pens, brushes, gradients and composition modes, so in theory, the vectorial APIs should run optimally.

    Rhys, which wrote the OpenVG paint engine, plans to do a post on the OpenVG paint engines internals in full in the near future.

    Images and Pixmaps

    The difference between these two is mostly covered in the documentation, but I would like to highlight a few things none the less.

    Our documentation says: “QImage is designed and optimized for I/O, and for direct pixel access and manipulation, while QPixmap is designed and optimized for showing images on screen.”

    Raster

    When using the raster graphics system, pixmaps are implemented as a QImage, with a potentially significant difference. When converting a QImage to a QPixmap, we do a few things.

    The image is converted to a pixel format that is fast to render to the backbuffer, meaning ARGB32_Premultiplied, RGB32, ARGB8565_Premultiplied or RGB16. When images are loaded from disk using the PNG plugin or when they are generated in software by the application, the format is often ARGB32 (non-premultiplied) as this is an easy format to work on, pixel-wise. I’ve measured ARGB32_Premultiplied onto RGB32 to be about 2-4x faster than drawing an ARGB32 non-premultiplied depending on the usecase.

    Secondly, we check the pixel data for transparent pixels and convert it to an opaque format if none are found. This means that if a “.png” file is loaded as ARGB32 from disk, but only contains opaque pixels, it will be rendered as an RGB32, which is also about 2-4x faster.

    OpenGL

    When using the OpenGL graphics system the actual implementation of the QPixmap varies a bit from setup to setup. The most ideal option gets enabled when your GL implementation supports Frame Buffer Objects (FBOs) in combination with the GL_EXT_framebuffer_blit extension. In this case, the pixmap is represented as a OpenGL texture id, and whenever a QPainter is opened on the pixmap we grab an FBO from an internal pool and use the FBO to render into the texture.

    Without these extensions available, which is typically the case for OpenGL/ES 2.0 devices, the implementation is a QImage (in optimal layout, same as raster) which is backed by a texture id. When you open a QPainter on the pixmap, you render into the QImage and when the pixmap is drawn to the screen, the texture id is used. Internally there is a syncing process between the two representations, so there will be a one-time hit of re-uploading the texture after drawing into it.

    In general

    If you intend to draw the same QImage twice, always convert it to a QPixmap.

    There are some usecases where QPixmap is potentially worse though. We have these functions, QPixmap::scaled(), QPixmap::tranformed() and friends, which historically are there because we wanted QImage and QPixmap to have similar API’s. We have support for reimplementing this functionality on a per pixmap-backend basis, but currently no engine does this, so for the GL case, or X11 for that matter, calling QPixmap::transformed() implies a conversion from QPixmap into QImage, a software conversion, and the a conversion back to the original format.

    By default a QPixmap is treated as opaque. When doing QPixmap::fill(Qt::transparent), it will be made into a pixmap with alpha channel which is slower to draw. If the pixmap is going to end up as opaque, initialize it with QPixmap::fill(Qt::white). You can even skip the initialization step all together if when you know that all pixels will be written as opaque when the pixmap is painted into.

    Before moving onto something else, I’ll just give a small warning on the functions setAlphaChannel and setMask and the innocently looking alphaChannel() and mask(). These functions are part of the Qt 3 legacy that we didn’t quite manage to clean up when moving to Qt 4. In the past the alpha channel of a pixmap, or its mask, was stored separately from the pixmap data. Depending on which platform you were on, the actual implementation was a bit different. For instance on X11, you had one 1-bit pixmap mask + an 8-bit alpha channel + a 24-bit color buffer. On Windows you had a 1-bit mask + a packed 32-bit ARGB pixel buffer. In Qt 4 we merged all this into one API, so that QPixmap is to be considered a packed datastructure of ARGB pixels. What we did not remove the functions implementing the old API however. In fact, we even added the alpha channel accessors, so we made it worse. The API was to some extent convenient, but all those four functions imply touching all the data and either merging the source with the pixmap or extracting a new pixmap from the current pixmap content. Bottom line. Just don’t call them. With composition modes, you can manipulate the alpha channel of the pixmaps using QPainter. This also has the benefit that it will potentially be SSE optimized for raster or done in hardware on OpenGL, so it has potential for being quite a bit faster. There is also the QGraphicsOpacityEffect which allows you to set a mask widgets and graphics items, but as of today, it is not as fast as we would like it to be.

    QGraphicsView

    I’ll do at least one separate post on graphicsview alone, so I’ll just comment quickly on the difference between using QGraphicsView with items vs QWidget’s. QGraphicsView with its scene populated with items is in many ways very similar to the widgets and their repaint handling. With the addition of layouts and QGraphicsWidgets the line is even more blurry. So which solution should you pick? More and more often, we’re seeing that people choose to create their UI’s in graphics view rather than creating them using traditional widgets.

    Compared to widgets, items in a graphics view are very cheap. If we consider the photo gallery again, then using a separate item for each of the items in the view may (I say may) be reasonable. A widget is repainted through its paintEvent. A QGraphicsItem is repainted through its paint function. The good thing with the items function is that there is no QPainter::begin as the painter is already properly set up for rendering. Another good thing is that the painter has less guaranteed state than the in the widget case. There may be a transformation and some clip, but no guarantees about fonts, pens or brushes. This makes the setup a bit cheaper.

    Another huge improvement over widgets is that items are not clipped by default. They have a bounding rectangle and there is a contract between the subclass implementer and the scene that the item does not paint outside. If we compare this to the system clip we need to set for widgets, then again there is less work to be done for the items. If the item violates this there will be rendering artifacts, but for graphicsview this has proven an acceptable compromise.

    Most UI elements are rather simple. A button, for instance, can be composed of a background image and a short text. In QPainter terms that is one call to drawPixmap and one call to drawText. The less time spent between painter calls the better the performance. The less state changes between painter calls, the better the performance. Looking back at how much happens between these calls for a button, you quickly realize that the traditional widgets are quite heavy. If widgets are going to survive the test of time, then they need to behave more like QGraphicsItem’s.

    Some final words

    I’ve been rambling on for a while, but hopefully there was some useful information in here. You may have noticed that I do not mention printing, PDF or SVG generation, nor do I focus on X11 or CoreGraphics paint engines in great detail. This is because, as outlined in the painter performance docs, we focus our performance efforts in on only a few backends which we consider critical for Qt.

    gunnar
    Painting
    Graphics Dojo
    Performance
    Posted by gunnar
     in Painting, Graphics Dojo, Performance
     on Monday, December 14, 2009 @ 12:19

    On friday I added the following to the QPainter documentation:

    
        section1 Performance
    
        QPainter is a rich framework that allows developers to do a great
        variety of graphical operations, such as gradients, composition
        modes and vector graphics. And QPainter can do this across a
        variety of different hardware and software stacks. Naturally the
        underlying combination of hardware and software has some
        implications for performance, and ensuring that every single
        operation is fast in combination with all the various combinations
        of composition modes, brushes, clipping, transformation, etc, is
        close to an impossible task because of the number of
        permutations. As a compromise we have selected a subset of the
        QPainter API and backends, where performance is guaranteed to be as
        good as we can sensibly get it for the given combination of
        hardware and software.
    
        The backends we focus on as high-performance engines are:
    
        list
    
        o Raster - This backend implements all rendering in pure software
        and is always used to render into QImages. For optimal performance
        only use the format types QImage::Format_ARGB32_Premultiplied,
        QImage::Format_RGB32 or QImage::Format_RGB16. Any other format,
        including QImage::Format_ARGB32, has significantly worse
        performance. This engine is also used by default on Windows and on
        QWS. It can be used as default graphics system on any
        OS/hardware/software combination by passing c {-graphicssystem
        raster} on the command line
    
        o OpenGL 2.0 (ES) - This backend is the primary backend for
        hardware accelerated graphics. It can be run on desktop machines
        and embedded devices supporting the OpenGL 2.0 or OpenGL/ES 2.0
        specification. This includes most graphics chips produced in the
        last couple of years. The engine can be enabled by using QPainter
        onto a QGLWidget or by passing c {-graphicssystem opengl} on the
        command line when the underlying system supports it.
    
        o OpenVG - This backend implements the Khronos standard for 2D
        and Vector Graphics. It is primarily for embedded devices with
        hardware support for OpenVG.  The engine can be enabled by
        passing c {-graphicssystem openvg} on the command line when
        the underlying system supports it.
    
        endlist
    
        These operations are:
    
        list
    
        o Simple transformations, meaning translation and scaling, plus
        0, 90, 180, 270 degree rotations.
    
        o c drawPixmap() in combination with simple transformations and
        opacity with non-smooth transformation mode
        (c QPainter::SmoothPixmapTransform not enabled as a render hint).
    
        o Text drawing with regular font sizes with simple
        transformations with solid colors using no or 8-bit antialiasing.
    
        o Rectangle fills with solid color, two-color linear gradients
        and simple transforms.
    
        o Rectangular clipping with simple transformations and intersect
        clip.
    
        o Composition Modes c QPainter::CompositionMode_Source and
        QPainter::CompositionMode_SourceOver
    
        o Rounded rectangle filling using solid color and two-color
        linear gradients fills.
    
        o 3x3 patched pixmaps, via qDrawBorderPixmap.
    
        endlist
    
        This list gives an indication of which features to safely use in
        an application where performance is critical. For certain setups,
        other operations may be fast too, but before making extensive use
        of them, it is recommended to benchmark and verify them on the
        system where the software will run in the end. There are also
        cases where expensive operations are ok to use, for instance when
        the result is cached in a QPixmap.
    

    I suspect it’s a piece of documentation many of you have been lacking for a while, and its something we should have put in a long time ago, but I can only say “sorry for not doing it sooner”. At least its getting done now. Note: Patch is not visible in public repository at the time of publishing. Should be there shortly

    The urge to get these things into the docs have spun out from a number of dialogues I’ve had recently which all went pretty much like this:

    • TheOtherGirlOrGuy: My application is running slow… What do I do?
    • Me: What is it doing?
    • TheOtherGirlOrGuy: Well, its using QGraphicsView and QPainter and is doing this and that…
    • Me: That doesn’t sound too bad.
    • TheOtherGirlOrGuy: And then its really slow when doing this…
    • Me: Yeah… That doesn’t work very well. What you should be doing is this…
    • TheOtherGirlOrGuy:Is that written down someplace? How am I suppose to know that?
    • Me: Eh…

    To remedy this, I’m going to put into action something I’ve had at the back of my head for a while now, a blog series on Qt Graphics and Performance. Along the way, I’ll also try to get parts of this into the documentation or into examples/demos as best practice use-cases.

    I just have to point out, that this blog series is not a request for more features. It is about us sharing with you what we consider best practises and what our priorities are. Of course if you think our focus is way off, then let us know, but my primary intent with this blog series is to share some thoughts.

    With the help of some of my co-workers, we plan to go through some Qt Graphics fundamentals, the “high-performance” engines, and usecases for graphicsview and widgets. If you have special usecases that you find interesting, then by all means let me know and maybe I can cover those too.

    I need to add a small comment to the “drawText” case. It is currently not super optimal, because we have to do layout on the text for each time you call it. Because there is no “handle” in the function we don’t have the ability to cache the layout either. If we started caching based on a qHash of all the strings that were passed to drawText() then we end up caching a lot of single-shot text drawing… The option that we provide today to work around this is to use a QTextLayout with caching enabled, which is memory-wise quite hungry… I think in the range of 100-300 bytes pr character! So as an alternative, we are working on an API for static text which encapsulates the layout work with very little memory overhead. Its currently called QStaticText and we’re aiming for it to go into 4.7. Once it is in place, we’ll update the drawText comment in the performance documentation to be for these static texts…

    As time permits we plan to push out blogs on the following topics:

    • An overview of the various components involved
    • The raster paint engine in detail
    • The OpenGL paint engine in detail
    • The OpenVG paint engine in detail
    • QGraphicsView optimization flags and cache modes
    Ariya Hidayat
    Graphics Dojo
    S60
    Posted by Ariya Hidayat
     in Graphics Dojo, S60
     on Thursday, October 22, 2009 @ 10:02

    I have shown parallax sliding example long time ago. It was certainly inspired by the increase use of such an effect in the so-called home screen, typically in the mobile platform. Since there has been also an increase interest on Qt for Symbian example programs, I decided to recycle that old example and to turn it into something that fits the form-factor and user experience on the phone. Thus, the demo is (re)born.


    The code is in Graphics Dojo repository, find it in the parallaxhome subdirectory. For a touch device, you can tap on the icons on the bottom bar to switch between different pages (enjoy the subtle bump effect of the icons). With a non-touch device, left and right arrow keys are your friends. The heart of this parallax effect is the difference of speed between the graphics items (those wonderful food photos) and the background. The above screenshot demonstrates that exactly: on the right side, although now the page (with the weather icon) has been shifted from the center one (with the home icon), you can see that these two pages mostly share the same background portions.

    Exercise for the reader: use the panning trick to shift between one page to another. If you feel brave, add some kinetic effect, too.

    As a last note, this will be my last Graphics Dojo example. For future Qt-related code examples, please check my personal blog on a regular basis (e.g. just track the posts tagged with qt).

    May the training spirits be with you! Namárië.

    Ariya Hidayat
    Graphics Dojo
    S60
    Posted by Ariya Hidayat
     in Graphics Dojo, S60
     on Wednesday, October 07, 2009 @ 21:32

    By popular demand, I have refactored the magnifying glass trick previously featured in Google Maps and OpenStreetMap demos into its own, simpler example. This time we just zoom in an image so it’s pretty straightforward and easier to digest. The proof is the following screenshot.


    Of course, it also runs on Qt for Symbian.

    The code is available from the Graphics Dojo repository, find it in the imgzoom subdirectory. It weighs at less than 250 lines of code.

    On a related note, if you compile Qt 4.6 branch for Symbian, you will find out that some of my demos previously shown in this Qt Labs, among others ray casting (now should also work on touch devices), maps, flight tracking, weather info, kinetic scrolling, flipping clock, have "graduated" to become real examples, even can be accessed from the infamous Fluid Launcher. What does it mean? If hitherto you haven’t tried them yet because you don’t bother to mess up your system with all the build procedures, just grab the (daily) Qt 4.6 for Symbian binary package and now you can enjoy the examples. No more excuse :)

    Ariya Hidayat
    WebKit
    Graphics Dojo
    Posted by Ariya Hidayat
     in WebKit, Graphics Dojo
     on Tuesday, August 25, 2009 @ 15:00

    Influenced by Holger and Simon, I decided to use S5 for my presentation at the Desktop Summit in Gran Canaria few weeks ago. It’s purely based on web technologies, it works in modern web browsers. More information is available at the official website, including demos, even with different styles and effects. Thanks to the use of web technologies, you can even create the slides online.


    For the fun of it, instead of just using a web browser (so boring!), I wrote a simple QtWebKit-based tool to run the slide shows, dubbed s5runner. The code is checked in already to the Graphics Dojo repository. Beside just launching the slide shows in a QWebView, the 200-lines C++ code adds a few more goodies (although arguably, all these extra stuff can be implemented in pure HTML/CSS/JavaScript instead). Run the program and open the included slides.html (in the example sub-directory), or just enjoy the following 2-minute screencast:

    A countdown timer (currently hard-coded to 30 minutes) is installed at the bottom right corner. The screen can be blanked temporarily to black or white, useful when you want to steal the focus of the audience from the slides. Going full-screen (and back again) is also easy, this is important if you’d like to show some live demos during the talk. The slides look ugly due to the aging or faulty projector? Use the night-mode, something you have seen in the previous OpenStreetMap example.

    When doing a talk about programming, it is often unavoidable to show code snippet. Thanks to Chili, the jQuery code highlighter plugin (there are other alternatives to pick:
    prettify, syntaxhighlighter, and many others), you will get the highlighting feature with zero effort. It’s quite useful as the code fragment (which you likely show only for a few seconds) becomes more understandable. My favorite is however the live-editing feature, just press F3 to start editing the slide while you are showing it at the same time.

    If you like this presentation tool, feel free to extend it. For example, you can have more presentation effects, like pulsating or shaking, by using script.aculo.us-based Presentacular. This example tool only supports basic editing, but I have shown a WYSIWYG HTML Editor before, so you can augment its editing features to support e.g. inserting images from the disk, changing character and paragraph styles, and so on. And of course, support for PDF export (with one slide per page) will be just very nice. A PowerPoint-killer, anyone?



    © 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
    All other trademarks are property of their respective owners.