Eskil Abrahamsen Blomfeldt
Qt
Painting
OpenGL
Performance
C++
 in Qt, Painting, OpenGL, Performance, C++
 on Monday, March 01, 2010 @ 18:36

Albert Einstein has been quoted as saying that “insanity is doing the same thing over and over again and expecting a different result.” Apparently this is a misquote, and the original quote actually belongs to Rita Mae Brown, but that’s not important right now. What’s important is that most Qt applications are crazy.

Background
I’ll explain. Some readers may remember Gunnar’s excellent blog series about graphics performance, how to get the most of it in Qt. He mentioned the fact a few times, that text rendering in Qt is slower than we’d like.

To see why text rendering is so slow, we need to look at what happens when you pass a QString into QPainter::drawText() and ask it to display it on screen. A QString is just an array of integer values which are defined to signify specific symbols in specific writing systems. How these symbols should actually look on the screen is defined by the font you have selected on your painter.

So the first step of drawText() is to take the code points and turn them into index values which reference an internal table in the font. The indices are specific to each font, and have no meaning outside the context of the current font.

The second step of drawText() is to collect data from the font which describe how the glyph should be positioned in relation to the surrounding glyphs. This step, the positioning of each glyph is potentially very complex. Several different tables in the font file need to be consulted, with programs and instructions that e.g. do things like kerning (allowing parts of certain glyphs to “hang over” og “stretch underneath” other glyphs) and placing one or more diacritical marks on the same character. Some writing systems also allow complex reordering of glyphs based on context of the surrounding characters, as explained by Simon in his blog from 2007. This complex shaping of the text is currently handled by the Harfbuzz library in Qt.

The third step applies only if the text has a layout applied to it. The layout would be the part which breaks text into nicely formatted lines. In Qt, this could be based on HTML code, using QTextDocument or WebKit, or it could be a simpler layout, just making the text wrap and align within a bounding rectangle. The former isn’t supported by QPainter::drawText(), so I’ll focus on the latter. Using information from the shaping step, the text layout calculates the width of unbreakable portions of the text and tries to format the text in a way which looks nice on screen but which does not expand beyond the bounds set by the user.

In the fourth and final step, the paint engine takes over. Its job is to draw the symbols retrieved in the first step at the positions calculated in the second and third step. In most of Qt’s performance-sensitive paint engines, this is done by caching a pixmap representation of the glyph the first time it is drawn, and then just redrawing this pixmap for every call. This is potentially very quick.

While these four steps may be slightly intertwined in Qt today, this is in principle what happens every single time you call drawText() and pass in a QString and a bounding QRect. Yet, in very many cases, both the text, the font and the rectangle remains completely static for the duration of your application, or at least for the main bulks of it. And this is the insane part: a lot of time is wasted here. Qt already provides QTextLayout as a way to cache the results of the first three steps and pushing this directly into the paint engine. However, QTextLayout is somewhat complicated to use, it has overheads related to its other use cases, and it stores a lot more information than what is needed specifically for putting the symbols on the screen, making it unsatisfactory in very memory sensitive settings.

QStaticText!
We decided there was a need for a specialized class to solve this problem. We named it QStaticText, and it will be available in Qt 4.7. QStaticText has been optimized specifically for the use case of redrawing text which does not change from one paint event to another. We’ve tried to keep the memory footprint to a minimum, and currently it has an overhead of approximately 14 bytes per glyph (including the 2 bytes per unicode character in the string, which would assumably already be part of the application), as well as about 200 bytes of constant overhead.

In the rest of this blog, I’ll show some graphs to illustrate the benefits of using QStaticText for drawing text. QStaticText is supported by the raster engine (the software renderer used as default on Windows), the opengl engine and the openvg engine. For now, I’ll focus the attention of this blog on the raster engine and the opengl engine. I’ll also focus on the following platforms: Windows/desktop, Linux/desktop and the N900 (also running Linux, of course.) Note that the hardware on the Windows and Linux machines is different, so the results will not be comparable from platform to platform.

Benchmarks for fifty character, single-line text
The benchmark I’m running is this: drawing the same 50 character string over and over again in each paint event and measuring how many “glyphs per second” we can achieve using different techniques to draw the text. I am testing the following text drawing mechanisms:

  • A call to QPainter::drawText() with no bounding rectangle.
  • A call to QPainter::drawStaticText() with no bounding rectangle.
  • Caching the entire string in a pixmap before-hand and drawing this in each paint event using QPainter::drawPixmap().
  • When testing on the OpenGL paint engine, the graph will also contain results for QStaticText with the performance hint QStaticText::AggressiveCaching. This is a hint to the paint engine that it is allowed to cache its own data, trading some memory for speed. It is currently used by the OpenGL engine to cache the vertex and texture coordinate arrays that are passed to the GPU when drawing the glyphs.

    On Windows
    Lets start off with the results for the raster engine on Windows. As I said, the measurement is in “glyphs per second”, i.e. the number of symbols we can put to the screen during a second of running the test. The measurement is based on the frame rate of the test, which is taken as the average of nine seconds of execution per test case. Note that cleartype rendering was turned off in the OS during the test. The difference between a drawPixmap() result and a drawStaticText() result would be larger with cleartype turned on, but cleartype is not generally supported when caching the text in a pixmap, since the pixmap will inevitably need to have a transparent background, and you can’t do subpixel antialiasing on top of a transparent background. Therefore all the benchmarks are run without subpixel antialiasing to get a better comparison.

    windows_raster1.png

    As you can see, the fastest way to draw text is to cache it in a pixmap and draw this, as pixmap drawing is extremely fast on modern hardware. However, in many circumstances you don’t have the memory to spare for this kind of extravagance, and drawStaticText() pushes over half as many glyphs per second as the equivalent drawPixmap() call. It is also three times faster than a regular drawText() call.

    Using the OpenGL paint engine instead, performance of drawPixmap() shoots through the roof:

    windows_opengl1.png

    The other bars look small in comparison, but drawStaticText() using the aggressive caching performance hint in fact pushes out 5,6 million glyphs per second in this benchmark, while a regular drawText() call manages a measly fifth of that.

    On Linux
    Similar numbers occur on Linux:

    linux_raster.png

    Using drawStaticText() gives you more than a 2x performance boost over using drawText(), and drawPixmap() is a little bit less than 1,5 times the speed of drawStaticText(). When using the OpenGL engine, the difference is smaller:

    linux_opengl.png

    As you can see, drawing a cached pixmap on Linux desktop is only slightly faster than drawing the static text item when aggressive caching is used. The hardware and the driver both play a part here, but at the very least we can see that both outperform drawText() by seven or eight times.

    On N900
    All the benchmarks so far have been on the desktop, where memory is cheap. Caching a few text items as pixmaps may not be the proverbial drop on those platforms, and as we have seen, using pixmap caching has the potential of being really fast. On an embedded device, however, we need to be a little bit more careful when we allocate big chunks of memory, so something like QStaticText, which is both lean and fast, can be a great tool on these platforms. So lets look at a few benchmarks for the N900 as well.

    For the raster engine on the N900, the drawText() baseline performance on the N900 is currently nothing less of horrible, as you can see from the following chart:

    n900_raster.png

    This is of course a puzzle which will be investigated closer, as there’s no reason why it should be this much slower to call drawText(), but for now we recommend using the native engine or a QGLWidget viewport on this device. At least it makes the other bars look really large in comparison. A more interesting result is that drawStaticText() can push as much as two thirds the number of glyphs per second as when just drawing a single pixmap that covers the same area, so we have a pretty good ratio of performance on this device.

    As we see from the following chart, similar numbers can be achieved when using the OpenGL engine:

    n900_opengl.png

    Conclusion
    The benchmark results displayed here so far are for a single-line piece of text, thus there is no need for the third step in the overview from earlier, where the text is formatted based on a layout. This has some implications, namely that the drawText() call can skip the third step as outlined in the beginning of the blog, as it does not need to do any high level text layout. On text which requires this in addition, performance will be even worse with drawText(), but approximately the same with drawStaticText() and drawPixmap(), since the layout step has already been done in advance. Another thing to note is that the text is fairly long and fairly dense. For shorter texts, and/or text which has more space (such as a multi-line string might have), the performance of drawStaticText() may very well be greater than that of drawing a pixmap, since the number of pixels touched becomes a greater factor in the equation.

    An interesting measurement which is not included here, is the CPU load of the different functions also. We don’t have any formal benchmarks for that at the moment, but since less time is spent on CPU intensive work when using drawStaticText() over drawText(), the CPU will have more free time to do other stuff, which is a good thing. And another pleasant discovery we made while benchmarking QStaticText on the N900, is that you have to increase the number of draw-calls made per frame to a pretty high number for it to visibly factor into the time spent in the paint event. This means that even with, say, fifty strings, the drawStaticText() calls should not be any considerable impact on the performance of the application. Swapping the front and back buffers will still be the main bottle neck, which is a suitable ideal.

    So the bottom line is: If you are using drawText() in your application to draw text that is never or very rarely updated, then you might consider using QStaticText instead when you start building against Qt 4.7, and we’d love to hear what you think about the API and the performance once you get a chance to try it out.

    Donald Carr
    OpenGL
    Embedded
    Build system
    Posted by Donald Carr
     in OpenGL, Embedded, Build system
     on Friday, February 05, 2010 @ 10:49

    Requirements

    1) Tegra 2 platform
    2) The latest Nvidia Tegra2 SDK (11.0074_devlite_eula_Beta-RC.zip at this time)

    Board bring up

    Nvidia have done a pretty good job in documenting bring up and I will not
    paraphrase them. I personally used their dev environment exactly as
    intended (answering every question posed during installation with an
    affirmative), so my dev machine became a DHCP/NFS server serving out on a
    secondary network interface. Be sure to select an X enabled SDK during installation as we
    currently don’t work out of the box with their OpenKode drivers.

    Their supplied documentation (in chm format) is thorough and documents
    flashing the latest bootloader to the device (amongst other things) which I would
    recommended on every update, to ensure targetfs/bootloader compatibility.

    Initial rootfs adjustment/configuration

    Once you have installed both packages included in
    11.0074_devlite_eula_Beta-RC.zip you should have 2 dirs:

    emPower-devlite-p1138
    toolchains

    Go into:

    ./emPower-devlite-p1138

    cp -r include/* targetfs/usr/include
    cp -r lib-target/* targetfs/usr/lib

    your targetfs is now primed for GL compilation. You will need to boot your
    target prior to proceeding though, as on first boot a host of packages
    are installed into the targetfs including all the X11 headers required to
    compile Qt/X11.

    Additional headers (dbus, glib, freetype, gstreamer) might be
    required for a full featured Qt build, but the packages installed on first
    boot will suffice for a OpenGL ES 2 enabled Qt/X11 build with all of Qt’s
    core functionality.

    Go to town with apt-get according to your needs
    (apt-get build-dep libqt4-gui unfortunately fails due to unmet
    dependencies) and please be aware of the fact that the chm documentation
    covers forwarding net traffic between your Tegra2 target and the external
    world via your host machine.

    Once this is done, we are ready to build qt.

    Configuring build environment

    I use Scratchbox 2 for reasons qualified in the appendix.

    1) Change to your targetfs directory (cd $NV_ROOT/emPower-devlite-p1138/targetfs)

    2) Run:

    sb2-init -c /usr/bin/qemu-arm $SB2-TARGETNAME $NV_ROOT/toolchains/tegra2-4.3.2-nv/bin/arm-none-linux-gnueabi-gcc

    within this directory, where:

    $SB2-TARGETNAME is a suitable name for your target’s scratchbox environment
    $NV_ROOT is where you unpacked the archive

    You now have a sane scratchbox 2 env when you can compile Qt/X11. If you
    were to enter the scratchbox env and build Qt now, it would get through
    every module up until QtOpenGL was reached, at which point you would
    witness spaghetti breakage referencing the inclusion of the qdebug/text
    streaming classes.

    This is because

    targetfs/usr/include/EGL/eglplatform.h

    includes several X11 headers:

    Xlib.h
    Xutil.h

    as it explicitly references native X11 types. I initially tried to move
    these headers out of this header file, but the path of least resistance
    ended up being the undefining of conflicting defines at the end of the
    eglplatform.h header file.

    I introduced the following 4 lines:

    #undef None
    #undef Status
    #undef Unsorted
    #undef GrayScale

    immediately prior to the final #endif in this file. I know this is dirty,
    but it circumvents the point of breakage and resolves all remaining build failures.

    Having done this, we are ready to build Qt/X11.

    Building Qt

    1) Enter scratchbox 2 with: sb2 -t $SB2-TARGETNAME
    2) Enter your Qt directory
    3) Configure Qt with (at a minimum):
    ./configure -xplatform linux-g++ -platform linux-host-g++ -opengl es2 -force-pkg-config ..
    (The utilized mkspecs are discussed in the Appendix)
    4) Check the output of configure to verify OpenGL ES2 (and any other
    functionality you wish to build) has been correctly detected and enabled by
    the configure tests.
    5) run “make” (Qt should build through to completion)
    6) “make install” Qt to its prefix path on the host (if necessary)
    7) Use Qt to compile any appropriate Qt applications within the
    scratchbox env
    8 ) Deploy Qt to its prefix path (on the target) with any desired
    applications

    When you boot your Tegra2 platform, start X and launch your
    OpenGL ES2 enabled Qt application (Either GLES2 content directly
    in QGLWidget ala hellogl_es2, a QGLWidget fronted QGraphicsView
    or via explicit use of the OpenGL ES2 graphics system), everything
    should simply work to a greater or lesser extent and work at pace
    at that.

    We have not done any profiling of Qt on the Tegra2
    hardware in order to quantify where we are today, nor any
    dedicated integration in order to be maximize our use of the
    underlying hardware but the baseline performance is very solid.

    Appendix

    mkspec information

    linux-g++

    the generic linux X11 mkspec, which behind the curtains of scratchbox
    is mapped to the (environment creation time) specified cross compiler.

    linux-host-g++

    a modification of the generic linux X11 mkspec, which maps to the host machines compiler. This
    mkspec is already present in the Qt maemo5 branch on our git repository. This
    mkspec is basically a modification of linux-g++, with “host-” prefixed on
    all the compiler variables. (included from ../common/g++.conf) The only
    noteworthy thing about this mkspec is that instead of including
    “../common/g++.conf” and modifying select variables accordingly, the
    complete include is inlined. This is due to a known issue where overriding
    QMAKE_CXX variables in not respected during the qmake boot strapping
    process.

    The linux-host-g++ mkspec looks roughly like this:

    =============================================

    #
    # qmake configuration for linux-g++
    #

    MAKEFILE_GENERATOR = UNIX
    TEMPLATE = app
    CONFIG += qt warn_on release incremental link_prl
    QT += core gui
    QMAKE_INCREMENTAL_STYLE = sublib

    #
    # qmake configuration for common gcc
    #

    #inlined ../common/g++.conf follows

    QMAKE_CC = host-gcc
    QMAKE_CFLAGS += -pipe
    QMAKE_CFLAGS_DEPS += -M
    QMAKE_CFLAGS_WARN_ON += -Wall -W
    QMAKE_CFLAGS_WARN_OFF += -w
    QMAKE_CFLAGS_RELEASE += -O2
    QMAKE_CFLAGS_DEBUG += -g
    QMAKE_CFLAGS_SHLIB += -fPIC
    QMAKE_CFLAGS_STATIC_LIB += -fPIC
    QMAKE_CFLAGS_YACC += -Wno-unused -Wno-parentheses
    QMAKE_CFLAGS_HIDESYMS += -fvisibility=hidden
    QMAKE_CFLAGS_PRECOMPILE += -x c-header -c ${QMAKE_PCH_INPUT} -o ${QMAKE_PCH_OUTPUT}
    QMAKE_CFLAGS_USE_PRECOMPILE += -include ${QMAKE_PCH_OUTPUT_BASE}

    QMAKE_CXX = host-g++
    QMAKE_CXXFLAGS += $$QMAKE_CFLAGS
    QMAKE_CXXFLAGS_DEPS += $$QMAKE_CFLAGS_DEPS
    QMAKE_CXXFLAGS_WARN_ON += $$QMAKE_CFLAGS_WARN_ON
    QMAKE_CXXFLAGS_WARN_OFF += $$QMAKE_CFLAGS_WARN_OFF
    QMAKE_CXXFLAGS_RELEASE += $$QMAKE_CFLAGS_RELEASE
    QMAKE_CXXFLAGS_DEBUG += $$QMAKE_CFLAGS_DEBUG
    QMAKE_CXXFLAGS_SHLIB += $$QMAKE_CFLAGS_SHLIB
    QMAKE_CXXFLAGS_STATIC_LIB += $$QMAKE_CFLAGS_STATIC_LIB
    QMAKE_CXXFLAGS_YACC += $$QMAKE_CFLAGS_YACC
    QMAKE_CXXFLAGS_HIDESYMS += $$QMAKE_CFLAGS_HIDESYMS -fvisibility-inlines-hidden
    QMAKE_CXXFLAGS_PRECOMPILE += -x c++-header -c ${QMAKE_PCH_INPUT} -o ${QMAKE_PCH_OUTPUT}
    QMAKE_CXXFLAGS_USE_PRECOMPILE = $$QMAKE_CFLAGS_USE_PRECOMPILE

    QMAKE_LINK = host-g++
    QMAKE_LINK_SHLIB = host-g++
    QMAKE_LINK_C = host-gcc
    QMAKE_LINK_C_SHLIB = host-gcc
    QMAKE_LFLAGS +=
    QMAKE_LFLAGS_RELEASE += -Wl,-O1
    QMAKE_LFLAGS_DEBUG +=
    QMAKE_LFLAGS_APP +=
    QMAKE_LFLAGS_SHLIB += -shared
    QMAKE_LFLAGS_PLUGIN += $$QMAKE_LFLAGS_SHLIB
    QMAKE_LFLAGS_SONAME += -Wl,-soname,
    QMAKE_LFLAGS_THREAD +=
    QMAKE_LFLAGS_NOUNDEF += -Wl,–no-undefined
    QMAKE_RPATH = -Wl,-rpath,

    QMAKE_PCH_OUTPUT_EXT = .gch

    # -Bsymbolic-functions (ld) support
    QMAKE_LFLAGS_BSYMBOLIC_FUNC = -Wl,-Bsymbolic-functions
    QMAKE_LFLAGS_DYNAMIC_LIST = -Wl,–dynamic-list,

    include(../common/linux.conf)

    =============================================

    Scratchbox

    I personally use Scratchbox 2 rather than Scratchbox 1 since I use a 64 bit
    distro, and Scratchbox 2 exists in the Ubuntu repositories.

    http://packages.ubuntu.com/search?keywords=scratchbox&searchon=names&suite=karmic&section=all

    You might get equally good mileage with Scratchbox 1, or entirely without
    Scratchbox. I personally opt for the path of least resistance, and this blog is a cart
    following that path.

    Arguments in favour of Scratchbox

    1) pkg-config in Ubuntu 9.10 does not support prefixes (sysroot) correctly,
    so you have to directly modify the .pc files or build pkg-config yourself
    2) The Nvidia targetfs/usr/lib entries often have fully qualified symlinks which link
    into your host machines libs without a chroot safety net
    3) There were some toolchain/targetfs anomalies which simply vanished when adopting this build approach

    Known issues

    1)The aforementioned qmake bootstrapping issue preventing QMAKE_CXX being over ridden in linux-host-g++
    2) 16 bit X is the only environment tested and known to work. Qt GLES2 applications currently segfault
    on launch under a 24 bit X session. I am busy investigating this issue. Please ensure that your xorg.conf file is using
    16 bit as the default/selected color depth if you intend to run Qt apps.
    3) libEGL.so prints a spurious error message:

    “”Couldn’t load implementation for OpenGL_ES”

    due to the absence of an libGLES(1).so which, it appears, can be safely ignored

    gunnar
    Painting
    Graphics Dojo
    OpenGL
    Posted by gunnar
     in Painting, Graphics Dojo, OpenGL
     on Monday, January 18, 2010 @ 10:00

    Previously in this topic:

    In my previous post, The Cost of Convenience, we saw quite clearly that text drawing was a major bottleneck. Text drawing is quite common in GUI applications though, so we need a solution for that. If we break down what happens behind QPainter::drawText(), it is split into two distinct parts. Converting the string into a set of positioned glyphs, often refer to as “text layout” because it positions the glyphs, does text wrapping and adjustments for alignment. The second part is passing the glyphs to the paint engines to be rendered. When the text is the same all the time, the first part could be done once and the glyphs/positions just reused.

    We have a class in Qt which allows you to cache the first part and only do the drawing for each frame. The class is QTextLayout. This is a low-level class, throwing asserts at you for the most trivial of mistakes. It also comes with a really inconvenient API, but it does reduce the most costly step of text drawing, which is the layout part. It is also only fair to mention that QTextLayout uses a lot more memory than just the glyph-array and positions array, as one could expect, so in a memory constrained setup, it should be used with caution. In 4.7, we plan to introduce an API for static text, which takes care of all the layout and stores only the required parts, reducing the overall memory footprint, but for now, QTextLayout is how you do it.

    Going back to my virtual keyboard, updated Source Code, I’ve changed the “-buttonview” example to make use of QTextLayout. In the constructor, I build the layout:

        ButtonView() {
            QString content;
            for (int i='0'; i< ='Z'; ++i) {
                content += QLatin1Char(i);
                content += QChar(QChar::LineSeparator);
            }
            m_layout = new QTextLayout(content, font());
            QFontMetricsF fm(font());
            m_layout->beginLayout();
            for (int i=0; i<content .size() / 2; ++i) {
                QTextLine line = m_layout->createLine();
                line.setNumColumns(1);
                int x = (i) % 10;
                int y = (i) / 10;
                QSizeF s = fm.boundingRect(content.at(i*2)).size();
                line.setPosition(QPointF(x * 32, y * 32) + QPointF(16 + s.width() / 2, 16 + s.height() / 2));
            }
            m_layout->endLayout();
            m_layout->setCacheEnabled(true);
        }
    
    </content>

    If you look at the source code, there is more stuff going on in the constructor than I show above. This is because I extracted the text layout relevant parts only. So what we do is to build a string of the characters. Between each character I insert a LineSeparator. Without this, I wouldn’t be able to split the text into multiple QTextLine objects. From the content string, I construct the layout. For each character, I find its position in the grid and construct a QTextLine and move the line to its position. Each line is one column/character big. Finally I enabled caching on the layout. This is the step where we start caching the laid out text.

    When it comes to the paint method, the code is rather straightforward. All the text is contained inside a single layout object so I can just call its draw function.

        void paint(QPainter *p, const QStyleOptionGraphicsItem *, QWidget *) {
    
            // Draw background pixmaps...
    
            m_layout->draw(p, QPointF(0, 0));
        }

    Now, lets have a look at what this gains us:

    Text Layouts

    The graph shows the number of milliseconds per frame including the blit. Measured on an N900 with composition disabled. Smaller is better!

    If we compare the “-no-indexing -optimize-flags” to the one with “-no-indexing -optimize-flags -text-layout”, we see that there is a significant reduction per frame. It brings raster from 9.3 ms per frame down to 5.5, OpenGL drops from 16 ms per frame to 9.1 ms when using a text layout. A drop of about 4 ms is also visible in the X11 paint engine.

    Needless to say, using the QTextLayout class introduces a huge benefit, but it requires a bit more setup to get there. In this implementation I merged all the text into a single object which also makes it impossible for me to move one item relative to the others, such as adding an offset when a button is pressed. I could have one QTextLayout for each item, which would have been roughly the same performance, but at a higher memory cost.

    Until next time, take care!

    PS: A small comment on the item cache / X11 numbers. The connection is asynchronous and Qt completes its job at about 2.7 ms pr frame. With “-sync” on the command line, which makes all X calls synchronous, raises the time to about 10 ms per frame. If I had put a QApp::syncX() into each frame, synchronizing once per frame which is essentially what GL and VG are doing, I would probably get a number that is in between these two. What this means is that the numbers for X11 in this test are actually quite a bit worse than the graphs show.

    gunnar
    Graphics View
    Painting
    Graphics Dojo
    OpenGL
    Posted by gunnar
     in Graphics View, Painting, Graphics Dojo, OpenGL
     on Monday, January 11, 2010 @ 09:25

    Previous posts in this topic:

    So, its time for my next post. Todays topic is how convenience relates to performance, specifically in the context of QGraphicsView. My goal is to illustrate that the way to achieve fast graphics is to pack your QPainter draw calls as tightly together as possible. The more stuff that happens in the middle, the slower it gets.

    To illustrate this, I’ve implemented a virtual keyboard. Granted, its not a very common layout nor is it usable, but the rendering is the point here, not the functionality. The full source code is here and it looks like this:

    Virtual Keyboard Image

    I’ve implemented the keyboard using three different approaches. One using proxy widgets, one using graphics items and one where the entire view is one graphics item. In addition to that, I added a number of options to tweak various properties, such as whether or not the text is drawn. I measured this on an N900 rather than a desktop because the difference becomes more profound on a small device. On the desktop it is easy to be fooled because most things complete in a matter of micro seconds anyway. It is only when the entire application comes together one notices that things are not as smooth as in the prototype, but too much work has been invested into the current design that one loses out on the super-slick feeling application.

    QGraphicsProxyWidget

    Since we’re implementing a series of clickable buttons, a natural and convenient starting point is to use an existing button class, such as the QPushButton. It already implements the logic for mouse/keyboard interaction and has signals for clicking and all sorts of other useful functionality. To get widgets into QGraphicsView, we use a QGraphicsProxyWidget. To make the test “fair”, I actually use a plain QWidget which just paints a pixmap and a draws a text. Had I gone through the styling API, these numbers would have been even worse.

    ProxyWidget Results
    Milliseconds spent per frame including blit to screen when using QGraphicsProxyWidgets. Low is better!

    If we look at the plain “-proxywidgets” run, the fastest engine was the raster engine, running at 26ms per frame. If I wanted to slide this keyboard onto screen, I have 16ms available if I want it running at 60 FPS and 33ms available if I want to do it at 30 FPS. When each frame takes 26ms, I can barely do 30, but with only a little bit of slack, so if another process is soaking up CPU time, that number is also a bit difficult to reach. So, not very good. (BTW, the exact numbers in the graphs are listed as a comment in the top of the .cpp file I linked above).

    The first thing I noticed with this approach was that the each button now had a gray background. This is of course the widget background. A QWidget embedded in QGraphicsView will be treated as a top-level and will therefore draw its background. I added an option “-no-widget-background” which sets the Qt::WA_NoBackground on the widget. This brings the rendering speed with raster down to 22ms. 4ms saved per frame, just by setting a flag, not too bad, but still pretty far from being awsome.

    I’ve mentioned before that text drawing is not as fast as we would like it, so just to compare how it looks without text, I added a “-no-text” option to the test. This brings the raster results down to 13ms. That is pretty nice and below the 16ms threshold required to achieve 60 FPS, but only with a small margin. And I’m not drawing any text! Before I give up with this approach, I’ll enable item caching. By setting ItemCoordinateCache on each button, I cache both the background pixmap and the text in one single pixmap. This brings the raster results down to 8.5ms, and its starting to look acceptable. But at a very high memory cost… In my original usecase I had one shared pixmap for all the button backgrounds, but now I have one per button.

    You may notice that there was a vast difference between item caching and the proxy widget drawing the pixmap. One thing that adds to the proxy widget cost is that the QPainter is recreated and initialized for each button in the buttons paint event. Also, as I mentioned in my previous post, An Overview, you may remember that I said that each widget has a system clip and that there is an overhead involved with calling the paintEvent. For items in QGraphicsView, there is already a painter, and I don’t need a clip, nor do I need any of the other stuff that goes on behind the scenes there. When we enable item coordinate caching, we don’t leave graphics view world and we don’t enter the widget world. This crossing is expensive, so by not going into the widget world, we save a lot.

    So, if there is a lesson to be learned it is that QGraphicsProxyWidget should be used with extreme caution. If you really need it, use very few of them.

    QGraphicsWidget

    If proxy widgets are too slow to be usable in this scenario, then the next best thing is to use a QGraphicsWidget. This is a subclass of both QObject and QGraphicsItem, which gives me signals, slots and properties, but its not a QWidget and therefore still fairly lightweight. The numbers are as follows:

    GraphicsWidgets Results
    Milliseconds spent per frame including blit to screen when using QGraphicsWidgets. Lower is better!

    Compared to the proxy widgets approach we’re starting out quite a bit better, with raster at 13 ms per frame, OpenGL at 20ms and X11 at 22ms. Below this line is a new line: “-no-indexing -optimize-flags”. QGraphicsView will by default put all the items in a view into a BSP tree for fast lookup, this is beneficial when the scene contains many items and you often need to find items that intersect with a small portion of the scene. In the testcase we’re always doing a full update, so there is no benefit from the index, so it can be disabled by calling scene->setItemIndexMethod(QGraphicsScene::NoIndex). Having a BSP is the default behaviour because graphics view was initially intended to be a static scene for many items. The most common usecase today is a few (a few hundred at max) items which tend to move a lot. For this reason, it is always a good idea to try to disable the BSP and see if it makes a difference in performance. If it helps, then leave it off.

    I also know that the items play nice, meaning that they don’t change the clip, translate the painter, change the composition mode or modify any other state that would propagate to other items. This means I can safely set the DontSavePainterState optimization flag. Actually, based on an old habit, I set all possible optimization flags. I only consider unsetting them if my drawing code starts to look weird, at which point I would rather fix the drawing code and keep the flags set. By disabling indexing and enabling optimization shaves off 2ms per frame in for all rendering backends, so that is definitely worth it.

    If I don’t do text, the performance is about twice as fast. Again we see that text drawing is a huge cost. We’re working on an API to fix this and we’ll have more information for you when we do. You may notice that enabling item caching drops the performance a bit compared to the “-no-text” case. There isn’t much overhead inside QGraphcisView for this path. A likely reason for the decrease is that reading from multiple memory sources (multiple pixmaps) results in a lot of cache misses, compared to the straight approach which draws the same pixmap over and over.

    ButtonView Item

    In my previous post I briefly mentioned that there is a slight overhead involved with the use of a QGraphicsItem too. Prior to calling the paint function, the painter is transformed to the coordinate system of the item and the painter state is saved. If the item draws a big polygon, this setup cost can be ignored, but when drawing just a pixmap and a few pixels of text, then it may be worth considering. In the spirit of “The more direct the painting code is, the faster it gets”, I implemented the keyboard as a single item. The numbers are as follows:

    ButtonView Results
    Milliseconds per frame including blit to screen when using a single item. Lower is better!

    Raster is now down to 10ms, which is 1ms better than the QGraphicsWidget approach when all optimizations were enabled, so even though graphics items are cheaper than widgets, they still cost a bit. The keyboard is now rendered in a tight loop, and the major difference in performance here is caused by the fact that items in the scene have a transform associated with them. Prior to calling paint() a transform is set to match the painter to the items local coordinate system. This causes a state change in the paint engine. For each button we’re drawing a 32×32 pixmap which means alpha blending 1024 pixels, followed by doing text layout and drawing a single character. Even then do we save about 10% time by not having a QPainter::translate() in the midst, so bear that in mind. By enabling the optimization flags and disabling the index, raster drops a bit more, so having those are still a good idea.

    You may have noticed that there is one dataset that is named “cheat” for OpenGL. I was reluctant to include this, because its using a private API that is not, and I really mean NOT, subject to binary compatibility rules. You cannot call this from your application. We’re going to add a public API for this in the future, hopefully 4.7, so until its there, wait. In the interest of showing what we are thinking internally, I thought I would show it.

    OpenGL is really great for accelerating graphics, but its way of working does not map optimally to how Qt works. GL is really good at taking a few large datasets of triangles and rendering them, but its not so good at drawing loads of small things. Small things like button backgrounds, icons, single text items, etc. However, all the buttons backgrounds are the same pixmaps, so what if I could tell QPainter to draw the same pixmap in multiple places at once? In GL this would correspond to setting up a texture and one vertex and texture coordinate array and drawing some 40 pixmaps in one go. This fits much better with how GL is made to work. The result is that drawing the buttons drop from 5.2ms to 3.9ms, so another piece of juice squeezed out. Naturally, the more times the pixmap is drawn and the smaller the pixmap gets, the more benefit you get from batching commands like this.

    There is a second option to OpenGL for the button view case, which is the “-ordered”. This was done after Tom brought to my attention that the testcase would do a shader program update for each painter call. In the default buttonview implementation we do:

                        for (int i=0; i < m_rects.size(); ++i) {
                            p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                            p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));
                        }
    

    Because pixmaps use one shader pipeline and text drawing uses another, the pipeline needs to be switched and reset all the time, which renders at 16m per frame. To see if it makes a difference, I added a second alternative rendering, “-ordered”, where I do all the pixmaps first, then all the text:

                        for (int i=0; i < m_rects.size(); ++i)
                            p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                        for (int i=0; i&lt;m_rects.size(); ++i)
                            p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));
    

    This prevents the shader pipeline updates and bring the rendering time per frame down to 13ms, so definitely worth it.

    Summing Up

    Virtual Keyboard Combined Results
    Milliseconds per frame including blit to screen for proxy widgets, graphics widgets and a single widget. Lower is better!

    OpenGL comes out rather bad in this testcase, which I was a bit disappointed to see, but it did send Tom into an optimization frenzy, so we’re hoping to remove some of the constant overhead. It should also be said that when using the OpenGL graphics system, we enable multisampling by default, which increases rendering time on the N900 by around 30%. A plain QGLWidget would thus perform slightly better. Another aspect to OpenGL is that it uses a dedicated low-power chip, so even though it for this particular usecase runs at half the speed, it also uses a lot less battery, so it may still be the right choice. OpenGL will also scale significantly better than raster and X11 as the pixmaps get bigger or if the content of the button is slightly more advanced, say like a horizontal gradient.

    The best numbers are definitely in the button view case, where all the content is rendered as one item, which is what I wanted to highlight with this blog. The button view item also opens up for other optimizations such as batching. We don’t have that many batching functions in QPainter today, its only drawRects(), drawLines() and drawPoints(), but we’re considering to add more, we are just not sure on how the API’s would look yet.

    The bottom line is still that how Qt is used defines how well it performs. On one hand there may be an easy and convenient way to get the job done which performs quite sub-optimally. On the other hand there may be a more involved implementation which performs very well. I’m not trying to suggest that you do one or the other, there are a lot of good reasons for picking either one. But I hope that I’ve illustrated that some features come at a cost and that this is kept in mind along with what the target is when designs evaluated and chosen.

    I’ll round off with a question. If you were to implement a particle effect when you press a button, which approach would you choose, having seen the numbers above?

    TomCooksey
    Painting
    Graphics Dojo
    OpenGL
    Performance
    Posted by TomCooksey
     in Painting, Graphics Dojo, OpenGL, Performance
     on Wednesday, January 06, 2010 @ 12:01

    Introduction

    Here’s the next instalment of the graphics performance blog series. We’ll begin by looking at some background about how OpenGL and QPainter work. We’ll then dive into how the two are married together in OpenGL 2 Paint Engine and finish off with some advice about how to get the best out of the engine. Enjoy!

    Why OpenGL?

    Before I dive into the OpenGL paint engine, I want to make sure we all understand the motivation for the OpenGL 2.0 paint engine. I’ve talked about this before in my article about hardware acceleration, but we still frequently get questions like “Why not implement a Direct2D paint engine?”.

    Everyone knows OpenGL means fast graphics right? Well, this is actually a bit of a misconception. What makes graphics fast is a bit of hardware dedicated to computer graphics called a GPU (Graphics Processing Unit). OpenGL 2.x is a software library which often (but not always) uses a particular class of GPU to help satisfy drawing operations (Note: OpenGL 1.x used a different class of GPU). A modern programmable GPU (e.g. nVidia GTX 295) can usually be programmed via both OpenGL, Direct3D and OpenCL. The only difference then is that Direct3D is only available on the Windows platform and OpenCL is not universally supported.

    So the reason we are investing our time and effort into OpenGL, rather than Direct3D or OpenCL, is that OpenGL 2.0 is sufficient to give us access to all the GPU features we currently want to use. It is also available on more platforms, especially if you limit yourself to the ES sub-set. We are also looking into restricting ourselves further to only use APIs in OpenGL 3.2 Core Profile.

    This might change in the future if we see a new class of GPU, like ones designed for 2D vector graphics which can’t be abstracted by OpenGL 2.0 very well (enter OpenVG), or, if we want to start using GPU features which OpenGL (ES) 2.0 doesn’t give us access to. Having said that, OpenGL is very good at exposing new GPU features through extensions.

    History

    Qt has had an OpenGL paint engine since early Qt 4.0 days. This engine was designed for the fixed-function hardware available at the time. As time went on and manufacturers added newer bits of hardware to their GPUs, the OpenGL paint engine was adapted to use those features through OpenGL extensions. Over the last 4 years, lots of people have hacked on the engine and added support for things like ARB fragment programs and even adapted the engine to work on OpenGL ES 1.1. The engine is pretty stable and has lots of fall-backs (or original code-paths, depending on how you look at them) for old hardware missing GL extensions the engine can utilise. But, fundamentally, it is an OpenGL 1.x engine.

    In early 2008, around the time of the Falcon project (the Falcon Project was an internal project started for Qt 4.5 which focused on painting performance and architecture), it became increasingly clear that Qt needed to support hardware acceleration using the OpenGL ES 2.0 API which was starting to appear on embedded System-On-Chips like the OMAP3. There were two options available: Extend the existing OpenGL paint engine further still, or develop a new paint engine from scratch. When looking at the existing engine, there was a major problem – although it supported fragment programs, it was heavily reliant on fixed-function vertex processing. A further consideration was that the Falcon project had just kicked off and the future of the QPaintEngine API was uncertain. Both of these factors resulted in a new paint engine being written from scratch for OpenGL ES 2.0. This new engine had a distinct advantage over the existing engine: everything I wanted to use from OpenGL was in the core OpenGL ES 2.0 API. This meant I didn’t need to add fallbacks in case of missing functionality, leading to much cleaner and leaner code.

    Another point about OpenGL ES 2.0 is that it doesn’t have much in the way of fixed function features – forcing you to write shader programs. While annoying at the time, this is apparently the best way to do things even on desktop GPUs. This point is important because it quickly became apparent that although the engine was designed for GLES2, not only would it also work on desktop OpenGL 2.0, but it would use that API in a way better suited for modern programmable GPUs. So, in Qt 4.6, the new engine is used by default on both GLES2 and on desktop systems which support OpenGL 2.0.

    What does OpenGL (ES) 2 provide?

    As I’ve already mentioned, OpenGL ES 2.0 is a pretty lean and mean API which models programmable GPUs. The “programmable” bit is fundamental to the API. It means that you write small programs known as shaders, ask OpenGL to compile and then run them on the GPU to process the data you give it. There are two types of shaders: one type processes positions (vertices) and another type processes pixels (fragments), called the vertex shader and fragment shader, respectively. The idea is that you tell OpenGL you want to draw some triangles and the vertex shader is run to determine the position of each of those triangles. Then, the GPU turns each triangle into a bunch of pixels and the fragment shader is run to determine the colour of each of those pixels. The API provides various ways of passing data from the CPU to the GPU (from textures and lists of triangle positions to individual floats) and ways of passing data from the vertex shader to the fragment shader. That’s basically it. All the complexity lives in the shaders you give to the GPU to run.

    What does QPainter require?

    The rest of this blog assumes you are familiar with the QPainter API (if not, go check the QPainter docs) ). It might also be a good idea to read through Gunnar’s post about how the Raster engine works.

    So, the QPainter API provides more than just triangles. It is therefore the GL paint engine’s job to turn the whole of the QPainter API into “just a bunch of triangles”. To understand its task a little better, you have to split QPainter up into chunks which map better to OpenGL. A great example of this is drawRect(). In QPainter terms, this is a single primitive, but in GL engine terms, it is actually two: A rectangle (the fill) and a (possibly quite complex) line round the outside (the stroke). The OpenGL paint engine tries to keep a fairly clean separation between the shape of something which is drawn and its fill. So, here’s the list of primitives (shapes) QPainter requires the engine to draw:

    • Simple primitives (Rectangles, convex polygons, ellipses, etc.)
    • Text
    • Pixmaps
    • Strokes
    • Complex vector paths (QPainterPath)

    In addition to this, we have various fills which we can use on our primitives provided by QBrush:

    • Solid colour
    • Linear gradients
    • Radial gradients
    • Conical gradients
    • Bitmap patterns
    • Textures

    Not only do we have different types of fill, but we also support a full 3×3 transformation matrix on the brushes. This allows you to draw a rectangle but use it as a kind-of stencil over (for example) a perspective transformed texture.

    Finally, QPainter also requires the engine to implement clipping, different composition modes and support it’s state stack (QPainter::save() & QPainter::restore()).

    Engine Operation

    Primitive Rendering

    • Simple Primitives: To render convex primitives such as rounded rectangles, we just generate a GL triangle fan and render it using glDrawArrays
    • Text: For large text, we convert it to a complex path and render is as such. However, for smaller font sizes, we rasterize the individual font glyphs and upload them as a texture (8-bit texture for bitmap & anti-aliased glyphs and 24-bit RGB for sub-pixel anti-aliased glyphs). This glyph texture is used as a mask in the engine’s pixel pipeline (see below). So, in terms of primitives, text is actually rendered as a set of rectangles - one rectangle for each glyph. When rendering with sub-pixel anti-aliased glyphs, it is possible that the engine will need to do two passes (if the brush is not a solid colour). This is because the engine uses a clever trick and sets the brush’s colour as the glBlendColor and outputs the RGB mask in the fragment shader. It is then able to set a glBlendFunc which combines the two and gives per-sub-pixel blending. If you set a more complex brush, the engine has to do two passes - first apply the mask to the destination, then a second pass to apply the brush, with glBlendFunc set to give the correct result.
    • Pixmaps: A pixmap is actually just a rectangle.
    • Strokes: Strokes can be very complex - just take a look at the pathstoke demo! However, even the most complex dashed pattern with rounded joins and end caps can be turned into a GL triangle strip relatively easily. This is done by the QTriangulatingStroker.
    • Complex vector paths: This is where things get tricky. QPainterPaths can have lots of things which break the “turn lineTo, moveTo and curveTo into verticies and render as triangle fan” algorithm…

    Rendering Using Stencil Technique

    Take the following path as an example:

    Convex Path (1)

    Here we have a seemingly trivial path with only 4 points. To draw this with GL, you could just convert the path’s points to verticies and draw it as a triangle fan, which results in two triangles: Triangle 1: ABC and Triangle 2: ACD. The problem is that just looks like a solid triangle, not the path we wanted:

    Convex Path (2)

    So, to overcome this difficulty, we drop to a 2-pass rendering method which uses the stencil buffer as a temporary scratchpad. So first off, we clear the stencil buffer to all zeros (represented as white):

    Stencil Buffer (Clear)

    Next, we set the stencil operation to invert, which means instead of setting the stencil value to ‘1′ when a triangle touches a pixel, invert the existing value instead. So 0->1 & 1->0. First we render the first triangle (ABC). As all the pixels are currently 0, every pixel touched by the triangle turns to 1 (represented as black):

    Stencil Buffer (Triangle 1)

    Next, we draw the second triangle (ACD). Note: We are inverting the stencil’s value, so black pixels touched by the second triangle turn to white and white pixels turns to black:

    Stencil Buffer (Triangle 2)

    So now the stencil buffer contains the silhouette of our path. All we do now is draw a rectangle into the destination window, but with the stencil test enabled.

    In addition to the stencil technique, we are also adding experimental support for triangulating QPainterPaths and caching the triangulation. While this is slower for paths which change often or are zoomed in & out, paths which are relatively static can be triangulated once and rendered multiple times without having to re-triangulate.

    Filling Primitives

    Now we know how all the different QPainter operations get turned into GL primitives, but we’re still missing how they get filled. As already mentioned, the colour of a pixel is determined by the fragment shader. We therefore have lots of different fragment shaders for different types of fill. However, we also need to support text rendering with arbitrary fills (QPainter lets you fill text with a perspective transformed radial gradient). In the future, we also want to support composition modes which OpenGL doesn’t provide. We’ve also found there are ways we can simplify the shaders for certain situations (and thus improve performance). The result is that Qt needs lots of different shaders. At last count, we’d need over 1000 different shaders to cover all situations. That’s a lot of GLSL to maintain and test, far more than the resources we have available. So instead we split the shaders into different interchangeable “stages”. This is achieved by having each stage in it’s own GLSL function. As an example, lets take regular, non sub-pixel anti-aliased text rendering with a transformed radial gradient. Note, this is just an example to demonstrate how the engine operates and you probably shouldn’t do it in performance critical situations.

    We render gradients by pre-calculating a 1px high texture (like a 1D texture) on the CPU which we sample from in the fragment shader. However, we calculate the texture coordinates in the vertex shader and pass it to the fragment shader as a varying. This is because it’s a good idea to do as much work as possible in the vertex shader rather than the fragment shader as it is called so much less frequently.

    As already mentioned, we render (non sub-pixel) anti-aliased text by using an 8-bit mask texture. We then multiply the fragment colour by a sample taken from this mask. So, if we’re on the edge of a glyph where the alpha value is <1, we adjust the alpha of the srcPixel by that amount (actually, we also adjust the RGB values too as we use pre-multiplied alpha pixel format internally).

    If there was a non-standard composition mode, we’d then pass the masked pixel to another stage which would blend it with the background (although this isn’t implemented yet!).

    So you can see in the fragment shader, there’s 3 different stages. The first stage (srcPixel) determines the brush colour of the fragment. The next stage (applyMask) modulates the pixel by a mask to achieve anti-aliased text rendering. The final stage (compose) then blends the pixel with the background. We also have a similar staging technique for the vertex shader. All this complexity is nicely abstracted by the QGLEngineShaderManager. The paint engine tells the shader manager what it wants to draw and the shader manager selects an appropriate selection of shaders. One final note on this: While desktop OpenGL 2 supports linking multiple fragment shaders in a single program, OpenGL ES 2.0 does not. This means that we actually use the different stages by appending them to a single string of GLSL we pass to GL. This also gives the GL implementation the best chance to inline the different stages (without which, performance would suck).

    Texture Management

    The OpenGL paint engine makes heavy use of gradients. For example, even though it’s perfectly possible to calculate colours for gradients in the fragment shader, we still use a texture as a look-up-table as it is so much faster. Repeatedly uploading textures every time we need them would ruin performance. So instead, we keep a per-context cache of what QPixmap/QImage is already present in texture memory. If two contexts are sharing then we also detect this and don’t duplicate the textures. This functionality is available publicly in QGLContext::bindTexture() too.

    On Linux/X11 platforms which support it, Qt will use glX/EGL texture-from-pixmap extension. This means that if your QPixmap has a real X11 pixmap backend, we simply bind that X11 pixmap as a texture and avoid copying it. You will be using the X11 pixmap backend if the pixmap was created with QPixmap::fromX11Pixmap() or you’re using the “native” graphics system. Not only does this avoid overhead but it also allows you to write a composition manager or even a widget which shows previews of all your windows.

    Antialiasing

    The OpenGL paint engine uses OpenGL multisampling to provide anti-aliasing. Typically, this will be 4x/8x FSAA, meaning 4/8 levels of coverage, which is worse quality than the raster engine, which always uses 256 levels of coverage. However, as the DPI of modern displays increases, you can get away with lower-quality anti-aliasing.

    Using multisampling also doesn’t affect text rendering as text is anti-aliased using masks rather than multisampling (for smaller font sizes). So text rendered with the OpenGL engine should look almost as good as text rendered with the raster engine (which also does gamma-correction). The only drawback of using multisampling is that some OpenGL implementations don’t support switching multisampling off. Indeed, the OpenGL ES 2.0 specification doesn’t even provide the API to switch it off. The consequence is that non-anti-aliased (a.k.a. aliased) rendering can be broken (Everything gets anti-aliased even when the QPainter::Antialiasing hint isn’t set). There’s little we can do about this. :-(

    Clipping

    QPainter supports setting an arbitrary clip, including complex QPainterPaths. Qt uses the GL stencil buffer (or more specifically the lower 7 bits of the stencil buffer) to store the clip. The clip is written to in the same way as we render any other primitive, even using the stencil technique for complex paths. However, instead of filling pixel colours into a colour buffer, we fill stencil values into the stencil buffer. The actual value we use depends on the current QPainter stack depth (how many times save() was called minus the number of time restore() was called). This means that if you restrict yourself to intersect clips (Qt::ClipOperation == Qt::IntersectClip), the engine only needs to write to the part of the stencil buffer which is being clipped to. What’s more, the engine doesn’t need to write to the stencil buffer at all when you call restore() - it just changes the value at which the stencil test passes.

    In addition to using the stencil buffer for clipping, the OpenGL paint engine can also just use glScissor. This only allows a single, untransformed rectangle to be used as the clip, which can be quite restrictive. However, it is by far the fastest way to do clipping. So if performance is more important to you than utility, only ever use untransformed rectangular clips.

    Recommendations

    Interleaved Rendering

    Unlike OpenGL, QPainter allows an arbitrary number of rendering contexts (QPainters) to be active in the same thread at the same time. For example, in your widget’s paint event, you can begin a painter on your widget and begin another painter on a QPixmap and interleave rendering to them:

    void Widget::paintEvent(QPaintEvent*)
    {
    QPainter widgetPainter(this);
    widgetPainter.fillRect(rect(), Qt::blue);
    QPixmap pixmap(256, 256);
    QPainter pixmapPainter(&amp;pixmap);
    pixmapPainter.drawPath(myPath);
    widgetPainter.drawPixmap(0, 0, &amp;pixmap);
    }

    While this works ok with the OpenGL graphics system, having to switch from doing something with one painter to doing something with a different painter can be very costly and should be avoided whenever possible.

    Mixing QPainter and Native OpenGL

    As shown in several examples, it is possible to mix your own OpenGL rendering code with QPainter rendering code. However, as OpenGL is a giant state machine, it is very easy for you to accidently clobber Qt’s GL state and vice-versa. To overcome this, we’ve added some new API to QPainter in Qt 4.6 - QPainter::beginNativePainting() and QPainter::endNativePainting(). To prevent artifacts, you must enclose your custom painting in beginNativePainting() and endNativePainting(). This is very important - even if you’re not seeing any problems now, you might find your code starts failing in a future Qt release in which the GL paint engine works slightly differently. Also, as beginNativePainting and endNativePainting sets lots of OpenGL state, it can be quite expensive and thus you should try to use it sparingly. Try to batch up all your custom OpenGL code in a single block.

    QGLWidget vs OpenGL Graphics-System

    Unlike the raster & OpenVG paint engine, you don’t have to use a specific graphics system to render widgets using the OpenGL paint engine. The QtOpenGL module provides several classes, including QGLWidget, which all use the OpenGL paint engine regardless of what graphics system is being used. QGLWidget is basically a regular widget which always has a native window ID and is always rendered to using OpenGL. You are free to choose whichever method you want to get OpenGL rendering (graphics system or QGLWidget). However, using the opengl graphics system can often be slower than using a QGLWidget, as Qt needs the contents of the “back buffer” (or QWindowSurface) to be preserved when flushing the render to the window system. OpenGL does not guarantee this and it is often not the case so Qt has to use either an FBO or a PBuffer as the back buffer. When the render needs to be flushed, the FBO or PBuffer is bound to a texture, rendered into the window and then the GL buffers are swapped. This extra overhead is avoided by using a QGLWidget, however as a consequence, it is not possible to redraw a sub-region of a QGLWidget: Whenever a QGLWidget is updated, the entire widget must be re-drawn.

    It should also be noted that using the OpenGL paint engine isn’t a silver bullet which makes everything faster. For example, the GL engine really sucks at drawing lots of small geometry with state changes between each drawing operation. While we’re working on improving that use case at the moment, the raster paint engine will probably always be faster just because it has so much less overhead. So QGLWidget might be a great way to get the best of both worlds when combined with the raster graphicssystem - Use QGLWidget for operations which GL excels at and the raster engine for everything else.

    Tips for Performance (fps)

    As a general rule of thumb, OpenGL state changes are expensive. So, use the knowledge you now have of what’s going on under QPainter and try to minimise the number of OpenGL state changes the paint engine needs to do. For example, if you implement a virtual keyboard, you now know that the engine uses a shader for text rendering and a different shader for pixmaps, so draw all the key pixmaps first, then draw all the text on top. That way, the engine only needs to change shaders twice per frame.

    • Never, ever use anything other than intersecting clips
    • Don’t switch render target in the middle of a render
    • Try to use use untransformed rectangular clips whenever possible
    • Minimise changing the brush wherever possible
    • Render batches of primitives of the same types together.
    • Avoid drawing translucent pixels & blending (particularly important on mobile GPUs)
    • Try to cache QPainterPaths and re-use them rather than creating & discarding them in your paintEvent
    • Use QPainterPaths even when there’s a QPainter convenience function. E.g. Rounded rects and elipses.
    • If you’re drawing lots of small pixmaps, try bunching them up into a single, larger pixmap
    • Prefer to use power-of-two (2^n) widths & heights for QImages and QPixmaps (128×256, 256×256, 512×512, etc)
    • If using QGLWidget and don’t need anti-aliasing, don’t enable sample buffers in the QGLFormat
    • If rendering complex QPainterPaths, try to only use odd-even fill rule
    gunnar
    Painting
    Graphics Dojo
    OpenGL
    Performance
    Posted by gunnar
     in Painting, Graphics Dojo, OpenGL, Performance
     on Wednesday, December 16, 2009 @ 06:54

    For this blog series that I’m doing, I figure its nice to start with an overview of the whole painter, pixmaps, widgets, graphicsview, backingstore idea.

    At the centre of all Qt graphics is the QPainter class. It can render to surfaces, through the QPaintDevice class. Examples of paint devices are QImage’s, QPixmaps and QWidgets. The way it works is that for a given QPaintDevice implementation we return a custom paint engine which supports rendering to that surface. This is all part of our documentation so perhaps not too interesting. Lets look at this in more detail.

    QWidgets and QWindowSurface

    Even though QWidget is a QPaintDevice subclass, one will never render directly into a QWidget’s surface. Instead, during the paintEvent, the painting is redirected to an offscreen surface which is represented by the internal class QWindowSurface. This was traditionally implemented using the QPainter::setRedirected(), but has since been replaced by an internal mechanism between QPainter and QWidget which is slightly more optimal.

    Some times we refer to this surface as “the backingstore”, but it really is just a 2D surface. If you ever looked through the Qt source code and found a class QWidgetBackingStore, this class is responsible for figuring out which parts of the window surface needs to be updated prior to showing it to screen, so its really a repaint manager. When the concept of backingstore was introduced in Qt 4.1, the two classes were the same, but the introduction of more varying ways to get content to screen made us split it in two.

    In the old days widgets were rendered “on screen”. Though the option to paint on screen is still available, it is not recommended to use it. I believe the only system that remotely supports it is X11, but it is more or less untested and thus often cause artifacts in the more complex styles. Setting the flag Qt::WA_PaintOnScreen means that the repaint manager inside Qt ignores that widget when repainting the windowsurface and instead sends a special paintEvent to that widget only. Prior to Qt 4.5 there was a significant speed gain to be had when 10-100 widgets updated at max fps, but in Qt 4.5 the repaint manager was optimized to handle this better so, on screen painting is usually worse than buffered.

    Back to the window surface. All widgets are composited into the window surface top to bottom and the top-level widget will fill the surface with its background or with transparent if the Qt::WA_TranslucentBackground attribute is set. All other widgets are considered transparent. A label only draws a bit of text, but doesn’t touch anything else. What that means for the repaint manager, is that every widget that overlaps with the label, but stacks behind it, needs to be drawn before it. If the application knows that a certain widget is opaque and will draw every single pixel for every paint event, then one should set the Qt::WA_OpaquePaintEvent, which causes the repaint manager to exclude the widgets region when painting the widgets behind it.

    Since all widgets are repainted into the same surface, we need to make sure that widgets don’t accidentally paint outside their own boundaries and into other widgets. Since there is no guarantee that widgets will paint inside their bounds, this could potentially lead to painting artifacts, so we set up a clip behind QPainter’s back called the “system clip”. For most widgets the system clip is a rectangle and looking at the performance section of the QPainter docs, we see that that is not so bad. Rectangular clips, when pixel aligned, are fast. A masked widget, on the other hand, is a performance disaster. It is slower to set up and slower to render. The system clip is the same clip that is passed to the paint event, except that the clip in the paint event has been translated to be relative to the top-left of the widget, rather than to the top-left of the surface. Do NOT set the paint event’s region as a clip on the painter. It is already set up, and we don’t detect that it is the exact same region and just process it fully again. The purpose of the region/rect in the paint event is so that widgets can decide to not draw certain parts. This is primarily useful when you have big scenes in the widgets, such as a map application, graphics view or similar.

    In addition to the system clip which is set up prior to calling paintEvent, the painter also needs to be in a clean state, which means setting up brushes, pens, fonts and others. Its not a huge amount, but if you have many widgets it adds up. So, though widgets are no longer native window handles (aka Alien), there is still a price tag involved in repainting them. Be aware of that when you design your application. For instance, implementing a photo gallery using QLabel’s with pixmaps in a QScrollArea doesn’t scale. You would have to set up clipping and all the other states per label, even though the label only draws a pixmap. A single “view” widget would scale much better, because the widget can then implement a tight loop that draws pixmaps in the right places.
    This whole backingstore and window surface logic only hold for Mac OS X when raster or opengl graphics systems are used. Personally I would strongly recommend to use raster, it implements the full feature set, it is often faster, has the same performance profile as Qt on Windows and painting bugs are prioritized higher for raster than for the CoreGraphics backend. In qt/main we plan to switch the default for Mac OS X to raster, we just have to iron out some window system integration issues.

    Graphics systems

    The concept of a graphics system was introduced in Qt 4.5. The idea is to be able to select at startup time, on an application level, what kind of graphics stack you should be using. The graphics system is responsible for creating the pixmap backends and the window surface. We currently have graphics systems for raster, OpenGL 1.x, OpenGL/ES 2.0, OpenVG and X11. You can select graphics systems either by starting the application with the command line option -graphicssystem raster|opengl|opengl1|x11|native, where “native” means to use the system default. Another option is to provide the exact same option to configure which will set that option for all applications using Qt. Finally there is the function QApplication::setGraphicsSystem which hardcodes the graphics system for a given application.

    In later blogs, we plan to go into each of the paint engines in more detail, but for now, lets just look at the highlights.

    Raster

    The raster graphics system is the reference implementation of QPainter. It implements all the features we specify and does it all in software. When a new port is started, such as with S60, we usually start with getting raster running. It is currently the default on Windows, Embedded, S60 and will also be on Mac OS X.

    Just a though. What do you think of raster on X11? If you ignore for a second that you currently get a local process local font cache. It performs quite nice on X11 and I’ve seen many people switch it at runtime. If we consider remote displays, this seems daunting, but it still may not be too bad. The way it works in the X11 paint engine today is that any gradient and pixmap transform is anyway done in software and uploaded as an image on a per painter-command level. Why not just do it all client side and upload only the parts that needs updating. We can watch HD videos (for some definition of HD, anyway) on youtube, certainly we can afford to upload a few pixels. This is bound to generate comments on XRender and server-side gradients and transforms, but these have been tried numerous times and the performance is simply not good enough.

    The window system integreation is handcoded for each platform to make the most out of it. For windows the windowsurface is a QImage which shares bits with a DIBSECTION, which results in pretty good blitting speed. On X11 we use MIT Shared Memory Images. We used to use Shared Memory Pixmaps, but this is removed from Xorg, but we got this awesome patch from the community, so we’re back up and running. On Mac OS X, we’re experimenting with using GL texture streaming for getting the backbuffer to screen and we’re seeing some promising numbers with that, so I hope that will make into Qt for 4.7 too.

    Because it is just an array of bytes, most native API’s have the ability to render into the same buffer we do. This makes integration with native theming quite straightforward, which is one of the reasons why this is attractive as a default desktop graphics system, despite not being hardware accelerated.

    OpenGL

    We have two OpenGL based graphics systems in Qt. One for OpenGL 1.x, which is primarily implemented using the fixed functionality pipeline in combination with a few ARB fragment programs. It was written for desktops back in the Qt 4.0 days (2004-2005) and has grown quite a bit since. You can enable it by writing -graphicssystem opengl1 on the command line. It is currently in life-support mode, which means that we will fix critical things like crashes, but otherwise leave it be. It is not a focus for performance from our side, though it does perform quite nicely for many scenarios.

    Our primary focus is the OpenGL/ES 2.0 graphics system, which is written to run on modern graphics hardware. It does not use a fixed functionality pipeline, only vertex shaders and fragment shaders. Since Qt 4.6, this is the default paint engine used for QGLWidget. Only when the required feature set is not available will we fall back to using the 1.x engine instead. When we refer to our OpenGL paint engine, its the 2.0 engine we’re talking about.

    We’ve wanted to have GL as a default graphics system on all our desktop systems for a while, but there are two major problems with it. Aliased drawing is a pain, it is close to impossible to guarantee that a line goes where you want it for certain drivers. Integration with native theming is a pain. It is rarely possible to pass a GL context to a theming function and tell it draw itself, hence we need to use temporary pixmaps for style elements. On Mac OS X, there is a function to get a CGContext from a GL context, but we’ve so far not managed to get any sensible results out of it. On the other hand, much of the UI content doesn’t depend on these features, which makes GL optimal for typical scene rendering, such as the viewport of a QGraphicsView or a photo gallery view. So as far as how the default setup in Qt will look in the future, we’re considering that the best default setup for desktop may be a combination of raster for the natively themed widgets and GL for one or two high-performance widgets. Nothing is decided on this topic though, we’re just looking at alternatives.

    Another problem with using GL by default is font sharing. With raster we could theoretically share pre-rendered glyphs between processes in a cross platform manner using shared memory, with GL this becomes a bit more difficult. On X11, there is an extension to bind textures as XPixmaps which can be shared across processes, but this will usually force the textures into a less optimal format which makes them somewhat slower to draw, so it is still not optimal. On Windows, Mac OS X, S60 or QWS, we would need driver-level support for sharing texture ids, which we currently don’t have.

    OpenVG

    I actually quite blank in this area. I’ve not been involved with writing it nor getting it up and running. It sits on top of EGL which makes it quite similar to the OpenGL graphics systems. We expect that OpenVG will be used in a number of mid-range embedded devices.

    The cool thing about OpenVG is that it matches the QPainter API quite nicely. It supports paths, pens, brushes, gradients and composition modes, so in theory, the vectorial APIs should run optimally.

    Rhys, which wrote the OpenVG paint engine, plans to do a post on the OpenVG paint engines internals in full in the near future.

    Images and Pixmaps

    The difference between these two is mostly covered in the documentation, but I would like to highlight a few things none the less.

    Our documentation says: “QImage is designed and optimized for I/O, and for direct pixel access and manipulation, while QPixmap is designed and optimized for showing images on screen.”

    Raster

    When using the raster graphics system, pixmaps are implemented as a QImage, with a potentially significant difference. When converting a QImage to a QPixmap, we do a few things.

    The image is converted to a pixel format that is fast to render to the backbuffer, meaning ARGB32_Premultiplied, RGB32, ARGB8565_Premultiplied or RGB16. When images are loaded from disk using the PNG plugin or when they are generated in software by the application, the format is often ARGB32 (non-premultiplied) as this is an easy format to work on, pixel-wise. I’ve measured ARGB32_Premultiplied onto RGB32 to be about 2-4x faster than drawing an ARGB32 non-premultiplied depending on the usecase.

    Secondly, we check the pixel data for transparent pixels and convert it to an opaque format if none are found. This means that if a “.png” file is loaded as ARGB32 from disk, but only contains opaque pixels, it will be rendered as an RGB32, which is also about 2-4x faster.

    OpenGL

    When using the OpenGL graphics system the actual implementation of the QPixmap varies a bit from setup to setup. The most ideal option gets enabled when your GL implementation supports Frame Buffer Objects (FBOs) in combination with the GL_EXT_framebuffer_blit extension. In this case, the pixmap is represented as a OpenGL texture id, and whenever a QPainter is opened on the pixmap we grab an FBO from an internal pool and use the FBO to render into the texture.

    Without these extensions available, which is typically the case for OpenGL/ES 2.0 devices, the implementation is a QImage (in optimal layout, same as raster) which is backed by a texture id. When you open a QPainter on the pixmap, you render into the QImage and when the pixmap is drawn to the screen, the texture id is used. Internally there is a syncing process between the two representations, so there will be a one-time hit of re-uploading the texture after drawing into it.

    In general

    If you intend to draw the same QImage twice, always convert it to a QPixmap.

    There are some usecases where QPixmap is potentially worse though. We have these functions, QPixmap::scaled(), QPixmap::tranformed() and friends, which historically are there because we wanted QImage and QPixmap to have similar API’s. We have support for reimplementing this functionality on a per pixmap-backend basis, but currently no engine does this, so for the GL case, or X11 for that matter, calling QPixmap::transformed() implies a conversion from QPixmap into QImage, a software conversion, and the a conversion back to the original format.

    By default a QPixmap is treated as opaque. When doing QPixmap::fill(Qt::transparent), it will be made into a pixmap with alpha channel which is slower to draw. If the pixmap is going to end up as opaque, initialize it with QPixmap::fill(Qt::white). You can even skip the initialization step all together if when you know that all pixels will be written as opaque when the pixmap is painted into.

    Before moving onto something else, I’ll just give a small warning on the functions setAlphaChannel and setMask and the innocently looking alphaChannel() and mask(). These functions are part of the Qt 3 legacy that we didn’t quite manage to clean up when moving to Qt 4. In the past the alpha channel of a pixmap, or its mask, was stored separately from the pixmap data. Depending on which platform you were on, the actual implementation was a bit different. For instance on X11, you had one 1-bit pixmap mask + an 8-bit alpha channel + a 24-bit color buffer. On Windows you had a 1-bit mask + a packed 32-bit ARGB pixel buffer. In Qt 4 we merged all this into one API, so that QPixmap is to be considered a packed datastructure of ARGB pixels. What we did not remove the functions implementing the old API however. In fact, we even added the alpha channel accessors, so we made it worse. The API was to some extent convenient, but all those four functions imply touching all the data and either merging the source with the pixmap or extracting a new pixmap from the current pixmap content. Bottom line. Just don’t call them. With composition modes, you can manipulate the alpha channel of the pixmaps using QPainter. This also has the benefit that it will potentially be SSE optimized for raster or done in hardware on OpenGL, so it has potential for being quite a bit faster. There is also the QGraphicsOpacityEffect which allows you to set a mask widgets and graphics items, but as of today, it is not as fast as we would like it to be.

    QGraphicsView

    I’ll do at least one separate post on graphicsview alone, so I’ll just comment quickly on the difference between using QGraphicsView with items vs QWidget’s. QGraphicsView with its scene populated with items is in many ways very similar to the widgets and their repaint handling. With the addition of layouts and QGraphicsWidgets the line is even more blurry. So which solution should you pick? More and more often, we’re seeing that people choose to create their UI’s in graphics view rather than creating them using traditional widgets.

    Compared to widgets, items in a graphics view are very cheap. If we consider the photo gallery again, then using a separate item for each of the items in the view may (I say may) be reasonable. A widget is repainted through its paintEvent. A QGraphicsItem is repainted through its paint function. The good thing with the items function is that there is no QPainter::begin as the painter is already properly set up for rendering. Another good thing is that the painter has less guaranteed state than the in the widget case. There may be a transformation and some clip, but no guarantees about fonts, pens or brushes. This makes the setup a bit cheaper.

    Another huge improvement over widgets is that items are not clipped by default. They have a bounding rectangle and there is a contract between the subclass implementer and the scene that the item does not paint outside. If we compare this to the system clip we need to set for widgets, then again there is less work to be done for the items. If the item violates this there will be rendering artifacts, but for graphicsview this has proven an acceptable compromise.

    Most UI elements are rather simple. A button, for instance, can be composed of a background image and a short text. In QPainter terms that is one call to drawPixmap and one call to drawText. The less time spent between painter calls the better the performance. The less state changes between painter calls, the better the performance. Looking back at how much happens between these calls for a button, you quickly realize that the traditional widgets are quite heavy. If widgets are going to survive the test of time, then they need to behave more like QGraphicsItem’s.

    Some final words

    I’ve been rambling on for a while, but hopefully there was some useful information in here. You may have noticed that I do not mention printing, PDF or SVG generation, nor do I focus on X11 or CoreGraphics paint engines in great detail. This is because, as outlined in the painter performance docs, we focus our performance efforts in on only a few backends which we consider critical for Qt.

    Donald Carr
    Qt
    Graphics View
    Painting
    OpenGL
    Performance
    Embedded
    Build system
    Posted by Donald Carr
     in Qt, Graphics View, Painting, OpenGL, Performance, Embedded, Build system
     on Friday, November 20, 2009 @ 00:53

    Introduction

    Texas Instruments has a wiki which documents what is required to bring Qt
    up on the Beagle board with full OpenGL ES (1/2) support:

    http://www.tiexpressdsp.com/index.php/Building_Qt

    and I would like to thank one of their engineers, Varun, for his quick turn
    around times in addressing any questions I raised.

    This blog entry is intended to serve a similar purpose, but is more verbose regarding
    Qt considerations and the initial beagle board bring up. It attempts to serve
    as a comprehensive independent source of information on getting Qt built
    for the Beagle board with full OpenGL ES 2 support.

    These instructions are intended for use with Qt 4.6 (and beyond), so grab
    the release candidate or check Qt 4.6 out from the public git repository prior
    to proceeding.

    You can choose to use either Qt/Embedded or Qt/X11, both can
    be successfully integrated with the Beagle board’s SGX GPU and the only
    point of divergence in these instructions will be at (Qt) configure time
    and the client side system (run time) configuration. Both implementations
    offer window management, via QWS and X11 respectively, and operate at
    around 27fps and 22fps respectively when running our hellogl_es2 example.
    (16bit color depth at 1280×720)

    I personally deploy Ångström on my Beagle board, it handles a large amount
    of the logistics surrounding cross compilation and is generally very
    agreeable, and these instructions are therefore going to be bolted to
    Ångström for completeness. Feel free to establish an environment capable of
    showing the OpenGL ES examples TI provide, then following the Qt level
    considerations (Configuring Qt) accordingly.

    For those holding a dormant Beagle board who are open to the author’s
    distribution preferences:

    Building the Ångström rootfs

    Open Embedded is manifested in a git repository: in this posting we are
    working within origin/stable/2009. Please follow the instructions give
    here, they are comprehensive and got me completely off the ground.

    http://www.angstrom-distribution.org/building-angstrom

    These instructions end in you running:

    bitbake base-image ; bitbake console-image ; bitbake x11-image

    which actually builds an X11 angstrom image for your Beagle board. Please
    note, you will need to build the X11-image if you want to build and deploy
    the SGX packages (we will do this in the next section) via Ångström as opkg considers
    X11 to be a required dependency of libgles-omap3_3.00.00.09. This is due
    to one of the encapsulated windowing system libraries being X11 centric:

    libpvrPVR2D_X11WSEGL.so

    Regardless of the indicated X11 dependency, this package will bestow the required
    kernel module on you for general OpenGL ES usage (console or X11). We will be
    building our own QWS centric (libpvrPVR2D_X11WSEGL.so equivalent) library
    behind the scenes for QWS in the Qt/Embedded instructions given later.

    Ångström SGX integration

    You now need to integrate the SGX drivers on your Ångström system.

    You need to get your paws on:

    OMAP35x_Graphics_SDK_setuplinux_3_00_00_09.bin

    with the following MD5 checksum:

    e15147ad76ddbe7c5aec682f5455b774

    Getting this involves following the above link and going through the required registration/request process.
    Once you have this file, you drop it in:

    $OETREE/openembedded/recipes/powervr-drivers/libgles-omap3

    and then run:

    bitbake libgles-omap3-3.00.00.09

    which generates the following packages:

    libgles-omap3_3.00.00.09-r1.1_armv7a.ipk
    libgles-omap3-dbg_3.00.00.09-r1.1_armv7a.ipk
    libgles-omap3-demos_3.00.00.09-r1.1_armv7a.ipk
    libgles-omap3-dev_3.00.00.09-r1.1_armv7a.ipk
    libgles-omap3-tests_3.00.00.09-r1.1_armv7a.ipk

    Deploy the x11-image to an sd-card, and copy these packages to the sd-card
    for deployment on the target. If your beagle board does not have internet
    access you will probably also require:

    *  devmem2
    *  libx11-6 (Only if you insisted on using a console build!)

    as opkg will not be able to automatically install the required dependencies
    from its repositories and you would hit the following error at deployment:

    ———————————————-
    root@beagleboard:/opt/deploy# opkg install ./libgles-omap3_3.00.00.09-r1.1_armv7 a.ipk
    Installing libgles-omap3 (3.00.00.09-r1.1) to root…
    libgles-omap3: unsatisfied recommendation for libgles-omap3-tests
    Collected errors:
    * ERROR: Cannot satisfy the following dependencies for libgles-omap3:
    *  devmem2 *  libx11-6 (>= 1.1.5) *
    ———————————————-

    Once you have installed all the above packages, please reboot the board.

    Your bootargs in U-Boot should look something like:

    console=ttyS0,115200n8=noinitrd ip=dhcp rw root=/dev/mmcblk0p2 omapfb.mode=dvi:1280×720MR-16@60

    assuming you want to output via DVI and are running a similar kernel
    version (2.6.29-omap1 on my beagle) which accepts the same kernel
    arguments indicated in the bootargs variable above.

    Please note that we are specifying a 16 bit color depth which is intentional
    and discussed in the “color depth considerations” section in the appendix

    Please run the powervr demos (under X11) to establish that your drivers are
    successfully installed and usable.

    Configuring Qt

    In order to build Qt now, all that is required for each target is an
    appropriate mkspec:

    For Qt/X11

    You would fork your mkspec off the linux-g++ mkspec, the resulting mkspec’s
    qmake.conf would resemble:

    ==================================================================
    ………….
    include(../common/linux.conf)

    # modifications to g++.conf
    # These release optimization flags are TI supplied
    # and a little more aggressive than Qt standard (gentoo types rejoice!)
    QMAKE_CFLAGS_RELEASE     = -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
    QMAKE_CXXFLAGS_RELEASE     = $$QMAKE_CFLAGS_RELEASE

    QMAKE_CC         = $FULLY_QUALIFIED_COMPILER_PREFIX-gcc
    QMAKE_CXX         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
    QMAKE_LINK         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
    QMAKE_LINK_SHLIB     = $FULLY_QUALIFIED_COMPILER_PREFIX-g++

    # modifications to linux.conf
    QMAKE_LIBS_EGL         = -lEGL -lIMGegl -lsrv_um
    QMAKE_LIBS_OPENGL_QT     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um
    QMAKE_LIBS_OPENVG     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um -lOpenVG -lOpenVGU

    QMAKE_INCDIR         = $TARGET_STAGING_PATH/usr/include
    QMAKE_LIBDIR         = $TARGET_STAGING_PATH/usr/lib

    QMAKE_AR         = $FULLY_QUALIFIED_COMPILER_PREFIX-ar cqs
    QMAKE_OBJCOPY         = $FULLY_QUALIFIED_COMPILER_PREFIX-objcopy
    QMAKE_STRIP         = $FULLY_QUALIFIED_COMPILER_PREFIX-strip

    load(qt_config)
    ==================================================================

    and you would configure Qt with:

    configure -arch arm -xplatform linux-omap3-g++ -opengl es2 -openvg

    all that remains is to adjust /etc/powervr.ini on the target to be:

    [default]
    WindowSystem=libpvrPVR2D_FLIPWSEGL.so

    Now compile an example, eg:

    ./examples/opengl/hellogl_es2

    deploy it and Qt to the target and enjoy.

    For Qt/Embedded

    Since we don’t have the X11 abstraction, we have to interface with the
    underlying hardware/interfaces with Qt/Embedded’s gfx abstraction layer. We
    are going to be making some heavy use of the powervr driver resident under:

    $QTSRCTREE/src/plugins/gfxdrivers/powervr

    there is a README file in the powervr directory that is definitely
    recommend reading, and lends some serious insight into our powervr driver
    and Qt/Embedded in general. The same driver is used for MBX/SGX targets and
    hence sees a fair amount of usage on a variety of target devices.

    You would fork your mkspec off the qws/linux-arm-g++ mkspec, the resulting mkspec’s
    qmake.conf would resemble:

    ==================================================================
    …………….
    include(../../common/qws.conf)

    # modifications to g++.conf
    QMAKE_CFLAGS_RELEASE     = -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
    QMAKE_CXXFLAGS_RELEASE     = $$QMAKE_CFLAGS_RELEASE

    QMAKE_CC         = $FULLY_QUALIFIED_COMPILER_PREFIX-gcc
    QMAKE_CXX         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
    QMAKE_LINK         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
    QMAKE_LINK_SHLIB     = $FULLY_QUALIFIED_COMPILER_PREFIX-g++

    # modifications to linux.conf
    QMAKE_INCDIR         = $TARGET_STAGING_PATH/usr/include
    QMAKE_LIBDIR         = $TARGET_STAGING_PATH/usr/lib

    QMAKE_LIBS_EGL         = -lEGL -lIMGegl -lsrv_um
    QMAKE_LIBS_OPENGL_QT     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um
    QMAKE_LIBS_OPENVG     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um -lOpenVG -lOpenVGU

    QMAKE_AR         = $FULLY_QUALIFIED_COMPILER_PREFIX-ar cqs
    QMAKE_OBJCOPY         = $FULLY_QUALIFIED_COMPILER_PREFIX-objcopy
    QMAKE_STRIP         = $FULLY_QUALIFIED_COMPILER_PREFIX-strip

    #These defines are documented in the powervr README, please read it
    DEFINES += QT_QWS_CLIENTBLIT QT_NO_QWS_CURSOR

    load(qt_config)
    ==================================================================

    and you would configure Qt with:

    /opt/dev/source/qt-beagle-4.6/configure -embedded arm -little-endian -xplatform qws/linux-omap3-g++ -opengl es2 -openvg -plugin-gfx-powervr

    all that remains is to adjust /etc/powervr.ini on the target to be:

    [default]
    WindowSystem=libpvrQWSWSEGL.so

    Now compile an example, eg:

    ./examples/opengl/hellogl_es2

    deploy it and Qt to your board, and after shutting down X, run the example with
    the following arguments:

    ./hellogl_es2 -qws -display powervr

    -qws - starts the application as the QWS server with exclusive access to the
    system hardware which manages all subsequent Qt “client” applications

    -display powervr - indicates that Qt should use the powervr driver we
    compiled earlier

    Summary

    I hope that this posting encourages people to go forward and experiment
    with a fully accelerated Qt 4.6 on the beagle board. Offloading the
    painting work onto the GPU drastically reduces the load on the CPU and
    broadens the range of applications which can feasibly be run on this
    broadly available (cheap!) embedded hardware. The Beagle board has
    really nice hardware, and it would be infinitely useful for us to have external
    people using our powervr driver and getting it as broadly used/refined as
    possible.

    Appendix

    Additional Benefits to OpenGL ES acceleration

    If you take any Qt Graphics View based example and set a QGLWidget as its
    viewport, a large amount of work will be offloaded on the GPU leaving your
    CPU free to frolic. To put this in perspective, a modified version of:

    ./examples/animation/animatedtiles

    which continually transitions runs smoothly at 720p on the beagle board
    when using software, but consumes 100% CPU time according to top (99.3% to
    be fair). It is therefore CPU bound and you are not going to be doing
    anything else in the background.

    When backed by a QGLWidget, the CPU usage drops to 20% on the exact same
    example in the exact same conditions (720p, at 16bit color depth). The
    frame rate suffers slightly, but at least this is mandated by the GPU

    Minor clipping issue evident in hellogl_es2

    The bubbles are evidently clipped on the right hand side, I will hopefully
    beat you to reporting this at: http://bugreports.qt.nokia.com/secure/Dashboard.jspa

    I have not seen any other artifacts, please file any additional bugs you
    may encounter at the above URL.

    Are these instructions applicable to OMAP3 targets in general

    Yes. There is no theoretical reason these instructions would not suffice
    for any OMAP3 based target, although I have not personally verified them
    outside of Beagle board usage. Caveat emptor.

    No Scratchbox2 usage when cross compiling

    The more astute of your would recognize that I bypassed Scratchbox2 when
    configuring Qt/X11 this time around. I payed dearly for it, and this X11
    build has no fontconfig, dbus or glib support even though the Ångström
    subsystem I am building against has support for all of them. If you want a
    full fledged X11 build with decent font support and OpenGL ES support,
    please either:

    1) Invest your time in physically adjusting your MKSPEC (and/or wrestling
    pkg-config) to get all desired dependencies detected and built against

    -Or-

    2) Take the easy road, refer to my previous blog posting “Cross compiling
    Qt/X11″ and merge the above mkspec changes into the:

    ./mkspecs/unsupported/linux-scratchbox2-g++

    mkspec in your Qt 4.6 source tree.

    The same goes for Qt/Embedded which is more self sufficient, but which will
    be built without dbus, glib, etc and additional external dependency support
    without additional mkspec/environment modification or the use of Scratchbox2
    to abstract this away.

    Color depth considerations

    1) The powervr implementation we are relying on does not support
    PVRSRV_PIXEL_FORMAT_RGB888 (24bit color depths), it does however support
    PVRSRV_PIXEL_FORMAT_RGB565 and PVRSRV_PIXEL_FORMAT_ARGB8888

    2) Ångström is busybox based, and the fbset command you will need to set 32
    bit color depths on the console will not work with the default fbset
    busybox symlink. You will therefore have to install and use fbset(.real)
    in order to get 32bit color depths, which is a simple opkg install away for
    the connected Beagle board and a bitbake away for the stranded.

    Please note the color depth specified in the boot arguments

    console=ttyS0,115200n8=noinitrd ip=dhcp rw root=/dev/mmcblk0p2 omapfb.mode=dvi:1280×720MR-16@60

    if you want 32 bit color depth, use:

    console=ttyS0,115200n8=noinitrd ip=dhcp rw root=/dev/mmcblk0p2 omapfb.mode=dvi:1280×720MR-24@60

    followed by:

    /usr/sbin/fbset.real -depth 32 -rgba 8/16,8/8,8/0,8/24

    after your Linux kernel drops you in userspace with a kiss on the cheek. A
    brave man once tried leaving the color depth at 16 in his boot args, and
    jumping all the way to 32bit with fbset so he could change between the more
    performant 16 bit color space and the hardware compositing ARGB offering.
    Running the dedicated fbset command halved his vertical resolution
    regardless of any other parameters he tried to pass fbset and he eventually
    ran off to fight another day.

    There is a clear performance hit of 7 fps when running hellogl_es2 in
    32bit rather than 16bit, taking you down to 20 fps. This hit is even more
    pronounced when setting a QGLWidget on the viewport of a QGraphicsView. I
    am not sure who is responsible for this, and will be personally
    investigating it in the future. Any conjecture/feedback/research performed
    by the reader would be greatly appreciated.

    *Edited: Introduce rudimentary formatting to make the blog look less Vim forged

    Sarah Smith
    Graphics
    OpenGL
    Posted by Sarah Smith
     in Graphics, OpenGL
     on Tuesday, November 17, 2009 @ 23:14

    For all you 3D and graphics hackers out there this will not be news:  writing OpenGL code is a pain.

    Well, the Qt Graphics team is coming to save your sanity with the a new project called Qt/3D.

    We teased about Qt/3D by putting a few of the foundations for it in Qt 4.6, which will be released very shortly. See the Qt/3D 4.6 features blog post for more details.

    At some point Qt/3D will be available as part of Qt itself - exactly what sort of module or library we are not sure just yet - but for now you can try it out via Qt Labs!

    With this post we’re pleased to announce that Qt/3D will be available for experimental use via the new Qt/3D labs repo.

    Old School OpenGL code gets the Qt Treatment

    The trusty old QGLWidget got you past first base: a nice window set up with a OpenGL context ready to go.

    But from there you’re on your own with the OpenGL reference book, tearing your hair out writing code like

    void My3DWidget::paintGL()
    {
       QColor clearColor(palette().color(backgroundRole()));
       glClearColor(clearColor.redF(), clearColor.greenF(), clearColor.blueF(), clearColor.alphaF());
       glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
       QColor color(170, 202, 0, 255);
       glColor4f(color.redF(), color.greenF(), color.blueF(), color.alphaF());
       static float vertices[] = {
          60.0f,  10.0f,  0.0f,
          110.0f, 110.0f, 0.0f,
          10.0f,  110.0f, 0.0f
       };
       glVertexPointer(3, GL_FLOAT, 0, vertices);
       glEnableClientState(GL_VERTEX_ARRAY);
       glDrawArrays(GL_TRIANGLES, 0, 3);
       glDisableClientState(GL_VERTEX_ARRAY);
    }

    just to paint a triangle on the screen.

    But then if you want cross-platform code - something that you can try on your desktop, and then run on your device with OpenGL ES, it starts to look really horrible!

    Macros everywhere to cope with the different function signatures and data types - not to mention shaders under ES 2.0 versus classic GL on the desktop, and a swath of other cross-platform difficulties.

    With Qt/3D your code looks like this:

    void My3DWidget::paintGL()
    {
        QGLPainter painter(this);
        painter.setClearColor(palette().color(backgroundRole()));
        painter.clear();
        painter.setStandardEffect(QGL::FlatColor);
        painter.setColor(QColor(170, 202, 0, 255));
        QGLVertexArray vertices(QGL::Position, 3);
        vertices.append(60.0f,  10.0f,  0.0f);
        vertices.append(110.0f, 110.0f, 0.0f);
        vertices.append(10.0f,  110.0f, 0.0f);
        painter.setVertexArray(vertices);
        painter.draw(QGL::Triangles, 3);
    }

    And what’s more it runs the same on your OpenGL/ES device and your desktop.  (Note that I have elided the view and model transform setup code from both examples above for the sake of space).

    As mentioned in the previous blog post Qt/3D has been in the wings for some time now, and the eagle-eyed might have notice math classes springing up in Qt’s GUI module.

    These classes provide the basis for Qt/3D’s cross platform geometry abstraction: QGLVertexArray. This nifty class also dovetails into the QGLBuffer class to take care of uploading your geometry to VBO’s on the graphics adaptor, as well as coping with differences in platform on data member sizes.

    Download the code from the labs repo and try out the examples - the code above comes from the tutorials directory, where you can find out more about writing your traditional OpenGL apps in the Qt cross-platform way.

    Whats in Store with Qt/3D

    There’s more to come from Qt/3D over and above the Portability tools mentioned in the example above.

    With Enablers are included encapsulation classes like QGLMaterialParameters to encapsulate OpenGL materials in a cross platform and Qt’ish way.

    One of the nicest enablers is the QGLView class and its friends.  Doing your GL painting into a view looks pretty much exactly the same as with an OpenGL widget, but a few more things are taken care of for you - no need to set up tricky viewing and model transforms (which is one reason why I elided them from the code above).  But even better as a bonus you get a pan-rotate-zoom view window for free.   Its customizable using the QGLCamera class, and with QGLLightParameters you can quickly set up your own lights too.

    Then there’s Real 3D bringing basic but powerful geometry management, and model file import functionality. With this stuff we’re just dipping our toes into the world of 3D to allow coding up of basic applications using Qt style containers, QObject based memory management, and the kinds of abstractions you’ve come to expect from Qt. If you’re an Ogre programmer, or used to using Coin3D or CrystalSpace or other powerful 3D and modelling libraries - well, you’ll still need them. We’re not planning to go into competition with those established 3D toolkits.

    Instead our aim is to deliver on the promise of Qt: do more with less.  It should be just as easy to use a 3D model file as it is to use a PNG file, and it should be just as easy to set up a cube with a texture on it as it is to create a Qt label. We call this component of Qt/3D Real 3D because it does start to provide functionality we’re used to seeing in 3D toolkits. But we’re working to be sure we do not go too far to go down this road, and thus to decide what will go in and what will left out - so please consider the stuff in our labs release as definite maybes.

    QML and Qt/3D

    There’s a lot of buzz around Declarative UI and its associated language QML.

    Qt/3D will work with Declarative UI by providing QML bindings so 3D functionality can be easily used from Declarative UI programs. There’s a few demos of this in the source tree which can be tried out and you can see Henriks short video about QML and Qt/3D.

    We’ll expand on the exciting possibilities of QML and Qt/3D in later posts.

    We hope you like what we’re planning, and look forward to your feedback - keep tuned as there are more blog posts to follow, with some cool examples and things to try with Qt/3D.

    Rhys Weatherley
    Painting
    OpenGL
    Posted by Rhys Weatherley
     in Painting, OpenGL
     on Monday, November 09, 2009 @ 22:44

    For the last year, we have been investigating API’s that Qt needs to support 3D applications and clever 2.5D effects with OpenGL.  When we started all this a year ago, the problem was broken down into three main areas:

    • Enablers - Basic building blocks like matrices, shaders, vertex buffers, etc.
    • Portability API - API’s that make it easier to write code that ports between desktop OpenGL and embedded OpenGL/ES.  Particularly OpenGL/ES 2.0 which does not have a fixed function pipeline.
    • Real 3D - API’s that take Qt into new application spaces beyond animations and 2D effects.

    Obviously that covers a lot of ground, so in this post we will just focus on a few of the Enablers - specifically the ones that made it into 4.6 as the first taste of Qt/3D.  In future posts, we’ll publish Qt/3D repository details and show you more of our plans for later Qt/3D releases.

    Math3d

    Traditionally, Qt has relied upon the OpenGL library to provide mathematical primitives, using functions like glOrtho(), glRotate(), and so on to manipulate matrices and vectors.  However, with the advent of OpenGL/ES 2.0 it is no longer possible to rely upon the OpenGL library to do the heavy-lifting - the programmer has to do all the work. Also, the traditional OpenGL functions are really only useful when drawing objects - they aren’t of much use when building object meshes in memory and transforming them prior to uploading to the GPU.

    So we really needed a hardcore 3D math library, just like the other 3D toolkits (Coin3D, Ogre, OpenSceneGraph, etc).  But we didn’t want to go overboard - it is very easy to re-invent all of linear algebra and lose sight of the core goal: make typical 3D mathematical operations fast and elegant.  We recognized that libraries like Eigen were very good at doing everything in mathematics, but our own goals were more focused.  So what did we do?

    The central workhorse is of course QMatrix4×4, which is highly optimized for 3D operations.  Internally it keeps track of its “type” - whether it is a translation, scale, rotation, etc - so that it can more efficiently build up transformations than a naive “make matrices and multiply” implementation might.  QTransform does the same thing for 2D transformation matrices. The following is an excerpt from the hellogl_es2 example in Qt 4.6 which builds up a modelview matrix and sets it on a shader program:

    QMatrix4x4 modelview;
    modelview.rotate(m_fAngle, 0.0f, 1.0f, 0.0f);
    modelview.rotate(m_fAngle, 1.0f, 0.0f, 0.0f);
    modelview.rotate(m_fAngle, 0.0f, 0.0f, 1.0f);
    modelview.scale(m_fScale);
    modelview.translate(0.0f, -0.2f, 0.0f);
    program1.setUniformValue(matrixUniform1, modelview);

    As can be seen, it is very similar to the traditional OpenGL functions:

    glRotatef(m_fAngle, 0.0f, 1.0f, 0.0f);
    glRotatef(m_fAngle, 1.0f, 0.0f, 0.0f);
    glRotatef(m_fAngle, 0.0f, 0.0f, 1.0f);
    glScalef(m_fScale, m_fScale, m_fScale);
    glTranslatef(0.0f, -0.2f, 0.0f);

    The choice to make the functions similar was deliberate: code that uses the existing OpenGL functions can be quickly converted into more portable code that uses QMatrix4×4.

    The QGenericMatrix template is used for creating “other” matrix sizes that commonly crop up in OpenGL work: 2×2, 2×3, 2×4, 3×2, 3×3, 3×4, 4×2, and 4×3.  It can do a lot more of course, being a template, although we did draw the line at supporting sparse matrices - the matrix sizes that occur in 3D code are rarely very large.  A common question is why didn’t we make QMatrix4×4 an instance or subclass of QGenericMatrix.  The main reason is performance - the 4×4 class needs to be very fast and it is easier to performance-tune a concrete class that isn’t at the mercy of the compiler’s template expansion system.  The other reason is to reduce user confusion - the API’s for all QGenericMatrix sizes is exactly the same, but QMatrix4×4 is extremely rich in the additional operations it provides.

    QVector2D, QVector3D, QVector4D provide vector classes of various sizes to complement QMatrix4×4. An interesting feature for the purposes of OpenGL is that these classes are guaranteed to use the same floating-point type internally as GLfloat on the system. QPointF wasn’t suitable for our 2D vector needs because it uses qreal, which can either be float or double depending upon the compilation flags passed to Qt’s configure. The GLfloat guarantee is very important when building large 3D object meshes: you want to get the vertex data into the most efficient format as early as possible. If we had made the internal type qreal, then Qt/3D would have needed to do a lot of floating-point conversions when uploading vertex data to the GPU.

    The QQuaternion class is the last in our current math3d set. It provides an efficient implementation of rotations in 3D space for use with camera positioning, rotation, and animation.

    Shader Programs

    The fixed function pipeline in OpenGL is getting very “old school”.  These days, OpenGL is all about shaders, shaders, shaders.  But resolving the extensions and managing the compilation, linking, and use of shader programs can be quite daunting.  In Qt 4.5, we had no less than three different internal shader program wrappers for pixmap filters, the OpenGL2 paint engine, and the boxes demo.  So in Qt 4.6 we have merged all of these efforts and devised a new public API to wrap the extensions.  The result is the QGLShader and QGLShaderProgram classes, which:

    • Support the GLSL and GLSL/ES shader languages.
    • Handle vertex and fragment shaders (geometry shaders are coming in future versions of Qt).
    • Support writing portable shaders that work on both GLSL and GLSL/ES.

    That last point is probably the most interesting for Qt.  GLSL has a lot of built-in variables like gl_Vertex, gl_Normal, gl_ModelViewProjectionMatrix, etc that don’t exist in GLSL/ES.  In turn, GLSL/ES has additional type qualifiers like highp, mediump, and lowp that are used to specify the desired precision.  These issues can make it a pain to port existing shader code from desktop to embedded.  We didn’t want to have to write two sets of shaders for the OpenGL2 paint engine, so a solution needed to be found.

    The solution we chose was to use GLSL/ES as the primary language for writing shaders in Qt, and provide #define’s for the extra keywords to make the code compile on desktop GLSL systems.  It is still possible to use the full GLSL language if you want to, but portability will suffer.

    The following example demonstrates how to compile and link a simple shader program that can be used to draw triangles with a flat color:

    program.addShaderFromSourceCode(QGLShader::Vertex,
        "attribute highp vec4 vertex;"
        "attribute mediump mat4 matrix;"
        "void main(void)"
        "{"
        "   gl_Position = matrix * vertex;"
        "}");
    program.addShaderFromSourceCode(QGLShader::Fragment,
        "uniform mediump vec4 color;"
        "void main(void)"
        "{"
        "   gl_FragColor = color;"
        "}");
    program.link();
    program.bind(); 
    
    int vertexLocation = program.attributeLocation("vertex");
    int matrixLocation = program.attributeLocation("matrix");
    int colorLocation = program.uniformLocation("color");

    The highp and mediump keywords are added to keep GLSL/ES happy - on desktop they #define to an empty string. Also, we have deliberately used user variables for the vertex position, matrix, and color rather than relying upon the desktop-specific gl_Vertex, gl_ModelViewProjectionMatrix, and gl_Color variables. We can then draw a green triangle as follows:

    QVector3D triangleVertices[] = {
        QVector3D(60.0f,  10.0f,  0.0f),
        QVector3D(110.0f, 110.0f, 0.0f),
        QVector3D(10.0f,  110.0f, 0.0f)
    }; 
    
    QMatrix4x4 pmvMatrix;
    pmvMatrix.ortho(rect()); 
    
    program.enableAttributeArray(vertexLocation);
    program.setAttributeArray(vertexLocation, triangleVertices);
    program.setUniformValue(matrixLocation, pmvMatrix);
    program.setUniformValue(colorLocation, QColor(0, 255, 0, 255)); 
    
    glDrawArrays(GL_TRIANGLES, 0, 3); 
    
    program.disableAttributeArray(vertexLocation);

    Note the use of QMatrix4×4 above to create an orthographic projection matrix to pass to the vertex shader, and the use of QVector3D to build the vertex array.  And that’s basically it!  Shaders 101.

    What’s Next?

    Lots and lots of stuff.  Wrapper classes for vertex buffers and textures will probably go into Qt in the near future.  Geometry handling for building object models.  Special-purpose 3D viewing widgets. Integration with Declarative UI for quickly building 3D applications.  And the portability API.  More to come on these in the next post …



    © 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
    All other trademarks are property of their respective owners.