Donald Carr
OpenGL
Embedded
Build system
Posted by Donald Carr
 in OpenGL, Embedded, Build system
 on Friday, February 05, 2010 @ 10:49

Requirements

1) Tegra 2 platform
2) The latest Nvidia Tegra2 SDK (11.0074_devlite_eula_Beta-RC.zip at this time)

Board bring up

Nvidia have done a pretty good job in documenting bring up and I will not
paraphrase them. I personally used their dev environment exactly as
intended (answering every question posed during installation with an
affirmative), so my dev machine became a DHCP/NFS server serving out on a
secondary network interface. Be sure to select an X enabled SDK during installation as we
currently don’t work out of the box with their OpenKode drivers.

Their supplied documentation (in chm format) is thorough and documents
flashing the latest bootloader to the device (amongst other things) which I would
recommended on every update, to ensure targetfs/bootloader compatibility.

Initial rootfs adjustment/configuration

Once you have installed both packages included in
11.0074_devlite_eula_Beta-RC.zip you should have 2 dirs:

emPower-devlite-p1138
toolchains

Go into:

./emPower-devlite-p1138

cp -r include/* targetfs/usr/include
cp -r lib-target/* targetfs/usr/lib

your targetfs is now primed for GL compilation. You will need to boot your
target prior to proceeding though, as on first boot a host of packages
are installed into the targetfs including all the X11 headers required to
compile Qt/X11.

Additional headers (dbus, glib, freetype, gstreamer) might be
required for a full featured Qt build, but the packages installed on first
boot will suffice for a OpenGL ES 2 enabled Qt/X11 build with all of Qt’s
core functionality.

Go to town with apt-get according to your needs
(apt-get build-dep libqt4-gui unfortunately fails due to unmet
dependencies) and please be aware of the fact that the chm documentation
covers forwarding net traffic between your Tegra2 target and the external
world via your host machine.

Once this is done, we are ready to build qt.

Configuring build environment

I use Scratchbox 2 for reasons qualified in the appendix.

1) Change to your targetfs directory (cd $NV_ROOT/emPower-devlite-p1138/targetfs)

2) Run:

sb2-init -c /usr/bin/qemu-arm $SB2-TARGETNAME $NV_ROOT/toolchains/tegra2-4.3.2-nv/bin/arm-none-linux-gnueabi-gcc

within this directory, where:

$SB2-TARGETNAME is a suitable name for your target’s scratchbox environment
$NV_ROOT is where you unpacked the archive

You now have a sane scratchbox 2 env when you can compile Qt/X11. If you
were to enter the scratchbox env and build Qt now, it would get through
every module up until QtOpenGL was reached, at which point you would
witness spaghetti breakage referencing the inclusion of the qdebug/text
streaming classes.

This is because

targetfs/usr/include/EGL/eglplatform.h

includes several X11 headers:

Xlib.h
Xutil.h

as it explicitly references native X11 types. I initially tried to move
these headers out of this header file, but the path of least resistance
ended up being the undefining of conflicting defines at the end of the
eglplatform.h header file.

I introduced the following 4 lines:

#undef None
#undef Status
#undef Unsorted
#undef GrayScale

immediately prior to the final #endif in this file. I know this is dirty,
but it circumvents the point of breakage and resolves all remaining build failures.

Having done this, we are ready to build Qt/X11.

Building Qt

1) Enter scratchbox 2 with: sb2 -t $SB2-TARGETNAME
2) Enter your Qt directory
3) Configure Qt with (at a minimum):
./configure -xplatform linux-g++ -platform linux-host-g++ -opengl es2 -force-pkg-config ..
(The utilized mkspecs are discussed in the Appendix)
4) Check the output of configure to verify OpenGL ES2 (and any other
functionality you wish to build) has been correctly detected and enabled by
the configure tests.
5) run “make” (Qt should build through to completion)
6) “make install” Qt to its prefix path on the host (if necessary)
7) Use Qt to compile any appropriate Qt applications within the
scratchbox env
8 ) Deploy Qt to its prefix path (on the target) with any desired
applications

When you boot your Tegra2 platform, start X and launch your
OpenGL ES2 enabled Qt application (Either GLES2 content directly
in QGLWidget ala hellogl_es2, a QGLWidget fronted QGraphicsView
or via explicit use of the OpenGL ES2 graphics system), everything
should simply work to a greater or lesser extent and work at pace
at that.

We have not done any profiling of Qt on the Tegra2
hardware in order to quantify where we are today, nor any
dedicated integration in order to be maximize our use of the
underlying hardware but the baseline performance is very solid.

Appendix

mkspec information

linux-g++

the generic linux X11 mkspec, which behind the curtains of scratchbox
is mapped to the (environment creation time) specified cross compiler.

linux-host-g++

a modification of the generic linux X11 mkspec, which maps to the host machines compiler. This
mkspec is already present in the Qt maemo5 branch on our git repository. This
mkspec is basically a modification of linux-g++, with “host-” prefixed on
all the compiler variables. (included from ../common/g++.conf) The only
noteworthy thing about this mkspec is that instead of including
“../common/g++.conf” and modifying select variables accordingly, the
complete include is inlined. This is due to a known issue where overriding
QMAKE_CXX variables in not respected during the qmake boot strapping
process.

The linux-host-g++ mkspec looks roughly like this:

=============================================

#
# qmake configuration for linux-g++
#

MAKEFILE_GENERATOR = UNIX
TEMPLATE = app
CONFIG += qt warn_on release incremental link_prl
QT += core gui
QMAKE_INCREMENTAL_STYLE = sublib

#
# qmake configuration for common gcc
#

#inlined ../common/g++.conf follows

QMAKE_CC = host-gcc
QMAKE_CFLAGS += -pipe
QMAKE_CFLAGS_DEPS += -M
QMAKE_CFLAGS_WARN_ON += -Wall -W
QMAKE_CFLAGS_WARN_OFF += -w
QMAKE_CFLAGS_RELEASE += -O2
QMAKE_CFLAGS_DEBUG += -g
QMAKE_CFLAGS_SHLIB += -fPIC
QMAKE_CFLAGS_STATIC_LIB += -fPIC
QMAKE_CFLAGS_YACC += -Wno-unused -Wno-parentheses
QMAKE_CFLAGS_HIDESYMS += -fvisibility=hidden
QMAKE_CFLAGS_PRECOMPILE += -x c-header -c ${QMAKE_PCH_INPUT} -o ${QMAKE_PCH_OUTPUT}
QMAKE_CFLAGS_USE_PRECOMPILE += -include ${QMAKE_PCH_OUTPUT_BASE}

QMAKE_CXX = host-g++
QMAKE_CXXFLAGS += $$QMAKE_CFLAGS
QMAKE_CXXFLAGS_DEPS += $$QMAKE_CFLAGS_DEPS
QMAKE_CXXFLAGS_WARN_ON += $$QMAKE_CFLAGS_WARN_ON
QMAKE_CXXFLAGS_WARN_OFF += $$QMAKE_CFLAGS_WARN_OFF
QMAKE_CXXFLAGS_RELEASE += $$QMAKE_CFLAGS_RELEASE
QMAKE_CXXFLAGS_DEBUG += $$QMAKE_CFLAGS_DEBUG
QMAKE_CXXFLAGS_SHLIB += $$QMAKE_CFLAGS_SHLIB
QMAKE_CXXFLAGS_STATIC_LIB += $$QMAKE_CFLAGS_STATIC_LIB
QMAKE_CXXFLAGS_YACC += $$QMAKE_CFLAGS_YACC
QMAKE_CXXFLAGS_HIDESYMS += $$QMAKE_CFLAGS_HIDESYMS -fvisibility-inlines-hidden
QMAKE_CXXFLAGS_PRECOMPILE += -x c++-header -c ${QMAKE_PCH_INPUT} -o ${QMAKE_PCH_OUTPUT}
QMAKE_CXXFLAGS_USE_PRECOMPILE = $$QMAKE_CFLAGS_USE_PRECOMPILE

QMAKE_LINK = host-g++
QMAKE_LINK_SHLIB = host-g++
QMAKE_LINK_C = host-gcc
QMAKE_LINK_C_SHLIB = host-gcc
QMAKE_LFLAGS +=
QMAKE_LFLAGS_RELEASE += -Wl,-O1
QMAKE_LFLAGS_DEBUG +=
QMAKE_LFLAGS_APP +=
QMAKE_LFLAGS_SHLIB += -shared
QMAKE_LFLAGS_PLUGIN += $$QMAKE_LFLAGS_SHLIB
QMAKE_LFLAGS_SONAME += -Wl,-soname,
QMAKE_LFLAGS_THREAD +=
QMAKE_LFLAGS_NOUNDEF += -Wl,–no-undefined
QMAKE_RPATH = -Wl,-rpath,

QMAKE_PCH_OUTPUT_EXT = .gch

# -Bsymbolic-functions (ld) support
QMAKE_LFLAGS_BSYMBOLIC_FUNC = -Wl,-Bsymbolic-functions
QMAKE_LFLAGS_DYNAMIC_LIST = -Wl,–dynamic-list,

include(../common/linux.conf)

=============================================

Scratchbox

I personally use Scratchbox 2 rather than Scratchbox 1 since I use a 64 bit
distro, and Scratchbox 2 exists in the Ubuntu repositories.

http://packages.ubuntu.com/search?keywords=scratchbox&searchon=names&suite=karmic&section=all

You might get equally good mileage with Scratchbox 1, or entirely without
Scratchbox. I personally opt for the path of least resistance, and this blog is a cart
following that path.

Arguments in favour of Scratchbox

1) pkg-config in Ubuntu 9.10 does not support prefixes (sysroot) correctly,
so you have to directly modify the .pc files or build pkg-config yourself
2) The Nvidia targetfs/usr/lib entries often have fully qualified symlinks which link
into your host machines libs without a chroot safety net
3) There were some toolchain/targetfs anomalies which simply vanished when adopting this build approach

Known issues

1)The aforementioned qmake bootstrapping issue preventing QMAKE_CXX being over ridden in linux-host-g++
2) 16 bit X is the only environment tested and known to work. Qt GLES2 applications currently segfault
on launch under a 24 bit X session. I am busy investigating this issue. Please ensure that your xorg.conf file is using
16 bit as the default/selected color depth if you intend to run Qt apps.
3) libEGL.so prints a spurious error message:

“”Couldn’t load implementation for OpenGL_ES”

due to the absence of an libGLES(1).so which, it appears, can be safely ignored

gunnar
Painting
Graphics Dojo
OpenGL
Posted by gunnar
 in Painting, Graphics Dojo, OpenGL
 on Monday, January 18, 2010 @ 10:00

Previously in this topic:

In my previous post, The Cost of Convenience, we saw quite clearly that text drawing was a major bottleneck. Text drawing is quite common in GUI applications though, so we need a solution for that. If we break down what happens behind QPainter::drawText(), it is split into two distinct parts. Converting the string into a set of positioned glyphs, often refer to as “text layout” because it positions the glyphs, does text wrapping and adjustments for alignment. The second part is passing the glyphs to the paint engines to be rendered. When the text is the same all the time, the first part could be done once and the glyphs/positions just reused.

We have a class in Qt which allows you to cache the first part and only do the drawing for each frame. The class is QTextLayout. This is a low-level class, throwing asserts at you for the most trivial of mistakes. It also comes with a really inconvenient API, but it does reduce the most costly step of text drawing, which is the layout part. It is also only fair to mention that QTextLayout uses a lot more memory than just the glyph-array and positions array, as one could expect, so in a memory constrained setup, it should be used with caution. In 4.7, we plan to introduce an API for static text, which takes care of all the layout and stores only the required parts, reducing the overall memory footprint, but for now, QTextLayout is how you do it.

Going back to my virtual keyboard, updated Source Code, I’ve changed the “-buttonview” example to make use of QTextLayout. In the constructor, I build the layout:

    ButtonView() {
        QString content;
        for (int i='0'; i< ='Z'; ++i) {
            content += QLatin1Char(i);
            content += QChar(QChar::LineSeparator);
        }
        m_layout = new QTextLayout(content, font());
        QFontMetricsF fm(font());
        m_layout->beginLayout();
        for (int i=0; i<content .size() / 2; ++i) {
            QTextLine line = m_layout->createLine();
            line.setNumColumns(1);
            int x = (i) % 10;
            int y = (i) / 10;
            QSizeF s = fm.boundingRect(content.at(i*2)).size();
            line.setPosition(QPointF(x * 32, y * 32) + QPointF(16 + s.width() / 2, 16 + s.height() / 2));
        }
        m_layout->endLayout();
        m_layout->setCacheEnabled(true);
    }

</content>

If you look at the source code, there is more stuff going on in the constructor than I show above. This is because I extracted the text layout relevant parts only. So what we do is to build a string of the characters. Between each character I insert a LineSeparator. Without this, I wouldn’t be able to split the text into multiple QTextLine objects. From the content string, I construct the layout. For each character, I find its position in the grid and construct a QTextLine and move the line to its position. Each line is one column/character big. Finally I enabled caching on the layout. This is the step where we start caching the laid out text.

When it comes to the paint method, the code is rather straightforward. All the text is contained inside a single layout object so I can just call its draw function.

    void paint(QPainter *p, const QStyleOptionGraphicsItem *, QWidget *) {

        // Draw background pixmaps...

        m_layout->draw(p, QPointF(0, 0));
    }

Now, lets have a look at what this gains us:

Text Layouts

The graph shows the number of milliseconds per frame including the blit. Measured on an N900 with composition disabled. Smaller is better!

If we compare the “-no-indexing -optimize-flags” to the one with “-no-indexing -optimize-flags -text-layout”, we see that there is a significant reduction per frame. It brings raster from 9.3 ms per frame down to 5.5, OpenGL drops from 16 ms per frame to 9.1 ms when using a text layout. A drop of about 4 ms is also visible in the X11 paint engine.

Needless to say, using the QTextLayout class introduces a huge benefit, but it requires a bit more setup to get there. In this implementation I merged all the text into a single object which also makes it impossible for me to move one item relative to the others, such as adding an offset when a button is pressed. I could have one QTextLayout for each item, which would have been roughly the same performance, but at a higher memory cost.

Until next time, take care!

PS: A small comment on the item cache / X11 numbers. The connection is asynchronous and Qt completes its job at about 2.7 ms pr frame. With “-sync” on the command line, which makes all X calls synchronous, raises the time to about 10 ms per frame. If I had put a QApp::syncX() into each frame, synchronizing once per frame which is essentially what GL and VG are doing, I would probably get a number that is in between these two. What this means is that the numbers for X11 in this test are actually quite a bit worse than the graphs show.

gunnar
Graphics View
Painting
Graphics Dojo
OpenGL
Posted by gunnar
 in Graphics View, Painting, Graphics Dojo, OpenGL
 on Monday, January 11, 2010 @ 09:25

Previous posts in this topic:

So, its time for my next post. Todays topic is how convenience relates to performance, specifically in the context of QGraphicsView. My goal is to illustrate that the way to achieve fast graphics is to pack your QPainter draw calls as tightly together as possible. The more stuff that happens in the middle, the slower it gets.

To illustrate this, I’ve implemented a virtual keyboard. Granted, its not a very common layout nor is it usable, but the rendering is the point here, not the functionality. The full source code is here and it looks like this:

Virtual Keyboard Image

I’ve implemented the keyboard using three different approaches. One using proxy widgets, one using graphics items and one where the entire view is one graphics item. In addition to that, I added a number of options to tweak various properties, such as whether or not the text is drawn. I measured this on an N900 rather than a desktop because the difference becomes more profound on a small device. On the desktop it is easy to be fooled because most things complete in a matter of micro seconds anyway. It is only when the entire application comes together one notices that things are not as smooth as in the prototype, but too much work has been invested into the current design that one loses out on the super-slick feeling application.

QGraphicsProxyWidget

Since we’re implementing a series of clickable buttons, a natural and convenient starting point is to use an existing button class, such as the QPushButton. It already implements the logic for mouse/keyboard interaction and has signals for clicking and all sorts of other useful functionality. To get widgets into QGraphicsView, we use a QGraphicsProxyWidget. To make the test “fair”, I actually use a plain QWidget which just paints a pixmap and a draws a text. Had I gone through the styling API, these numbers would have been even worse.

ProxyWidget Results
Milliseconds spent per frame including blit to screen when using QGraphicsProxyWidgets. Low is better!

If we look at the plain “-proxywidgets” run, the fastest engine was the raster engine, running at 26ms per frame. If I wanted to slide this keyboard onto screen, I have 16ms available if I want it running at 60 FPS and 33ms available if I want to do it at 30 FPS. When each frame takes 26ms, I can barely do 30, but with only a little bit of slack, so if another process is soaking up CPU time, that number is also a bit difficult to reach. So, not very good. (BTW, the exact numbers in the graphs are listed as a comment in the top of the .cpp file I linked above).

The first thing I noticed with this approach was that the each button now had a gray background. This is of course the widget background. A QWidget embedded in QGraphicsView will be treated as a top-level and will therefore draw its background. I added an option “-no-widget-background” which sets the Qt::WA_NoBackground on the widget. This brings the rendering speed with raster down to 22ms. 4ms saved per frame, just by setting a flag, not too bad, but still pretty far from being awsome.

I’ve mentioned before that text drawing is not as fast as we would like it, so just to compare how it looks without text, I added a “-no-text” option to the test. This brings the raster results down to 13ms. That is pretty nice and below the 16ms threshold required to achieve 60 FPS, but only with a small margin. And I’m not drawing any text! Before I give up with this approach, I’ll enable item caching. By setting ItemCoordinateCache on each button, I cache both the background pixmap and the text in one single pixmap. This brings the raster results down to 8.5ms, and its starting to look acceptable. But at a very high memory cost… In my original usecase I had one shared pixmap for all the button backgrounds, but now I have one per button.

You may notice that there was a vast difference between item caching and the proxy widget drawing the pixmap. One thing that adds to the proxy widget cost is that the QPainter is recreated and initialized for each button in the buttons paint event. Also, as I mentioned in my previous post, An Overview, you may remember that I said that each widget has a system clip and that there is an overhead involved with calling the paintEvent. For items in QGraphicsView, there is already a painter, and I don’t need a clip, nor do I need any of the other stuff that goes on behind the scenes there. When we enable item coordinate caching, we don’t leave graphics view world and we don’t enter the widget world. This crossing is expensive, so by not going into the widget world, we save a lot.

So, if there is a lesson to be learned it is that QGraphicsProxyWidget should be used with extreme caution. If you really need it, use very few of them.

QGraphicsWidget

If proxy widgets are too slow to be usable in this scenario, then the next best thing is to use a QGraphicsWidget. This is a subclass of both QObject and QGraphicsItem, which gives me signals, slots and properties, but its not a QWidget and therefore still fairly lightweight. The numbers are as follows:

GraphicsWidgets Results
Milliseconds spent per frame including blit to screen when using QGraphicsWidgets. Lower is better!

Compared to the proxy widgets approach we’re starting out quite a bit better, with raster at 13 ms per frame, OpenGL at 20ms and X11 at 22ms. Below this line is a new line: “-no-indexing -optimize-flags”. QGraphicsView will by default put all the items in a view into a BSP tree for fast lookup, this is beneficial when the scene contains many items and you often need to find items that intersect with a small portion of the scene. In the testcase we’re always doing a full update, so there is no benefit from the index, so it can be disabled by calling scene->setItemIndexMethod(QGraphicsScene::NoIndex). Having a BSP is the default behaviour because graphics view was initially intended to be a static scene for many items. The most common usecase today is a few (a few hundred at max) items which tend to move a lot. For this reason, it is always a good idea to try to disable the BSP and see if it makes a difference in performance. If it helps, then leave it off.

I also know that the items play nice, meaning that they don’t change the clip, translate the painter, change the composition mode or modify any other state that would propagate to other items. This means I can safely set the DontSavePainterState optimization flag. Actually, based on an old habit, I set all possible optimization flags. I only consider unsetting them if my drawing code starts to look weird, at which point I would rather fix the drawing code and keep the flags set. By disabling indexing and enabling optimization shaves off 2ms per frame in for all rendering backends, so that is definitely worth it.

If I don’t do text, the performance is about twice as fast. Again we see that text drawing is a huge cost. We’re working on an API to fix this and we’ll have more information for you when we do. You may notice that enabling item caching drops the performance a bit compared to the “-no-text” case. There isn’t much overhead inside QGraphcisView for this path. A likely reason for the decrease is that reading from multiple memory sources (multiple pixmaps) results in a lot of cache misses, compared to the straight approach which draws the same pixmap over and over.

ButtonView Item

In my previous post I briefly mentioned that there is a slight overhead involved with the use of a QGraphicsItem too. Prior to calling the paint function, the painter is transformed to the coordinate system of the item and the painter state is saved. If the item draws a big polygon, this setup cost can be ignored, but when drawing just a pixmap and a few pixels of text, then it may be worth considering. In the spirit of “The more direct the painting code is, the faster it gets”, I implemented the keyboard as a single item. The numbers are as follows:

ButtonView Results
Milliseconds per frame including blit to screen when using a single item. Lower is better!

Raster is now down to 10ms, which is 1ms better than the QGraphicsWidget approach when all optimizations were enabled, so even though graphics items are cheaper than widgets, they still cost a bit. The keyboard is now rendered in a tight loop, and the major difference in performance here is caused by the fact that items in the scene have a transform associated with them. Prior to calling paint() a transform is set to match the painter to the items local coordinate system. This causes a state change in the paint engine. For each button we’re drawing a 32×32 pixmap which means alpha blending 1024 pixels, followed by doing text layout and drawing a single character. Even then do we save about 10% time by not having a QPainter::translate() in the midst, so bear that in mind. By enabling the optimization flags and disabling the index, raster drops a bit more, so having those are still a good idea.

You may have noticed that there is one dataset that is named “cheat” for OpenGL. I was reluctant to include this, because its using a private API that is not, and I really mean NOT, subject to binary compatibility rules. You cannot call this from your application. We’re going to add a public API for this in the future, hopefully 4.7, so until its there, wait. In the interest of showing what we are thinking internally, I thought I would show it.

OpenGL is really great for accelerating graphics, but its way of working does not map optimally to how Qt works. GL is really good at taking a few large datasets of triangles and rendering them, but its not so good at drawing loads of small things. Small things like button backgrounds, icons, single text items, etc. However, all the buttons backgrounds are the same pixmaps, so what if I could tell QPainter to draw the same pixmap in multiple places at once? In GL this would correspond to setting up a texture and one vertex and texture coordinate array and drawing some 40 pixmaps in one go. This fits much better with how GL is made to work. The result is that drawing the buttons drop from 5.2ms to 3.9ms, so another piece of juice squeezed out. Naturally, the more times the pixmap is drawn and the smaller the pixmap gets, the more benefit you get from batching commands like this.

There is a second option to OpenGL for the button view case, which is the “-ordered”. This was done after Tom brought to my attention that the testcase would do a shader program update for each painter call. In the default buttonview implementation we do:

                    for (int i=0; i < m_rects.size(); ++i) {
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));
                    }

Because pixmaps use one shader pipeline and text drawing uses another, the pipeline needs to be switched and reset all the time, which renders at 16m per frame. To see if it makes a difference, I added a second alternative rendering, “-ordered”, where I do all the pixmaps first, then all the text:

                    for (int i=0; i < m_rects.size(); ++i)
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                    for (int i=0; i&lt;m_rects.size(); ++i)
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));

This prevents the shader pipeline updates and bring the rendering time per frame down to 13ms, so definitely worth it.

Summing Up

Virtual Keyboard Combined Results
Milliseconds per frame including blit to screen for proxy widgets, graphics widgets and a single widget. Lower is better!

OpenGL comes out rather bad in this testcase, which I was a bit disappointed to see, but it did send Tom into an optimization frenzy, so we’re hoping to remove some of the constant overhead. It should also be said that when using the OpenGL graphics system, we enable multisampling by default, which increases rendering time on the N900 by around 30%. A plain QGLWidget would thus perform slightly better. Another aspect to OpenGL is that it uses a dedicated low-power chip, so even though it for this particular usecase runs at half the speed, it also uses a lot less battery, so it may still be the right choice. OpenGL will also scale significantly better than raster and X11 as the pixmaps get bigger or if the content of the button is slightly more advanced, say like a horizontal gradient.

The best numbers are definitely in the button view case, where all the content is rendered as one item, which is what I wanted to highlight with this blog. The button view item also opens up for other optimizations such as batching. We don’t have that many batching functions in QPainter today, its only drawRects(), drawLines() and drawPoints(), but we’re considering to add more, we are just not sure on how the API’s would look yet.

The bottom line is still that how Qt is used defines how well it performs. On one hand there may be an easy and convenient way to get the job done which performs quite sub-optimally. On the other hand there may be a more involved implementation which performs very well. I’m not trying to suggest that you do one or the other, there are a lot of good reasons for picking either one. But I hope that I’ve illustrated that some features come at a cost and that this is kept in mind along with what the target is when designs evaluated and chosen.

I’ll round off with a question. If you were to implement a particle effect when you press a button, which approach would you choose, having seen the numbers above?

TomCooksey
Painting
Graphics Dojo
OpenGL
Performance
Posted by TomCooksey
 in Painting, Graphics Dojo, OpenGL, Performance
 on Wednesday, January 06, 2010 @ 12:01

Introduction

Here’s the next instalment of the graphics performance blog series. We’ll begin by looking at some background about how OpenGL and QPainter work. We’ll then dive into how the two are married together in OpenGL 2 Paint Engine and finish off with some advice about how to get the best out of the engine. Enjoy!

Why OpenGL?

Before I dive into the OpenGL paint engine, I want to make sure we all understand the motivation for the OpenGL 2.0 paint engine. I’ve talked about this before in my article about hardware acceleration, but we still frequently get questions like “Why not implement a Direct2D paint engine?”.

Everyone knows OpenGL means fast graphics right? Well, this is actually a bit of a misconception. What makes graphics fast is a bit of hardware dedicated to computer graphics called a GPU (Graphics Processing Unit). OpenGL 2.x is a software library which often (but not always) uses a particular class of GPU to help satisfy drawing operations (Note: OpenGL 1.x used a different class of GPU). A modern programmable GPU (e.g. nVidia GTX 295) can usually be programmed via both OpenGL, Direct3D and OpenCL. The only difference then is that Direct3D is only available on the Windows platform and OpenCL is not universally supported.

So the reason we are investing our time and effort into OpenGL, rather than Direct3D or OpenCL, is that OpenGL 2.0 is sufficient to give us access to all the GPU features we currently want to use. It is also available on more platforms, especially if you limit yourself to the ES sub-set. We are also looking into restricting ourselves further to only use APIs in OpenGL 3.2 Core Profile.

This might change in the future if we see a new class of GPU, like ones designed for 2D vector graphics which can’t be abstracted by OpenGL 2.0 very well (enter OpenVG), or, if we want to start using GPU features which OpenGL (ES) 2.0 doesn’t give us access to. Having said that, OpenGL is very good at exposing new GPU features through extensions.

History

Qt has had an OpenGL paint engine since early Qt 4.0 days. This engine was designed for the fixed-function hardware available at the time. As time went on and manufacturers added newer bits of hardware to their GPUs, the OpenGL paint engine was adapted to use those features through OpenGL extensions. Over the last 4 years, lots of people have hacked on the engine and added support for things like ARB fragment programs and even adapted the engine to work on OpenGL ES 1.1. The engine is pretty stable and has lots of fall-backs (or original code-paths, depending on how you look at them) for old hardware missing GL extensions the engine can utilise. But, fundamentally, it is an OpenGL 1.x engine.

In early 2008, around the time of the Falcon project (the Falcon Project was an internal project started for Qt 4.5 which focused on painting performance and architecture), it became increasingly clear that Qt needed to support hardware acceleration using the OpenGL ES 2.0 API which was starting to appear on embedded System-On-Chips like the OMAP3. There were two options available: Extend the existing OpenGL paint engine further still, or develop a new paint engine from scratch. When looking at the existing engine, there was a major problem – although it supported fragment programs, it was heavily reliant on fixed-function vertex processing. A further consideration was that the Falcon project had just kicked off and the future of the QPaintEngine API was uncertain. Both of these factors resulted in a new paint engine being written from scratch for OpenGL ES 2.0. This new engine had a distinct advantage over the existing engine: everything I wanted to use from OpenGL was in the core OpenGL ES 2.0 API. This meant I didn’t need to add fallbacks in case of missing functionality, leading to much cleaner and leaner code.

Another point about OpenGL ES 2.0 is that it doesn’t have much in the way of fixed function features – forcing you to write shader programs. While annoying at the time, this is apparently the best way to do things even on desktop GPUs. This point is important because it quickly became apparent that although the engine was designed for GLES2, not only would it also work on desktop OpenGL 2.0, but it would use that API in a way better suited for modern programmable GPUs. So, in Qt 4.6, the new engine is used by default on both GLES2 and on desktop systems which support OpenGL 2.0.

What does OpenGL (ES) 2 provide?

As I’ve already mentioned, OpenGL ES 2.0 is a pretty lean and mean API which models programmable GPUs. The “programmable” bit is fundamental to the API. It means that you write small programs known as shaders, ask OpenGL to compile and then run them on the GPU to process the data you give it. There are two types of shaders: one type processes positions (vertices) and another type processes pixels (fragments), called the vertex shader and fragment shader, respectively. The idea is that you tell OpenGL you want to draw some triangles and the vertex shader is run to determine the position of each of those triangles. Then, the GPU turns each triangle into a bunch of pixels and the fragment shader is run to determine the colour of each of those pixels. The API provides various ways of passing data from the CPU to the GPU (from textures and lists of triangle positions to individual floats) and ways of passing data from the vertex shader to the fragment shader. That’s basically it. All the complexity lives in the shaders you give to the GPU to run.

What does QPainter require?

The rest of this blog assumes you are familiar with the QPainter API (if not, go check the QPainter docs) ). It might also be a good idea to read through Gunnar’s post about how the Raster engine works.

So, the QPainter API provides more than just triangles. It is therefore the GL paint engine’s job to turn the whole of the QPainter API into “just a bunch of triangles”. To understand its task a little better, you have to split QPainter up into chunks which map better to OpenGL. A great example of this is drawRect(). In QPainter terms, this is a single primitive, but in GL engine terms, it is actually two: A rectangle (the fill) and a (possibly quite complex) line round the outside (the stroke). The OpenGL paint engine tries to keep a fairly clean separation between the shape of something which is drawn and its fill. So, here’s the list of primitives (shapes) QPainter requires the engine to draw:

  • Simple primitives (Rectangles, convex polygons, ellipses, etc.)
  • Text
  • Pixmaps
  • Strokes
  • Complex vector paths (QPainterPath)

In addition to this, we have various fills which we can use on our primitives provided by QBrush:

  • Solid colour
  • Linear gradients
  • Radial gradients
  • Conical gradients
  • Bitmap patterns
  • Textures

Not only do we have different types of fill, but we also support a full 3×3 transformation matrix on the brushes. This allows you to draw a rectangle but use it as a kind-of stencil over (for example) a perspective transformed texture.

Finally, QPainter also requires the engine to implement clipping, different composition modes and support it’s state stack (QPainter::save() & QPainter::restore()).

Engine Operation

Primitive Rendering

  • Simple Primitives: To render convex primitives such as rounded rectangles, we just generate a GL triangle fan and render it using glDrawArrays
  • Text: For large text, we convert it to a complex path and render is as such. However, for smaller font sizes, we rasterize the individual font glyphs and upload them as a texture (8-bit texture for bitmap & anti-aliased glyphs and 24-bit RGB for sub-pixel anti-aliased glyphs). This glyph texture is used as a mask in the engine’s pixel pipeline (see below). So, in terms of primitives, text is actually rendered as a set of rectangles - one rectangle for each glyph. When rendering with sub-pixel anti-aliased glyphs, it is possible that the engine will need to do two passes (if the brush is not a solid colour). This is because the engine uses a clever trick and sets the brush’s colour as the glBlendColor and outputs the RGB mask in the fragment shader. It is then able to set a glBlendFunc which combines the two and gives per-sub-pixel blending. If you set a more complex brush, the engine has to do two passes - first apply the mask to the destination, then a second pass to apply the brush, with glBlendFunc set to give the correct result.
  • Pixmaps: A pixmap is actually just a rectangle.
  • Strokes: Strokes can be very complex - just take a look at the pathstoke demo! However, even the most complex dashed pattern with rounded joins and end caps can be turned into a GL triangle strip relatively easily. This is done by the QTriangulatingStroker.
  • Complex vector paths: This is where things get tricky. QPainterPaths can have lots of things which break the “turn lineTo, moveTo and curveTo into verticies and render as triangle fan” algorithm…

Rendering Using Stencil Technique

Take the following path as an example:

Convex Path (1)

Here we have a seemingly trivial path with only 4 points. To draw this with GL, you could just convert the path’s points to verticies and draw it as a triangle fan, which results in two triangles: Triangle 1: ABC and Triangle 2: ACD. The problem is that just looks like a solid triangle, not the path we wanted:

Convex Path (2)

So, to overcome this difficulty, we drop to a 2-pass rendering method which uses the stencil buffer as a temporary scratchpad. So first off, we clear the stencil buffer to all zeros (represented as white):

Stencil Buffer (Clear)

Next, we set the stencil operation to invert, which means instead of setting the stencil value to ‘1′ when a triangle touches a pixel, invert the existing value instead. So 0->1 & 1->0. First we render the first triangle (ABC). As all the pixels are currently 0, every pixel touched by the triangle turns to 1 (represented as black):

Stencil Buffer (Triangle 1)

Next, we draw the second triangle (ACD). Note: We are inverting the stencil’s value, so black pixels touched by the second triangle turn to white and white pixels turns to black:

Stencil Buffer (Triangle 2)

So now the stencil buffer contains the silhouette of our path. All we do now is draw a rectangle into the destination window, but with the stencil test enabled.

In addition to the stencil technique, we are also adding experimental support for triangulating QPainterPaths and caching the triangulation. While this is slower for paths which change often or are zoomed in & out, paths which are relatively static can be triangulated once and rendered multiple times without having to re-triangulate.

Filling Primitives

Now we know how all the different QPainter operations get turned into GL primitives, but we’re still missing how they get filled. As already mentioned, the colour of a pixel is determined by the fragment shader. We therefore have lots of different fragment shaders for different types of fill. However, we also need to support text rendering with arbitrary fills (QPainter lets you fill text with a perspective transformed radial gradient). In the future, we also want to support composition modes which OpenGL doesn’t provide. We’ve also found there are ways we can simplify the shaders for certain situations (and thus improve performance). The result is that Qt needs lots of different shaders. At last count, we’d need over 1000 different shaders to cover all situations. That’s a lot of GLSL to maintain and test, far more than the resources we have available. So instead we split the shaders into different interchangeable “stages”. This is achieved by having each stage in it’s own GLSL function. As an example, lets take regular, non sub-pixel anti-aliased text rendering with a transformed radial gradient. Note, this is just an example to demonstrate how the engine operates and you probably shouldn’t do it in performance critical situations.

We render gradients by pre-calculating a 1px high texture (like a 1D texture) on the CPU which we sample from in the fragment shader. However, we calculate the texture coordinates in the vertex shader and pass it to the fragment shader as a varying. This is because it’s a good idea to do as much work as possible in the vertex shader rather than the fragment shader as it is called so much less frequently.

As already mentioned, we render (non sub-pixel) anti-aliased text by using an 8-bit mask texture. We then multiply the fragment colour by a sample taken from this mask. So, if we’re on the edge of a glyph where the alpha value is <1, we adjust the alpha of the srcPixel by that amount (actually, we also adjust the RGB values too as we use pre-multiplied alpha pixel format internally).

If there was a non-standard composition mode, we’d then pass the masked pixel to another stage which would blend it with the background (although this isn’t implemented yet!).

So you can see in the fragment shader, there’s 3 different stages. The first stage (srcPixel) determines the brush colour of the fragment. The next stage (applyMask) modulates the pixel by a mask to achieve anti-aliased text rendering. The final stage (compose) then blends the pixel with the background. We also have a similar staging technique for the vertex shader. All this complexity is nicely abstracted by the QGLEngineShaderManager. The paint engine tells the shader manager what it wants to draw and the shader manager selects an appropriate selection of shaders. One final note on this: While desktop OpenGL 2 supports linking multiple fragment shaders in a single program, OpenGL ES 2.0 does not. This means that we actually use the different stages by appending them to a single string of GLSL we pass to GL. This also gives the GL implementation the best chance to inline the different stages (without which, performance would suck).

Texture Management

The OpenGL paint engine makes heavy use of gradients. For example, even though it’s perfectly possible to calculate colours for gradients in the fragment shader, we still use a texture as a look-up-table as it is so much faster. Repeatedly uploading textures every time we need them would ruin performance. So instead, we keep a per-context cache of what QPixmap/QImage is already present in texture memory. If two contexts are sharing then we also detect this and don’t duplicate the textures. This functionality is available publicly in QGLContext::bindTexture() too.

On Linux/X11 platforms which support it, Qt will use glX/EGL texture-from-pixmap extension. This means that if your QPixmap has a real X11 pixmap backend, we simply bind that X11 pixmap as a texture and avoid copying it. You will be using the X11 pixmap backend if the pixmap was created with QPixmap::fromX11Pixmap() or you’re using the “native” graphics system. Not only does this avoid overhead but it also allows you to write a composition manager or even a widget which shows previews of all your windows.

Antialiasing

The OpenGL paint engine uses OpenGL multisampling to provide anti-aliasing. Typically, this will be 4x/8x FSAA, meaning 4/8 levels of coverage, which is worse quality than the raster engine, which always uses 256 levels of coverage. However, as the DPI of modern displays increases, you can get away with lower-quality anti-aliasing.

Using multisampling also doesn’t affect text rendering as text is anti-aliased using masks rather than multisampling (for smaller font sizes). So text rendered with the OpenGL engine should look almost as good as text rendered with the raster engine (which also does gamma-correction). The only drawback of using multisampling is that some OpenGL implementations don’t support switching multisampling off. Indeed, the OpenGL ES 2.0 specification doesn’t even provide the API to switch it off. The consequence is that non-anti-aliased (a.k.a. aliased) rendering can be broken (Everything gets anti-aliased even when the QPainter::Antialiasing hint isn’t set). There’s little we can do about this. :-(

Clipping

QPainter supports setting an arbitrary clip, including complex QPainterPaths. Qt uses the GL stencil buffer (or more specifically the lower 7 bits of the stencil buffer) to store the clip. The clip is written to in the same way as we render any other primitive, even using the stencil technique for complex paths. However, instead of filling pixel colours into a colour buffer, we fill stencil values into the stencil buffer. The actual value we use depends on the current QPainter stack depth (how many times save() was called minus the number of time restore() was called). This means that if you restrict yourself to intersect clips (Qt::ClipOperation == Qt::IntersectClip), the engine only needs to write to the part of the stencil buffer which is being clipped to. What’s more, the engine doesn’t need to write to the stencil buffer at all when you call restore() - it just changes the value at which the stencil test passes.

In addition to using the stencil buffer for clipping, the OpenGL paint engine can also just use glScissor. This only allows a single, untransformed rectangle to be used as the clip, which can be quite restrictive. However, it is by far the fastest way to do clipping. So if performance is more important to you than utility, only ever use untransformed rectangular clips.

Recommendations

Interleaved Rendering

Unlike OpenGL, QPainter allows an arbitrary number of rendering contexts (QPainters) to be active in the same thread at the same time. For example, in your widget’s paint event, you can begin a painter on your widget and begin another painter on a QPixmap and interleave rendering to them:

void Widget::paintEvent(QPaintEvent*)
{
QPainter widgetPainter(this);
widgetPainter.fillRect(rect(), Qt::blue);
QPixmap pixmap(256, 256);
QPainter pixmapPainter(&amp;pixmap);
pixmapPainter.drawPath(myPath);
widgetPainter.drawPixmap(0, 0, &amp;pixmap);
}

While this works ok with the OpenGL graphics system, having to switch from doing something with one painter to doing something with a different painter can be very costly and should be avoided whenever possible.

Mixing QPainter and Native OpenGL

As shown in several examples, it is possible to mix your own OpenGL rendering code with QPainter rendering code. However, as OpenGL is a giant state machine, it is very easy for you to accidently clobber Qt’s GL state and vice-versa. To overcome this, we’ve added some new API to QPainter in Qt 4.6 - QPainter::beginNativePainting() and QPainter::endNativePainting(). To prevent artifacts, you must enclose your custom painting in beginNativePainting() and endNativePainting(). This is very important - even if you’re not seeing any problems now, you might find your code starts failing in a future Qt release in which the GL paint engine works slightly differently. Also, as beginNativePainting and endNativePainting sets lots of OpenGL state, it can be quite expensive and thus you should try to use it sparingly. Try to batch up all your custom OpenGL code in a single block.

QGLWidget vs OpenGL Graphics-System

Unlike the raster & OpenVG paint engine, you don’t have to use a specific graphics system to render widgets using the OpenGL paint engine. The QtOpenGL module provides several classes, including QGLWidget, which all use the OpenGL paint engine regardless of what graphics system is being used. QGLWidget is basically a regular widget which always has a native window ID and is always rendered to using OpenGL. You are free to choose whichever method you want to get OpenGL rendering (graphics system or QGLWidget). However, using the opengl graphics system can often be slower than using a QGLWidget, as Qt needs the contents of the “back buffer” (or QWindowSurface) to be preserved when flushing the render to the window system. OpenGL does not guarantee this and it is often not the case so Qt has to use either an FBO or a PBuffer as the back buffer. When the render needs to be flushed, the FBO or PBuffer is bound to a texture, rendered into the window and then the GL buffers are swapped. This extra overhead is avoided by using a QGLWidget, however as a consequence, it is not possible to redraw a sub-region of a QGLWidget: Whenever a QGLWidget is updated, the entire widget must be re-drawn.

It should also be noted that using the OpenGL paint engine isn’t a silver bullet which makes everything faster. For example, the GL engine really sucks at drawing lots of small geometry with state changes between each drawing operation. While we’re working on improving that use case at the moment, the raster paint engine will probably always be faster just because it has so much less overhead. So QGLWidget might be a great way to get the best of both worlds when combined with the raster graphicssystem - Use QGLWidget for operations which GL excels at and the raster engine for everything else.

Tips for Performance (fps)

As a general rule of thumb, OpenGL state changes are expensive. So, use the knowledge you now have of what’s going on under QPainter and try to minimise the number of OpenGL state changes the paint engine needs to do. For example, if you implement a virtual keyboard, you now know that the engine uses a shader for text rendering and a different shader for pixmaps, so draw all the key pixmaps first, then draw all the text on top. That way, the engine only needs to change shaders twice per frame.

  • Never, ever use anything other than intersecting clips
  • Don’t switch render target in the middle of a render
  • Try to use use untransformed rectangular clips whenever possible
  • Minimise changing the brush wherever possible
  • Render batches of primitives of the same types together.
  • Avoid drawing translucent pixels & blending (particularly important on mobile GPUs)
  • Try to cache QPainterPaths and re-use them rather than creating & discarding them in your paintEvent
  • Use QPainterPaths even when there’s a QPainter convenience function. E.g. Rounded rects and elipses.
  • If you’re drawing lots of small pixmaps, try bunching them up into a single, larger pixmap
  • Prefer to use power-of-two (2^n) widths & heights for QImages and QPixmaps (128×256, 256×256, 512×512, etc)
  • If using QGLWidget and don’t need anti-aliasing, don’t enable sample buffers in the QGLFormat
  • If rendering complex QPainterPaths, try to only use odd-even fill rule
gunnar
Painting
Graphics Dojo
OpenGL
Performance
Posted by gunnar
 in Painting, Graphics Dojo, OpenGL, Performance
 on Wednesday, December 16, 2009 @ 06:54

For this blog series that I’m doing, I figure its nice to start with an overview of the whole painter, pixmaps, widgets, graphicsview, backingstore idea.

At the centre of all Qt graphics is the QPainter class. It can render to surfaces, through the QPaintDevice class. Examples of paint devices are QImage’s, QPixmaps and QWidgets. The way it works is that for a given QPaintDevice implementation we return a custom paint engine which supports rendering to that surface. This is all part of our documentation so perhaps not too interesting. Lets look at this in more detail.

QWidgets and QWindowSurface

Even though QWidget is a QPaintDevice subclass, one will never render directly into a QWidget’s surface. Instead, during the paintEvent, the painting is redirected to an offscreen surface which is represented by the internal class QWindowSurface. This was traditionally implemented using the QPainter::setRedirected(), but has since been replaced by an internal mechanism between QPainter and QWidget which is slightly more optimal.

Some times we refer to this surface as “the backingstore”, but it really is just a 2D surface. If you ever looked through the Qt source code and found a class QWidgetBackingStore, this class is responsible for figuring out which parts of the window surface needs to be updated prior to showing it to screen, so its really a repaint manager. When the concept of backingstore was introduced in Qt 4.1, the two classes were the same, but the introduction of more varying ways to get content to screen made us split it in two.

In the old days widgets were rendered “on screen”. Though the option to paint on screen is still available, it is not recommended to use it. I believe the only system that remotely supports it is X11, but it is more or less untested and thus often cause artifacts in the more complex styles. Setting the flag Qt::WA_PaintOnScreen means that the repaint manager inside Qt ignores that widget when repainting the windowsurface and instead sends a special paintEvent to that widget only. Prior to Qt 4.5 there was a significant speed gain to be had when 10-100 widgets updated at max fps, but in Qt 4.5 the repaint manager was optimized to handle this better so, on screen painting is usually worse than buffered.

Back to the window surface. All widgets are composited into the window surface top to bottom and the top-level widget will fill the surface with its background or with transparent if the Qt::WA_TranslucentBackground attribute is set. All other widgets are considered transparent. A label only draws a bit of text, but doesn’t touch anything else. What that means for the repaint manager, is that every widget that overlaps with the label, but stacks behind it, needs to be drawn before it. If the application knows that a certain widget is opaque and will draw every single pixel for every paint event, then one should set the Qt::WA_OpaquePaintEvent, which causes the repaint manager to exclude the widgets region when painting the widgets behind it.

Since all widgets are repainted into the same surface, we need to make sure that widgets don’t accidentally paint outside their own boundaries and into other widgets. Since there is no guarantee that widgets will paint inside their bounds, this could potentially lead to painting artifacts, so we set up a clip behind QPainter’s back called the “system clip”. For most widgets the system clip is a rectangle and looking at the performance section of the QPainter docs, we see that that is not so bad. Rectangular clips, when pixel aligned, are fast. A masked widget, on the other hand, is a performance disaster. It is slower to set up and slower to render. The system clip is the same clip that is passed to the paint event, except that the clip in the paint event has been translated to be relative to the top-left of the widget, rather than to the top-left of the surface. Do NOT set the paint event’s region as a clip on the painter. It is already set up, and we don’t detect that it is the exact same region and just process it fully again. The purpose of the region/rect in the paint event is so that widgets can decide to not draw certain parts. This is primarily useful when you have big scenes in the widgets, such as a map application, graphics view or similar.

In addition to the system clip which is set up prior to calling paintEvent, the painter also needs to be in a clean state, which means setting up brushes, pens, fonts and others. Its not a huge amount, but if you have many widgets it adds up. So, though widgets are no longer native window handles (aka Alien), there is still a price tag involved in repainting them. Be aware of that when you design your application. For instance, implementing a photo gallery using QLabel’s with pixmaps in a QScrollArea doesn’t scale. You would have to set up clipping and all the other states per label, even though the label only draws a pixmap. A single “view” widget would scale much better, because the widget can then implement a tight loop that draws pixmaps in the right places.
This whole backingstore and window surface logic only hold for Mac OS X when raster or opengl graphics systems are used. Personally I would strongly recommend to use raster, it implements the full feature set, it is often faster, has the same performance profile as Qt on Windows and painting bugs are prioritized higher for raster than for the CoreGraphics backend. In qt/main we plan to switch the default for Mac OS X to raster, we just have to iron out some window system integration issues.

Graphics systems

The concept of a graphics system was introduced in Qt 4.5. The idea is to be able to select at startup time, on an application level, what kind of graphics stack you should be using. The graphics system is responsible for creating the pixmap backends and the window surface. We currently have graphics systems for raster, OpenGL 1.x, OpenGL/ES 2.0, OpenVG and X11. You can select graphics systems either by starting the application with the command line option -graphicssystem raster|opengl|opengl1|x11|native, where “native” means to use the system default. Another option is to provide the exact same option to configure which will set that option for all applications using Qt. Finally there is the function QApplication::setGraphicsSystem which hardcodes the graphics system for a given application.

In later blogs, we plan to go into each of the paint engines in more detail, but for now, lets just look at the highlights.

Raster

The raster graphics system is the reference implementation of QPainter. It implements all the features we specify and does it all in software. When a new port is started, such as with S60, we usually start with getting raster running. It is currently the default on Windows, Embedded, S60 and will also be on Mac OS X.

Just a though. What do you think of raster on X11? If you ignore for a second that you currently get a local process local font cache. It performs quite nice on X11 and I’ve seen many people switch it at runtime. If we consider remote displays, this seems daunting, but it still may not be too bad. The way it works in the X11 paint engine today is that any gradient and pixmap transform is anyway done in software and uploaded as an image on a per painter-command level. Why not just do it all client side and upload only the parts that needs updating. We can watch HD videos (for some definition of HD, anyway) on youtube, certainly we can afford to upload a few pixels. This is bound to generate comments on XRender and server-side gradients and transforms, but these have been tried numerous times and the performance is simply not good enough.

The window system integreation is handcoded for each platform to make the most out of it. For windows the windowsurface is a QImage which shares bits with a DIBSECTION, which results in pretty good blitting speed. On X11 we use MIT Shared Memory Images. We used to use Shared Memory Pixmaps, but this is removed from Xorg, but we got this awesome patch from the community, so we’re back up and running. On Mac OS X, we’re experimenting with using GL texture streaming for getting the backbuffer to screen and we’re seeing some promising numbers with that, so I hope that will make into Qt for 4.7 too.

Because it is just an array of bytes, most native API’s have the ability to render into the same buffer we do. This makes integration with native theming quite straightforward, which is one of the reasons why this is attractive as a default desktop graphics system, despite not being hardware accelerated.

OpenGL

We have two OpenGL based graphics systems in Qt. One for OpenGL 1.x, which is primarily implemented using the fixed functionality pipeline in combination with a few ARB fragment programs. It was written for desktops back in the Qt 4.0 days (2004-2005) and has grown quite a bit since. You can enable it by writing -graphicssystem opengl1 on the command line. It is currently in life-support mode, which means that we will fix critical things like crashes, but otherwise leave it be. It is not a focus for performance from our side, though it does perform quite nicely for many scenarios.

Our primary focus is the OpenGL/ES 2.0 graphics system, which is written to run on modern graphics hardware. It does not use a fixed functionality pipeline, only vertex shaders and fragment shaders. Since Qt 4.6, this is the default paint engine used for QGLWidget. Only when the required feature set is not available will we fall back to using the 1.x engine instead. When we refer to our OpenGL paint engine, its the 2.0 engine we’re talking about.

We’ve wanted to have GL as a default graphics system on all our desktop systems for a while, but there are two major problems with it. Aliased drawing is a pain, it is close to impossible to guarantee that a line goes where you want it for certain drivers. Integration with native theming is a pain. It is rarely possible to pass a GL context to a theming function and tell it draw itself, hence we need to use temporary pixmaps for style elements. On Mac OS X, there is a function to get a CGContext from a GL context, but we’ve so far not managed to get any sensible results out of it. On the other hand, much of the UI content doesn’t depend on these features, which makes GL optimal for typical scene rendering, such as the viewport of a QGraphicsView or a photo gallery view. So as far as how the default setup in Qt will look in the future, we’re considering that the best default setup for desktop may be a combination of raster for the natively themed widgets and GL for one or two high-performance widgets. Nothing is decided on this topic though, we’re just looking at alternatives.

Another problem with using GL by default is font sharing. With raster we could theoretically share pre-rendered glyphs between processes in a cross platform manner using shared memory, with GL this becomes a bit more difficult. On X11, there is an extension to bind textures as XPixmaps which can be shared across processes, but this will usually force the textures into a less optimal format which makes them somewhat slower to draw, so it is still not optimal. On Windows, Mac OS X, S60 or QWS, we would need driver-level support for sharing texture ids, which we currently don’t have.

OpenVG

I actually quite blank in this area. I’ve not been involved with writing it nor getting it up and running. It sits on top of EGL which makes it quite similar to the OpenGL graphics systems. We expect that OpenVG will be used in a number of mid-range embedded devices.

The cool thing about OpenVG is that it matches the QPainter API quite nicely. It supports paths, pens, brushes, gradients and composition modes, so in theory, the vectorial APIs should run optimally.

Rhys, which wrote the OpenVG paint engine, plans to do a post on the OpenVG paint engines internals in full in the near future.

Images and Pixmaps

The difference between these two is mostly covered in the documentation, but I would like to highlight a few things none the less.

Our documentation says: “QImage is designed and optimized for I/O, and for direct pixel access and manipulation, while QPixmap is designed and optimized for showing images on screen.”

Raster

When using the raster graphics system, pixmaps are implemented as a QImage, with a potentially significant difference. When converting a QImage to a QPixmap, we do a few things.

The image is converted to a pixel format that is fast to render to the backbuffer, meaning ARGB32_Premultiplied, RGB32, ARGB8565_Premultiplied or RGB16. When images are loaded from disk using the PNG plugin or when they are generated in software by the application, the format is often ARGB32 (non-premultiplied) as this is an easy format to work on, pixel-wise. I’ve measured ARGB32_Premultiplied onto RGB32 to be about 2-4x faster than drawing an ARGB32 non-premultiplied depending on the usecase.

Secondly, we check the pixel data for transparent pixels and convert it to an opaque format if none are found. This means that if a “.png” file is loaded as ARGB32 from disk, but only contains opaque pixels, it will be rendered as an RGB32, which is also about 2-4x faster.

OpenGL

When using the OpenGL graphics system the actual implementation of the QPixmap varies a bit from setup to setup. The most ideal option gets enabled when your GL implementation supports Frame Buffer Objects (FBOs) in combination with the GL_EXT_framebuffer_blit extension. In this case, the pixmap is represented as a OpenGL texture id, and whenever a QPainter is opened on the pixmap we grab an FBO from an internal pool and use the FBO to render into the texture.

Without these extensions available, which is typically the case for OpenGL/ES 2.0 devices, the implementation is a QImage (in optimal layout, same as raster) which is backed by a texture id. When you open a QPainter on the pixmap, you render into the QImage and when the pixmap is drawn to the screen, the texture id is used. Internally there is a syncing process between the two representations, so there will be a one-time hit of re-uploading the texture after drawing into it.

In general

If you intend to draw the same QImage twice, always convert it to a QPixmap.

There are some usecases where QPixmap is potentially worse though. We have these functions, QPixmap::scaled(), QPixmap::tranformed() and friends, which historically are there because we wanted QImage and QPixmap to have similar API’s. We have support for reimplementing this functionality on a per pixmap-backend basis, but currently no engine does this, so for the GL case, or X11 for that matter, calling QPixmap::transformed() implies a conversion from QPixmap into QImage, a software conversion, and the a conversion back to the original format.

By default a QPixmap is treated as opaque. When doing QPixmap::fill(Qt::transparent), it will be made into a pixmap with alpha channel which is slower to draw. If the pixmap is going to end up as opaque, initialize it with QPixmap::fill(Qt::white). You can even skip the initialization step all together if when you know that all pixels will be written as opaque when the pixmap is painted into.

Before moving onto something else, I’ll just give a small warning on the functions setAlphaChannel and setMask and the innocently looking alphaChannel() and mask(). These functions are part of the Qt 3 legacy that we didn’t quite manage to clean up when moving to Qt 4. In the past the alpha channel of a pixmap, or its mask, was stored separately from the pixmap data. Depending on which platform you were on, the actual implementation was a bit different. For instance on X11, you had one 1-bit pixmap mask + an 8-bit alpha channel + a 24-bit color buffer. On Windows you had a 1-bit mask + a packed 32-bit ARGB pixel buffer. In Qt 4 we merged all this into one API, so that QPixmap is to be considered a packed datastructure of ARGB pixels. What we did not remove the functions implementing the old API however. In fact, we even added the alpha channel accessors, so we made it worse. The API was to some extent convenient, but all those four functions imply touching all the data and either merging the source with the pixmap or extracting a new pixmap from the current pixmap content. Bottom line. Just don’t call them. With composition modes, you can manipulate the alpha channel of the pixmaps using QPainter. This also has the benefit that it will potentially be SSE optimized for raster or done in hardware on OpenGL, so it has potential for being quite a bit faster. There is also the QGraphicsOpacityEffect which allows you to set a mask widgets and graphics items, but as of today, it is not as fast as we would like it to be.

QGraphicsView

I’ll do at least one separate post on graphicsview alone, so I’ll just comment quickly on the difference between using QGraphicsView with items vs QWidget’s. QGraphicsView with its scene populated with items is in many ways very similar to the widgets and their repaint handling. With the addition of layouts and QGraphicsWidgets the line is even more blurry. So which solution should you pick? More and more often, we’re seeing that people choose to create their UI’s in graphics view rather than creating them using traditional widgets.

Compared to widgets, items in a graphics view are very cheap. If we consider the photo gallery again, then using a separate item for each of the items in the view may (I say may) be reasonable. A widget is repainted through its paintEvent. A QGraphicsItem is repainted through its paint function. The good thing with the items function is that there is no QPainter::begin as the painter is already properly set up for rendering. Another good thing is that the painter has less guaranteed state than the in the widget case. There may be a transformation and some clip, but no guarantees about fonts, pens or brushes. This makes the setup a bit cheaper.

Another huge improvement over widgets is that items are not clipped by default. They have a bounding rectangle and there is a contract between the subclass implementer and the scene that the item does not paint outside. If we compare this to the system clip we need to set for widgets, then again there is less work to be done for the items. If the item violates this there will be rendering artifacts, but for graphicsview this has proven an acceptable compromise.

Most UI elements are rather simple. A button, for instance, can be composed of a background image and a short text. In QPainter terms that is one call to drawPixmap and one call to drawText. The less time spent between painter calls the better the performance. The less state changes between painter calls, the better the performance. Looking back at how much happens between these calls for a button, you quickly realize that the traditional widgets are quite heavy. If widgets are going to survive the test of time, then they need to behave more like QGraphicsItem’s.

Some final words

I’ve been rambling on for a while, but hopefully there was some useful information in here. You may have noticed that I do not mention printing, PDF or SVG generation, nor do I focus on X11 or CoreGraphics paint engines in great detail. This is because, as outlined in the painter performance docs, we focus our performance efforts in on only a few backends which we consider critical for Qt.

Donald Carr
Qt
Graphics View
Painting
OpenGL
Performance
Embedded
Build system
Posted by Donald Carr
 in Qt, Graphics View, Painting, OpenGL, Performance, Embedded, Build system
 on Friday, November 20, 2009 @ 00:53

Introduction

Texas Instruments has a wiki which documents what is required to bring Qt
up on the Beagle board with full OpenGL ES (1/2) support:

http://www.tiexpressdsp.com/index.php/Building_Qt

and I would like to thank one of their engineers, Varun, for his quick turn
around times in addressing any questions I raised.

This blog entry is intended to serve a similar purpose, but is more verbose regarding
Qt considerations and the initial beagle board bring up. It attempts to serve
as a comprehensive independent source of information on getting Qt built
for the Beagle board with full OpenGL ES 2 support.

These instructions are intended for use with Qt 4.6 (and beyond), so grab
the release candidate or check Qt 4.6 out from the public git repository prior
to proceeding.

You can choose to use either Qt/Embedded or Qt/X11, both can
be successfully integrated with the Beagle board’s SGX GPU and the only
point of divergence in these instructions will be at (Qt) configure time
and the client side system (run time) configuration. Both implementations
offer window management, via QWS and X11 respectively, and operate at
around 27fps and 22fps respectively when running our hellogl_es2 example.
(16bit color depth at 1280×720)

I personally deploy Ångström on my Beagle board, it handles a large amount
of the logistics surrounding cross compilation and is generally very
agreeable, and these instructions are therefore going to be bolted to
Ångström for completeness. Feel free to establish an environment capable of
showing the OpenGL ES examples TI provide, then following the Qt level
considerations (Configuring Qt) accordingly.

For those holding a dormant Beagle board who are open to the author’s
distribution preferences:

Building the Ångström rootfs

Open Embedded is manifested in a git repository: in this posting we are
working within origin/stable/2009. Please follow the instructions give
here, they are comprehensive and got me completely off the ground.

http://www.angstrom-distribution.org/building-angstrom

These instructions end in you running:

bitbake base-image ; bitbake console-image ; bitbake x11-image

which actually builds an X11 angstrom image for your Beagle board. Please
note, you will need to build the X11-image if you want to build and deploy
the SGX packages (we will do this in the next section) via Ångström as opkg considers
X11 to be a required dependency of libgles-omap3_3.00.00.09. This is due
to one of the encapsulated windowing system libraries being X11 centric:

libpvrPVR2D_X11WSEGL.so

Regardless of the indicated X11 dependency, this package will bestow the required
kernel module on you for general OpenGL ES usage (console or X11). We will be
building our own QWS centric (libpvrPVR2D_X11WSEGL.so equivalent) library
behind the scenes for QWS in the Qt/Embedded instructions given later.

Ångström SGX integration

You now need to integrate the SGX drivers on your Ångström system.

You need to get your paws on:

OMAP35x_Graphics_SDK_setuplinux_3_00_00_09.bin

with the following MD5 checksum:

e15147ad76ddbe7c5aec682f5455b774

Getting this involves following the above link and going through the required registration/request process.
Once you have this file, you drop it in:

$OETREE/openembedded/recipes/powervr-drivers/libgles-omap3

and then run:

bitbake libgles-omap3-3.00.00.09

which generates the following packages:

libgles-omap3_3.00.00.09-r1.1_armv7a.ipk
libgles-omap3-dbg_3.00.00.09-r1.1_armv7a.ipk
libgles-omap3-demos_3.00.00.09-r1.1_armv7a.ipk
libgles-omap3-dev_3.00.00.09-r1.1_armv7a.ipk
libgles-omap3-tests_3.00.00.09-r1.1_armv7a.ipk

Deploy the x11-image to an sd-card, and copy these packages to the sd-card
for deployment on the target. If your beagle board does not have internet
access you will probably also require:

*  devmem2
*  libx11-6 (Only if you insisted on using a console build!)

as opkg will not be able to automatically install the required dependencies
from its repositories and you would hit the following error at deployment:

———————————————-
root@beagleboard:/opt/deploy# opkg install ./libgles-omap3_3.00.00.09-r1.1_armv7 a.ipk
Installing libgles-omap3 (3.00.00.09-r1.1) to root…
libgles-omap3: unsatisfied recommendation for libgles-omap3-tests
Collected errors:
* ERROR: Cannot satisfy the following dependencies for libgles-omap3:
*  devmem2 *  libx11-6 (>= 1.1.5) *
———————————————-

Once you have installed all the above packages, please reboot the board.

Your bootargs in U-Boot should look something like:

console=ttyS0,115200n8=noinitrd ip=dhcp rw root=/dev/mmcblk0p2 omapfb.mode=dvi:1280×720MR-16@60

assuming you want to output via DVI and are running a similar kernel
version (2.6.29-omap1 on my beagle) which accepts the same kernel
arguments indicated in the bootargs variable above.

Please note that we are specifying a 16 bit color depth which is intentional
and discussed in the “color depth considerations” section in the appendix

Please run the powervr demos (under X11) to establish that your drivers are
successfully installed and usable.

Configuring Qt

In order to build Qt now, all that is required for each target is an
appropriate mkspec:

For Qt/X11

You would fork your mkspec off the linux-g++ mkspec, the resulting mkspec’s
qmake.conf would resemble:

==================================================================
………….
include(../common/linux.conf)

# modifications to g++.conf
# These release optimization flags are TI supplied
# and a little more aggressive than Qt standard (gentoo types rejoice!)
QMAKE_CFLAGS_RELEASE     = -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
QMAKE_CXXFLAGS_RELEASE     = $$QMAKE_CFLAGS_RELEASE

QMAKE_CC         = $FULLY_QUALIFIED_COMPILER_PREFIX-gcc
QMAKE_CXX         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
QMAKE_LINK         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
QMAKE_LINK_SHLIB     = $FULLY_QUALIFIED_COMPILER_PREFIX-g++

# modifications to linux.conf
QMAKE_LIBS_EGL         = -lEGL -lIMGegl -lsrv_um
QMAKE_LIBS_OPENGL_QT     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um
QMAKE_LIBS_OPENVG     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um -lOpenVG -lOpenVGU

QMAKE_INCDIR         = $TARGET_STAGING_PATH/usr/include
QMAKE_LIBDIR         = $TARGET_STAGING_PATH/usr/lib

QMAKE_AR         = $FULLY_QUALIFIED_COMPILER_PREFIX-ar cqs
QMAKE_OBJCOPY         = $FULLY_QUALIFIED_COMPILER_PREFIX-objcopy
QMAKE_STRIP         = $FULLY_QUALIFIED_COMPILER_PREFIX-strip

load(qt_config)
==================================================================

and you would configure Qt with:

configure -arch arm -xplatform linux-omap3-g++ -opengl es2 -openvg

all that remains is to adjust /etc/powervr.ini on the target to be:

[default]
WindowSystem=libpvrPVR2D_FLIPWSEGL.so

Now compile an example, eg:

./examples/opengl/hellogl_es2

deploy it and Qt to the target and enjoy.

For Qt/Embedded

Since we don’t have the X11 abstraction, we have to interface with the
underlying hardware/interfaces with Qt/Embedded’s gfx abstraction layer. We
are going to be making some heavy use of the powervr driver resident under:

$QTSRCTREE/src/plugins/gfxdrivers/powervr

there is a README file in the powervr directory that is definitely
recommend reading, and lends some serious insight into our powervr driver
and Qt/Embedded in general. The same driver is used for MBX/SGX targets and
hence sees a fair amount of usage on a variety of target devices.

You would fork your mkspec off the qws/linux-arm-g++ mkspec, the resulting mkspec’s
qmake.conf would resemble:

==================================================================
…………….
include(../../common/qws.conf)

# modifications to g++.conf
QMAKE_CFLAGS_RELEASE     = -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
QMAKE_CXXFLAGS_RELEASE     = $$QMAKE_CFLAGS_RELEASE

QMAKE_CC         = $FULLY_QUALIFIED_COMPILER_PREFIX-gcc
QMAKE_CXX         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
QMAKE_LINK         = $FULLY_QUALIFIED_COMPILER_PREFIX-g++
QMAKE_LINK_SHLIB     = $FULLY_QUALIFIED_COMPILER_PREFIX-g++

# modifications to linux.conf
QMAKE_INCDIR         = $TARGET_STAGING_PATH/usr/include
QMAKE_LIBDIR         = $TARGET_STAGING_PATH/usr/lib

QMAKE_LIBS_EGL         = -lEGL -lIMGegl -lsrv_um
QMAKE_LIBS_OPENGL_QT     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um
QMAKE_LIBS_OPENVG     = -lEGL -lGLESv2 -lGLES_CM -lIMGegl -lsrv_um -lOpenVG -lOpenVGU

QMAKE_AR         = $FULLY_QUALIFIED_COMPILER_PREFIX-ar cqs
QMAKE_OBJCOPY         = $FULLY_QUALIFIED_COMPILER_PREFIX-objcopy
QMAKE_STRIP         = $FULLY_QUALIFIED_COMPILER_PREFIX-strip

#These defines are documented in the powervr README, please read it
DEFINES += QT_QWS_CLIENTBLIT QT_NO_QWS_CURSOR

load(qt_config)
==================================================================

and you would configure Qt with:

/opt/dev/source/qt-beagle-4.6/configure -embedded arm -little-endian -xplatform qws/linux-omap3-g++ -opengl es2 -openvg -plugin-gfx-powervr

all that remains is to adjust /etc/powervr.ini on the target to be:

[default]
WindowSystem=libpvrQWSWSEGL.so

Now compile an example, eg:

./examples/opengl/hellogl_es2

deploy it and Qt to your board, and after shutting down X, run the example with
the following arguments:

./hellogl_es2 -qws -display powervr

-qws - starts the application as the QWS server with exclusive access to the
system hardware which manages all subsequent Qt “client” applications

-display powervr - indicates that Qt should use the powervr driver we
compiled earlier

Summary

I hope that this posting encourages people to go forward and experiment
with a fully accelerated Qt 4.6 on the beagle board. Offloading the
painting work onto the GPU drastically reduces the load on the CPU and
broadens the range of applications which can feasibly be run on this
broadly available (cheap!) embedded hardware. The Beagle board has
really nice hardware, and it would be infinitely useful for us to have external
people using our powervr driver and getting it as broadly used/refined as
possible.

Appendix

Additional Benefits to OpenGL ES acceleration

If you take any Qt Graphics View based example and set a QGLWidget as its
viewport, a large amount of work will be offloaded on the GPU leaving your
CPU free to frolic. To put this in perspective, a modified version of:

./examples/animation/animatedtiles

which continually transitions runs smoothly at 720p on the beagle board
when using software, but consumes 100% CPU time according to top (99.3% to
be fair). It is therefore CPU bound and you are not going to be doing
anything else in the background.

When backed by a QGLWidget, the CPU usage drops to 20% on the exact same
example in the exact same conditions (720p, at 16bit color depth). The
frame rate suffers slightly, but at least this is mandated by the GPU

Minor clipping issue evident in hellogl_es2

The bubbles are evidently clipped on the right hand side, I will hopefully
beat you to reporting this at: http://bugreports.qt.nokia.com/secure/Dashboard.jspa

I have not seen any other artifacts, please file any additional bugs you
may encounter at the above URL.

Are these instructions applicable to OMAP3 targets in general

Yes. There is no theoretical reason these instructions would not suffice
for any OMAP3 based target, although I have not personally verified them
outside of Beagle board usage. Caveat emptor.

No Scratchbox2 usage when cross compiling

The more astute of your would recognize that I bypassed Scratchbox2 when
configuring Qt/X11 this time around. I payed dearly for it, and this X11
build has no fontconfig, dbus or glib support even though the Ångström
subsystem I am building against has support for all of them. If you want a
full fledged X11 build with decent font support and OpenGL ES support,
please either:

1) Invest your time in physically adjusting your MKSPEC (and/or wrestling
pkg-config) to get all desired dependencies detected and built against

-Or-

2) Take the easy road, refer to my previous blog posting “Cross compiling
Qt/X11″ and merge the above mkspec changes into the:

./mkspecs/unsupported/linux-scratchbox2-g++

mkspec in your Qt 4.6 source tree.

The same goes for Qt/Embedded which is more self sufficient, but which will
be built without dbus, glib, etc and additional external dependency support
without additional mkspec/environment modification or the use of Scratchbox2
to abstract this away.

Color depth considerations

1) The powervr implementation we are relying on does not support
PVRSRV_PIXEL_FORMAT_RGB888 (24bit color depths), it does however support
PVRSRV_PIXEL_FORMAT_RGB565 and PVRSRV_PIXEL_FORMAT_ARGB8888

2) Ångström is busybox based, and the fbset command you will need to set 32
bit color depths on the console will not work with the default fbset
busybox symlink. You will therefore have to install and use fbset(.real)
in order to get 32bit color depths, which is a simple opkg install away for
the connected Beagle board and a bitbake away for the stranded.

Please note the color depth specified in the boot arguments

console=ttyS0,115200n8=noinitrd ip=dhcp rw root=/dev/mmcblk0p2 omapfb.mode=dvi:1280×720MR-16@60

if you want 32 bit color depth, use:

console=ttyS0,115200n8=noinitrd ip=dhcp rw root=/dev/mmcblk0p2 omapfb.mode=dvi:1280×720MR-24@60

followed by:

/usr/sbin/fbset.real -depth 32 -rgba 8/16,8/8,8/0,8/24

after your Linux kernel drops you in userspace with a kiss on the cheek. A
brave man once tried leaving the color depth at 16 in his boot args, and
jumping all the way to 32bit with fbset so he could change between the more
performant 16 bit color space and the hardware compositing ARGB offering.
Running the dedicated fbset command halved his vertical resolution
regardless of any other parameters he tried to pass fbset and he eventually
ran off to fight another day.

There is a clear performance hit of 7 fps when running hellogl_es2 in
32bit rather than 16bit, taking you down to 20 fps. This hit is even more
pronounced when setting a QGLWidget on the viewport of a QGraphicsView. I
am not sure who is responsible for this, and will be personally
investigating it in the future. Any conjecture/feedback/research performed
by the reader would be greatly appreciated.

*Edited: Introduce rudimentary formatting to make the blog look less Vim forged

Sarah Smith
Graphics
OpenGL
Posted by Sarah Smith
 in Graphics, OpenGL
 on Tuesday, November 17, 2009 @ 23:14

For all you 3D and graphics hackers out there this will not be news:  writing OpenGL code is a pain.

Well, the Qt Graphics team is coming to save your sanity with the a new project called Qt/3D.

We teased about Qt/3D by putting a few of the foundations for it in Qt 4.6, which will be released very shortly. See the Qt/3D 4.6 features blog post for more details.

At some point Qt/3D will be available as part of Qt itself - exactly what sort of module or library we are not sure just yet - but for now you can try it out via Qt Labs!

With this post we’re pleased to announce that Qt/3D will be available for experimental use via the new Qt/3D labs repo.

Old School OpenGL code gets the Qt Treatment

The trusty old QGLWidget got you past first base: a nice window set up with a OpenGL context ready to go.

But from there you’re on your own with the OpenGL reference book, tearing your hair out writing code like

void My3DWidget::paintGL()
{
   QColor clearColor(palette().color(backgroundRole()));
   glClearColor(clearColor.redF(), clearColor.greenF(), clearColor.blueF(), clearColor.alphaF());
   glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
   QColor color(170, 202, 0, 255);
   glColor4f(color.redF(), color.greenF(), color.blueF(), color.alphaF());
   static float vertices[] = {
      60.0f,  10.0f,  0.0f,
      110.0f, 110.0f, 0.0f,
      10.0f,  110.0f, 0.0f
   };
   glVertexPointer(3, GL_FLOAT, 0, vertices);
   glEnableClientState(GL_VERTEX_ARRAY);
   glDrawArrays(GL_TRIANGLES, 0, 3);
   glDisableClientState(GL_VERTEX_ARRAY);
}

just to paint a triangle on the screen.

But then if you want cross-platform code - something that you can try on your desktop, and then run on your device with OpenGL ES, it starts to look really horrible!

Macros everywhere to cope with the different function signatures and data types - not to mention shaders under ES 2.0 versus classic GL on the desktop, and a swath of other cross-platform difficulties.

With Qt/3D your code looks like this:

void My3DWidget::paintGL()
{
    QGLPainter painter(this);
    painter.setClearColor(palette().color(backgroundRole()));
    painter.clear();
    painter.setStandardEffect(QGL::FlatColor);
    painter.setColor(QColor(170, 202, 0, 255));
    QGLVertexArray vertices(QGL::Position, 3);
    vertices.append(60.0f,  10.0f,  0.0f);
    vertices.append(110.0f, 110.0f, 0.0f);
    vertices.append(10.0f,  110.0f, 0.0f);
    painter.setVertexArray(vertices);
    painter.draw(QGL::Triangles, 3);
}

And what’s more it runs the same on your OpenGL/ES device and your desktop.  (Note that I have elided the view and model transform setup code from both examples above for the sake of space).

As mentioned in the previous blog post Qt/3D has been in the wings for some time now, and the eagle-eyed might have notice math classes springing up in Qt’s GUI module.

These classes provide the basis for Qt/3D’s cross platform geometry abstraction: QGLVertexArray. This nifty class also dovetails into the QGLBuffer class to take care of uploading your geometry to VBO’s on the graphics adaptor, as well as coping with differences in platform on data member sizes.

Download the code from the labs repo and try out the examples - the code above comes from the tutorials directory, where you can find out more about writing your traditional OpenGL apps in the Qt cross-platform way.

Whats in Store with Qt/3D

There’s more to come from Qt/3D over and above the Portability tools mentioned in the example above.

With Enablers are included encapsulation classes like QGLMaterialParameters to encapsulate OpenGL materials in a cross platform and Qt’ish way.

One of the nicest enablers is the QGLView class and its friends.  Doing your GL painting into a view looks pretty much exactly the same as with an OpenGL widget, but a few more things are taken care of for you - no need to set up tricky viewing and model transforms (which is one reason why I elided them from the code above).  But even better as a bonus you get a pan-rotate-zoom view window for free.   Its customizable using the QGLCamera class, and with QGLLightParameters you can quickly set up your own lights too.

Then there’s Real 3D bringing basic but powerful geometry management, and model file import functionality. With this stuff we’re just dipping our toes into the world of 3D to allow coding up of basic applications using Qt style containers, QObject based memory management, and the kinds of abstractions you’ve come to expect from Qt. If you’re an Ogre programmer, or used to using Coin3D or CrystalSpace or other powerful 3D and modelling libraries - well, you’ll still need them. We’re not planning to go into competition with those established 3D toolkits.

Instead our aim is to deliver on the promise of Qt: do more with less.  It should be just as easy to use a 3D model file as it is to use a PNG file, and it should be just as easy to set up a cube with a texture on it as it is to create a Qt label. We call this component of Qt/3D Real 3D because it does start to provide functionality we’re used to seeing in 3D toolkits. But we’re working to be sure we do not go too far to go down this road, and thus to decide what will go in and what will left out - so please consider the stuff in our labs release as definite maybes.

QML and Qt/3D

There’s a lot of buzz around Declarative UI and its associated language QML.

Qt/3D will work with Declarative UI by providing QML bindings so 3D functionality can be easily used from Declarative UI programs. There’s a few demos of this in the source tree which can be tried out and you can see Henriks short video about QML and Qt/3D.

We’ll expand on the exciting possibilities of QML and Qt/3D in later posts.

We hope you like what we’re planning, and look forward to your feedback - keep tuned as there are more blog posts to follow, with some cool examples and things to try with Qt/3D.

Rhys Weatherley
Painting
OpenGL
Posted by Rhys Weatherley
 in Painting, OpenGL
 on Monday, November 09, 2009 @ 22:44

For the last year, we have been investigating API’s that Qt needs to support 3D applications and clever 2.5D effects with OpenGL.  When we started all this a year ago, the problem was broken down into three main areas:

  • Enablers - Basic building blocks like matrices, shaders, vertex buffers, etc.
  • Portability API - API’s that make it easier to write code that ports between desktop OpenGL and embedded OpenGL/ES.  Particularly OpenGL/ES 2.0 which does not have a fixed function pipeline.
  • Real 3D - API’s that take Qt into new application spaces beyond animations and 2D effects.

Obviously that covers a lot of ground, so in this post we will just focus on a few of the Enablers - specifically the ones that made it into 4.6 as the first taste of Qt/3D.  In future posts, we’ll publish Qt/3D repository details and show you more of our plans for later Qt/3D releases.

Math3d

Traditionally, Qt has relied upon the OpenGL library to provide mathematical primitives, using functions like glOrtho(), glRotate(), and so on to manipulate matrices and vectors.  However, with the advent of OpenGL/ES 2.0 it is no longer possible to rely upon the OpenGL library to do the heavy-lifting - the programmer has to do all the work. Also, the traditional OpenGL functions are really only useful when drawing objects - they aren’t of much use when building object meshes in memory and transforming them prior to uploading to the GPU.

So we really needed a hardcore 3D math library, just like the other 3D toolkits (Coin3D, Ogre, OpenSceneGraph, etc).  But we didn’t want to go overboard - it is very easy to re-invent all of linear algebra and lose sight of the core goal: make typical 3D mathematical operations fast and elegant.  We recognized that libraries like Eigen were very good at doing everything in mathematics, but our own goals were more focused.  So what did we do?

The central workhorse is of course QMatrix4×4, which is highly optimized for 3D operations.  Internally it keeps track of its “type” - whether it is a translation, scale, rotation, etc - so that it can more efficiently build up transformations than a naive “make matrices and multiply” implementation might.  QTransform does the same thing for 2D transformation matrices. The following is an excerpt from the hellogl_es2 example in Qt 4.6 which builds up a modelview matrix and sets it on a shader program:

QMatrix4x4 modelview;
modelview.rotate(m_fAngle, 0.0f, 1.0f, 0.0f);
modelview.rotate(m_fAngle, 1.0f, 0.0f, 0.0f);
modelview.rotate(m_fAngle, 0.0f, 0.0f, 1.0f);
modelview.scale(m_fScale);
modelview.translate(0.0f, -0.2f, 0.0f);
program1.setUniformValue(matrixUniform1, modelview);

As can be seen, it is very similar to the traditional OpenGL functions:

glRotatef(m_fAngle, 0.0f, 1.0f, 0.0f);
glRotatef(m_fAngle, 1.0f, 0.0f, 0.0f);
glRotatef(m_fAngle, 0.0f, 0.0f, 1.0f);
glScalef(m_fScale, m_fScale, m_fScale);
glTranslatef(0.0f, -0.2f, 0.0f);

The choice to make the functions similar was deliberate: code that uses the existing OpenGL functions can be quickly converted into more portable code that uses QMatrix4×4.

The QGenericMatrix template is used for creating “other” matrix sizes that commonly crop up in OpenGL work: 2×2, 2×3, 2×4, 3×2, 3×3, 3×4, 4×2, and 4×3.  It can do a lot more of course, being a template, although we did draw the line at supporting sparse matrices - the matrix sizes that occur in 3D code are rarely very large.  A common question is why didn’t we make QMatrix4×4 an instance or subclass of QGenericMatrix.  The main reason is performance - the 4×4 class needs to be very fast and it is easier to performance-tune a concrete class that isn’t at the mercy of the compiler’s template expansion system.  The other reason is to reduce user confusion - the API’s for all QGenericMatrix sizes is exactly the same, but QMatrix4×4 is extremely rich in the additional operations it provides.

QVector2D, QVector3D, QVector4D provide vector classes of various sizes to complement QMatrix4×4. An interesting feature for the purposes of OpenGL is that these classes are guaranteed to use the same floating-point type internally as GLfloat on the system. QPointF wasn’t suitable for our 2D vector needs because it uses qreal, which can either be float or double depending upon the compilation flags passed to Qt’s configure. The GLfloat guarantee is very important when building large 3D object meshes: you want to get the vertex data into the most efficient format as early as possible. If we had made the internal type qreal, then Qt/3D would have needed to do a lot of floating-point conversions when uploading vertex data to the GPU.

The QQuaternion class is the last in our current math3d set. It provides an efficient implementation of rotations in 3D space for use with camera positioning, rotation, and animation.

Shader Programs

The fixed function pipeline in OpenGL is getting very “old school”.  These days, OpenGL is all about shaders, shaders, shaders.  But resolving the extensions and managing the compilation, linking, and use of shader programs can be quite daunting.  In Qt 4.5, we had no less than three different internal shader program wrappers for pixmap filters, the OpenGL2 paint engine, and the boxes demo.  So in Qt 4.6 we have merged all of these efforts and devised a new public API to wrap the extensions.  The result is the QGLShader and QGLShaderProgram classes, which:

  • Support the GLSL and GLSL/ES shader languages.
  • Handle vertex and fragment shaders (geometry shaders are coming in future versions of Qt).
  • Support writing portable shaders that work on both GLSL and GLSL/ES.

That last point is probably the most interesting for Qt.  GLSL has a lot of built-in variables like gl_Vertex, gl_Normal, gl_ModelViewProjectionMatrix, etc that don’t exist in GLSL/ES.  In turn, GLSL/ES has additional type qualifiers like highp, mediump, and lowp that are used to specify the desired precision.  These issues can make it a pain to port existing shader code from desktop to embedded.  We didn’t want to have to write two sets of shaders for the OpenGL2 paint engine, so a solution needed to be found.

The solution we chose was to use GLSL/ES as the primary language for writing shaders in Qt, and provide #define’s for the extra keywords to make the code compile on desktop GLSL systems.  It is still possible to use the full GLSL language if you want to, but portability will suffer.

The following example demonstrates how to compile and link a simple shader program that can be used to draw triangles with a flat color:

program.addShaderFromSourceCode(QGLShader::Vertex,
    "attribute highp vec4 vertex;"
    "attribute mediump mat4 matrix;"
    "void main(void)"
    "{"
    "   gl_Position = matrix * vertex;"
    "}");
program.addShaderFromSourceCode(QGLShader::Fragment,
    "uniform mediump vec4 color;"
    "void main(void)"
    "{"
    "   gl_FragColor = color;"
    "}");
program.link();
program.bind(); 

int vertexLocation = program.attributeLocation("vertex");
int matrixLocation = program.attributeLocation("matrix");
int colorLocation = program.uniformLocation("color");

The highp and mediump keywords are added to keep GLSL/ES happy - on desktop they #define to an empty string. Also, we have deliberately used user variables for the vertex position, matrix, and color rather than relying upon the desktop-specific gl_Vertex, gl_ModelViewProjectionMatrix, and gl_Color variables. We can then draw a green triangle as follows:

QVector3D triangleVertices[] = {
    QVector3D(60.0f,  10.0f,  0.0f),
    QVector3D(110.0f, 110.0f, 0.0f),
    QVector3D(10.0f,  110.0f, 0.0f)
}; 

QMatrix4x4 pmvMatrix;
pmvMatrix.ortho(rect()); 

program.enableAttributeArray(vertexLocation);
program.setAttributeArray(vertexLocation, triangleVertices);
program.setUniformValue(matrixLocation, pmvMatrix);
program.setUniformValue(colorLocation, QColor(0, 255, 0, 255)); 

glDrawArrays(GL_TRIANGLES, 0, 3); 

program.disableAttributeArray(vertexLocation);

Note the use of QMatrix4×4 above to create an orthographic projection matrix to pass to the vertex shader, and the use of QVector3D to build the vertex array.  And that’s basically it!  Shaders 101.

What’s Next?

Lots and lots of stuff.  Wrapper classes for vertex buffers and textures will probably go into Qt in the near future.  Geometry handling for building object models.  Special-purpose 3D viewing widgets. Integration with Declarative UI for quickly building 3D applications.  And the portability API.  More to come on these in the next post …



© 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
All other trademarks are property of their respective owners.