Benjamin
WebKit
Performance
Posted by Benjamin
 in WebKit, Performance
 on Tuesday, February 02, 2010 @ 11:13

Like for the other parts of Qt, having great performance is important for QtWebKit.

Traditionally, QtWebKit has been mostly used on desktop computers for advanced layouting, hybrid applications or simply to browse the web. On a modern computer, the speed of WebKit is not a problem.

The world has changed, and QtWebKit is now used on mobile phones running Maemo, Symbian or Windows CE, and it is more and more used in embedded application on various devices.

Working on what matters

To improve the performance of WebKit, we works with benchmarks. Those benchmarks are used as use cases when profiling WebKit and to evaluate the gain of our patches.

Those benchmarks and the tools are available on Gitorious, in the QtWebkit performance repository.

WebKit gives a lot of possibilities, and we try to focus our work on what matters. For the performance work, we use real webpages, and look for ways to improve WebKit in the way it is used on the Web.

How we benchmark WebKit

The performance suite has three kinds of tools:

  • host tools: to manage the data used by the benchmarks
  • tests: the benchmarks
  • reductions: some benchmarks for specific components of WebKit

Let’s have a look at the tools and the tests. The full documentation of the performance suite is on the WebKit’s wiki.

Mirroring the web

For the benchmark, we do not want to access Internet directly. We want to compare the results from one run to the other, so we don’t want the pages to change arbitrarily. Using the Web for benchmarking would also create an important load on the servers.

To use real pages without going online, we create databases of web pages with the mirror application:

Process of webpage mirroring

Those databases are snapshots of webpages at a given point in time, and they are used as input of the benchmarks.

The mirror application uses WebKit to load the pages and intercept all network requests. This means the database also includes resources that are loaded lazily via Javascript.

Using the benchmarks

There are two ways to exploit the databases with the benchmark: the online and offline modes. The difference lies in the way we provide the database’s content to the benchmarks:

Modes of benchmarking

In the “online mode“, we use a basic web server to serve the database over HTTP. The benchmarks use the complete stack to load pages, as they would if we were loading the page from Internet.

In the “offline mode“, the database is loaded directly by the benchmarks and is used as the source of data. In that case, the network is not involved. This mode is mostly useful for the benchmarks that do not involve the network (like measuring the rendering speed).

What is measured

The benchmark suite is still a work in progress. Currently, there are benchmarks for:

  • the page loading performance (with or without rendering)
  • the rendering performance
  • the scrolling performance

How you can use that?

If you use WebKit, and are interested in great performance, you can use the performance suite to profile the use case you are interested in, and optimize those cases.

If you evaluate the use of WebKit for embedded, you can use the benchmark to evaluate how good WebKit performs on the hardware.

If you make patches for WebKit’s performance, have a look on how to contribute. You can also join us on IRC in #qtwebkit on freenode.

Simon
WebKit
Posted by Simon
 in WebKit
 on Monday, February 01, 2010 @ 12:56

Hi!

Here’s the weekly summary of Qt related changes to WebKit trunk. Big changes include Yael’s patch for WebSockets support, the beginnings of QtScript on top of JavaScriptCore’s C API, Maemo 5 tweaks and layout test fixes:

  • Janne added the necessary meta-data to make QtWebKit play nicely with Symbian backups (34077).
  • I did some code cleanups in RenderThemeQt and fixed a bug with combo boxes not showing up in Maemo5 (34088).
  • Holger fixed a regression in the JavaScript prompt handling (30914).
  • Jedrzej landed the first files for building QtScript on top of JavaScriptCore’s C API (32565).
  • Diego added history support to the Qt DRT, we now pass the http/tests/history layout tests! (34167)
  • Daniel fixed a bug with the height of button elements (29564).
  • Kent fixed support for ES5 style introspection with QMetaObject methods (34087).
  • Yael implemented the Qt part of WebSocket support, we now pass websocket/tests in the layout tests (34180).
  • Diego fixed more worker layout tests by adding support for counting worker threads in the Qt DRT (34221).
  • Holger found a neat way to speed up the conversion from KURL to QUrl (33873).
  • Trond fixed an endless loop in QWebPage printing (r53997).
  • Kenneth fixed incorrect fonts on comboboxes on Maemo5 and Symbian (r53999).
  • Andreas upstreamed Ralf and Robert’s kinetic scrolling support for QWebView using QAbstractKineticScroller (34267).
  • Benjamin implemented support for the display() method in the Qt DRT (34258).
  • Oswald speed up the conversion between WebCore::String and QString by avoiding QString::fromUtf16() (r54060).
  • Kenneth landed a patch to disable auto-uppercasing and text prediction for password input fields (r54064).
  • Kenneth also continued to clean up the QtLauncher for a future merge with QGVLauncher
  • Andreas and Kenneth submitted tweaks to the look’n'feel of QtLauncher on Maemo5.
kkoehne
QtCreator
Kinetic
Declarative UI
Posted by kkoehne
 in QtCreator, Kinetic, Declarative UI
 on Wednesday, January 27, 2010 @ 11:45

Declarative UI is one of the big things on the Qt Roadmap for Qt 4.7 and 4.6.x. I already enjoyed working with the Qml language and the developers behind it for quite some time - and believe me, this one will fundamentally change the way slick Qt UIs are designed and look like! If you have not checked it out yet, do so … although it is not yet part of the Qt package, it is very mature, and just fun to experiment with :)

Anyhow, a technology like this of course does not come out of the blue, neither do the tools to support it - I think the first discussions about a Visual Editor for a language yet-to-be-invented on the basis of the QGraphicsView framework started in summer 2008! In early 2009 I first heard about the name “Qml”, and was soon the lead of a development project, with right now 5 people working full time in this project. One of the most important decisions we had to make early on was for a cool project name - we eventually settled for “Bauhaus”, as a reminiscence to the famous German school for design.

Back to the facts: What I want to share with you today are our plans for supporting Declarative UI / Qml in QtCreator. This is an important one, because with Declarative UI we target not only the traditional Qt/C++ developers, but also more design centric people - the goal is to let both share the same language, from early prototyping until the final product. We can only achieve this with good tools in place.

Here are the things we are currently working on for the next major version of QtCreator:

Qml Text Editor - We already have some basic syntax highlighting / formatting support in QtCreator 1.3, but in the QtCreator master branch we are right now working on really mature Qml/JavaScript support. This will include all the goodies you kind of expect these days: Code completion, context sensitive help, …

Visual Qml Editor - After all, we are talking about graphical user interfaces, and you will not make UI designers happy just with a text editor ;-) . Here we decided for a fresh start and developed the components for the Visual Editor from the ground up. Interestingly, we are using Qml heavily ourselves here - e.g. in the Property Editor and States View.

Seamless Integration - It is no either text, or visual editing: You want to use both, and quickly switch between both. This is why we deeply integrate the Visual Editor into QtCreator, share the same Undo/Redo history and are also preserving the Qml file formatting as much as possible when doing file changes.

Debugging - We integrate the Qml debugger, which allows you to inspect the qml item tree and its properties at runtime, to check framerates, to evaluate JavaScript expressions and so on inside QtCreator.

Enough said - Nigel recorded a nice video showing you how the support currently looks like:

If you want to try it out yourself, we also packaged a technical preview of creator, qmlviewer/qmldebugger and the declarative examples & demos in one installer:

qt-creator-win-opensource-1.3.80-qml-tp1.exe (Windows 32 bit)
qt-creator-linux-x86-opensource-1.3.80-qml–tp1.bin (Linux 32 bit)
qt-creator-mac-opensource-1.3.80-qml-tp1.dmg (Mac OS X)

Disclaimer: The binaries are unsupported, and are only meant for early testing. In fact they are based on an untested snapshot of Qt, and an (almost) untested version of QtCreator! If you want to use qml for production work, stay with QtCreator 1.3.1.

Of course we are interested in all kind of feedback, preferably via the bug tracker or on the qt-creator mailing list.

Simon
WebKit
Posted by Simon
 in WebKit
 on Sunday, January 24, 2010 @ 09:38

This week has been a very busy week! Here’s a list of the landed changes that affect the Qt port, in chronological order:

  • Tor Arne has ported the Qt build of DumpRenderTree to run on Windows. (r53526, r53543).
  • Luiz continued with cleaning up the combobox popup handling.(33418).
  • Ben from the Google Team fixed a bug where touch events weren’t sent to iframes (33894).
  • No’am landed support in the Qt bindings that allows passing pixmaps and images as arguments in signals/slots/properties, which can then easily be turned into image elements or data urls. (32461).
  • Benjamin cleaned up the QWebPage autotest (32216).
  • On Thursday we landed No’am’s implementation of GraphicsLayer. This accelerates the composition of layers with animated opacity or transform attributes, through content caching in pixmaps. Layers are mapped to QGraphicsItems with enabled CacheMode and the animation is driven by the Qt Animation Framework.The feature is disabled by default currently as it’s not stable yet, but it’s a fantastic start! (33514).
  • Diego continued implementing missing functions in the Qt DRT (33945).
  • Jakub fixed a crash with video elements and phonon (33842).
  • Girish fixed a bug with combobox popups in transformed QGraphicsWebViews (r53703, 33887).
  • Robert fixed 5 layout tests by fixing support for window.close() and window.closed() in the Qt DRT (32953).
  • Kent fixed bugs with Object.getOwnPropertyDescriptor (33948, 33946).
gunnar
Threads
Painting
Graphics Dojo
Performance
Posted by gunnar
 in Threads, Painting, Graphics Dojo, Performance
 on Thursday, January 21, 2010 @ 08:18

Previous posts in this topic:

In this series that we’ve been doing, I wanted to cover threading, a topic that has been actively discussed amongst some of the trolls over the last few months. We’ve had support for rendering into QImage’s from non-GUI threads since the early Qt 4.0 days, but its only in recent versions of Qt, I think, 4.4 that we got support for rendering text into images. Now that support is there, it begs the question how to make proper use of it. Generating the actual content in a thread is one usecase, here is an example of it.

What it means is that instead of rendering all the content of a certain view in the QWidget::paintEvent() or in the QGraphicsItem::paint() function, we use a background thread which produces the cache. The benefit is that even though drawing the actual content can be quite costly, drawing a pre-rendered image is fast, making it possible for the UI to stay 100% responsive while the heavy loading is happening in the background. It does imply that not all content is available at all times, but for many scenarios this is perfectly fine. There is nothing novel about this approach, I just think its a nice way to solve a problem that often comes up when dealing with user experience.

This approach is used by Google Maps (actually, what the server does I don’t know, but it sends individual tiles to the browser at least), iPhone and N900 web browsers, and I’ve talked to customers in the past that use this approach for usecases where generating the content is costly, but the user interface needs to stay responsive. In fact, this approach applies to pretty much anything where it is ok that the content is not immediately there, such as data tables like an mp3-index or a contact list, images in a data folder, etc.

The Task

Lets first look at the task. I’ve done a trivial implementation which looks in a directory and displays all the images in there. Each image is a separate content piece and I’ve put a background, a small frame around it and a drop shadow under it. Just so that there is a bit of active work going on. If you are into it, here is the Source Code

The content pieces could have been tiles in a map of Norway or tiles composing a webpage, but I choose images, because I already had some images around and I figured it made for an ok example. The demo is run on an N900 with compositor disabled using the following command lines:

  • Non-Threaded: ./threaded_tile_generation -no-thread -graphicssystem opengl MyImageFolder
  • Threaded: ./threaded_tile_generation -graphicssystem opengl MyImageFolder

Here’s how it looks when the content is generated in the GUI thread:


The UI is running super-smooth as long as I show only the content that is already loaded. Once work is needed, the entire UI stops and the user experience is really bad. Here is how it looks if we move the work into a background thread.

The algorithm

Don’t use this particular algorithm. It is very crude and written to show an idea. First of all, because I was lazy, I used queued connections rather than a synchronized queue to schedule the pieces to be rendered. This means that the queue is managed by Qt’s event loop, out of my control. So if I pan far out, I will schedule a lot of images to be rendered, then pan beyond them before they are done. In a decent implementation, I would dequeue these and make sure that only the pieces that are directly visible are being processed.

The other thing is that there is no logic to “peek ahead”. I schedule images to be generated only when I need them. If I instead scheduled them based on the current panning direction, in addition to not discarding so aggressively, it would probably result in a situation most images are rendered ahead of time.

QGraphicsView

It would be kinda cool if this could be applied directly to QGraphicsView. You set a flag on the item and instead of generating its cache pixmap in the GUI thread, it was offloaded to the worker thread. This is not straight forward however, because the GUI thread can, pr today at least, continue to modify the state of the item, while its being rendered in the worker thread. Synching these two becomes a bit of a mess, and how to solve it, if at all, is not something we have a plan for. That doesn’t prevent people from doing this kind of work in their own custom paint() functions of course.

Simon
WebKit
Posted by Simon
 in WebKit
 on Friday, January 15, 2010 @ 17:49

Here’s the weekly summary of the Qt related changes that landed in WebKit trunk this week:

  • Daniel and Robert added support for the XSS auditor to the Qt DRT (33419).
  • Simon removed unnecessary memory allocations of QPainterPath from WebCore::Path (33466).
  • Zoltan continued to make the world a subclass of FastMallocBase.
  • Diego fixed support for user stylesheet locations in the Qt DRT (33617).
  • Andreas fixed the scrolling performance on pages with embedded widgets (33373).
  • Jocelyn reworked the qmake based build system to separate the generated files from the regular build, fixing longstanding dependency issues (33542).
  • Diego added a missing implementation of fileSystemPath() to the Qt build of KURL (33614).
  • Jakub fixed a bug with XSL stylesheet loading.
  • Ben fixed support for touch events in document.createEvent() (33605) as well as the detection of touch events as user gestures (33597).
  • Kim fixed support for touch event coordinates in zoomed and scrolled pages (32899).
  • Petri fixed an incorrect touch layout test (33465).
  • The Szeged hackers continue to rock our world by keeping the bot green green green!

That’s all for this week, folks :). If I have missed anything or you’d like to mention something in the next digest, please send me an email.

BTW, if you’d like to join the development, please subscribe to the webkit-dev and the webkit-qt mailing lists. You can also join our weekly meeting point on IRC in #qtwebkit on freenode every Monday at 15:00. There’s no fixed agenda, but instead there’s a good chance that most developers will be around, if you’re looking for code reviews, etc. in a particular area. See you there!

Kent
WebKit
Posted by Kent
 in WebKit
 on Friday, January 15, 2010 @ 13:06

Today it’s exactly one month since the press release stating that ECMA-262 5th edition (ES5) has been approved. (Yeah, that’s just a random coincidence.) What’s changed since the 3rd edition? Quoting the spec itself (actually, it’s only the “Final final final final draft” according to the PDF document title :) ):

The [fifth edition of ECMAScript] codifies de facto interpretations of the language specification that have become common among browser implementations and adds support for new features that have emerged since the publication of the third edition. Such features include accessor properties, reflective creation and inspection of objects, program control of property attributes, additional array manipulation functions, support for the JSON object encoding format, and a strict mode that provides enhanced error checking and program security.

Mark Caudill gives a good 10000-feet overview of the new features. John Resig goes into more detail about Objects and properties and strict mode and more.

Here’s my attempt at an overview of the ES5 implementation status in WebKit/JavaScriptCore (JSC).

Features implemented in JavaScriptCore

This is stuff that’s already in WebKit trunk. I’ve included links to relevant Bugzilla tasks in case you’d like more information (e.g. have a look at the patches).

  • Array extras”: Array.prototype.{indexOf,lastIndexOf,every,some,forEach,map,filter,reduce,reduceRight}:
    These have been in JSC for years. I’m not sure how conformant the implementations are, though.

Features not implemented in JavaScriptCore

  • Strict mode: I’m not aware of any work that’s been done to support strict mode yet. It involves making the parser/compiler recognize the “use strict” directive and adapting execution according to the rules given in annex C of the ES5 specification. (The annex lists 20 restrictions/exceptions that apply to strict mode.)

Want to get involved or track the status?

Have a look at the open ES5 tasks at bugs.webkit.org. For creating new tasks, use component “JavaScriptCore”, tag “ES5″ and (optionally) summary prefix “[ES5]”. In addition to implementing ES5 functionality or playing with it in your WebKit-based app, there are also opportunities to use that functionality within WebKit, such as in the test framework and the Web Inspector (whose front-end is written in JavaScript). For example, Object.getOwnPropertyNames() made it easy to resolve the long-standing issue of the Web Inspector console not auto-completing non-enumerable properties of built-in ECMA objects (https://bugs.webkit.org/show_bug.cgi?id=19119). And Object.getOwnPropertyDescriptor() could potentially be used to display detailed information about variables. I just love those new introspection capabilities! Finally it’s possible to do things directly in JavaScript that you could only do with native (engine-specific) APIs before.

Happy ES5 hacking!

No
WebKit
Graphics View
Graphics
Performance
Posted by No'am Rosenthal
 in WebKit, Graphics View, Graphics, Performance
 on Wednesday, January 13, 2010 @ 16:07

I’d like to share with the community a project I’m working on, while it’s still in its development phase (isn’t that what labs is for? :))
The goal of the project is to get CSS3 animations to a reasonable FPS performance, mainly on embedded hardware where it’s a pain.

See http://gitorious.org/~noamr/webkit/noamrs-webkit/commits/accel

The idea is to implement webkit’s GraphicsLayer concept, which allows platform-specific implementations of CSS transform and opacity animations, using the graphics-view and the Qt animation framework as a backend. This would only work for QGraphicsWebView and not for QWebView, as rendering a separate QGraphicsScene inside QWebView would probably not give us much performance benefit.

Preliminary results are very promising - The leaves demo, for example, runs 4 times faster on Maemo Fremantle than it does without the acceleration, and it looks graphically accurate.

The reason this gives us a performance benefit is mainly because of graphics-item caching: when a CSS animation occurs inside webkit, the item that’s being animated has to go through a re-layout and re-draw every so often, while with the accelerated approach we draw it once into a QPixmap (QGraphicsView takes care of that) and then it’s just a series of fast and furious pixmap blts. The hardware acceleration becomes relevant when the images are big and the blt itself becomes a bottleneck.

This project is not ready to go upstream, as it supports many delicate use-cases that need to be tested. But if you’re interested in participating (or to just comment!), this has so far been a fun project to hack on.

Instructions:

  1. Get the Git repo from git://gitorious.org/~noamr/webkit/noamrs-webkit.git, branch accel
  2. Build or get a relatively new version of Qt, possibly without building QtWebkit
  3. Build Webkit from the downloaded Git repo:
    export QTDIR=[my-qt-4.6-root]
    export PATH=$QTDIR/bin:$PATH
    ./WebKitTools/Scripts/build-webkit --qt
  4. Run ./WebKitBuild/Release/bin/QGVLauncher --accel: This will enable the necessary web-setting for composite-layer acceleration. You can also create a small QGraphicsWebView example yourself, as long as you enable the new settings flag: QWebSettings::AcceleratedCompositingEnabled
  5. Load a website with CSS transform/opacity animations: like this one or this one.
  6. Hack. Most of the code is in WebCore/platform/graphics/qt/GraphicsLayerQt.cpp, or you could just search for the term USE(ACCELERATED_COMPOSITING)
  7. Send merge requests through Gitorious or comments on Bugzilla

No’am

gunnar
Graphics View
Painting
Graphics Dojo
OpenGL
Posted by gunnar
 in Graphics View, Painting, Graphics Dojo, OpenGL
 on Monday, January 11, 2010 @ 09:25

Previous posts in this topic:

So, its time for my next post. Todays topic is how convenience relates to performance, specifically in the context of QGraphicsView. My goal is to illustrate that the way to achieve fast graphics is to pack your QPainter draw calls as tightly together as possible. The more stuff that happens in the middle, the slower it gets.

To illustrate this, I’ve implemented a virtual keyboard. Granted, its not a very common layout nor is it usable, but the rendering is the point here, not the functionality. The full source code is here and it looks like this:

Virtual Keyboard Image

I’ve implemented the keyboard using three different approaches. One using proxy widgets, one using graphics items and one where the entire view is one graphics item. In addition to that, I added a number of options to tweak various properties, such as whether or not the text is drawn. I measured this on an N900 rather than a desktop because the difference becomes more profound on a small device. On the desktop it is easy to be fooled because most things complete in a matter of micro seconds anyway. It is only when the entire application comes together one notices that things are not as smooth as in the prototype, but too much work has been invested into the current design that one loses out on the super-slick feeling application.

QGraphicsProxyWidget

Since we’re implementing a series of clickable buttons, a natural and convenient starting point is to use an existing button class, such as the QPushButton. It already implements the logic for mouse/keyboard interaction and has signals for clicking and all sorts of other useful functionality. To get widgets into QGraphicsView, we use a QGraphicsProxyWidget. To make the test “fair”, I actually use a plain QWidget which just paints a pixmap and a draws a text. Had I gone through the styling API, these numbers would have been even worse.

ProxyWidget Results
Milliseconds spent per frame including blit to screen when using QGraphicsProxyWidgets. Low is better!

If we look at the plain “-proxywidgets” run, the fastest engine was the raster engine, running at 26ms per frame. If I wanted to slide this keyboard onto screen, I have 16ms available if I want it running at 60 FPS and 33ms available if I want to do it at 30 FPS. When each frame takes 26ms, I can barely do 30, but with only a little bit of slack, so if another process is soaking up CPU time, that number is also a bit difficult to reach. So, not very good. (BTW, the exact numbers in the graphs are listed as a comment in the top of the .cpp file I linked above).

The first thing I noticed with this approach was that the each button now had a gray background. This is of course the widget background. A QWidget embedded in QGraphicsView will be treated as a top-level and will therefore draw its background. I added an option “-no-widget-background” which sets the Qt::WA_NoBackground on the widget. This brings the rendering speed with raster down to 22ms. 4ms saved per frame, just by setting a flag, not too bad, but still pretty far from being awsome.

I’ve mentioned before that text drawing is not as fast as we would like it, so just to compare how it looks without text, I added a “-no-text” option to the test. This brings the raster results down to 13ms. That is pretty nice and below the 16ms threshold required to achieve 60 FPS, but only with a small margin. And I’m not drawing any text! Before I give up with this approach, I’ll enable item caching. By setting ItemCoordinateCache on each button, I cache both the background pixmap and the text in one single pixmap. This brings the raster results down to 8.5ms, and its starting to look acceptable. But at a very high memory cost… In my original usecase I had one shared pixmap for all the button backgrounds, but now I have one per button.

You may notice that there was a vast difference between item caching and the proxy widget drawing the pixmap. One thing that adds to the proxy widget cost is that the QPainter is recreated and initialized for each button in the buttons paint event. Also, as I mentioned in my previous post, An Overview, you may remember that I said that each widget has a system clip and that there is an overhead involved with calling the paintEvent. For items in QGraphicsView, there is already a painter, and I don’t need a clip, nor do I need any of the other stuff that goes on behind the scenes there. When we enable item coordinate caching, we don’t leave graphics view world and we don’t enter the widget world. This crossing is expensive, so by not going into the widget world, we save a lot.

So, if there is a lesson to be learned it is that QGraphicsProxyWidget should be used with extreme caution. If you really need it, use very few of them.

QGraphicsWidget

If proxy widgets are too slow to be usable in this scenario, then the next best thing is to use a QGraphicsWidget. This is a subclass of both QObject and QGraphicsItem, which gives me signals, slots and properties, but its not a QWidget and therefore still fairly lightweight. The numbers are as follows:

GraphicsWidgets Results
Milliseconds spent per frame including blit to screen when using QGraphicsWidgets. Lower is better!

Compared to the proxy widgets approach we’re starting out quite a bit better, with raster at 13 ms per frame, OpenGL at 20ms and X11 at 22ms. Below this line is a new line: “-no-indexing -optimize-flags”. QGraphicsView will by default put all the items in a view into a BSP tree for fast lookup, this is beneficial when the scene contains many items and you often need to find items that intersect with a small portion of the scene. In the testcase we’re always doing a full update, so there is no benefit from the index, so it can be disabled by calling scene->setItemIndexMethod(QGraphicsScene::NoIndex). Having a BSP is the default behaviour because graphics view was initially intended to be a static scene for many items. The most common usecase today is a few (a few hundred at max) items which tend to move a lot. For this reason, it is always a good idea to try to disable the BSP and see if it makes a difference in performance. If it helps, then leave it off.

I also know that the items play nice, meaning that they don’t change the clip, translate the painter, change the composition mode or modify any other state that would propagate to other items. This means I can safely set the DontSavePainterState optimization flag. Actually, based on an old habit, I set all possible optimization flags. I only consider unsetting them if my drawing code starts to look weird, at which point I would rather fix the drawing code and keep the flags set. By disabling indexing and enabling optimization shaves off 2ms per frame in for all rendering backends, so that is definitely worth it.

If I don’t do text, the performance is about twice as fast. Again we see that text drawing is a huge cost. We’re working on an API to fix this and we’ll have more information for you when we do. You may notice that enabling item caching drops the performance a bit compared to the “-no-text” case. There isn’t much overhead inside QGraphcisView for this path. A likely reason for the decrease is that reading from multiple memory sources (multiple pixmaps) results in a lot of cache misses, compared to the straight approach which draws the same pixmap over and over.

ButtonView Item

In my previous post I briefly mentioned that there is a slight overhead involved with the use of a QGraphicsItem too. Prior to calling the paint function, the painter is transformed to the coordinate system of the item and the painter state is saved. If the item draws a big polygon, this setup cost can be ignored, but when drawing just a pixmap and a few pixels of text, then it may be worth considering. In the spirit of “The more direct the painting code is, the faster it gets”, I implemented the keyboard as a single item. The numbers are as follows:

ButtonView Results
Milliseconds per frame including blit to screen when using a single item. Lower is better!

Raster is now down to 10ms, which is 1ms better than the QGraphicsWidget approach when all optimizations were enabled, so even though graphics items are cheaper than widgets, they still cost a bit. The keyboard is now rendered in a tight loop, and the major difference in performance here is caused by the fact that items in the scene have a transform associated with them. Prior to calling paint() a transform is set to match the painter to the items local coordinate system. This causes a state change in the paint engine. For each button we’re drawing a 32×32 pixmap which means alpha blending 1024 pixels, followed by doing text layout and drawing a single character. Even then do we save about 10% time by not having a QPainter::translate() in the midst, so bear that in mind. By enabling the optimization flags and disabling the index, raster drops a bit more, so having those are still a good idea.

You may have noticed that there is one dataset that is named “cheat” for OpenGL. I was reluctant to include this, because its using a private API that is not, and I really mean NOT, subject to binary compatibility rules. You cannot call this from your application. We’re going to add a public API for this in the future, hopefully 4.7, so until its there, wait. In the interest of showing what we are thinking internally, I thought I would show it.

OpenGL is really great for accelerating graphics, but its way of working does not map optimally to how Qt works. GL is really good at taking a few large datasets of triangles and rendering them, but its not so good at drawing loads of small things. Small things like button backgrounds, icons, single text items, etc. However, all the buttons backgrounds are the same pixmaps, so what if I could tell QPainter to draw the same pixmap in multiple places at once? In GL this would correspond to setting up a texture and one vertex and texture coordinate array and drawing some 40 pixmaps in one go. This fits much better with how GL is made to work. The result is that drawing the buttons drop from 5.2ms to 3.9ms, so another piece of juice squeezed out. Naturally, the more times the pixmap is drawn and the smaller the pixmap gets, the more benefit you get from batching commands like this.

There is a second option to OpenGL for the button view case, which is the “-ordered”. This was done after Tom brought to my attention that the testcase would do a shader program update for each painter call. In the default buttonview implementation we do:

                    for (int i=0; i < m_rects.size(); ++i) {
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));
                    }

Because pixmaps use one shader pipeline and text drawing uses another, the pipeline needs to be switched and reset all the time, which renders at 16m per frame. To see if it makes a difference, I added a second alternative rendering, “-ordered”, where I do all the pixmaps first, then all the text:

                    for (int i=0; i < m_rects.size(); ++i)
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                    for (int i=0; i&lt;m_rects.size(); ++i)
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));

This prevents the shader pipeline updates and bring the rendering time per frame down to 13ms, so definitely worth it.

Summing Up

Virtual Keyboard Combined Results
Milliseconds per frame including blit to screen for proxy widgets, graphics widgets and a single widget. Lower is better!

OpenGL comes out rather bad in this testcase, which I was a bit disappointed to see, but it did send Tom into an optimization frenzy, so we’re hoping to remove some of the constant overhead. It should also be said that when using the OpenGL graphics system, we enable multisampling by default, which increases rendering time on the N900 by around 30%. A plain QGLWidget would thus perform slightly better. Another aspect to OpenGL is that it uses a dedicated low-power chip, so even though it for this particular usecase runs at half the speed, it also uses a lot less battery, so it may still be the right choice. OpenGL will also scale significantly better than raster and X11 as the pixmaps get bigger or if the content of the button is slightly more advanced, say like a horizontal gradient.

The best numbers are definitely in the button view case, where all the content is rendered as one item, which is what I wanted to highlight with this blog. The button view item also opens up for other optimizations such as batching. We don’t have that many batching functions in QPainter today, its only drawRects(), drawLines() and drawPoints(), but we’re considering to add more, we are just not sure on how the API’s would look yet.

The bottom line is still that how Qt is used defines how well it performs. On one hand there may be an easy and convenient way to get the job done which performs quite sub-optimally. On the other hand there may be a more involved implementation which performs very well. I’m not trying to suggest that you do one or the other, there are a lot of good reasons for picking either one. But I hope that I’ve illustrated that some features come at a cost and that this is kept in mind along with what the target is when designs evaluated and chosen.

I’ll round off with a question. If you were to implement a particle effect when you press a button, which approach would you choose, having seen the numbers above?

Simon
Qt
WebKit
Posted by Simon
 in Qt, WebKit
 on Sunday, January 10, 2010 @ 10:03

I’d like to give you a brief summary of the commits that were landed in the Qt parts of WebKit in the first week this year:

  • Lots of improvements on the DRT testing tool: Enter key fixes (Jakub), Drag and Drop support (Yael) and Zoom support (Kim, Diego, Afonso). All of these changes improve the layout test coverage. :)
  • Jakub added support for sliders to RenderThemeQt.
  • Andreas fixed a crash with input methods on startup.
  • Kim fixed the semantics of touch events to match the iPhone and Android behaviour better.
  • Luiz and Kenneth landed more code that’ll make it possible in the future to handle list popups in the application.
  • Norbert landed a workaround for an RVCT bug that makes the trunk build again on Symbian.
  • Yael continued work on the network state notifier implementation (using Qt Bearer Management).


© 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
All other trademarks are property of their respective owners.