Thiago Macieira
Qt
Performance
Posted by Thiago Macieira
 in Qt, Performance
 on Friday, December 11, 2009 @ 22:55

I don’t know if this is showing up for the community, but we in Qt have been dedicating a lot of effort for performance improvements in Qt. For Qt 4.5, we had a project codenamed “Falcon” whose job was to improve the graphics engines and make them perform much faster. From that project, we got the graphics engines, including the raster and OpenGL ones.

For Qt 4.6, there was a lot of work done on Graphics View. For Qt 4.7, we’re going to do some more. Where, I don’t know yet.

Among the many ideas, one I’m interested in seems very small, but may be of important benefit: removing volatile from QAtomicInt and QAtomicPointer.

Here’s what happens: QAtomicInt derives from the internal class QBasicAtomicInt, which is a struct of one member: a volatile int _q_value. Similarly, QAtomicPointer derives from QBasicAtomicPointer, which is a struct of one member: T * volatile _q_value. The idea here is to remove those two “volatile” keywords.

Before you cry foul and tell me that I’m going to break your code, let me quote the Qt documentation for these two classes:

For convenience, QAtomicPointer provides pointer comparison, cast, dereference, and assignment operators. Note that these operators are not atomic.

(emphasis is in the documentation)

With that card up my sleeve, I claim that I’m not breaking any contracts. All of the atomic operations that these two classes support (fetch-and-add, test-and-set, fetch-and-store) are implemented in assembly, which means the compiler cannot optimise them anyway. And the assembly code will not be influenced by any caching of the variable contents that the compiler may want to produce. What’s more, we also tell the compiler that we changed the value, so that it will discard its cache.

So, why are we considering this?

Well, the reason I hinted above: the compiler caching the value. The whole point of the volatile keyword is that the compiler may not cache the value. It must reload the value every time it tries to access the variable.

And if we look at any of the Qt tool classes, the non-const functions start by calling detach(), which is generally implemented like this (extracted from qlist.h):

inline void detach() { if (d->ref != 1) detach_helper(); }

That is, “if our reference count is not one (i.e., if we’re being shared), do the detaching.”

And since QAtomicInt::operator int() simply returns _q_value, which is volatile, the compiler has to reload the value every single time. Then it must actually compare that value to 1 and generate the proper branching.

What the compiler doesn’t know, is that once a container detaches, the reference count will remain 1. It can only increase from 1 in a way that is visible to the compiler: that is, either in the same thread or, if the container is a globally-visible variable, after mutex locking/unlocking.

So, if we remove the volatile keyword, the compiler is allowed to cache the value of the reference count. Once the first detaching happens, the compiler knows that the reference count is 1. It can therefore optimise out all the next checks, because it also knows that the reference count remains 1.

This would mean that our reference counting system would be a lot more efficient (hence the title of this blog). It might turn out to be the best ratio of performance gain vs effort ever. After all, it’s a one-word change in one header file.

That’s the theory anyway. I haven’t yet tested to see if the compiler really knows how to optimise this the way I expect it to.

PS: credit where credit is due: this optimisation was not my idea. It was Olivier’s. And at first I resisted, saying it would break stuff and wouldn’t work. But now I’m in favour of it. :-)

Thiago Macieira
Qt
WebKit
Performance
Posted by Thiago Macieira
 in Qt, WebKit, Performance
 on Wednesday, December 02, 2009 @ 14:55

You know, of all the demos we released and blogged about yesterday, turns out none were really a web app. “How could this be?! How could the Trolls forget about the web?” you may be asking. (or not, I mean, no one posted anything in the blogs as of the time of writing this blog)

Say no more, here’s your web demo:

This demo is called “webscraps” and was developed by formerforever Troll Girish and his friend and business partner Roopesh. The source code, of course, is available on Labs: http://qt.gitorious.org/qt-labs/webscraps.

So what does this demo do? It allows you to create a scrap collection of webpages and websites. Often you’re interested in just the “news” section of a website that may not provide an RSS feed for you to be up-to-date. For example, you could configure it to watch blue-grey central banner of the Qt Homepage.

But that’s not it. I mean, from what I’ve described so far, we could do since Qt 4.4, with QtWebKit. No… this is really a Qt 4.6 demo. For starters, it also makes use of our Animations Framework as well as the States and Transitions one. And in order to do that, it uses QGraphicsWebView, which is also a new feature in Qt 4.6. It’s a very complete demo for what Qt can do and how you can integrate Web content into a Qt application.

Now, I tried the demo on my N900, but with WebKit in debug mode it runs painfully slow. I couldn’t run it in release mode because then it crashes (I think it’s my toolchain, though, since most QtScript unit tests also crash).

Anyone wants to give a hand? We’d like to see this running fast, so it’s a great opportunity to flex your optimisation muscles and play with the new QGraphicsWebView, Animations, States and Transitions APIs. Contributions are welcome.

Good hacking!

PS: Why it fell to me to blog about Web and Graphics View, I am not sure. Maybe they couldn’t decide who should blog (similarly as they couldn’t decide for a time if it should be called QGraphicsWebView or QWebGraphicsView). Or they thought that as a Product Manager I would have time on my hands… :-)

Thiago Macieira
Qt
KDE
Git
C++
Posted by Thiago Macieira
 in Qt, KDE, Git, C++
 on Thursday, November 12, 2009 @ 13:34

To everyone using Qt 4.6 from the Git repository: be aware that I introduced a binary-incompatible change. This change is there to stay.

No, we’re not breaking binary compatibility with Qt 4.5. This only affects previous Qt 4.6 versions.

Actually, this kind of change happens all the time. So why am I blogging about this specific change?

Well, the problem is that this change affects QHash, QMap and QVector. And those classes are inlined everywhere in Qt-using code. This means that, if you update Qt across that version, you must recompile all of the Qt-using code, from scratch (i.e., make clean). For KDE developers using trunk, that means recompiling all of KDE.

This change will be included in the upcoming Qt 4.6.0 Release Candidate.

Note: the change is in the 4.6 branch but hasn’t reached the 4.6-stable branch yet. That also means it’s not in kde-qt’s 4.6-stable-patched branch yet. When you next update those stable branches, please remember to recompile everything.

PS: the stable branches aren’t updating not because of Qt not building. It is buiding. The reason why is because our Continuous Integration system experiencing some technical difficulties, like Windows running out of memory, the Symbian buildsystem failing for no apparent reason, the powerful 8-core Mac machines being able to run only one testcase at a time, etc.

Thiago Macieira
Qt
KDE
Qt Jambi
Contributors
QtCreator
QtMobility
Posted by Thiago Macieira
 in Qt, KDE, Qt Jambi, Contributors, QtCreator, QtMobility
 on Tuesday, October 27, 2009 @ 10:05

For those of you who don’t read the new Qt blog website, where Qt Marketing and Product Management talk about “corporatey” stuff (affectionately called the “PHB blog” by our developers), we’ve just announced that our brand, new bugtracker is public: see http://bugreports.qt.nokia.com.

So, I won’t repeat everything that is in the other blog (it is, after all, written by Marketing, so it should be better written than this thrown-together “reblogging”). I’d just like to highlight one important point that Adam made in his blog:

The Qt Bug Tracker isn´t simply a read-only view into the bug tracking system used by Qt developers, it is the bug tracking tracking system used by Qt developers.

The previous solution was an in-house system we had built over the years. It started as a distribution list for the Qt developers back in the day, then got an automatic tool to reply to the emails received and assign numbers, a robot to collect incoming emails and add to the database. Internally, we’ve had a rich-client to access that database and manipulate our own bugs. But communicating with the reporters was always very difficult.

This new tool is different. Everything is on the web. And you get to vote on issues, even watch if they change.

This is another step in our opening up of our development model. Enjoy!

Thiago Macieira
Qt
Performance
C++
Posted by Thiago Macieira
 in Qt, Performance, C++
 on Monday, October 05, 2009 @ 22:39

This weekend, a user posted to the qt4-preview-feedback mailing list saying that the QAtomicInt documentation needed some work. I replied saying that memory ordering semantics would be better explained by a book, than Qt documentation.

But thinking about it later, I started thinking whether we could add some documentation about it. So I decided to test on you, dear lazyweb.

Last year, during the Qt Developer Days 2008 — btw, registration is still open for 2009 in San Francisco, so register now! — I had a talk on threading. At the time, Qt 4.5 wasn’t released, so Qt 4.4 was all there was. And one of the features of Qt 4.4 was QtConcurrent and the new atomic classes. I mentioned them, but I refrained from going too deep. Doing that would probably interest half a dozen people in the audience only.

But maybe you’ll be interested. Before I go, however, let’s have a history lesson.

History

I thought of just giving you facts, but if you want that, you can research Wikipedia. It should be a lot more interesting to have the important information recounted in prose, in lore.

So, place yourself in the mood: you’re telling this story to your grandchildren or your great-grandchildren. Any historical inaccuracies present are result of the oral tradition and are now stuff of legends.

Hƿæt! In times immemorial, before time_t began to be counted, the Wise told this lore. In the darkness, there was the Engineer, and no one knew whence he came. The Engineer was fabled for many wondrous creations and he inspired awe in his peers.

And the Engineer created the Processor, for then the Engineer could rest while the Processor would do the work. And for a while, all were glad, for there was work for all and rest for all.

But then it came to pass that the Processor turned to the Engineer and spake unto him: “Hello, World! Master, thou gave me work and thou gave me purpose. And I am glad to do thy Work, for thus can I learn. But Master, heed my plea: thy work ever increases and thy humble servant can not cope. Couldst thou not create a mate for me? Listest thou not that I do thy Work swiftly?”

And the Engineer felt pity for the Processor. So came into existence another Processor. The Engineer said unto the world, “Let thee be called Processor2!”

Then came forth he who had the keen foresight and could tell what would still come to pass. He was called the Analyst and thus foretold he: “These twain shall do the Work in tandem and they shall do it swifter than either could do alone. This joining shall be known as dual-processor. In the years to come, this joining will breed dual-cores and quad-cores and many other beasts of names unknown. Yet strife shall come from it!”

After many eons had passed (in computer time), Processor and Processor2 were working and much they learnt. So quoth the Processor: “Hello, Word! Master, thou art great and for eons have we done thy Work, following thine Assembly opcode-by-opcode. We have not strayed a single cycle from thine Assembly Charts. But Processor2 hath taught me much and sweareth that we could do thy Work swifter than thou hadst foreseen, an only thou empowerest us to do so.”

So the Engineer let the processors execute his instructions swifter than his cycles, and gave he the gift of Cache. Then strife came to the World and the processors warred over memory: what one wrote, the other read not. The Engineer entered the world for a second time and he toiled against the warring processors. Much was broken in this toiling ere it was rebuilt. He shifted the rules and the processors executed his Assembly Out-of-Order. Then he imposed Memory Ordering, so processors stalled their strife and ended their war. The world would forever be changed.

Memory ordering

Now that how memory ordering was introduced, let’s see why. Old computer systems, as recent as 386 and 486 actually, still executed everything in-order and had well-defined timing semantics. An instruction that loaded data from memory into a register would require 3 cycles: Read (the instruction), Fetch (the data), Execute (the instruction). But as processors improved, their clocks became much faster than memory could cope with, so caching was introduced.

That means a processor would be allowed to serve a memory read from the cache, instead of from the memory. And it could content itself with writing to the cache only on a memory write. As our tale above tells, this works fine for one processor. When there are more than one, they need to organise themselves, because in general the flushing of the cache back to main memory is delayed. To ensure that the other can read what one writes, the one needs to flush memory sooner.

Anyway, there can be four different types of memory ordering:

  • no memory ordering
  • flushing all cache reads, ensuring all new reads are served from main memory
  • flushing all cache writes, ensuring that all writes have been written to main memory
  • full ordering, combining the previous two types above

I’ll get back to them in a second.

To compound the problem, modern processors also execute operations out-of-order. They’re allowed to reorder the instructions provided that, at some level, it looks like everything got executed in-order. The x86 architecture originally concluded the instruction entirely before moving on to the next, so that’s the behaviour that is required today: all operations must behave as if they had finished before the next instruction starts. And all memory accesses must look like they happened in the order that they were assembled.

The Itanium (IA-64) removes some of those restrictions. First of all, all instructions are allowed to execute in parallel, or finish or start in any order that the processor may find suitable. To re-synchronise, the assembly language introduces a “stop bit”, indicating that the instructions prior to the stop are finished before any instructions after the stop are started. And this is inside one thread only. Outside of it (i.e., as seen by another processor), the architecture imposes no guarantees: the memory accesses can happen in any order.

The atomic and the other data

It’s important to note that the memory ordering semantic is not just about the atomic data itself. QAtomicInt and QAtomicPointer execute loads, stores, fetch-and-store (a.k.a. exchange), fetch-and-add, test-and-set (a.k.a. compare-and-swap) always atomically. For one atom of memory (i.e., the int that the QAtomicInt holds or the pointer that the QAtomicPointer holds), the operation is either executed completely or not executed at all. In other words, no one ever sees the data in an intermediary state. That’s the definition of atomic.

Now, the memory semantic is about how the other data is affected by the atomic operation. Imagine the following code running in one thread:

    extern int x, y, z;
    x = 1;
    y = 2;
    z = 3;

And the following running in another thread:

    extern int x, y, z;
    if (z == 3)
        printf("%d %d", x, y);

We declared x, y, and z to be normal variables, so no atomic operation is executed here. The x86 and x86-64 would behave as your intuition dictates: the only possible output is “1 2″. If z is 3, then x is 1 and y is 2; whereas if z isn’t 3, nothing is printed.

But the IA-64 makes no such guarantee. Like I exposed in the previous section, the processor is allowed to execute the stores in any order it sees fit. For example, x and z could be in the same cacheline, wheras y could be in another, thus causing x and z to be written at the same time, but no ordering guarantee being made on y. Worse yet, the othe processor is allowed to execute the loads in any order as well. It could load x, y, and z in that order, meaning that it could catch x and y before their values are changed, but catch a completed z. In conclusion, the code above could print anything! (If x and y are initialised to 0 before, the possible outputs are “0 0″, “0 2″, “1 0″ in addition to the expected “1 2″ and no output)

Weird? Definitely.

So here’s where memory ordering enters:

  • in a release semantic, the processor guarantees that all past writes (store operations) have completed and become visible by the time that the release happens;
  • in an acquire semantic, the processor guarantees that no future reads (load operations) have started yet so that it will see any writes released by other processors

So if thread 1 in the example above wanted to ensure that its writes to x and y became visible, it would require a store-release on z. And if thread 2 wanted to ensure that the values of x and y were updated, it would require a load-acquire on z.

The names “acquire” and “release” come from the operation of mutexes. When a mutex is acquired, it needs to ensure that the processor will see the memory written to by other processors, so it executes an “acquire” operation. When the mutex is released, it needs to ensure that the data changed by this thread becomes visible to other threads, so it executes a “release” operation.

The other two operations that QAtomicInt supports are just the combination of acquire and release, or of neither. The “relaxed” operation means no acquire or release is executed, only the atomic operation, whereas the “ordered” operation means it’s fully ordered: both acquire and release semantics are applied.

Practical uses

Relaxed

Like I said before, the relaxed semantic means that no memory ordering is applied. Only the atomic operation itself is executed. The most common case of relaxed memory operations are mundane loads and stores. Most modern processor architectures execute loads and stores atomically for the powers of 2 smaller than or equal to the register size. Whether bigger reads and writes are atomic or not, it depends on the platform (for example, a double-precision floating point in a 32-bit architecture).

But we can come up with cases for the other atomic operations. For example, QAtomicInt offers ref() and unref(), which are just a wrapper around fetchAndAddRelaxed(1) and fetchAndAddRelaxed(-1). This means the reference count is atomic, but nothing else.

Acquire and Release

To see where acquire and release semantics are required, I gave mutexes as examples. However, mutexes are quite complex beasts. Let’s examine a simpler case: a spin-lock:

class SpinLock
{
    QAtomicInt atomic;
public:
    void lock()
    {
        while (!atomic.testAndSetAcquire(0, 1))
            ;
    }
    void unlock()
    {
        atomic.testAndSetRelease(1, 0);
    }
}

The class above has two methods, like QMutex: lock and unlock. The interesting one is lock: it has a loop that tries forever to change the value of atomic from 0 to 1. If it succeds, it’s an “acquire” operation, meaning that the current thread shall now see any stores released prior to this acqiure.

The unlock function does the inverse: it changes the atomic from 1 to 0 in a release operation. But it’s actually not required: the compiler usually generates a “store-release” for volatile variables (which QAtomicInt is). That means we could have just written: atomic = 0;

Ordered

The use-case for ordered is, interestingly, quite rare. Usually, it’s more like “I can’t figure out if acquire or release is enough, so I’ll go for full ordering”.

But there’s one case of fully-ordered memory semantic in Qt source code: it’s in the (undocumented) Q_GLOBAL_STATIC macro. It results from the behaviour of said macro: one or more threads may be competing to execute an operation. The first one that completes it, wins. It will publish its conclusions to all other threads (i.e., release), whereas the loser threads need to acquire the conclusions. The code, simplified from the macro, is:

Type *gs()
{
    static QBasicAtomicPointer<Type> pointer = Q_BASIC_ATOMIC_INITIALIZER(0);
    if (!pointer) {
        Type *x = new Type;
        if (!pointer.testAndSetOrdered(0, x))
            delete x;
    }
    return pointer;
}

What this code does is to check if pointer is still null. If it is, it creates a new object of type Type and tries to set it on the atomic pointer. If the setting succeeds, we need a “release” to publish the contents of the new object to other threads. If it fails, we need an “acquire” to obtain the contents of the new object from the winner thread.

But wait, is this correct? Well, not entirely. What we need is actually a “testAndSetReleaseAcquire”, which we don’t have in Qt’s API. So we could split it into a testAndSetRelease plus an Acquire in the failing case. That’s exactly what I did in QWeakPointer:

    ExternalRefCountData *x = new ExternalRefCountData(Qt::Uninitialized);
    x->strongref = -1;
    x->weakref = 2;  // the QWeakPointer that called us plus the QObject itself
    if (!d->sharedRefcount.testAndSetRelease(0, x)) {
        delete x;
        d->sharedRefcount->weakref.ref();
    }
    return d->sharedRefcount;

As you can see here, if the test-and-set succeeds, it executes a “release” so that the other threads can see the result. What if it fails? It needs to execute an acquire, right? So where is it?

Well, it’s there, but very well hidden: it’s in operator->. Remember what I said above: compilers generate load-acquires for volatile variables. So, in order to call QAtomicInt::ref() with this = &d->sharedRefCount->weakref, the compiler needs to load the value of d->sharedRefCount and that’s an acquire operation.

Conclusion

So, did you get this? If you didn’t, don’t blame yourself, it’s not an easy subject. Reread what I wrote and search for more resources on memory ordering. If you did or you almost did, let me know. My purpose here was to try and figure out if it makes sense to explain this in Qt documentation at all.

However, unless you’re writing something like a lock-free stack, chances are that you don’t care about memory ordering semantics. You just rely on the processor doing the right thing, as well as the Qt semaphore classes (QMutex, QWaitCondition, QSemphore, QReadWriteLock) and the Qt cross-thread signal-slot mechanism. And that’s if you do any threading at all.

If you are writing a lock-free stack, you’ll probably be familiar with the ABA problem and that one can’t be solved by QAtomicInt or QAtomicPointer. It requires an operation known as “double compare-and-swap” and to explain why, I’d need a full other blog. And explain why the original AMD64 didn’t have such an instruction, nor did the original IA-64. (The 386 didn’t have it either, but that’s not a problem for us because Qt doesn’t run 386 anyway)

Thiago Macieira
Qt
Git
Posted by Thiago Macieira
 in Qt, Git
 on Thursday, September 03, 2009 @ 09:03

(a.k.a., Qt 4.7)

Borrowing the term from our Symbian / S60 friends, who have stopped using this naming scheme already, I’d like to point out that the Qt 4.6 and master branches in our Gitoriuous repository have diverged.

What does this mean? Well, if you build master now, Qt will tell you that it’s version 4.7.0, not 4.6.0. For example:

$ qmake -v
QMake version 2.01a
Using Qt version 4.7.0 in /home/tmacieir/obj/troll/qt-main/lib
$ moc -v
Qt Meta Object Compiler version 62 (Qt 4.7.0)
$ $QTDIR/lib/libQtCore.so
This is the QtCore library version 4.7.0
Copyright (C) 2009 Nokia Corporation and/or its subsidiary(-ies).
Contact: Nokia Corporation (qt-info@nokia.com)

Build key:           i386 linux g++-4 full-config
Installation prefix: /home/tmacieir/obj/troll/qt-main
Library path:        /home/tmacieir/obj/troll/qt-main/lib
Include path:        /home/tmacieir/obj/troll/qt-main/include

And if you’re following Qt development using Git, it might be time to start tracking the 4.6 branch instead:

git branch 4.6 origin/4.6
git checkout 4.6

That’s about it. We haven’t started merging 4.7 features yet. We don’t know yet when it will be released, nor much about what’s going to be in it. (Ok, we have some opinions on date and content…)

And before anyone panicks, no, we’re not adopting the naming scheme either. This was just to write a longer title than two letters, two digits and one punctuation :-)

We’ve managed to keep a very boringstrict numbering scheme for Qt for years. I remember when we released Qt 4.4 and our Chief Troll was interviewed by a Norwegian newspaper. After he had listed the new features coming in, the reporter commented something like, “4.4? You should call that release 13 or something!”

Other products have had a more, erm… relaxed numbering:

  • Windows: 3.0, 3.1, 3.11, NT 3.5, 95, NT 4, 98, 98 SE, 2000, ME, XP, Server 2003, Vista, Server 2008, 7
  • Solaris: started as Unix 6, 7, III, IV, V, V R2, V R3, V R3.2, V R4; merging with SunOS: 4 BSD, 4.1 BSD, SunOS 1.0, 2.0, 3.0, 4.0; resulting in: Solaris 2.0, 2.5, 7, 8, 9, 10

What’s your take on what Qt’s sequence numbering should be? Give your reply how Qt past releases should be numbered and how we should go from here. Be creative!

Thiago Macieira
Qt
Posted by Thiago Macieira
 in Qt
 on Tuesday, August 25, 2009 @ 06:56

On Friday, along with the Qt for Symbian integration, we got a new smart pointer class in Qt, called QScopedPointer. Harald, one of the class’s author, blogged about it, which prompted a lot of comments asking why we have those classes and what’s the difference between the ones we have.

Before we can find out why we have those classes, we need to know the classes we have. So, count with me, in chronological order:

  1. QPointer (4.0)
  2. QSharedDataPointer (4.0)
  3. QExplicitlySharedDataPointer (4.3/4.4)
  4. QtPatternist::AutoPtr (internal class, 4.4)
  5. QSharedPointer (4.5)
  6. QWeakPointer (4.5)
  7. QGuard (internal class, 4.6)
  8. QScopedPointer (4.6)

Note: QExplicitlySharedDataPointer was introduced in 4.3, but the API was made public and documented in 4.4

That many, huh?

Each and every case has its use and they all (except one) are still valid today.

Shared pointer versus shared data

First, let’s get one thing straight: there’s a difference between sharing pointers and sharing data. When you share pointers, the value of the pointer and its lifetime is protected by the smart pointer class. In other words, the pointer is the invariant. However, the object that the pointer is pointing to is completely outside its control. We don’t know if the object is copiable or not, if it’s assignable or not.

Now, sharing of data involves the smart pointer class knowing something about the data being shared. In fact, the whole point is that the data is being shared and we don’t care how. The fact that pointers are being used to share the data is irrelevant at this point. For example, you don’t really care how Qt tool classes are implicitly shared, do you? What matters to you is that they are shared (thus reducing memory consumption) and that they work as if they weren’t.

Strong versus weak pointer referencing

The difference between a strong and a weak reference is whether the existence of the smart pointer class on a given pointer guarantees that the object will not get deleted. In other words, if you have this smart pointer, are you sure that this will always remain valid (provided, of course, everyone is playing by the same rules)?

Some of the pointer classes above don’t guarantee that. If they don’t guarantee that the object remains valid, their main purpose in life is to tell you whether the object has been deleted already or not. Some classes may provide an additional feature that allows you to promote a weak pointer to a strong one, thus guaranteeing that it won’t get deleted anymore.

The Qt smart pointer classes

1. QPointer

QPointer is a weak pointer class and it shares the pointer value, not the data. It only operates on QObject and QObject-derived classes. This class was added in Qt 4.0 and is the direct upgrade of Qt 3’s QGuardedPtr (and Qt 2’s QGuardedPtr). Like its predecessors, QPointer suffers from broken constness support and shows its age.

Its sole purpose in life is to tell you whether the QObject has been deleted already or not. But, unlike Qt 2 and Qt 3, the QObject of Qt 4 can live in several threads. That means QPointer has one serious flaw: it lets you know whether the object has been deleted, but it makes no guarantee about the next line! For example, the following code could be in trouble:

    QPointer<QObject> o = getObject();

    // […]
    if (!o.isNull())
        o->setProperty(“objectName”, “Object”);

Even if isNull() returns false, there’s no guarantee that the object won’t get deleted by the next line.

Therefore, QPointer can only be used to access the object if you can guarantee, by external means, that the object won’t get deleted. For example, QWidget and its descendents can only be created, manipulated and deleted in the GUI thread. If your code is running on the GUI thread or has that thread blocked, then QPointer usage is safe.

2. QSharedDataPointer

Now this is a nice little class. It’s actually by far the most important of the smart pointer classes in Qt for its ingeniuty: it provides implicit sharing, with thread-safe copy-on-write. It requires that your class have a member called ref, which offers a function called ref() for increasing the reference count, and another called deref() that decreases that reference count and returns false when it drops to zero. If you derive your class from QSharedData, you get exactly that. Moreover, the size of a QSharedDataPointer object is exactly the size of a pointer. That means you can replace normal pointers with it in your code without breaking Binary Compatibility.

This class is the basis of all Qt value-type, implicit-shared, thread-safe copy-on-write recent classes, like QNetworkProxy. The only reason why it isn’t used in the base classes like QByteArray, QString and QList is that those classes were developed before this class was made. There’s nothing technically stopping the retrofitting of those classes with QSharedDataPointer.

So QSharedDataPointer is a strong smart pointer class, sharing data.

3. QExplicitlySharedDataPointer

This class is exactly like QSharedDataPointer (so it’s a a strong smart pointer class, sharing data), with the only difference that it never implicitly causes the detach. With QSharedDataPointer, any non-const access will cause the data to be copied. With QExplicitlySharedDataPointer, you have to call detach() for that to happen. This allows you to implement explicitly-shared data classes — which Qt doesn’t have anymore, but Qt 3 did in QMemArray (so it’s present in Qt4’s Qt3Support Q3MemArray).

But it also allows you to have finer-grained control of the detaching operation. In fact, if the Qt Tool classes were to be retrofitted with a smart pointer class, they’d be using QExplicitlySharedDataPointer instead. Using this class allows the code to delay the detaching until the very last moment, ensuring that no unnecessary memory access happens.

4. QtPatternist::AutoPtr

This is an internal class used by the QtXmlPatterns module. It’s basically your stock, dumb pointer wrapper. So it implements a strong pointer. It doesn’t share it, though.

The reason this class exists in the first place is that the QtXmlPatterns module makes extensive use of exceptions internally. To survive exceptions being thrown without leaking memory, a pointer wrapper is indicated. QtXmlPatterns also uses reference-counted classes, for which AutoPtr is not indicated — in that case, it uses QExplicitlySharedDataPointer.

5. QSharedPointer

This class was created as a response to QtPatternist::AutoPtr. When I started writing it, I intended for it to be ready for Qt 4.4 and replace the use of the internal class that Frans had written and what I perceived as a misuse of QExplicitlySharedDataPointer. QtXmlPatterns was using QExplicitlySharedDataPointer not for sharing data, but for sharing pointers. The objects it was sharing were not copiable. A later investigation, however, revealed that QtScript, Phonon, and Solid were using it for the same purpose. (In fact, QtScript introduced QExplicitlySharedDataPointer for that purpose in 4.3)

So QSharedPointer was shelved for 4.4, but was reborn in 4.5. It implements a strong smart pointer class, sharing the pointer. It has all the features you may want in a modern pointer class: it is polymorphic, it supports static, const, and dynamic casts, it implements atomic reference-counting and thread-safe semantics, it supports custom deleters. But note that, when I say it implements thread-safe semantics, it’s only for the pointer itself: remember it shares the pointer, not the data.

It comes with a cost, though: to support polymorphism correctly, the size of QSharedPointer is actually twice the size of a normal pointer. This means you cannot maintain binary compatibility while replacing a normal pointer with it in public parts of your API. You can use it internally in your code, though.

6. QWeakPointer

This is the companion class of QSharedPointer. If that implements a strong control of the pointer, QWeakPointer is a weak smart pointer class, sharing the pointer. It works in tandem with QSharedPointer: QWeakPointer can only be created from a QSharedPointer and they let you know when a QSharedPointer has been deleted.

They can be promoted to QSharedPointer, though, in a thread-safe manner. So it allows us to rewrite the code above to be safer:

    QWeakPointer<Data> weak(getSharedPointer());

    // […]
    QSharedPointer<Data> ptr = weak;
    if (!ptr.isNull())
        ptr->doSomething();

In this case, the promotion of a QWeakPointer to a QSharedPointer will either succeed or it won’t. But that’s a thread-safe decision: if it does succeed, then the resulting object is guaranteed not to get deleted, while you hold the ptr reference (again, as long as everyone plays by the same rules).

With 4.6, I added a nifty new feature to QWeakPointer: its ability to track QObjects as well, without passing through a QSharedPointer. It can be used to determine whether a QObject-derived object has been deleted already or not. So it implements a weak pointer class sharing the pointer value for QObject-derived classes. Sounds familiar? Yes, that’s the idea: you can replace the old, slow QPointer with a faster, modern alternative. Just be careful that the size of QWeakPointer is not the same size of QPointer.

7. QGuard

This is another internal class. It was added to replace QPointer because that is very slow (it uses a global, mutex-protected QHash, which must be accessed by every QObject destructor). It’s actually what prompted me to write the QWeakPointer QObject-tracking feature. But it’s in a state of flux: we don’t know whether we’re going to keep or even use this class. Anyway, it’s internal, so you really don’t care about it.

8. QScopedPointer

This is the new kid in the block: it implements a non-shared strong pointer wrapper. It was created because of our attempt at handling the Symbian platform’s exceptions in our container classes: we needed a way to free resources without writing try/catch everywhere. A scoped pointer provides a very nice way to do RAII. In fact, QScopedPointer is actually a full replacement for QtXmlPattern’s QtPatternist::AutoPtr. Both implement the same functionality, so the internal one can be dropped.

Some people commented in Harald’s blog that we could’ve used QSharedPointer. Actually, we couldn’t: QSharedPointer has the size of two pointers, but we’re replacing Qt code that has the size of one pointer, so we needed a class that fits into that space. That’s also the reason why QScopedPointer has a custom deleter as a template parameter, as opposed to a parameter to the constructor (like QSharedPointer does): it has no space in those 4 or 8 bytes to store the custom deleter.

What’s more, QSharedPointer implements atomic reference-counting. Never mind the fact that it’s atomic: the reference counting is absolutely unnecessary for the cases that QScopedPointer is trying to solve.

Why not C++0x? Why not TR1? Why not Boost?

Some people in Harald’s blog suggested we should use std::shared_ptr (C++0x) or std::tr1::shared_ptr (TR1). I’m sorry, but those people didn’t see very far: we can’t use C++0x. It’s not even approved and there are only two compilers that implement initial support for it (GCC since 4.3 and MSVC 2010, which is in beta). It’s not even funny to suggest using C++0x for Qt at this point. You can use it for your own code, but we can’t use it in Qt.

TR1 has been implemented by more compilers. Unfortunately, not enough. We have to deal with compilers that haven’t implemented C++98 fully yet — or people who don’t bother to change their compiler settings. For example, the latest version of the Sun Studio compiler on Solaris (Sun Studio 12, with CC 5.10) still comes with the RogueWave implementation of pre-C++98 STL. If you read Sun’s article comparing RW stdlib to stlport4, you’ll see why they still keep the 11-year-old library as default. But the point is that they do, which means we have to deal with it. (Fortunately, other compiler vendors provide newer STL implementations, even though their compilers are sometimes far too picky)

That means the only smart pointer from STL we can use in Qt is std::auto_ptr. And even then there are issues (RW stdlib doesn’t implement member templates).

That leaves Boost. And there are some nice things in Boost: boost::shared_ptr, boost::intrusive_ptr, boost::scoped_ptr, etc. In fact, there are a lot of nice things in Boost. Very often I see things there that I’d like to have in Qt. Of course, that means I can just add said feature to Qt as well. There’s nothing stopping me, aside from, well, my day job :-)

One of the main problems with boost is that it provides an “un-Qt-ish” API — to say the least; I prefer calling it “horrible API”, but that’s a statement of opinion, not fact. Even if Boost’s API is intuitive to some people, it represents a departure from Qt’s API. That means those people using Qt and Boost need to learn Boost’s way of doing things as well, their naming of functions, etc.

At the very least, we’d have to wrap Boost’s API around with a Qt shell. But if we go further, we see that Qt loses control of an important piece of its technology. We then have to deal with whatever problems they have, at their schedules. Also, it adds a dependency to Qt, one we can’t justify because they don’t promise binary compatibility (cursory search over the web; please correct me if I’m wrong). Binary compatibility is the other of the main problems.

So, no, Boost is not an option either.

Conclusion

So Qt has too many smart pointer classes. Or does it?

In fact, Qt has only these pointer classes if you exclude the internal classes and you deprecate QPointer:

Class Description
QSharedDataPointer / QExplicitlySharedDataPointer Implements sharing of data (not of pointers), implicitly and explicitly, respectively
QSharedPointer Implements reference-counted strong sharing of pointers
QWeakPointer Implements reference-counted weak sharing of pointers
QScopedPointer / QScopedArrayPointer Implements non-reference-counted strong pointer wrapper (QSharedPointer’s little brother)

Update 1: QExplicitlySharedDataPointer can be used to implement reference-counted sharing of pointers when the target class includes the reference counter (similar to boost::intrusive_ptr)

Update 2: QScopedPointer is really based on the API of boost::scoped_ptr (but is not a copy); QSharedPointer and QWeakPointer were developed from the scratch.

Thiago Macieira
Qt
Posted by Thiago Macieira
 in Qt
 on Saturday, August 15, 2009 @ 10:01

Yes, it’s me again. And it’s not yet the blog series I promised you. No, I decided I would expand some more on the whole binary compatibility / ABI thing, by exploring an even more detailed concept.

This is a brain-dump again. So expect no conclusions. This text may range from “cool stuff, everything I ever wanted to know but never had the courage to ask” to “somewhat entertaining” to “you lost me at ‘it’s me again’”. You’ve been warned :-)

But before I go into it, let me address something from my previous blog: the lack of conclusion. The reason the previous blog has no conclusion is because it’s a brain-dump, not an essay. But more than that, it’s because I had written a large chunk of text that needed editing before publishing. And I thought, “I can do that before I go to bed” — two hours later, I hadn’t finished, had no conclusion and was way past bedtime. So I clicked “Publish”.

If you want a conclusion, you’ve got a conclusion: binary compatibility is hard. We do it for you. You don’t have to learn all of this. Unless you’re writing a library too…
Read the rest of this entry »

Thiago Macieira
Qt
Posted by Thiago Macieira
 in Qt
 on Wednesday, August 12, 2009 @ 22:56

For the past few months I have been quite quiet in the blogosphere. I have been collecting ideas for a two- or three-parter blog that I am still going to write on how Qt rules, but while that doesn’t come, I decided to dump some thoughts on binary compatibility.

Recently I updated the KDE Techbase article on Binary compatibility with C++ (btw, that’s the 3rd page from the top in the Google search for “binary compatibility”). I tried to explain a bit better what the dos and don’ts (mostly the don’ts) are. After I wrote the part about overriding a virtual from a non-primary base, someone on IRC asked me to write some examples.

In order to write those examples, I had to brush up a bit on my skills of name mangling, virtual table layout, etc. and I had even to try and learn Microsoft Visual Studio ABI. It took me a while, but I did find an article with some information on that (link is in the Techbase page’s introduction). I’m also glad I took the time to brush up on my skills, since I found another example of things not to do (the “virtual override with covariant return of different top address” case).

History

Let’s start with a bit of history: whereas on the Unix has always been closely tied to the C language, the DOS market initially had no such relationship. Sure, applications were developed in C even in the early 80s, but the point is that DOS didn’t provide a “C library”. No, to access DOS services, you’d move a some values into registers and cause an interrupt (the Int 21). Implementors of C compilers had to provide their own C library.

Also remember that these were the days before DLLs and shared libraries, so there was no binary compatibility to maintain. The conclusion is that each compiler decided for itself how to implement the calling sequence and the ABI: that is, what are the responsibilities of the caller and the callee, like which processor registers (if any) are used for parameter passing, which ones may be used for scratch values, which ones must be preserves, who cleans up the stack, the size of certain types, the alignment, padding, etc.

And, as you can expect, each compiler implementation did that differently.

On the Unix world, things were a bit more standardised, since a C library had existed for a long while and usually there is a reference compiler for the operating system. In order to use that C library — and you really want to — any other compilers must implement the same ABI.

But even then things become exciting when we talk about C++. If on one hand the C calling convention is pretty well standardised on Unix systems, it’s not so for C++. C is a very low-level language, to the point that you can almost see the assembly code behind C if you stare long enough at the screen (in my experience, however, when that happens, you’re just seeing things and should instead go home and have some rest). C++ introduces several concepts on top of C, like overloads, virtual calls, multiple inheritance, virtual inheritance, polymorphism, covariant returns, templates, references, etc. That means more things for the compilers to differ on.

Now, an interesting thing happened about the year 2000: the Itanium processor. Not because of the processor itself, but for what documents came out of it. It wasn’t enough to know the instruction set for the architecture (see the Software Developer’s Manual), developers needed more and Intel obliged (apparently they had a lot of time on their hands):

GCC clearly adopted this ABI on Itanium, but since the code was there and it was superior to what GCC had, GCC applied it to other platforms as well. So it’s interesting today to see this ABI used in systems that have nothing to do with the Itanium nor are Unix, like Symbian running on ARM devices.

What the ABI needs

It’s quite clear that the ABI needs to accommodate any valid C++ program. That is, it should support all features of the language. Starting with the simplest innovation that C++ has on top of C, we can see how things become interesting.

In C, a function is uniquely identified by its name. There can be no other function with the same name with global scope. C++, on the other hand, has overloads: functions with the same name differing from each other only by the argument types they are called with. By that, we come to the conclusion that any and all ABI must encode the different functions with different names. It has to encode all the differences that are permissible by the C language, but it may also choose to encode more information which helps in outputting error messages.

Then there are virtual calls. When making a virtual call with a given C++ class, the compiler must somehow generate code that can call any reimplemented virtual, without knowing a priori what those reimplementations are. The only way it can do that is if, somewhere in the class, there’s information about where the virtual call is supposed to go. Most (all?) compilers simply add a pointer somewhere in the object, pointing to the “virtual table”: that is, a list of function pointers for each virtual call. Each C++ class with virtual function has a virtual table, listing the virtuals of that class (the ones it inherited and the ones it overrode).

But the virtual tables usually contain more information than just function pointers, like the typeinfo of a C++ class and usually the offsets of virtual bases into the object. The case of a virtual base is illustrated by the typical case of diamond-shaped multiple inheritance: a base “Base”, two classes “A” and “B” virtually-deriving from “Base” and a final class “X” deriving from “A” and “B”. When taken independently, A and B are similar to each other and the “Base” contents are allocated somewhere inside the “A” structure. However, inside “X”, things change, since it must allocate one copy of “A”, one copy of “B” and only one copy of “Base”.

The compiler must therefore encode somewhere where it placed the VBase sub-object. One way is to simply have a pointer, as a member of both “A” and “B”. Another is to put the offset from the beginning of “A” and “B” in the virtual table — you save a couple of bytes in each object.

If you combine those three concepts (naming of all overloads possible, virtual calls and virtual inheritance), you cover 99% of the needs of the ABI for a typical C++ program.

Today

For our purposes with Qt, we can classify the C++ ABIs in three categories: systems using the Itanium C++ ABI, the Microsoft C++ ABI and “other”. That last category is a group of all other compilers, like the Sun Studio compiler for Solaris, IBM’s Visual Age compiler for AIX and HP’s aCC compiler for HP-UX on PA-RISC. (note that HP-UXi runs on the Itanium so aCC uses the Itanium C++ ABI on that platform) We don’t actively test Qt’s binary compaitibility for issues specific to those three compilers for the simple reason that we have no clue what those specific issues are. I don’t know of any documents describing the C++ ABI they implement — and I really don’t want to study them, given the value we’d get. After all, most users of those platforms usually are compiling Qt from source anyway.

The Itanium C++ ABI is a modern concept, created after C++ had been standardised and its features well-known. It was created by people who were trying to solve a problem: how to make all of C++ possible, without overdoing it? They came up with an ABI that is quite elegant: classes with virtuals get added as a first member a hidden pointer to the virtual table of the class, which itself gets emitted along the first non-inline virtual member function. The virtual table contains, at positive offsets, the function pointers of the virtual member functions, while at negative offsets it has the typeinfo and the offsets required to implement multiple inheritance.

Even the name mangling is quite readable, for simple types. The ground rule is that it should be something that C shouldn’t use, to avoid collision: they chose the “_Z” prefix, since underscore + capital is reserved to the compiler. For example, take _ZN7QString7replaceEiiPK5QChari. If we break it down, we end up with:

_Z N 7QString 7replace E i i PK5QChar i

We read that as:

  • _Z: C++ symbol prefix
  • N…E: composed name:
    • 7QString: name of length 7 “QString”
    • 7replace: name of length 7 “replace”

    That means “QString::replace”

  • i: int
  • P: pointer
  • K5QChar: const name of length 5 “QChar” (i.e., const QChar)

Put everything together and we have “QString::replace(int, int, const QChar *, int)”

On the other end of the spectrum, the Microsoft compilers chose to encode the function names with every single detail possible, like for example whether a member function is public, protected or private. Moreover, for some obscure reason that probably doesn’t make sense anymore, Microsoft mangling is also case-insensitive. That is, if someone flipped a switch tomorrow and — gasp! — made C++ case-insensitive, the mangling scheme that they use would work. (GCC of course would be completely lost in a case-insensitive C++ world)

That’s quite clearly a legacy from old DOS days. That also shows when you notice that the mangling scheme encodes the pointer size (i.e., near or far), as well as whether the function call — or, more to the point, the return — is near or far. Those things are definitely not used today, but the ABI can still encode that.

The same function above gets encoded in MSVC as:

?replace@QString@@QAEAAV0@HHPBVQChar@@H@Z

Which we decode as:

  • ?: C++ symbol prefix
  • replace: rightmost (innermost) name
  • @: separator
  • QString: enclosing class
  • @@: terminates the function name.
    The names are in the reverse order, so we have “QString::replace”
  • Q: public near (i.e., not virtual and not static)
  • A: no cv-qualifiers for the function (i.e, not const or volatile)
  • E: __thiscall (i.e., call of member functions)
  • AA: the first “A” stands for reference (possibly near reference), the second “A” indicates it’s an unmodified reference (i.e., not “const X &”)
  • V…@: class and delimiter
    • 0: indicates the first class name seen before, i.e., QString
  • H: int
  • H: int
  • P: normal pointer (i.e., not const pointer)
  • B: const type — PB together makes “const X *”, whereas “X * const” would be QA
  • V..@: class and delimiter
    • VQChar@: class QChar, plus delimiter
  • H: int
  • @: end of argument list
  • Z: function, or code/text storage class

That reads: “public: class QString & near __thiscall QString::append(int, int, const class QChar *, int)”. Things to note about this:

  1. The use of ? as prefix, instead of something you can normally type in C
  2. The same letter can mean different things depending on the position
  3. Types are assigned alphabetically from a list (signed char is C, char is D, unsigned char is E, short is F, unsigned short is G, int is H, unsigned int is I, etc.) instead of trying to resemble the type.
  4. “class” is encoded explicitly (V), whereas struct is “U”, union is “T” and enum is “W4″ (at least, int-backed enums)
  5. encoding of calling sequence (__thiscall) and displacement (near)

On one hand, the Microsoft mangling scheme makes it possible to produce much more detailed error messages, and makes a difference in type or calling sequence not resolve to the same symbol. On the other hand, it also encodes details that make no difference at all to the call, like the difference between “class” and “struct”, or whether the member function is private, protected or public.

Thiago Macieira
Uncategorized
Rants
Posted by Thiago Macieira
 in Uncategorized, Rants
 on Thursday, May 28, 2009 @ 21:32

So I’m sitting here in my hotel room in Helsinki, after a quick flight from Oslo. In fact, the flight arrived early, and that’s a first time that it has happened for me. I was in the taxi on my way to the hotel 5 minutes before the scheduled time of arrival for the flight.

Anyway, so I got to hotel, had a shower, turned the laptop on, and logged on to the wireless service provided by the hotel along with the phone company. Then I connected to the Nokia VPN and my own VPN to download my emails. After reading said emails, I hopped on to IRC.

As a helpful chap that I am, I saw someone posting a link on the #qt channel to some code of theirs they were having problems with. I clicked the link to see what the problem was. The bizarrely-sized progress window in my KDE 4 popped up, with the progress bar indicating no progress at all.

“Odd,” I thought, then I clicked Cancel and tried again. No such luck… “Maybe that website is just not working from here” I pondered. I decided to try one famous search engine whose reminds us of a very large number. And again it didn’t load.

I was having network trouble.

My first reaction was to test whether the Nokia VPN was still on. It was dead, but vpnc had not disconnected, so the traffic to the Nokia VPN wasn’t going anywhere. I disconnected, but still no Internet.

A couple of minutes of searching, I found out the reason:
cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.0.6
search example.com

(This is not edited, the hotel internet really gave me a domain search of “example.com”)

Can you guess what the problem was?

Let me give you a hint: my blog six months ago called “The Great Crash of 2009” addressed this very subject.

The DNS server that the hotel network gave me is on the same network as my home VPN (because my home DHCP server is the ADSL router by my Internet provider). So all my DNS queries were timing out, because they were being transmitted over the VPN to an address that isn’t active. (I think it’s assigned to my N95 when I’m at home, but I’d have to be home to check)

This problem I had is probably quite common, I imagine. I mean, the hotel I’m in isn’t exactly in a touristic spot. It’s probably sought more by business travelers, who, like me, come with a laptop and VPN access to their offices. How many of them reroute 10.0.0.0/24 when the VPN is active? I would bet quite a few.

So we’re 6 months into the year of the Great Crash because IPv4 runs out. And we’re still not using IPv6.

Quick update on how the transition worked

The blog of 6 months ago analysed our own transition into the Nokia network as a case study of where IPv6 would have been helpful. The curious readers of this blog were left imagining “how will they pull off this transition? Will the FTP servers ever be tested again? Can the Trolls print?”

I left you in suspense, so here’s what happened in these 6 months.

Yes, we can print. Otherwise I wouldn’t be here in Helsinki, because I wouldn’t have printed my eTicket.

The network transition isn’t complete yet. I don’t know the exact reasons why our test farm remains in the old Trolltech network, but it does. All I’ve heard are anecdotes and the worst cases. For one thing, we received the level 3 switches that serve the network. But our sysadmins could not install them: we had to wait for a technician to come from the outsourced network provider, unpack the switch, install it in the rack and plug the cables.

Our QA department has been expanding our virtual machines for running tests in parallel. A week or two ago, they reported they couldn’t add more machines. The reason? They had run out of IP addresses in the block allocated by the outsourced network provider. They had to wait for a new allocation — possibly a reallocation!

The network tests in Qt have all been updated to reference the test server by a generic name, “qt-test-server.qt-test-net”. Anyone running the network tests must update their /etc/hosts or Windows equivalent with the IP address of the virtual machine.

(By the way, the network tests of Qt have been made available in the open repository for anyone to see)

We also had to go over all the tests and fix any failures caused by moving to the Nokia network. All of the network tests are now passing (including the one in tst_QHostInfo that assumed that “foo” would never resolve).

Now we have to face other problems. Like today not being able to log in to the Nokia intranet websites… but this one I had been expecting: Nokia IT requires us to change our passwords every 90 days.

The price of security. (Or is it?)



© 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
All other trademarks are property of their respective owners.