englich
Qt
Aggregated
Posted by englich
 in Qt, Aggregated
 on Wednesday, September 10, 2008 @ 14:35

A couple of weeks ago, I merged the development branch for XSL-T into our main line, heading for Qt 4.5. The idea is that Qt will carry an XSL-T 2.0 implementation with as usual being cross-platform, having solid documentation, and easy of use.

Using it is should straightforward. Either on the command line:

xmlpatterns yourStylesheet.xsl yourInputDocument -param myParam=myValue

Or using the C++ API[1]:

QXmlQuery myQuery(QXmlQuery::XSLT20);
myQuery.bindVariable("myParam", QVariant("myValue");
myQuery.setQuery("http://example.com/myStylesheet.xsl");
QFile out("outFile.xml");
out.open(QIODevice::WriteOnly);

myQuery.evaluateTo(&out);

See the documentation for the QXmlQuery class on the overloads available for setQuery() and evaluateTo(), for instance.

However, due to the beast XSL-T 2.0 is — I agree that it’s larger than XQuery — we’ve decided to do this according to the “release early release often” approach. The first, in Qt 4.5, will carry a subset, and subsequently be complemented in Qt 4.6. The current status is documented in the main page for the QtXmlPatterns module, which can be viewed in the documentation snapshot.

Therefore, while the current implementation probably falls short on more complex applications(such as Docbook XSL), it can run simpler things, users can plan ahead, and we trolls can receive feedback on what features/APIs that are missing, and what needs focus. So feel free to do that: send a mail to qt-bugs@trolltech.com, or say hello on IRC(FransE, on Free Node).

The code is accessible through the Qt snapshots.

What is XSL-T anyway?

XSL-T is a programming language for transforming XML into XML, HTML or text. Some implementations,  such as QtXmlPatterns or Saxon, provides mechanisms to map XML to other data sources and hence widens the scope of the language by letting the XML act as an abstract interface. Wikipedia has a good article on XSL-T. Version 2.0 of XSL-T extends the language heavily by putting a rigid type system and data model in the backbone, and adds many features that was a pain to miss when programming in XSL-T 1.0. XSL-T 2.0 use XPath 2.0, and shares the same large function library as XQuery.

1.

Over time, Java bindings through QtJambi and ECMAScript bindings through QtScript, will likely arrive.

englich
Uncategorized
Aggregated
Posted by englich
 in Uncategorized, Aggregated
 on Tuesday, December 11, 2007 @ 14:12

I have not yet seen an API for XQuery in which integrating the data model, atomic values, nodes and all, into the interfacing language has been a walk in the park.

At the top of the list of things people tend to ask on the forums around is “How do I get XML represented as a sequence of bytes in Java/C++ into my query?”, whose result is clear — a tree fragment for the query to operate on — but whose method for reaching is not that given if you ask me.

There is no “bytestream” type in XQuery. Should the user build the tree herself and then pass the tree to the query? Should the implementation in some voodoish way be instructed how to treat a string or custom type? Shouldn’t the query engine do it such that its scope of analysis is increased and its done the way it prefers it?

What I sense have been the problem with some solutions is that they mix the data, the bytestream, with interpretation.

In Qt this manifestate itself with that the content of a QIODevice should appear in a QXmlQuery. The way it’s now provided, is that when a QIODevice is bound to a variable using QXmlQuery::bindVariable(), the query sees a URI(an instance of xs:anyURI) which behind the scenes maps to the QIODevice the user bound. Hence, if the purpose is to build an XML document, one passes the URI to the builtin fn:doc() function.

I hope this is clean. Since it’s handled like any other URI, custom extensions stays at a minimum, error reporting is consistent, and the interpretation hasn’t been coupled with the data. For instance, later on I hope to merge in support for XInclude and XQuery Update, and in those cases the URI is again simply passed to for instance fn:put().

One can weight quite well on URIs and the abstraction the XPath Data Model provides, it seems.

englich
Qt
Aggregated
Patternist
Posted by englich
 in Qt, Aggregated, Patternist
 on Thursday, November 15, 2007 @ 10:52

People have asked for Qt’s XQuery & XPath support to not be locked to a particular tree backend such as QDom, but to be able to work on arbitrary backends.

Any decent implementation(such as XQilla or Saxon) provide that nowadays in someway or another, but I’d say Patternist’s approach is novice, with its own share of advantages. So let me introduce what Qt’s snapshot carries.

<ul>
    {
        for $file in $exampleDirectory//file[@suffix = "cpp"]
        order by xs:integer($file/@size)
        return <li>
                    {string($file/@fileName)}, size: {string($file/@size)}
                  </li>
    }
</ul>

and the query itself was set up with:

QXmlQuery query;FileTree fileTree(query.namePool());
query.setQuery(&file, QUrl::fromLocalFile(file.fileName()));
query.bindVariable("exampleDirectory", fileTree.nodeFor(QLibraryInfo::location(QLibraryInfo::ExamplesPath)));
if(!query.isValid())
     return InvalidQuery;
QFile out;
out.open(stdout, QIODevice::WriteOnly);
query.serialize(&out);

These two snippets are taken from the example found in examples/xmlpatterns/filetree/, which with about 250 lines of code, has virtualized the file system into an XML document.

In other words, with the tree backend FileTree that the example has, it’s possible to query the file system, without converting it to a textual XML document or anything like that.

And that’s what the query does: it finds all the .cpp files found on any level in Qt’s example directory, and generate a HTML list, ordered by their file size. Maybe generating a view for image files in a folder would have been a tad more useful.

The usual approach to this is an abstract interface/class for dealing with nodes, which brings disadvantages such as heap allocations and that one need to allocate such structures and hence the possibility to affect the implementation of what one is going to query.

But along time ago Patternist was rewritten to use Qt’s items & models pattern, which means any existing structure can be queried, without touching it. That’s what the FileTree class does, it subclasses QSimpleXmlNodeModel and handles out QXmlNodeModelIndex instances, which are light, stack allocate values.

This combined with that the engine tries to evaluate in a streamed and lazy manner to the degree that it thinks it can, means fairly efficient solutions should be doable.

So what does this mean? It means that if you would like to, you can relatively cheaply be able to use the XQuery language on top of your custom data structure, as long as it is somewhat hierarchical.

For instance, a backend could bridge the QObject tree, such that the XQuery language could be used to find Human Interface Guideline-violations within widgets; molecular patterns in a chemistry application can concisely be identified with a two or three liner XPath expression, and the documentation carries on with a couple of other examples. No need to convert QWidgets to nodes, or force a compact representation to sub-class an abstract interface.

A to me intriguing case would be a web robot that models the links between different pages as a graph, and finds invalid documents & broken links using the doc-available() function, or reported URIs that a website shouldn’t be linking to(such as a public site referencing intranet pages).

Our API freeze is approaching. If something is needed but missing, let me know.

englich
Qt
Aggregated
Patternist
Posted by englich
 in Qt, Aggregated, Patternist
 on Tuesday, October 23, 2007 @ 09:26

Attention to details is ok, but compiler messages has historically not received it. Here’s an example of GCC’s output:

qt/src/xml/query/expr/qcastingplatform.cpp: In member function 'bool CastingPlatform::prepareCasting():
qt/src/xml/query/expr/qcastas.cpp:117: instantiated from here
qt/src/xml/query/expr/qcastingplatform.cpp:85: error: no matching function for call to ‘locateCaster(int)’
qt/src/xml/query/expr/qcastingplatform.cpp:93: note: candidates are: locateCaster(const bool&)

Typically compiler messages have been subject to crude printf approaches and dignity has been left out: localization, translation, consistency in quoting style (for instance), adapting language to users (e.g, to not phrase things preferred by compiler engineers), good English, and just generally looking sensible.

To solve that it requires quite some work, and that’s probably the explanation to why it often is left out. To have line numbers, error codes, names of functions, and whatever available and flowing through the system requires quite some plumbing and room in the design.

Another thing is that nowadays we really should expect that compiler messages within IDEs or other graphical applications should be sanely typeset. If not, we’ve lost ourselves in all this UNIX stuff. Keywords and important phrases should be italic, emphasized, colorized depending on the GUI style.

For shuffling compiler messages around it is customary to pass a set of properties: a URI, line number, column number, a descriptive string, and possibly an error code. Apart from that it falls short reaching the goals outlined in this text, it encounters a problem which I think is illustrated in the above example with GCC. What does one do if the message involves several locations?

Even if a message involves several locations, it is still one message and should be treated so, and presented as so. The approach of using a struct with properties falls short here, and chops the message into as many parts as it has locations.

For Patternist I wanted to make an attempt at improving messages. So far it is an improvement at least. For instance, for this message that the command line tool patternist outputs:

cli.png

the installed QAbstractMessageHandler was passed a QSourceLocation and a message which read:

<p>Operator <span class='XQuery-keyword'>+</span> is not available between atomic values of type <span class='XQuery-type'>xs:integer</span> and <span class='XQuery-type'>xs:string</span>.</p>

It was subsequently converted to local encoding and formatted with ECMA-48 color codes. (The format is not spec’d yet, it will probably be XHTML with specified class ids.)

While using markup for the message is a big improvement, it opens the door for formatting and all, this API still has the problem of dealing with multiple locations.

What is the solution to that?

Striking the balance between programmatic interpretation(such that for instance source document navigation is doable) and that the message reads naturally as one coherent unit is to… maybe duplicate the information, but each time tailored for a particular consumer?

<p xmlns:l="http://example.com/">In my <l:location href="myDocument.xml" line="57" column="3">myQuery.xq at line 57, column 3</l:location>, function <span class="XQuery-keyword">fn:doc()</span> failed with code <span class="XQuery-keyword">XPTY0004</span>: the file <l:location href="myDocument.xml" line="93" column="9">myDocument.xml failed to parse at line 93, column 9</l:location>: unexpected token <span class="XQuery-keyword">&</span>.</p>

This is complicated by that language strings cannot be concatenated together since that prevents translation. But I think the above paragraph is possible to implement. As above, the message reads coherently, but still allows programmatic extraction. A language string and formatted data sits in opposite corners of extremity, and maybe markup is the balance between them.

Would this give good compiler messages and allow slick IDE integration? If not, what would?

englich
Qt
Aggregated
Patternist
Posted by englich
 in Qt, Aggregated, Patternist
 on Tuesday, September 18, 2007 @ 10:03

The Qt snapshots now includes support for XPath 2.0 and XQuery 1.0.

Being part of the XML library, the idea is that Qt 4.4 will ship with a C++ API for running and evaluating such queries. On the side too, is a command line tool called patternist, for quickly testing queries, scripting and old-school web solutions. But who cares, blogs with screenshots is the thing:

cli.png

Stronger XML support in Qt has been consistently asked for by users over a long time, with XPath being one of the main requests. Hopefully Patternist, with the help of KDE folks, users, and customers expressing what’s missing, will please those needs. Considering the similarities of XQuery and XSL-T, Patternist also serves as a foundation for implementing XSL-T, if so decided.

For KDE folks all this might ring a bell. Patternist was indeed first developed for a long time in the KDE repository, as part of KDOM. We just thought it would make a lot more use as part of Qt.

And I think exactly that makes this exciting. W3C’s XQuery working group has registered an astonishing number of exciting implementations. But for users, reliability is what matter in the end. Whether bugs will be fixed, whether people can answer questions, whether the piece is maintained and documented. Persistency. Trolltech swiftly carries this on its shoulders(assuming I brush my teeth and all that).

Combined with that Qt is open source and the Patternist SDK used for development is as well, this is like eating some nasty chocolate while at the same time singing a little duet with Miss Piggy. I can’t sing, nor can Piggy (although she tries), but you get my point.

Humble modesty aside, it is worth to mention that this still needs work. About 94% of the test suite is passed, the API needs more work, and there is performance issues.

Nailing test cases and trimming code paths are problems that have known solutions (though typically horrible to carry out). Harder is to know what people need and how they need it. It’s hard to guess what kind of APIs or extensions Amarok or KOffice or a GNOME or web application need.

If you got input, feel free to add a comment to the blog, send a report to Trolltech, grab me(FransE) on the Open Projects IRC network, or ask a question or two on the qt-interest mailing list.

The documentation starts over here.

englich
Aggregated
Posted by englich
 in Aggregated
 on Thursday, January 11, 2007 @ 11:18

Patternist, the XQuery/XPath/XSL-T framework, is abstracted to be able to use different tree-implementations, in concept like Saxon. Up until now, Patternist has been using one that wrapped Qt’s QDom. When I started writing that very first tree backend it was with the purpose to boot strap the rest of the code, a temporary solution that got the job done until the solution for production use arrived. QDom’s massive memory usage — my measurements says roughly 18 times the document size — is people’s usual complaint. The reason I stalled was that the XPath Data Model, simply couldn’t be implemented with QDom, let alone efficiently. So what now?

This blog entry is tinkering — although without accompanying code — on how to represent XML.

(more…)

Comments Off
englich
Aggregated
Posted by englich
 in Aggregated
 on Tuesday, January 09, 2007 @ 16:09

I wrote a small tool for extracting statistics about XML documents. If I was less lazy, it could be more useful. Still, to some use I think it is.

(more…)

Comments Off
englich
Aggregated
Posted by englich
 in Aggregated
 on Monday, January 08, 2007 @ 09:44

I’ve been reading research papers about XQuery recently and I am impressed. I’ve always had the impression that the amount of papers have significantly increased during the XPath 2.0/XQuery 1.0 “era”, but my conviction that the organic nature of XML is hopeless to query and store efficiently has withstood until now — to mention one of the few interesting discoveries I’ve done while scanning papers.

But that’s the positive side of it.

(more…)

Comments Off
englich
Aggregated
Posted by englich
 in Aggregated
 on Sunday, January 07, 2007 @ 17:15

Celeste is doing historical research on KDE’s usability. To her request for my comment on things, I modestly replied:

I think my effective contributions are modest, although one could say I’ve tried. But I can of course always express my view.

Her response, which made me think, was:

Your view is what matters to me, not some generalized or idealistic view from the usability contributors themselves. I have certainly learned some interesting things from the developers.

One could of course take that as a negative comments towards “the usability contributors” but I think it was to address a certain problem.

Aaron wrote in a recent blog entry:

[…] It punishes developers like Tim for speaking openly about the challenges we face. the free software community relies on our ability to speak openly and honestly to each other; if we start to get punished for it then we have a real problem.

Although the blog entry is in general about a certain article, Aaron is in that paragraph simply pointing out that being able to talk and address issues is dead important.

As reply to one of Celeste’s questions on KDE’s usability, I wrote:

One thing I admire the GNOME project much of, is their ability to change. They manage to get ideas /implemented/ in their main
line, without getting shot down at the proposal-stage. Those ideas might one disagree with or they are perhaps even downright wrong, but the ability to
change, to test new ideas, is a prerequisite for reaching the right ideas. Progress isn’t a linear progression of constantly correct changes, and the
working process must be adapted for that.

I can’t name a particular achievement, but each time a usability idea advances from being a proposal to being tried on the practical level, progress
is happening.

which as well merely says “don’t shoot down ideas just because they’re different or sound bad.” I’m of course only speculating from my view on things, but it wouldn’t surprise me if many nods to that it can be difficult to not have an idea stalled as early as when it is a suggestion.

Open Source and Free Software, at least if we go back in time, was a liberator for sick things in the IT industry, and will continue to be so, as long as those values are withheld. But perhaps the community is too consumed with its achievements on the democratic side, to see the sides of itself that fights its own mindset.

Belief fucks up mankind in spectacular ways. “We just need a revolution from system X to system Y and we will have no more corruption”, “It is ok to reduce the democratic rights for Them because They are not Us”, “We don’t have to listen because we are right”, and other countless examples that demonstrates people thinking there is a difference between people as long as they have a different skin color, operating system, religion, political system, desktop environment, and so on.

My point is simple and well repeated: openess is important. This time, it’s being emphasized for the open source community. Things will stall if ideas from GNOME are on mailing lists tuted as evilness, if less technically minded users are What’s Wrong, if KDE is considered to always be perfect, or if new ideas are shot down for not being what we have. And blogs isn’t the only way ideas are expressed, what ideas that are implemented in software, is another way as well.

Comments Off
englich
Aggregated
Posted by englich
 in Aggregated
 on Saturday, November 18, 2006 @ 18:44

How to design APIs for XML is debated daily, and has been done so for long. For too long. Now ages ago, companies formed at W3C to design the DOM, using language neutrality and document editing with load/save persistence as goals(it seems, and some says). But some needed other things, such as a streamed, less verbose approach and hence SAX was brought to use. Others found SAX cumbersome to use, and StAX was deployed. And so on, and so on.

One urge I have is to cry out: why can we never design a sensible API? But that reaction wouldn’t be justified. Software is the implementation of ideas. When the software has to change, it’s the reaction stemming from that the ideas(the requirements) changed.

Afterall, SAX works splendidly for some scenarios. I don’t expect one tool for all scenarios, because XML is used in too varied ways. But still, even though one can expect tools to become obsolete and that one size doesn’t fit all, the current situation is more worse than what is reasonable.

In Qt, the dilemma the XML community has is present as well, painfully. The QtXml module provides an (in my opinion poor) implementation of DOM, and SAX. Something needs to be added in order to make XML practical to work with using Qt. Some of the ideas I’ve heard are by the book: add StAX as a streamed-but-easy-to-use API, and a XOM-like API for doing in-memory representation. The latter would be an API that doesn’t carry the legacy of XML’s evolution(the addition of namespaces, for instance) and in general do what an XML API is supposed to do: be an interface for the XML and therefore take care of all the pesky spec details, which XOM does in an excellent way.

If Trolltech added StAX and a XOM-like API to Qt no one could blame them. Other do it and it is the politically correct alternative at this point of our civilization(just as DOM once was). But I start suspecting that it’s the wrong direction. That the step of learning a lesson of adding yet another API could be skipped, in favour of jumping directly to what would follow.

Let’s look at what XML is:

  • It is a medium, a text format for exchanging data, specified in XML 1.0 and XML namespaces. XML is absolutely terrific at this. The IT’s history is tormented with interoperability problems such as encoding issues. XML solves all that in one go. It abstracts away from primitive details, and provides a platform. This is why XML is popular.
  • A set of concepts to express ideas. This is all that about elements and nodes formed in a hierarchial structure(that from a reader’s standpoint can be difficult to distinguish from the text representation, since we humans instantly see the logical structure when looking at an XML document). Exactly what that is, is not so obvious. The different appproaches are often referred to collectively as data models, and there are plenty of them: the XPath 2.0 Data Model, the XML Information Set, the PSVI infoset extension, the DOM(that it stands for Document Object Model is a hint), and the list goes on. These are all different ideas to what a sequence of characters arranged to be valid XML, actually means.

That one can view XML as consisting of these two parts reveals a bit about how XML has evolved. First XML 1.0 arrived, taking care of syntax details. Later on, this plethora of data models arrived to formally define what XML 1.0 informally specified. Understandably many wants to make the XML specification also specify the data model. The question is of course which one to choose, and what the effects are of that.

But the list of data models doesn’t stop with the above. Those are just examples of standardized models. I believe that one data model exist for each XML scenario.

When a word processor reads in a document with the DOM, the actual data model consists of words, paragaphs, titles, sections and so on. The DOM represents that poorly, but apparently acceptably well. Similarly, when a chemistry program reads in a molecule, its data model consists of atoms.

That XML is used for different things can be seen in the APIs being created. SAX is popular because it easily allows a specialized data model to be created, by that the programmer receive the XML on a high level and from that builds the perfect data structure. DOM allows sub-classing of node classes by using factories and attaching user data to nodes, in order to make the DOM instance closer to the user’s data model.

XML is not wanted. Communication is a necessary evil, and therefore XML is as well. If programs could just mindwarp their ideas, molecules and word processor documents, to another program they would, instead of dwelving into the perils with communicating through XML.

I believe this is a good background when tackling the big topic of providing tools for working with XML. It’s not questions like “How do we design an API that avoid the namespace problems the DOM exposes?” It starts at a higher level:

How do we allow the user to in the easiest and most efficient way go from XML to the data model of choice?

Ideally, the user shouldn’t care about details such as namespaces and parent/child relationships. If the API has to push that onto the user, it’s an necessary evil. It’s again about not getting far away from the ideal data model. The idea is in general already practiced when it comes to the most primitive part: serialization. It’s widely agreed that a specialized mechanism(a class) should take care of the serialization step.

Let’s try to apply this buzzword talk to Patternist and Qt. A QAbstractItemModel is typically used to represent the data, since the data is practically separated from its presentation, with the model/view concept. The user wants to read an XML file, and produce an QAbstractItemModel instance.

Patternist, just as Saxon, is designed to be able to send its output to different destinations. It’s not constrained to produce text(XML) or SAX events or building a DOM tree, it just uses a callback. And that callback could just build an item model. It should be possible to write that glue code such that it works for arbitrary models.[1] With such a mechanism, one would only have to write an XQuery query or XSL-T stylesheet that defines a mapping between the XML and the item model, in order to do up and down conversions.

Using Patternist to directly creating item models might not be the way to go. But I do think one should concentrate on what the user wants to achieve instead of trying to fix the current tools(perhaps it doesn’t matter that the hammer is broken, because in either case a screw driver should be used). And amongst what the user wants to do, I believe converting between XML and the data model of choice is a very common scenario.

1.

In general, it all seems interesting to write “interactive” output receivers and trees with Qt. One would be able to write queries/stylesheets that generate widgets, write queries over the file system or QObject tree, etc. But that’s another topic.

Comments Off


© 2008 Nokia Corporation and/or its subsidiaries. Nokia, Qt and their respective logos are trademarks of Nokia Corporation in Finland and/or other countries worldwide.
All other trademarks are property of their respective owners.