posted on Tuesday, June 28, 2005 8:48 PM
by
johnwood
XQuery: A Modern Query Language
For the past few weeks I've been pretty intensely learning and applying my knowledge of XQuery, a modern query language, standardized by the W3C, that is rapidly gaining momentum in the industry - not least because of its integration into SQL Server 2005 and other top database products, and also its datasource-independent design.
XQuery is mostly used to query XML data, for times when XPath simply isn't enough. Think ordering and nested "for" loops. However, it was designed to achieve much more than that. Because of its open design and suitability for integrating with semi-structured data, many implementors are currently using it as the centerpiece of their enterprise data integration strategies -- letting you perform queries that join excel data with DB2 data with SQL data with SOAP results with your favorite RSS feed, for example.
One thing that sets XQuery apart from other query languages (eg. T-SQL) and general purposes languages is the fact it's declarative rather than procedural. However, I find that to be the most confusing aspect of the language, and something that is very difficult to explain.
What declarative means to a developer is that the query has to be written to conform to a specific structure. And that's what newcomers find challenging about the language. Compare this to a traditional procedural language where each line must follow a given syntax, but the procedures themselves can contain any number or form of line you choose - and this is what defines the unique execution of your program. In T-SQL, for example, when you bring a particular query to mind you often plan how that will execute by thinking in terms of steps, and each step can be a little query in its own right, a while loop, if statement, letting you have complete control over the flow of the query. You can even store temporary results and then use them again later.
For declarative languages things are different. You instead have to think of the query as just one unit, a bit like a tree, and the design of your query is encoded into nodes of that tree. You have no control over the flow as such, instead you define the relationships between the different data sources, the filters you want to apply and the data that you want to retrieve. The actual flow is determined by the query processor, after analyzing the intent of your query, applying various optimizations and constructing a plan. The tree metaphor is used because a data source can itself be the result of a query, and so the recursive nature of the structure is a little like a tree and its branches.
SQL, too, is declarative (the select, joins and sub-queries are similar in structure to XQuery), but alas most queries are not written in SQL per se, but are written in T-SQL, which is a procedural language that encompasses declarative SQL queries. The reason most people fall back on T-SQL for their queries is because SQL is quite primitive, and anything more complex than a handful of joins can easily become unmanageable. You have to remember that SQL was invented in the late 1970s: it's really showing its age, and it's not surprising that it looks a little outdated. What's more, I find the same complex queries in XQuery can actually look quite elegant compared to its equivalent T-SQL implementation.
The fact that XQuery is declarative gives it some big advantages over T-SQL. It allows database engines to provide far better understanding of the intent of the query and therefore implement better optimization when constructing the query plan. Because each part of the XQuery expression is a well defined part of the whole, the query can be executed with minimal overhead, without creating temporary tables to store results for example. It's also more difficult to implement queries inefficiently, because you're granted less freedom to be round-about. I find that a declarative approach to queries also helps readability and helps maintain a sensible structure to the queries.
Will XQuery one day be used in preference to SQL for querying data? Perhaps, who knows. With SQL Server 2005 there's a good reason to learn XQuery today so that you can make use of its new XML features. And I'm sure we'll see more XQuery implementations popping up in the enterprise systems market in the near future.
If you're interested in learning more about XQuery you can look at the W3C's use-cases (which I find to be a long, but good starting point for hands-on programmers), or you can look at this introduction at DevX. You can also look out for some future postings I'll be running on the subject.
Please visit my blog at DevAuthority to leave comments!