Tuesday, June 28, 2005 - Posts

Reflection changes in .Net 2.0 - Order of members

Something I think anyone making extensive use of reflection in .Net should know…

I use reflection for all kinds of things in my applications. In one instance, I keep a cache of all the PropertyInfo and FieldInfo members of my classes so that I can set the properties by numeric index later on. However I have since found that this feature breaks when I run the same code in .Net 2.0.

It turns out that it's to do with the *order* that fields and properties are returned in .Net. In previous versions you could pretty much rely on the order of members returning from calls such as GetFields: it would match the order the members are defined in the source code.

In .Net 2.0 however. If you rely on the order of members being returned in your code, that code will break in .Net 2.0 and there's nothing you can do to get Microsoft to fix it. Not only is the order of returned members different, it can change while the application is running!

I got a nice, lengthy, detailed explanation from a guy called Jesse Kaplan at Microsoft. I've pasted it below so you can better understand why this change was made:

Your application seems to be hitting a change we made in how items are returned back from reflection. In previous versions of the .Net FX we always chose the same pattern to return items from reflection, unfortunately the problem is that between versions of the FX the pattern remained the same but the order things were returned still changed. For example, if an overload was added to a parent type, what used to be the 6th method in the type was no longer the 6th method and applications would fail.

This type of problem fell into a general bucket we call "deterministic non-determinism." It basically means that for a given platform/os/application layout/FX version the behavior would be deterministic, but that the behavior was not something that could preserved between releases. This type of issue makes it very hard to get predictable behavior out of your application. What we’ve done in 2.0 is to find cases of this type of behavior and if possible make it completely deterministic to remove and ambiguities or, if completely deterministic behavior was not possible, we would opt for making the behavior non-deterministic enough that it would be hard to write a new application that relied on this behavior.

When it came to the order items were being returned from reflection we couldn’t provide a model where a given field would always be in a specific position in the list, so we moved to a model where the return order was somewhat randomized over the course of the application and thus make it more difficult to take dependencies on this.

In the past we haven’t don’t such a good job documenting best practices for avoiding these types of issues, but by the time we release the final version of the NetFX 2.0 we will have more documentation that will help you avoid these types of problems in the future.

I'm not 100% sure I accept his explanation. Relying on the position of members between sessions is clearly something that shouldn't be deterministic or constant, but I can't see what's wrong with relying on the order of the members during the session. This used to work in 1.1 and doesn't in 2.0, it seems, simply because Microsoft don't think people should be relying on the position of members remaining constant between sessions. The solution implemented doesn't just make it difficult to unknowingly rely on the position of the members, it makes it impossible to determine the order of the members -- something which *is* constant and deterministic while the application is running.

Alas I guess it's too late to make any changes...

Please visit my blog at DevAuthority to leave comments!

XQuery: A Modern Query Language

For the past few weeks I've been pretty intensely learning and applying my knowledge of XQuery, a modern query language, standardized by the W3C, that is rapidly gaining momentum in the industry - not least because of its integration into SQL Server 2005 and other top database products, and also its datasource-independent design.

XQuery is mostly used to query XML data, for times when XPath simply isn't enough. Think ordering and nested "for" loops. However, it was designed to achieve much more than that. Because of its open design and suitability for integrating with semi-structured data, many implementors are currently using it as the centerpiece of their enterprise data integration strategies -- letting you perform queries that join excel data with DB2 data with SQL data with SOAP results with your favorite RSS feed, for example.

One thing that sets XQuery apart from other query languages (eg. T-SQL) and general purposes languages is the fact it's declarative rather than procedural. However, I find that to be the most confusing aspect of the language, and something that is very difficult to explain.

What declarative means to a developer is that the query has to be written to conform to a specific structure. And that's what newcomers find challenging about the language. Compare this to a traditional procedural language where each line must follow a given syntax, but the procedures themselves can contain any number or form of line you choose - and this is what defines the unique execution of your program. In T-SQL, for example, when you bring a particular query to mind you often plan how that will execute by thinking in terms of steps, and each step can be a little query in its own right, a while loop, if statement, letting you have complete control over the flow of the query. You can even store temporary results and then use them again later.

For declarative languages things are different. You instead have to think of the query as just one unit, a bit like a tree, and the design of your query is encoded into nodes of that tree. You have no control over the flow as such, instead you define the relationships between the different data sources, the filters you want to apply and the data that you want to retrieve. The actual flow is determined by the query processor, after analyzing the intent of your query, applying various optimizations and constructing a plan. The tree metaphor is used because a data source can itself be the result of a query, and so the recursive nature of the structure is a little like a tree and its branches.

SQL, too, is declarative (the select, joins and sub-queries are similar in structure to XQuery), but alas most queries are not written in SQL per se, but are written in T-SQL, which is a procedural language that encompasses declarative SQL queries. The reason most people fall back on T-SQL for their queries is because SQL is quite primitive, and anything more complex than a handful of joins can easily become unmanageable. You have to remember that SQL was invented in the late 1970s: it's really showing its age, and it's not surprising that it looks a little outdated. What's more, I find the same complex queries in XQuery can actually look quite elegant compared to its equivalent T-SQL implementation.

The fact that XQuery is declarative gives it some big advantages over T-SQL. It allows database engines to provide far better understanding of the intent of the query and therefore implement better optimization when constructing the query plan. Because each part of the XQuery expression is a well defined part of the whole, the query can be executed with minimal overhead, without creating temporary tables to store results for example. It's also more difficult to implement queries inefficiently, because you're granted less freedom to be round-about. I find that a declarative approach to queries also helps readability and helps maintain a sensible structure to the queries.

Will XQuery one day be used in preference to SQL for querying data? Perhaps, who knows. With SQL Server 2005 there's a good reason to learn XQuery today so that you can make use of its new XML features. And I'm sure we'll see more XQuery implementations popping up in the enterprise systems market in the near future.

If you're interested in learning more about XQuery you can look at the W3C's use-cases (which I find to be a long, but good starting point for hands-on programmers), or you can look at this introduction at DevX. You can also look out for some future postings I'll be running on the subject.

Please visit my blog at DevAuthority to leave comments!