Joe Wood's Blog

Technology

<November 2008>
SuMoTuWeThFrSa
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456


Navigation

Subscriptions



Saturday, January 08, 2005 - Posts

Datasets - The Good and the Bad

The DataSet gives the developer a lot of functionality : -  transactional modifications, simple ERD style design of application state, ad-hoc queries, temporary indexes and provides some good design practices.  Used properly the dataset encourages a developer to maintain the application state together in one place.  From an implementation perspective there are some trade offs.  These are deficiencies of the FX 1.1 implementation of DataSets and using XML Schemas with the typed dataset generator.  The specific issues that hampered a recent project of mine include:

  • Scalability - a Datatable can hold only one index - the primary index.  All other <xs:key> definitions in the XML Schema are ignored by the typed dataset generator.  Running Selects on the table will table scan the fields, but also create a temporary index based on the criteria that you supplied.  This means subsequent selects are very fast.  This is great and very flexible, and works well in most static data environments.  The problem is the fact that if the table is modified or added to then all the temporary indexes are removed and you start again from scratch. 
    This works fine if you are working with a database, reading once into the database, calculating your results then writing the results.  This accounts for most of the Dataset uses, but in real-time, low-latency applications this isn't acceptable.  If you had a table of 30000 rows changing in real-time then querying the table on anything other than the primary key becomes impossibly slow.
  • Multi-threaded access - the dataset is not multi-threaded.  The DataTable, DataRow and DataSet all need to be wrapped up with thread safe functions if multi-threaded access is required.  It's a shame the designers didn't provide a .Synchronized static method to return a wrapper automatically.
  • Missing Enumerated Types from the typed DataSet - these are just implemented as strings.  It would be nice if the values were constrained and exposed as a proper enumerated type by the typed wrapper.  There are no runtime constraints for this at all.
  • Missing Inheritance - complex type extensions are ignored in the generated code.  It would be nice to map them to proper wrapped classes mirroring the derived types in the schema.  So type B inheriting from type A in the schema would have a generated B and A class with the same relationship.
  • Missing One to one mappings - there may be work arounds, but why can't the typed dataset generator generate the relationship properties for one-to-one relationships.  If an element in the xml schema has a child element where maxOccurs=1 then the typed dataset generator shouldn't generate a method returning an array of these types.  There will never be more than one - it should know this and modify the code accordingly.
  • Missing Comments - why can't the Typed Dataset Generator take the <xs:annotation> XML tags from the element definitions and generate XML-DOC style comments.  The reason it doesn't is the cause of a lot of warnings from the compiler when you're trying to generate your documentation (missing documentation warnings).

Workarounds do exist for all these shortcomings.  The Typed Dataset Generator itself is just an API call, the generated codedom structure can be modified and enriched by referring back to the original schema. My point here is not to complain about the DataSet class - more to emphasize that this approach is actually a massive step forward. 

The MVC design pattern (and here) and its derivatives combine other design patterns.  General critique of design patterns (from the Wikipedia page) states that design patterns themselves encourage a navigational style design rather than relational.  Using the DataSet as the model in the MVC design pattern addresses this nicely.  We now have relational style data modeling behind one of the most powerful design patterns.

The data model is central to the application and forcing the developer to explicitly design this up-front, just like a database is a huge step in the right direction.  After all, state in any application is a replication of state elsewhere - maybe in a database, in a file, on a form or even in the user's head.  Data consistency, integrity and correctness is fundamentally important to the process of the application. How we model this state dictates a large portion of the interfaces between logical units of an application.  Exposing this model in such an explicit way allows us to compare data models, map data structures and integrate functionality. 

XML Schema isn't the perfect match for this modeling, but it buys in a lot of investment towards web service style SOA.  The DataSet class in .NET is where these two worlds meet, being populated by relational data but potentially described by message/document designs.  It will be interesting to watch how ADO.NET evolves in the future.

posted Saturday, January 08, 2005 8:22 PM by joewood with 1 Comments




Powered by Dot Net Junkies, by Telligent Systems