posted on Monday, May 30, 2005 11:51 AM by johnwood

Ramblings on Data Layers and Relational Database Shortcomings

Sahil recently posted his thoughts on web services and enterprise system design, and in his usual blunt (yet effective and entertaining) manner questioned the need for business layers that simply delegate to a data layer without any extra work. It got me thinking. I get asked this question too, and my usual answer is usually three fold:

1. "because you'll need to isolate the specifics of the database when you come to change the data model later on" -- I mean everyone has to change their data model at some point, just because the next version needs some new features that require table changes. It's to save time in the long run.

2. In many cases your most important asset is your business logic. You don't want your business logic written in terms of the specifics of your database, because that reduces the value of your business logic and makes it harder to be maintained by the people who know more about the business than the database.

3. Commingling data access code and business logic also makes it harder to have your business logic work on more than one data source, or new data sources in the future. The general rule of thumb is that the data layer *finds* the data, and the business layer *uses* the data. Maybe you'll need to integrate your application with another that passes data in through XML files, how will your SQL-ridden business layer handle that? Maintaining the separation between the data finder and the data consumer ensures your business logic remains agile enough to work with multiple data sources.

Of course for the first point you could say the same thing about the business layer. This too will change going forward and you'll still have to change method parameters, objects etc. for the next version.

The second point deserves some discussion as it makes more sense. But I think it underlines one very simple fact: Databases are too hard. You don't want your business logic and data manipulation code together because they're seemingly incompatible. The data access code isn't just accessing data, it's decoding data. The data layer tries to make sense of the data in the database - but if the database could model data the way we use it, life wouldn't be so complicated in the first place!

Relational databases are very powerful, but they're also very distant from how we interact with that data once we get hold of it. Relational databases were first developed in the 70s, and were in use long before modern object orientated programming languages, and it shows. In fact the funny thing is, commercial relational databases today aren't really all that relational at all -- it's not like relations are a central feature of relational databases - the ability to relate data is central, but not the relationship itself. When I approach the data I'm often faced with having to find the views that give me the model I'm looking for, or recreate the relationships all over again in a new select statement. The list of tables resembles a flat-file. There's no central relational data model as your first point of access.

Also in the majority of cases (in databases in which I've worked on), there is only one primary relationship between two entities. By primary I mean the relationship you would use 90% of the time. For example, tasks and sub-tasks. Folders and files. Projects and tasks. Cars and parts. Patients and their medical history records. Customers and orders. Stocks and their historic prices. Sure you will likely need to relate this data differently at some point, but wouldn't it be useful to at least be given the primary relationships for free without having to do any joins at all? The way relational databases work today, you really have to dig around to find those relationships - even when they're obvious.

The primary relationship also helps identify the meaning of the data. For example, an 'orders' table doesn't tell you much other than the fact it probably contains orders for something. Opening up the table and seeing all those foreign keys probably won't help you much either. But when you can see a primary relationship between customers and orders, such that when you look at a customer it can take you straight to an order, it tells you that these are orders placed by your customers, as opposed to purchase req's for example. It adds meaning to the table that isn't otherwise obvious.

I think it's these failings of relational databases that are central to how unmanageable databases can become, and how complex data layers can develop.

Hierarchical data is a great way to present a primary relationship between entities. Hierarchies provide a very clear relationship -- child and parent -- between the different entities in your database. Relational databases don't make it easy to view your data hierarchically. I think this is why XML databases are coming into fashion. XML databases already have a hierarchical structure in place, and the hierarchy represents the primary relationship between entities. All major relational database vendors seem to be rushing to integrate both XML and semi-structured data access into their databases.

Another good example of hierarchical data is an object database. Object databases usually just persist the data in your objects transparently. Have you ever heard of a data layer in a system using an object database? You don't really need it because the data representation on disk is transparent: the data *is* the business objects. In object databases you maintain a separation between the specific data source and the business logic through interfaces, and this is usually enough of a separation to not require a data layer.

Exposing primary relationships is one of the goals of a good data layer. Yet no matter how good the data layer is, someone usually ends up bypassing it, especially in smaller shops where there's no real DBA. Reports are a classic example of this. How many times have systems spent months designing a great data layer and business layer and then developed their reports as stored procedures and crystal reports files? It's the easiest option and I see it all the time.

Yet if the database was structured and exposed in a more meaningful way not to require a data layer, directly accessing the data might not be such a bad thing. If the query language could return data in a format that fits in with modern programming language structures, then having the business layer talk straight to the query language might not be such a bad thing.

XML databases, XML access layers and the XML query language, XQuery, hold a lot of promise for revolutionizing the way we interact with persistent data. SQLXML, SQL Server 2005's XML integration, Oracle's 10g v2 and IBM's Viper efforts to expose relational data as hierarchical XML are great steps towards solving these problems and taking the focus away from complex data layers.

This brings me on to the third point -- data source dependency. Common wisdom is that your business logic should never request the data itself, it should always be provided the data, so as to remove any dependency on that particular source. Once it has the data, it doesn't matter where it came from so long as it's in a format it can understand -- be that a dataset, collection, XML or whatever. This is the foundation of what's become known as "contract first" programming.

A part of this problem is rooted in the fact that database query languages are so different to one another. Once a query language is adopted, you're bound to a particular data platform. Business logic uses OOP techniques to query data in an object model, the data layer uses SQL to query data in the database, and the user interface calls the business logic methods to query data. If they all used the same query language, we might not need quite so much separation and layering.

In fact, with the same query language, the only extra layer we would require would be a translator that can translate one model to another. And this translation could be accomplished with a declarative language like XSLT, not a general purpose programming language. If the data source was configurable externally and used the same query language, then switching data sources or models wouldn't require any code changes regardless of how many layers there were.

Perhaps W3C's XQuery or language innovations such as Microsoft Research's C-Omega project hold the key to solving a part of this problem, providing one ubiquitous query syntax for all data sources and all layers. Why should my UI be limited to a very restrictive business logic API for querying data? Why shouldn't it be able to run a query on the business layer? After all, objects are data too.

Anyway these are all just ongoing thoughts and a work-in-progress, as such there's probably lots of holes in my reasoning. If you have anything to add on the subject I'd be more than open to hearing your opinions.

Comments