DataRowCollection, Red-Black Tree and a simple explanation.
Last week I posted about
the DataRowCollection and the arraylist that supports it was replaced in Beta2. At that time I didn't know what the implementation was going to be. Turns out, an internal implementation of a red-black tree is the current method used to replace the arraylist. I was also told this could change at any time.
For those of you who don't know, a red-black tree is a type of binary tree with a height of no more than 2log(n+1). Standard operations take O log(n) time. Each node of the tree contains fields for color, key, parent, left, right and rank. The biggest rank of a tree occurs at the root and goes down to zero, so the rank is the height of a node. Two consecutive nodes differ in rank by either zero or one. Every leaf node is black (also a NULL pointer). If a node is red, both its children must be black. Most imporantly, to ensure the balance of the tree, each path from the root node to a leaf contains the exact number of black nodes.
A red-black tree has great performance, and will not suffer the linear performance degredation that a standard binary tree will encounter, due to the number of elements in the standard binary tree. Red-black trees are always balanced, so searching, inserting and deleting all are guaranteed to be O log(n). Re-balancing a tree can be a bit costly, but that still occurs in O log(n) time.
So what does this mean? It means the performance of deletions in a DataRowCollection will be substantially faster, if the DataWorks team continues to use their internal implementation of the red-black tree instead of an arraylist.