March 2004 - Posts

Measuring Performance

In the next few weeks I will be presenting a number of sessions on code optimization in the .NET environment. And one of the key things I will be iterating is the selection of algorithms and of course benchmarking before performing optimizations. So here is what I will be saying on one of my favorite topics ‘Algorithm Selection’.

When evaluating the efficiency of an algorithm, there are a number of factors that affect the timings obtained for an iteration of the algorithm. For this reason the efficiency of an algorithm is not measured in execution time, but on an equation that is a function of the size of the data processed by the algorithm. For example, when searching a document for the occurrences of a specific word, one algorithm would be to search character for character until there is a match on the first letter. Once the first letter is found you can proceed to compare subsequent characters against the remainder of the search string to determine if there is a word match. From this we see that if the document has N characters the algorithm execution time is a function of N. This relationship is represented as O(N). Obviously there are other factors that will affect the execution time, for example the more matches there are of the first letter, the more often the algorithm will perform an alternate sequence of tests. These factors are a function of the nature of the input data and not the algorithm. For that reason are ignored when determining the relationship between the algorithm and the input data, focusing rather on the size of the input data. This approach relates the efficiency of an algorithm in the general case. In the case where the worst-case scenario is likely more prevalent it is worth understanding how the algorithm performs in this case and deciding if this is the best algorithm for the expected data.

As another example, the well-known bubble sort routine can be evaluated. The bubble sort routine iterates the data once for each item in the data, comparing and swapping neighboring items if required. For each iteration one less element is evaluated. The bounding function here is the comparison that is performed for each item. From this we can see that the bubble sort requires N(N-1)/2 or (N2-N)/2 comparisons, where N is the number of items to be sorted. This gives us what is known as the performance equation O((N2-N)/2). Using the performance equation we can calculate that for 10 items 45 comparisons are performed and 100 items will require 4950 comparisons. Looking at the growth in the number of comparisons as the number of items increases clearly the bubble sort is useless for large datasets. Typically algorithms whose performance is N2 or worse are considered unusable. By convention O notation does not place significance on any constants other than those that affect the order of the equation, for this reason the standard representation of the bubble sort algorithm would be O(N2-N). While this does not provide the true number of comparisons performed it clearly conveys the order of the algorithm performance equation, which is what has the greatest impact on performance. I like to call this a performance factor or indicator.

To contrast the above with something more usable, we will look at the performance of the ever-popular quick sort algorithm. In the general case the quick sort has a performance factor of O(N log2 N) for the general case and a worst case performance of O(N2) if the dataset is already in sorted order. Fortunately most modern quick sort implementations guard against the worst cast scenario and reduce the likely hood of the algorithm degenerating into a O(N2) function.

When comparing two algorithms there performance equations are not the only factor to evaluate. While two algorithms might have the same order, it is possible that the work performed by one algorithm is more intensive than the other. In this case two algorithms might have the same order, but one is less process intensive than the other.

Open Source saves my day

This weekend was the big re-install. Among other things, my aging FreeBSD server was replaced by a Windows server which took me deeper into the world of Windows DNS/DHCP and Routing. For the time being I am left without a FreeBSD machine, and that left me without a HTTP proxy. Unfortunately I do not have ISA server, a quick Google and I found a native port of Squid for NT4/2000/XP/2003. You just got to love the open source community. I wish I had the time/courage to contribute in a significant way. Now my next step is to install a Linux box to further explore MONO and dotGNU.

P/Invoke Definitions and Samples

I have been thinking about starting an on-line reference for P/Invoke definitions. Each definition will be accompanied by an example and the .NET managed equivalent if available. Before undertaking this task, I wondered if anyone out there already had something similar. I know that this information is available, but I am not aware of a single consolidated source. A project like this would rely on the inputs of the community to help insure the accuracy and quality of the information provided. Any ideas or suggestions would be appreciated.

Binary data from a structure

As a followup to my previous article Structure from binary data, this article looks at a few of the techniques a .NET developer can use to convert from a structure to a byte array. Please take a look at the artile in my Articles links under the title Binary data from a Structure.

Thumbs up for Oracle Query optimizer

While installing Oracle on my notebook for a .NET project, it occurred to me to perform the same series of test I did for my earlier post on optimizing database queries. Wow, was I surprised, it appears that the Oracle query optimizer is automatically performing the transformation of the disjunctive normal form to conjunctive normal form. This is a real performance gain while maintaining the simplicity of the query.

Of course I am by no means an Oracle expert so any comments or opinions would be greatly appreciated. Does anyone have access to Yukon, I would really like to know how they are dealing with this.

Re-throw InnerException and preserve the original StackTrace

One of the problems when throwing a new exception from with-in a catch handler is that the stack trace information captured for the initial exception is lost and a new stack trace is built from the point of the throw. Of course when throwing the new exception all that is required to circumvent this problem is to set the inner exception of the new exception to the initial exception that was caught by the catch handler.

However, on occasion, you want to throw the InnerException it self. Recently on the SADeveloper forum exactly this scenario arose. A poster on the forum was using reflection to invoke a method. The poster realized that exceptions thrown by methods invoked through reflection were wrapped in the InnerException of a TargetInvocationException. Now what the poster wanted to do was to catch the TargetInvocationException and then re-throw the InnerException. I assume the reason for this was that the caller of the method, which performed the reflection invoke, was not concerned with the current implementation detail and was only required to deal with the original exceptions. The only problem with this solution is that once the InnerException is rethrown, the stack trace no longer contained the original stack trace information.

Naturally curiosity got the better of me and I set out to see if I could find a solution. After all, this is what I enjoy most about my free time (between 2-3 in the morning), I get to do things I enjoy and that is learn how the internals work. On this occasion, I got lucky and the first idea I had workout for me. I new that the remoting infrastructure in fact did something very similar. When an exception is throw from the remoting server, the server side stack trace is preserved on the client side. This is done by setting the private member _remoteStackTraceString to the server side stack trace. Then when the StackTrace property of the exception is called, the _remoteStackTraceString is concatenated with the _stackTraceString and returned to the caller. So I figured that all that was required was the set the _remoteStackTraceString of the InnerException to the current StackTrace of the InnerException. Then when the InnerException is re thrown, the stack trace is replaced with the new stack trace, but the _remoteStackTraceString is preserved. And the end result is a complete stack trace as if the original exception being throw never even saw a TargetInvocationException. So the question is how do we go about manipulating the private _remoteStackTraceString, well you guessed it, reflection.

As usual, before I present the code for this I have to say, just because something works does not mean it is the right thing to do. But is was Fun... And now I know a little more about how the remoting mechanism and exceptions fit together.

public void test1()
{
 
// Throw an exception for testing purposes
 
throw new ArgumentException( "test1" ); 
}

void test2()
{
 
try
 

    MethodInfo mi =
typeof(Form1).GetMethod( "test1" );
    mi.Invoke(
this, null );
  }
 
catch( TargetInvocationException tiex )
  {
   
// NB: Error checking etc. excluded
   
// Get the _remoteStackTraceString of the Exception class
   
FieldInfo remoteStackTraceString = typeof(Exception).GetField("_remoteStackTraceString",
    BindingFlags.Instance | BindingFlags.NonPublic );
   
   
// Set the InnerException._remoteStackTraceString to the current InnerException.StackTrace
   
remoteStackTraceString.SetValue( tiex.InnerException,
    tiex.InnerException.StackTrace + Environment.NewLine );
   
   
// Throw the new exception
   
throw tiex.InnerException;
  }
}

...

// Test the effect of the above code by calling test2() and handling
// the exception that is thrown
try
{
  test2();
// Call the method that uses reflection to call another method
}
catch ( Exception ex )
{
  Console.WriteLine( ex.ToString() );
}

Thanks to my online buddy MaLio (I think this guy would have some interesting blogs!) for pointing me to the original question.