October 2005 - Posts

OT: vb:feeds

I wouldn't normally recommend subscribing to an aggregate feed, but I'm giving vbfeeds a try (via Bill Vaughn, a long time ago). The site states, on the "About" page:
vb:feeds is personal project by Serge Baranovsky. The purpose of this web site is to aggregate Visual Basic/.NET related posts on the Net while keeping unrelated posts filtered out. This site aggregates Visual Basic/.NET related web feeds from blogs, forums and other Visual Basic sites. Most technical weblogs have personal and social aspect to them. All posts collected on vb:feeds go through a review process before appearing on the site and its web feeds so only qualified posts are published here.
So far (1 week) and the site has lived up to its promise. Oh, and my name is on the site too!

OT: In a moment of anger...

...I broke my mouse. It happened when my boss and another staff member were standing in my office, and while they were talking I took a quick moment to recompile and run something - when that something didn't run, I slammed the mouse down in (mock?) frustration, and it broke.

In my defense, I was only mucking around! Normally I wouldn't so such a stupid thing.

But I did it nonetheless, and I'm particularly embarrassed. My boss is OK with it, and he's asked me to order a new mouse (I've gone back to using an old one), so the hunt for The New Mouse is on.

Crosstab Queries in SQL Server 2000

I have had to produce crosstab/pivot queries from SQL Server 2000 before, and it ain't pretty. Typically the data is stored in rows, and the data in one field needs to be turned into a column header. To cut straight to the chase, here's an article and stored procedure on SearchSQLServer.com that will do exactly that: A simple way to perform crosstab operations By Brian Walker

Brilliant! One note: you don't have to put this stored proc in master, it can go in any database.

Keep reading to find my slower, non-dynamic SQL way to do it. Here's some psuedo-code for when you know the columns that will be produced (an example is months of the year, although in this fake code I'm only producing two "pivot" columns):

--this is fake code, it does not work!
SELECT BaseColumn, SUM(X), SUM(Y)
FROM (
      SELECT BaseColumn, 
             IF ColumnHeader = x THEN FigureToBePivoted AS X, 
             IF ColumnHeader = y THEN FigureToBePivoted AS Y
      FROM (
            SELECT BaseColumn, ColumnHeader, FigureToBePivoted
           )
     )
GROUP BY BaseColumn

And here's an example that works in Northwind, with months of the year across the top:

--the outer-most query SUMs the OrderTotal for each month and groups by the ProductName
--so that a product appears once, with all its OrderTotal dollar values in month columns.
--Note there will be NULLs returned where a ProductName had no orders in a month
SELECT  X.ProductName, 
        SUM(X.[01_Raw]) AS [Jan], SUM(X.[02_Raw]) AS [Feb], SUM(X.[03_Raw]) AS [Mar], 
        SUM(X.[04_Raw]) AS [Apr], SUM(X.[05_Raw]) AS [May], SUM(X.[06_Raw]) AS [Jun], 
        SUM(X.[07_Raw]) AS [Jul], SUM(X.[08_Raw]) AS [Aug], SUM(X.[09_Raw]) AS [Sep], 
        SUM(X.[10_Raw]) AS [Oct], SUM(X.[11_Raw]) AS [Nov], SUM(X.[12_Raw]) AS [Dec] 
FROM    (
         --this query splits each OrderMonth to its own column. However after the result of this we
         --end up with staggered results where no two months appear on the same line, so we need to 
         --SUM the results at a later step
         SELECT  I.ProductName, 
                 CASE WHEN I.OrderMonth = 1  THEN I.OrderTotal ELSE NULL END AS [01_Raw], 
                 CASE WHEN I.OrderMonth = 2  THEN I.OrderTotal ELSE NULL END AS [02_Raw], 
                 CASE WHEN I.OrderMonth = 3  THEN I.OrderTotal ELSE NULL END AS [03_Raw], 
                 CASE WHEN I.OrderMonth = 4  THEN I.OrderTotal ELSE NULL END AS [04_Raw], 
                 CASE WHEN I.OrderMonth = 5  THEN I.OrderTotal ELSE NULL END AS [05_Raw], 
                 CASE WHEN I.OrderMonth = 6  THEN I.OrderTotal ELSE NULL END AS [06_Raw], 
                 CASE WHEN I.OrderMonth = 7  THEN I.OrderTotal ELSE NULL END AS [07_Raw], 
                 CASE WHEN I.OrderMonth = 8  THEN I.OrderTotal ELSE NULL END AS [08_Raw], 
                 CASE WHEN I.OrderMonth = 9  THEN I.OrderTotal ELSE NULL END AS [09_Raw], 
                 CASE WHEN I.OrderMonth = 10 THEN I.OrderTotal ELSE NULL END AS [10_Raw], 
                 CASE WHEN I.OrderMonth = 11 THEN I.OrderTotal ELSE NULL END AS [11_Raw], 
                 CASE WHEN I.OrderMonth = 12 THEN I.OrderTotal ELSE NULL END AS [12_Raw] 
         FROM    (
                  --this inner query returns the column(s) we want to return results for (ProductName), 
                  --the data (OrderTotal), and the column headings (OrderMonth)
                  SELECT  Products.ProductName, 
                          MONTH(Orders.OrderDate) AS OrderMonth, 
                          [Order Details].UnitPrice * [Order Details].Quantity AS OrderTotal 
                  FROM    dbo.[Order Details] INNER JOIN 
                              dbo.Orders ON [Order Details].OrderID = Orders.OrderID INNER JOIN 
                              dbo.Products ON [Order Details].ProductID = Products.ProductID 
                  WHERE   YEAR(Orders.OrderDate) = 1997
                 ) I
        ) X
GROUP BY X.ProductName
ORDER BY X.ProductName

You can see that this is a lot of code to be copy-and-pasted, which allows much room for error. It follows my approach of basic, blunt, step-by-step SQL. One advantage of this method (there has to be one good thing, right?) is that you can change the inner-most query to return a different figure, and as long as it's aliased to (in this case) "OrderTotal" you'll still get valid results from the outer queries...however, if you want to add an extra column, you need to propogate that all the way from the inner-most to the outer-most query. I find that if I need extra reference columns, I put the results from the pivot into an @temp table variable and join them later, rather than having a lot of non-pivot columns being mucked around with inside the multi-level select.

There are other alternatives, most of which I've considered and had to discard: Excel Pivot Tables, Analysis Services, and Reporting Services Matrix control.

My ASP.NET 2.0 Project is now underway

I finally started on my new project using ASP.NET 2.0 and Beta 2 of Visual Studio 2005. I'm using an older Pentium 3/512MB RAM machine that I had sitting around which has performed adequately, but I literally can't wait to use Visual Studio - a known memory hog - on my main development machine, a Pentium 4 2.8GHz Hyper Threading machine with 1GB of RAM.

Beta 2 has worked OK for me, and depending on what I'm working on I experience 1 or 2 "The Visual Basic compiler needs to restart - no work has been lost"-type messages a day. Sometimes it's easier if I restart Visual Studio, and thankfully I have never lost any work while this is happening, which is A Good Thing. The number of restarts goes up if I work in the App_Code folder for some reason, although in fairness it might be my setup or even the RAM which is impacting this.

Visual Studio 2005 has a lot of out-of-the box components and a number of "themes" to aid in a modern-looking user interface, yet a strange pull from my old ASP-classic days steers me towards CSS and hand-crafted HTML...so I've been looking around at free ways I can dress up the UI, especially knowing that I am not a graphics designer. I am considering whether I should recommend to my boss to purchase an interface library like ComponentArt or Infragistics, but in the meantime I have to make do with free stuff.

One or two things I want to bookmark for later on:

October SQL Server User Group Melbourne Presentation - "Discover the Power of Row Numbers (Itzik Ben-Gan)"

Last Tuesday Itzik Ben-Gan returned to the Melbourne SQL Server User Group meeting, to talk about row numbers in SQL 2000 as well as SQL 2005. I missed Itzik's presentation earlier this year (fool that I am), but everyone I spoke to about it gave Itzik a big rap. And after hearing him on Tuesday, he certainly deserves lots of credit for his inventive problem-solving and effective teaching style.

Itzik was *so* smart that I had a brain mini-meltdown at around 7:00 trying to work out an answer for a problem he posed (a "why won't this work"-type question). It's not that Itzik strayed up into the mental stratosphere, it's just that even though I could see the results with my own eyes I had difficulty actually believing them (and I couldn't get the answer to his question...time to go back to the books!)

Itzik started by talking about what row numbers are - sequential numbers based on a desired order. I'm sure if you have worked with SQL Server for any length of time the concept will be familiar to you (like it is to me); however Itzik provided a lot more useful information, stuff I had never even considered. He introduced the concept of a "tie-breaker" that would help decide how row numbers are allocated if the ordering field has the same value for two records. He also specified that any solution must meet three simple criteria - it must work, it must work fast and it must be simple.

Itzik kept coming back to those three criteria, both from a benchmark standpoint (how many page reads, etc.) and developer view (how easy is this code to understand?), which was very helpful. He talked about the numerous ways to calculate row numbers today: on the client, or in SQL Server 2000 with sets, cursors, or creating a table with an identity column, and also the answer to the problem in SQL Server 2005: ROW_NUMBER().

I won't go into the timing results and his explanation for each of these techniques (the slides will soon be available on sqlserver.org.au), but I will say that the implementation of SQL Server 2005's ROW_NUMBER() function seems like a high-performance way to get row numbers. It was at this point in the meeting that Itzik posed a question on how long a certain set-based row numbering method would take with 10 million rows, given that the cost is ((n + n²) / 2), and my friend Nirav got closest with his guess of 2 years! Mental note to self: don't to use that method.

Itzik also talked about how ORDER BY returns a cursor and not a set (which is, by definition, not ordered) and how the TOP function in SQL Server 2000 affects this. Then, he went on to discuss some practical applications with row numbers in SQL Server 2005 in paging, getting the top N for a group, trends (matching a current row to the one immediately before), islands, finding the median, deleting duplicate rows and recursive hierarchical data.

These demonstrations using T-SQL were the real highlight of the meeting for me. As previously mentioned, my brain overload came at the point Itzik discussed data islands. Using row numbers, he identified a way to group dates together and find the holes in one SQL Statement! Itzik's coverage of finding the median was also amazing, and finally the recursive Common Table Expressions looked really cool as well (my notes for this part of the evening are summed up with the word "wow").

Greg Linwood arranged for a web cast of the evening, so I'll be interested in seeing how that turned out. It was certainly worth taping to catch some of the insights Itzik gave during his explanations. I had just been asked in the past two weeks how I would find the median and how I would delete duplicate rows - and even though we don't use SQL Server 2005 here yet, I went away with a neat method that worked, worked fast and was simple for both of those two questions.

I just want to say hello to Darren who came up and introduced himself to me after he recognised me from my blog! Thanks Darren, it was great to meet you and now I know who you are when I see your blog postings or comments on the ausdev mailing list.

PDC 2005 PowerPoint Slides

Miguel points out that the PDC 2005 PowerPoint sessions can be downloaded from http://commnet.microsoftpdc.com/content/downloads.aspx.

I'm going to troll through them and pick a couple of interesting ones to look at.