Wednesday, October 19, 2005 - Posts

October SQL Server User Group Melbourne Presentation - "Discover the Power of Row Numbers (Itzik Ben-Gan)"

Last Tuesday Itzik Ben-Gan returned to the Melbourne SQL Server User Group meeting, to talk about row numbers in SQL 2000 as well as SQL 2005. I missed Itzik's presentation earlier this year (fool that I am), but everyone I spoke to about it gave Itzik a big rap. And after hearing him on Tuesday, he certainly deserves lots of credit for his inventive problem-solving and effective teaching style.

Itzik was *so* smart that I had a brain mini-meltdown at around 7:00 trying to work out an answer for a problem he posed (a "why won't this work"-type question). It's not that Itzik strayed up into the mental stratosphere, it's just that even though I could see the results with my own eyes I had difficulty actually believing them (and I couldn't get the answer to his question...time to go back to the books!)

Itzik started by talking about what row numbers are - sequential numbers based on a desired order. I'm sure if you have worked with SQL Server for any length of time the concept will be familiar to you (like it is to me); however Itzik provided a lot more useful information, stuff I had never even considered. He introduced the concept of a "tie-breaker" that would help decide how row numbers are allocated if the ordering field has the same value for two records. He also specified that any solution must meet three simple criteria - it must work, it must work fast and it must be simple.

Itzik kept coming back to those three criteria, both from a benchmark standpoint (how many page reads, etc.) and developer view (how easy is this code to understand?), which was very helpful. He talked about the numerous ways to calculate row numbers today: on the client, or in SQL Server 2000 with sets, cursors, or creating a table with an identity column, and also the answer to the problem in SQL Server 2005: ROW_NUMBER().

I won't go into the timing results and his explanation for each of these techniques (the slides will soon be available on sqlserver.org.au), but I will say that the implementation of SQL Server 2005's ROW_NUMBER() function seems like a high-performance way to get row numbers. It was at this point in the meeting that Itzik posed a question on how long a certain set-based row numbering method would take with 10 million rows, given that the cost is ((n + n²) / 2), and my friend Nirav got closest with his guess of 2 years! Mental note to self: don't to use that method.

Itzik also talked about how ORDER BY returns a cursor and not a set (which is, by definition, not ordered) and how the TOP function in SQL Server 2000 affects this. Then, he went on to discuss some practical applications with row numbers in SQL Server 2005 in paging, getting the top N for a group, trends (matching a current row to the one immediately before), islands, finding the median, deleting duplicate rows and recursive hierarchical data.

These demonstrations using T-SQL were the real highlight of the meeting for me. As previously mentioned, my brain overload came at the point Itzik discussed data islands. Using row numbers, he identified a way to group dates together and find the holes in one SQL Statement! Itzik's coverage of finding the median was also amazing, and finally the recursive Common Table Expressions looked really cool as well (my notes for this part of the evening are summed up with the word "wow").

Greg Linwood arranged for a web cast of the evening, so I'll be interested in seeing how that turned out. It was certainly worth taping to catch some of the insights Itzik gave during his explanations. I had just been asked in the past two weeks how I would find the median and how I would delete duplicate rows - and even though we don't use SQL Server 2005 here yet, I went away with a neat method that worked, worked fast and was simple for both of those two questions.

I just want to say hello to Darren who came up and introduced himself to me after he recognised me from my blog! Thanks Darren, it was great to meet you and now I know who you are when I see your blog postings or comments on the ausdev mailing list.