Moving Forward

Homepage of Andrew Robinson

Grouping by Dates in T-SQL

without comments

Extracting useful data from T-SQL on the fly can be challenging. Writing GROUP BY statements can bend your mind and make life difficult. Like any good SQL programmer I’ve built up a tool chest of SQL queries over the years that serve me well in the field. I’ve had to sort, aggregate, count, average, and pivot data a countless number of times and reaching for the T-SQL documentation becomes a tedious task. One of my favorite queries, and one of the ones I see beginner SQL programmers struggle with most often, certainly has to be grouping a series of discrete, time-stamped events by days and pulling useful information from it. To this end I present my implementation of the T-SQL group-by day and month query. 

GroupEventsByDate.sql

SELECT
	DatePart(month,[Date]) as lMonth,
	DatePart(day,[Date]) as lDay,
	Count(*) as lCount
    FROM [MY].[dbo].[UsageLog]
    Where [Date] > '12/31/2009'
    Group By DatePart(month,[Date]), DatePart(day,[Date])
    Order By lMonth Desc, lDay Desc
GO

Now, for some explanation

I’ve seen many SQLers try to create a string concatenated field from the day and month as a new column. This works but has both poor performance and making sorting nearly impossible. The solution is to GROUP BY multiple columns and select each one as a separate column. This allows us to do a traditional sort with days and months appearing in the right order. The current implementation is shorthanded, I use it to look at data over the past 12 months most often so I do not select the year as a unique identifier, however it should be clear how to accomplish this.

The final question you might have is how to include days which there was no official data for. T-SQL doesn’t offer us a native way to do this so we must resort to a sort of hack. Here’s the procedure you’d follow:

  1. Create a time-dimension table and insert all the dates from the starting date to the current (and beyond). This is best accomplished via a script in your favorite language or this particularity nasty SQL hack, documented very well here.
  2. RIGHT JOIN against the newly created time-dimension table with a GROUP BY clause to ensure all those 0 values are properly displayed

This method works great when the missing dates are mission critical and can’t be added by a scripting language post-humorously.

Written by Andrew Robinson

June 2nd, 2010 at 12:56 am

Posted in Uncategorized

Leave a Reply