Grouping by Dates in T-SQL
Extracting useful data from T-SQL on the fly can be challenging. Writing GROUP BY statements can bend your mind and make life difficult. Like any good SQL programmer I’ve built up a tool chest of SQL queries over the years that serve me well in the field. I’ve had to sort, aggregate, count, average, and pivot data a countless number of times and reaching for the T-SQL documentation becomes a tedious task. One of my favorite queries, and one of the ones I see beginner SQL programmers struggle with most often, certainly has to be grouping a series of discrete, time-stamped events by days and pulling useful information from it. To this end I present my implementation of the T-SQL group-by day and month query.
GroupEventsByDate.sql
SELECT
DatePart(month,[Date]) as lMonth,
DatePart(day,[Date]) as lDay,
Count(*) as lCount
FROM [MY].[dbo].[UsageLog]
Where [Date] > '12/31/2009'
Group By DatePart(month,[Date]), DatePart(day,[Date])
Order By lMonth Desc, lDay Desc
GO
Now, for some explanation
I’ve seen many SQLers try to create a string concatenated field from the day and month as a new column. This works but has both poor performance and making sorting nearly impossible. The solution is to GROUP BY multiple columns and select each one as a separate column. This allows us to do a traditional sort with days and months appearing in the right order. The current implementation is shorthanded, I use it to look at data over the past 12 months most often so I do not select the year as a unique identifier, however it should be clear how to accomplish this.
The final question you might have is how to include days which there was no official data for. T-SQL doesn’t offer us a native way to do this so we must resort to a sort of hack. Here’s the procedure you’d follow:
- Create a time-dimension table and insert all the dates from the starting date to the current (and beyond). This is best accomplished via a script in your favorite language or this particularity nasty SQL hack, documented very well here.
- RIGHT JOIN against the newly created time-dimension table with a GROUP BY clause to ensure all those 0 values are properly displayed
This method works great when the missing dates are mission critical and can’t be added by a scripting language post-humorously.
