Often times we need to deal with spans of time in SQL.  There are challenges to finding and removing overlapping spans.  In this article, I will share two of the methods I’ve used to deal with this problem.  I can’t take credit for initially creating them, but I’m happy to make your work easier by passing them on.

First, we need some tables and and a small set of data.  Below is a small script to create a table and insert two members with multiple overlapping spans.  Please also notice that MemberID 2 has a gap between 4/16/2015 and 4/30/2015.  When these spans are combined we should have one span for MemberID 1 and two spans for MemberID 2.

CREATE TABLE MembershipSpans(
MembershipSpanID INT IDENTITY(1,1) PRIMARY KEY,
MemberID int NOT NULL,
StartDate date NOT NULL,
EndDate DATE NOT NULL)
go

INSERT INTO dbo.MembershipSpans(MemberID, StartDate, EndDate)
VALUES
( 1, ‘1/1/2015’, ‘3/31/2015’),
( 1, ‘1/25/2015’, ‘4/15/2015’),
( 1, ‘3/10/2015’, ‘3/16/2015’),
( 1, ‘4/1/2015’, ‘6/30/2015’),
( 1, ‘5/1/2015′, ’12/31/2015’),

( 2, ‘2/1/2015’, ‘3/31/2015’),
( 2, ‘3/25/2015’, ‘4/15/2015’),
( 2, ‘2/10/2015’, ‘3/16/2015’),
( 2, ‘5/1/2015’, ‘6/30/2015’),
( 2, ‘6/1/2015′, ’12/31/2015’)

Method 1:  Simple but slow

This method creates a record for each day and then creates the spans based on continuous dates.  First, we need an easy way to create the daily records.  Here is a simple function that I found online to help.

CREATE Function [dbo].[fnDateTable]
(
@StartDate datetime,
@EndDate datetime,
@DayPart char(5) — support ‘day’,’month’,’year’,’hour’, default ‘day’
)
Returns @Result Table
([Date] date)
As
Begin
     Declare @CurrentDate datetime
     Set @CurrentDate=@StartDate
     While @CurrentDate<=@EndDate
     Begin
          Insert Into @Result Values (@CurrentDate)
          Select @CurrentDate=
               Case
               When @DayPart=’year’ Then DateAdd(yy,1,@CurrentDate)
               When @DayPart=’month’ Then DateAdd(mm,1,@CurrentDate)
               When @DayPart=’hour’ Then DateAdd(hh,1,@CurrentDate)
               Else
               DateAdd(dd,1,@CurrentDate)
               End
     End
Return
END

Next we create our daily records and group the consecutive days with a common “DateGroup”.  The “DateGroup” works by giving a ranking number to each record and then subtracting that number of days from the date on that member’s daily record.  This gives consecutive days the same “DateGroup”.

SELECT
     m.*,
     dt.[Date],
     DateGroup = DATEADD(day, -DENSE_RANK() OVER (PARTITION BY                                                            MemberID ORDER BY [Date] asc), [Date])
INTO #MemberDays
FROM
     MembershipSpans m
     INNER JOIN
     dbo.fnDateTable(‘1/1/2005′, ’12/31/2017’, ‘day’) dt
          ON
          dt.[Date] BETWEEN m.StartDate AND m.EndDate

Finally, we create spans by taking the minimum and maximum dates for each “DateGroup” per member.  I added an “seq” column just to easily see how many spans are in each member, but it is not necessary.

SELECT
     MemberID,
     MIN([Date]) StartDate,
     MAX([Date]) EndDate,
     seq = ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY                                                                           MIN([Date]) ASC)
FROM
     #MemberDays
GROUP BY
     MemberID,
     DateGroup

Here are the results.

As you can see, the resulting spans correctly represent each member’s time periods.  Also, notice the gap discussed earlier.

Method 2:  Complicated but fast

Method 2 uses multiple CTEs to create a table with a row for each “StartDate” and each “EndDate”.  It also adds a “DateGroup” column and ends by doing the same thing that method 1 did.  See the code below:

WITH C1 AS
(
     SELECT MemberID, StartDate AS ts, +1 AS type, NULL AS e,
                ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY                          StartDate, MemberID) AS s
FROM
     dbo.MembershipSpans

UNION ALL

     SELECT MemberID, EndDate AS ts, -1 AS type,
               ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY                            EndDate, MemberID) AS e,
     NULL AS s
FROM
     dbo.MembershipSpans
),
C2 AS
(
     SELECT C1.*, ROW_NUMBER() OVER(PARTITION by MemberID ORDER                                                   BY ts, type DESC, MemberID) AS se

FROM
     
C1
),
C3 AS
(
     
SELECT MemberID, ts, 
     
FLOOR((ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY                                                                            ts) – 1) / 2 + 1) AS DateGroup

FROM
     C2

WHERE
     COALESCE(s – (se – s) – 1, (se – e) – e) = 0

)
SELECT MemberID, MIN(ts) AS StartDate, max(ts) AS EndDate,
     seq = ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY                                                                             MIN(ts) ASC)

FROM
     C3
GROUP BY
     MemberID ,
     DateGroup;

Here are the results:

As you can see, the results are the same.  To keep this post brief, I’m not going into detail as to exactly how “DateGroup” column is generated or how the “Coalesce” statement in the “Where” clause works.  We’ll just call it crazy math (it really isn’t that crazy).  Feel free to discover how it works for yourself.

Both methods are accurate, so speed is what we care about.  For small sets of data, either method will be fast.  But what about larger data sets?  To test, I created a set of 5,000 members each with 20 spans on my development VM.  Some spans were overlapping and some were not.  Both returned the same results.  Method 1 took too long for me to wait probably due to the huge temp table.  Method 2 only took 364 milliseconds.  As you can see, except for a few special situations, you should pretty much always use Method 2.

In this post I demonstrated two ways to tackle a common SQL problem.  There are numerous other ways and I’ve tested many of them.  In the end, I almost always use Method 2 due it’s speed.  I hope this post helps to make you SQL work a little easier.  As always, feel free to comment or contact me with any questions.