SQL Queries for Overlapping Time Periods on SQL Server
SQL programmers frequently work with datetime fields identifying a time period's start time or end time. Overlapping of two processes or two different periods is one of the most common problems for database developer. In this SQL tutorial, I want to show how database developers can deal with overlapping of time periods identified by a start time and an end time.
Let's start by answering this fundamental question. How can we define overlapping periods? Or when the database developer can say that given two time periods are overlapping?
Though this is a very basic question and seems to be easy yo answer, most of the errors solving such difficult overlapping time problems are caused of missing correct approach to the issue.
Overlapping Time Periods Characteristics
Take some time on thinking the prerequisites for two different time periods to be assumed as overlapping time periods.
First of all, a period must a beginning aka begindate or startdate, etc. Also time periods have an end time or date, like enddate or finishdate, etc.
Basically, a period can be represented by a line fragment on time axis which has two boundaries; starttime and endtime.
To claim two time periods to be overlapping, they must have common datetime values which is between lower and upper limits of both periods.
To make overlapping time periods be easily understandable, I want to visualize the cases where we can say two periods are overlapping.
Let me draw the first case where two periods overlap each other.
Another sample case is just the opposite case where the red and black periods are changed.
Of course, as a programmer or SQL developer, if you miss following cases then your solution will produce wrong results.
This case is where one of the time intervals covers or contains the other time interval totally differing from the above cases where the time coverage is only partially.
Of course there is just the opposite case of the above one.
I guess with this fourth case, we have completed visually drawing of all overlapping time period cases
Sample Database Table for Storing Time Periods
Let's now create a database table to store our test data so we can build our sql query to identify if these two data rows contain two time periods with overlapping interval data.
In fact, in most cases data modellers create the ID field as auto increment or identity column.
And for a better data model, task field should be defined as NOT NULL, just like the startData and endDate fields.
For simplicity I keep table definition or table create DDL script simple in this SQL tutorial. This will also help you to compare your results with the results given in this tutorial.
Let's populate sample database table with some test data
Determine If Two Time Periods are Overlapping Each Other
Following SQL Select statement will enable database developers to bring two time periods side by side so we can compare the start date and end date values of each time intervals to identify whether they are overlapping or seperated from each other.
As database developers can see, I have brought the two periods side by side in a row to compare them easily.
Now, I will share the CASE statement which will definitely help SQL programmers to identify overlapping time periods. I will use this method or solution in following SQL queries built as solution to different SQL problems, too.
If you return back to visual diagrams and assume that black interval is used for t1, and red marked period is for t2, the CASE statement can be read easily. The first criteria maps to first visual diagram, the second criteria maps to the second drawing, and goes on.
The output of the execution of above SQL statement shows us that the two periods are overlapping as expected
SQL Query using Lag Function for OverLapping Time Intervals
Let's now insert another data row into our sample database table which does not overlap with previous time periods.
SQL programmers now can use SQL LAG() function to compare all table rows with the previous row when ordered by ID column.
Maybe it would be better to select the compared previous row using LAG() function sorting rows by StarDate.
If you also believe so, just replace the ORDER BY column ID with StartDate in the following SQL statement.
The output shows that the first row is not overlapping with a previous time period since itself is the first one and there is not any other time interval before.
Data row with ID 2 is overlapping with time period represented by row ID 1.
Data row with ID 3 is not overlapping with time period represented by row ID 2. [OverLapping with Previous] column has value "No"
Now let's clear all table data using following SQL Truncate Table command so we can later populate more sample data
Here is additional data rows for the sample database table
Before SQL database programmers execute previous SELECT query, I would like to notify that now in our table we have two different data categories based on TASK column.
So it will be correct to modify our SELECT query and add PARTITION BY clause to seperate data based on different TASK values.
I will also change the data sorting to make a time based order using STARTDATE instead of ID column value.
Output of the SQL query to determine overlapping time periods of data rows is as follows
Overlapping Time Frame Chains
Altough above SQL query gives database developers if a table entry containing a time period is overlapping with the time period just before that record, it does not provide enough data if as a developer looking for set of rows that overlap as a group.
To be more clear, rows with ids 10, 11 and 12 can be considered as overlapping time periods although row with 11 and 12 do not overlap.
Since time interval of ID 10 is so long that it covers the time period identified by both ID 11 and ID 12 rows.
Perhaps, if I could draw the case it will be more helpful for you to see the situation more clear.
You see in above picture, though red and blue drawn time periods do not overlap, I want to group these three intervals as overlapping group because they are linked one-to-another in a way.
As least black marked time interval contains both of the others in its time frame.
So I want to build a SQL query, that will mark these 3 rows (ID 10,11 and 12) as a group.
Or better, the SQL query should assing the same group number to these 3 rows and assign different group numbers to other rows in similar situation.
For example, in our sample table data, the Task B entries form a single group that they overlap continously. Here is the time frame sketch which is not proportionally.
Let's now build a SQL query to determine and group time frames forming an overlapping chain.
Please check following SQL query where multiple CTE expressions are used. The first SQL CTE expression is just for sorting data entries according to their start datetime values and assinging them a row number
The second CTE expression is a SQL recursive query.
In the anchor query of the SQL recursive CTE, I start with the earliest time period which is identified with rownumber rn value equals to 1.
The recursive part of the SQL CTE expression, reads the next time period by adding one to the row number value.
It checks if there is an overlapping situation between two time intervals of the two entries.
If there is an time overlapping relation between two, the previous GroupId value is kept. Otherwise, group number value is increased by one.
One more interesting calculation takes place in the recursive part of the SQL query.
Each time a new row is read, if it is overlapping then the boundaries of the group is calculated once more.
If there is not an overlapping case, a new group is formed. And the starting and ending time limits of the new group is set by this new data row initially.
And if you wonder how is the output of this SQL recursive CTE query which identifies overlapping data period chains and groups them by assigning a single value to each group, here is the SQL query output with our sample data.
If you want to calculate minimum datetime value and maximum datetime value of a group, in short the boundaries of the time frame of each overlapping chain, simply execute a GROUP BY clauseafter "cte" aliased CTE expression.
In short, comment the last SELECT statement as seen below on the above shared SQL query. Then add below SELECT statement where MIN() and MAX() aggregation functions are used with GROUP BY clause on task and groupid columns.
SQL database developers can see the ID's of the records for that overlapping time periods are listed under column taskList.
I also used one of the new SQL functions String_Agg function for string concatenation introduced with SQL Server 2017.
If you are using a SQL Server version prior to SQL Server 2017, you can comment the string_agg function