How to write a SQL statement to sum data using group by the same day of every two neighboring months - sql

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...

One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo

I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.

I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).

Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

Related

Question: Joining two data sets with date conditions

I'm pretty new with SQL, and I'm struggling to figure out a seemingly simple task.
Here's the situation:
I'm working with two data sets
Data Set A, which is the most accurate but only refreshes every quarter
Data Set B, which has all the date, including the most recent data, but is overall less accurate
My goal is to combine both data sets where I would have Data Set A for all data up to the most recent quarter and Data Set B for anything after (i.e., all recent data not captured in Data Set A)
For example:
Data Set A captures anything from Q1 2020 (January to March)
Let's say we are April 15th
Data Set B captures anything from Q1 2020 to the most current date, April 15th
My goal is to use Data Set A for all data from January to March 2020 (Q1) and then Data Set B for all data from April 1 to 15
Any thoughts or advice on how to do this? Potentially a join function along with a date one?
Any help would be much appreciated.
Thanks in advance for the help.
I hope I got your question right.
I put in some sample data that might match your description: a date and an amount. To keep it simple, one row per any month. You can extract the quarter from a date, and keep that as an additional column, and then filter by that down the line.
WITH
-- some sample data: date and amount ...
indata(dt,amount) AS (
SELECT DATE '2020-01-15', 234.45
UNION ALL SELECT DATE '2020-02-15', 344.45
UNION ALL SELECT DATE '2020-03-15', 345.45
UNION ALL SELECT DATE '2020-04-15', 346.45
UNION ALL SELECT DATE '2020-05-15', 347.45
UNION ALL SELECT DATE '2020-06-15', 348.45
UNION ALL SELECT DATE '2020-07-15', 349.45
UNION ALL SELECT DATE '2020-08-15', 350.45
UNION ALL SELECT DATE '2020-09-15', 351.45
UNION ALL SELECT DATE '2020-10-15', 352.45
UNION ALL SELECT DATE '2020-11-15', 353.45
UNION ALL SELECT DATE '2020-12-15', 354.45
)
-- real query starts here ...
SELECT
EXTRACT(QUARTER FROM dt) AS the_quarter
, CAST(
TIMESTAMPADD(
QUARTER
, CAST(EXTRACT(QUARTER FROM dt) AS INTEGER)-1
, TRUNC(dt,'YEAR')
)
AS DATE
) AS qtr_start
, *
FROM indata;
-- out the_quarter | qtr_start | dt | amount
-- out -------------+------------+------------+--------
-- out 1 | 2020-01-01 | 2020-01-15 | 234.45
-- out 1 | 2020-01-01 | 2020-02-15 | 344.45
-- out 1 | 2020-01-01 | 2020-03-15 | 345.45
-- out 2 | 2020-04-01 | 2020-04-15 | 346.45
-- out 2 | 2020-04-01 | 2020-05-15 | 347.45
-- out 2 | 2020-04-01 | 2020-06-15 | 348.45
-- out 3 | 2020-07-01 | 2020-07-15 | 349.45
-- out 3 | 2020-07-01 | 2020-08-15 | 350.45
-- out 3 | 2020-07-01 | 2020-09-15 | 351.45
-- out 4 | 2020-10-01 | 2020-10-15 | 352.45
-- out 4 | 2020-10-01 | 2020-11-15 | 353.45
-- out 4 | 2020-10-01 | 2020-12-15 | 354.45
If you filter by quarter, you can group your data by that column ...

Querying across months and days

My access logs database stores time as epoch and extracts year month and day as integers. Further, the partitioning of the database is based on the extracted Y/m/d and I have a 35 day retention.
If I run this query:
select *
from mydb
where year in (2017, 2018)
and month in (12, 1)
and day in (31, 1)
On the 29th of January, 2018, I will get data for 12/31/2017 and 1/1/2018.
On the 5th of January, 2018, I will get data for 12/1/2017, 12/31/2017, and 1/1/2018 (undesirable)
I also realize that I can do something like this:
select *
from mydb
where (year = 2017 and month = 12 and day = 31)
or (year = 2018 and month = 1 and day = 1)
But what I am really looking for is this: a good way to write a query where I give the year month and day number as the start and then a fourth value (number of days +) and then get all the data for 12/31/2017 + 5 days for example.
Is there a native way in SQL to accomplish this? I have an enormous data set and if I don't specify the days and have to rely on the epoch to do this, the query takes forever. I also have no influence over the partitioning configuration.
With Impala as the dbms and SQL dialect you will be able to use common table expressions but not recursion. In addition there may be problems inserting parameters as well.
Below is an untested suggestion that will require you to locate some function alternatives. First it generates a set of rows with an integer from 0 to 999 (in the example). It is quite easy to expand the number of rows if required. From those rows it is possible to add the number of days to a timestamp literal using date_add(timestamp startdate, int days/interval expression) and then with year(timestamp date) and month(timestamp date) and day(timestamp date) see Date and Time functions create the columns needed to match to your data.
Overall then you should be able to build a common table expression that has columns for year, month, day that cover a wanted range, and that you can inner join to your source table and thereby implementing a date range filter.
The code below was produced using T-SQL (SQL Server) and it can be tested here.
-- produce a set of integers, adjust to suit needed number of these
;WITH
cteDigits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
)
, cteTally AS (
SELECT
d1s.digit
+ d10s.digit * 10
+ d100s.digit * 100 /* add more like this as needed */
-- + d1000s.digit * 1000 /* add more like this as needed */
AS num
FROM cteDigits d1s
CROSS JOIN cteDigits d10s
CROSS JOIN cteDigits d100s /* add more like this as needed */
-- CROSS JOIN cteDigits d1000s /* add more like this as needed */
)
, DateRange AS (
select
num
, dateadd(day,num,'20181227') dt
, year(dateadd(day,num,'20181227')) yr
, month(dateadd(day,num,'20181227')) mn
, day(dateadd(day,num,'20181227')) dy
from cteTally
where num < 10
)
select
*
from DateRange
I think these are the Impala equivalents for the function calls used above:
, DateRange AS (
select
num
, date_add(to_timestamp('20181227','yyyyMMdd'),num) dt
, year( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) yr
, month( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) mn
, day( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) dy
from cteTally
where num < 10
Hopefully you can work out how to use these. Ultimately the purpose is to use the generated date range like so:
select * from mydb t
inner join DateRange on t.year = DateRange.yr and t.month = DateRange.mn and t.day = DateRange.dy
original post
Well in the absence of knowing what database to propose solutions for, here is a suggestion using SQL Server:
This suggestion involves a recursive common table expression, which may then be used as an inner join to your source data to limit the results to a date range.
--Sql Server 2014 Express Edition
--https://rextester.com/l/sql_server_online_compiler
declare #yr as integer = 2018
declare #mn as integer = 12
declare #dy as integer = 27
declare #du as integer = 10
;with CTE as (
select
datefromparts(#yr, #mn, #dy) as dt
, #yr as yr
, #mn as mn
, #dy as dy
union all
select
dateadd(dd,1,cte.dt)
, datepart(year,dateadd(dd,1,cte.dt))
, datepart(month,dateadd(dd,1,cte.dt))
, datepart(day,dateadd(dd,1,cte.dt))
from cte
where cte.dt < dateadd(dd,#du-1,datefromparts(#yr, #mn, #dy))
)
select
*
from cte
This produces the following result:
+----+---------------------+------+----+----+
| | dt | yr | mn | dy |
+----+---------------------+------+----+----+
| 1 | 27.12.2018 00:00:00 | 2018 | 12 | 27 |
| 2 | 28.12.2018 00:00:00 | 2018 | 12 | 28 |
| 3 | 29.12.2018 00:00:00 | 2018 | 12 | 29 |
| 4 | 30.12.2018 00:00:00 | 2018 | 12 | 30 |
| 5 | 31.12.2018 00:00:00 | 2018 | 12 | 31 |
| 6 | 01.01.2019 00:00:00 | 2019 | 1 | 1 |
| 7 | 02.01.2019 00:00:00 | 2019 | 1 | 2 |
| 8 | 03.01.2019 00:00:00 | 2019 | 1 | 3 |
| 9 | 04.01.2019 00:00:00 | 2019 | 1 | 4 |
| 10 | 05.01.2019 00:00:00 | 2019 | 1 | 5 |
+----+---------------------+------+----+----+
and:
select * from mydb t
inner join cte on t.year = cte.yr and t.month = cte.mn and t.day = cte.dy
Instead of a recursive common table expression a table of integers may be used instead (or use a set unioned select queries to generate a set of integers) - often known as a tally table. The method one chooses will depend of dbms type and version being used.
Again, depending on database, it may be more efficient to persist the result seen above as a temporary table and add an index to that.

Postgres query for calendar

I am trying to write a query to retrieve data from an events query for a simple calendar app.
The table structure is as followed:
table name: events
Column | Type
---------+-----------
id | integer
start | timestamp
end | timestamp
the data inside of the table
id| start | end
--+---------------------+--------------------
1 | 2017-09-01 12:00:00 | 2017-09-01 12:00:00
2 | 2017-09-03 10:00:00 | 2017-09-03 12:00:00
3 | 2017-09-08 12:00:00 | 2017-09-11 12:00:00
4 | 2017-09-11 12:00:00 | 2017-09-11 12:00:00
the expected result is
date | event.id
-----------+---------
2017-09-01 | 1
2017-09-03 | 2
2017-09-08 | 3
2017-09-09 | 3
2017-09-10 | 3
2017-09-11 | 3
2017-09-11 | 4
As you can see, only days with an event (not just start and end, but also the days in between) is retrieved, days without an event are not retrieved at all.
In the second step I would like to be able to limit the amount of distinct days, e.g. "get 4 days with events" what might be more than 4 rows.
Right now I am able to retrieve the events based on start date only using the following query:
SELECT start::date, id FROM events WHERE events.start::date >= '2017-09-01' LIMIT 3
Thinks I already though about are DENSE_RANK and generate_series, but up to now I didn't find a way to fill the gaps between start and end, but not on days where there are no data.
So in short:
What I want to get is: get the next X days where there is an event. A date with an event is a day where start <= date >= end
Any ideas ?
Edit
Thanks to Tim I have now the following query (modified to use generate_series instead of a table and added a limit using dense_rank):
select date, id FROM (
SELECT
DENSE_RANK() OVER (ORDER BY t1.date) as rank,
t1.date,
events.id
FROM
generate_series([DATE]::date, [DATE]::date + interval '365 day', '1 day') as t1
INNER JOIN
events
ON t1.date BETWEEN events.start::date AND events."end"::date
) as t
WHERE rank <= [LIMIT]
This is working really good, even though I am not 100% sure about the performance hit with this kind of limit
I think you really need a calendar table here to cover the full range of dates in which your data may appear. In the first CTE below, I generate a table covering the month of September 2017. Then all we need to do is inner join this calendar table with the events table on the criteria of a given day appearing within a given range.
WITH cte AS (
SELECT CAST('2017-09-01' AS DATE) + (n || ' day')::INTERVAL AS date
FROM generate_series(0, 29) n
)
SELECT
t1.date,
t2.id
FROM cte t1
INNER JOIN events t2
ON t1.date BETWEEN CAST(t2.start AS DATE) AND CAST(t2.end AS DATE);
Output:
date id
1 01.09.2017 00:00:00 1
2 03.09.2017 00:00:00 2
3 08.09.2017 00:00:00 3
4 09.09.2017 00:00:00 3
5 10.09.2017 00:00:00 3
6 11.09.2017 00:00:00 3
7 11.09.2017 00:00:00 4
Demo here:
Rextester

Joining series of dates and counting continous days

Let's say I have a table as below
date add_days
2015-01-01 5
2015-01-04 2
2015-01-11 7
2015-01-20 10
2015-01-30 1
what I want to do is to check the days_balance, i.e. if date is greater or smaller than previous date + N days (add_days) and take the cumulated sum of days count if they are a continuous series.
So the algorithm should work like
for i in 2:N_rows {
days_balance[i] := date[i-1] + add_days[i-1] - date[i]
if days_balance[i] >= 0 then
date[i] := date[i] + days_balance[i]
}
The expected result should be as follows
date days_balance
2015-01-01 0
2015-01-04 2
2015-01-11 -3
2015-01-20 -2
2015-01-30 0
Is it possible in pure SQL? I imagine it should be with some conditional joins, but cannot see how it could be implemented.
I'm posting another answer since it may be nice to compare them since they use different methods (this one just does a n^2 style join, other one used a recursive CTE). This one takes advantage of the fact that you don't have to calculate the days_balance for each previous row before calculating it for a particular row, you just need to sum things from previous days....
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
, combinedWithAllPreviousDaysCte as
(
select i [curr_i], date [curr_date], add_days [curr_add_days], days_since_prev [curr_days_since_prev], 0 [prev_add_days], 0 [prev_days_since_prev] from cte where i = 1 --get first row explicitly since it has no preceding rows
UNION ALL
select curr.i [curr_i], curr.date [curr_date], curr.add_days [curr_add_days], curr.days_since_prev [curr_days_since_prev], prev.add_days [prev_add_days], prev.days_since_prev [prev_days_since_prev]
from cte curr
join cte prev on curr.i > prev.i --join to all previous days
)
select curr_i, curr_date, SUM(prev_add_days) - curr_days_since_prev - SUM(prev_days_since_prev) [days_balance]
from combinedWithAllPreviousDaysCte
group by curr_i, curr_date, curr_days_since_prev
order by curr_i
outputs:
+--------+-------------------------+--------------+
| curr_i | curr_date | days_balance |
+--------+-------------------------+--------------+
| 1 | 2015-01-01 00:00:00.000 | 0 |
| 2 | 2015-01-04 00:00:00.000 | 2 |
| 3 | 2015-01-11 00:00:00.000 | -3 |
| 4 | 2015-01-20 00:00:00.000 | -5 |
| 5 | 2015-01-30 00:00:00.000 | -5 |
+--------+-------------------------+--------------+
Well, I think I have it with a recursive CTE (sorry, I only have Microsoft SQL Server available to me at the moment, so it may not comply with PostgreSQL).
Also I think the expected results you had were off (see comment above). If not, this can probably be modified to conform to your math.
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
,recursiveCte (i, date, add_days, days_since_prev, days_balance, math) as
(
select top 1
i,
date,
add_days,
days_since_prev,
0 [days_balance],
CAST('no math for initial one, just has zero balance' as varchar(max)) [math]
from cte where i = 1
UNION ALL --recursive step now
select
curr.i,
curr.date,
curr.add_days,
curr.days_since_prev,
prev.days_balance - curr.days_since_prev + prev.add_days [days_balance],
CAST(prev.days_balance as varchar(max)) + ' - ' + CAST(curr.days_since_prev as varchar(max)) + ' + ' + CAST(prev.add_days as varchar(max)) [math]
from cte curr
JOIN recursiveCte prev ON curr.i = prev.i + 1
)
select i, DATEPART(day,date) [day], add_days, days_since_prev, days_balance, math
from recursiveCTE
order by date
And the results are like so:
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| i | day | add_days | days_since_prev | days_balance | math |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| 1 | 1 | 5 | 0 | 0 | no math for initial one, just has zero balance |
| 2 | 4 | 2 | 3 | 2 | 0 - 3 + 5 |
| 3 | 11 | 7 | 7 | -3 | 2 - 7 + 2 |
| 4 | 20 | 10 | 9 | -5 | -3 - 9 + 7 |
| 5 | 30 | 1 | 10 | -5 | -5 - 10 + 10 |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
I don’t quite get how your algorithm returns your expected results? But let me share a technique I came up with that might help.
This will only work if the end result of your data is to be exported to Excel, and even then it won’t work in all scenarios depending on what format you export your dataset in, but here it is....
If you’ll familiar with Excel Formulas, what I discovered is that if you write an Excel formula in your SQL as another field, it will execute that formula for you as soon as you export to excel (best method that works for me is just coping and pasting it into Excel, so that it doesn’t format it as text)
So for your example, here’s what you could do (noting again I don’t understand your algorithm, so this is probably wrong, but it’s just to give you the concept)
SELECT
date
, add_days
, '=INDEX($1:$65536,ROW()-1,COLUMN()-2)'
||'+INDEX($1:$65536,ROW()-1,COLUMN()-1)'
||'-INDEX($1:$65536,ROW(),COLUMN()-2)'
AS "days_balance[i]"
,'=IF(INDEX($1:$65536,ROW(),COLUMN()-1)>=0'
||',INDEX($1:$65536,ROW(),COLUMN()-3)'
||'+INDEX($1:$65536,ROW(),COLUMN()-1))'
AS "date[i]"
FROM
myTable
ORDER BY /*Ensure to order by whatever you need for your formula to work*/
The key part to making this work is using the INDEX formula function to select a cell based on the position of the current cell. So ROW()-1 tells it get me the result of the previous record, and COLUMN()-2 means take the value from two columns to the left of the current. Because you can't use cell references like A2+B2-A3 because the row numbers won't change on export, and it assumes the position of the columns.
I used SQL string concatenation with || just so it's easier to read on screen.
I tried this one in excel; it didn’t match your expected results. But if this technique works for you then just correct the excel formula to suit.

SQL Query Compare values in per 15 minutes and display the result per hour

I have a table with 2 columns. UTCTime and Values.
The UTCTime is in 15 mins increment. I want a query that would compare the value to the previous value in one hour span and display a value between 0 and 4 depends on if the values are constant. In other words there is an entry for every 15 minute increment and the value can be constant so I just need to check each value to the previous one per hour.
For example
+---------|-------+
| UTCTime | Value |
------------------|
| 12:00 | 18.2 |
| 12:15 | 87.3 |
| 12:30 | 55.91 |
| 12:45 | 55.91 |
| 1:00 | 37.3 |
| 1:15 | 47.3 |
| 1:30 | 47.3 |
| 1:45 | 47.3 |
| 2:00 | 37.3 |
+---------|-------+
In this case, I just want a Query that would compare the 12:45 value to the 12:30 and 12:30 to 12:15 and so on. Since we are comparing in only one hour span then the constant values must be between 0 and 4 (O there is no constant values, 1 there is one like in the example above)
The query should display:
+----------+----------------+
| UTCTime | ConstantValues |
----------------------------|
| 12:00 | 1 |
| 1:00 | 2 |
+----------|----------------+
I just wanted to mention that I am new to SQL programming.
Thank you.
See SQL fiddle here
Below is the query you need and a working solution Note: I changed the timeframe to 24 hrs
;with SourceData(HourTime, Value, RowNum)
as
(
select
datepart(hh, UTCTime) HourTime,
Value,
row_number() over (partition by datepart(hh, UTCTime) order by UTCTime) RowNum
from foo
union
select
datepart(hh, UTCTime) - 1 HourTime,
Value,
5
from foo
where datepart(mi, UTCTime) = 0
)
select cast(A.HourTime as varchar) + ':00' UTCTime, sum(case when A.Value = B.Value then 1 else 0 end) ConstantValues
from SourceData A
inner join SourceData B on A.HourTime = B.HourTime and
(B.RowNum = (A.RowNum - 1))
group by cast(A.HourTime as varchar) + ':00'
select SUBSTRING_INDEX(UTCTime,':',1) as time,value, count(*)-1 as total
from foo group by value,time having total >= 1;
fiddle
Mine isn't much different from Vasanth's, same idea different approach.
The idea is that you need recursion to carry it out simply. You could also use the LEAD() function to look at rows ahead of your current row, but in this case that would require a big case statement to cover every outcome.
;WITH T
AS (
SELECT a.UTCTime,b.VALUE,ROW_NUMBER() OVER(PARTITION BY a.UTCTime ORDER BY b.UTCTime DESC)'RowRank'
FROM (SELECT *
FROM #Table1
WHERE DATEPART(MINUTE,UTCTime) = 0
)a
JOIN #Table1 b
ON b.UTCTIME BETWEEN a.UTCTIME AND DATEADD(hour,1,a.UTCTIME)
)
SELECT T.UTCTime, SUM(CASE WHEN T.Value = T2.Value THEN 1 ELSE 0 END)
FROM T
JOIN T T2
ON T.UTCTime = T2.UTCTime
AND T.RowRank = T2.RowRank -1
GROUP BY T.UTCTime
If you run the portion inside the ;WITH T AS ( ) you'll see that gets us the hour we're looking at and the values in order by time. That is used in the recursive portion below by joining to itself and evaluating each row compared to the next row (hence the RowRank - 1) on the JOIN.