Grabbing certain columns from the rows that the MAX value occurred - sql

So i am trying to insert only the rows that are correlated to the HOUR where the MAX value occurred . However I am grouping by Day, and later on will be grouping on week, month, year, etc. Is my approach correct? The reason I am using a subquery is because I need to grab the value that goes into the HR column but adding it to the GROUP BY will mess up my daily groupings (if that makes sense). I need certain values from those columns to insert with a Stored Procedure that I wrote. Is the query I have below on the right track? Thanks in Advance
Below is some sample data:
VALUE_ID VALUE HR VALUE_TYPE OFFSET DATA_CODE
1 75 DEC-25-2018 01:00:00 AM Bananas 1 HI
2 10 DEC-25-2018 02:00:00 AM Bananas 1 HI
3 0 DEC-25-2018 03:00:00 AM Bananas 1 HI
4 77 DEC-25-2018 04:00:00 AM Bananas 1 HI
5 787 DEC-25-2018 05:00:00 PM Bananas 1 HI
What I want:
VALUE_ID VALUE HR VALUE_TYPE OFFSET DATA_CODE
5 787 DEC-25-2018 05:00:00 PM Bananas 1 HI
SELECT v.value AS MAX_VALUE
, v.offset
, v.data_code
, v.hr
, v.code
, v.data_date
, to_date(to_char(to_date(lpad(v.data_date, 7, 0), 'DDDYYYY'), 'MM/DD/YYYY'), 'MM/DD/YYYY') as converted_date
FROM value v
inner join sub_value sv on v.value_id = sv.value_id
Where v.Value_id IN (select VALUE_ID from(
select MAX(v.value) as MAX_VALUE
, MAX(v.offset) as OFFSET
, v.data_Code
, MAX(v.value_id) as VALUE_ID
, v.code
, v.data_date
, to_date(to_char(to_date(lpad(v.data_date, 7, 0), 'DDDYYYY'), 'MM/DD/YYYY'), 'MM/DD/YYYY') as converted_date
from value v
inner join sub_value sv on v.value_id = sv.value_id;
Also: Would the process be the same for weekly MAX as well? (of course I am going to convert the date/timestamp to IW (week), but aside from that)

One simple method uses row_number() or rank(). Here is an example, assuming you want the maximum for each data_code:
select t.*
from (select t.*, row_number() over (partition by data_code order by value desc) as seqnum
from t
) t
where seqnum = 1;

Related

SQL : create intermediate data from date range

I have a table as shown here:
USER
ROI
DATE
1
5
2021-11-24
1
4
2021-11-26
1
6
2021-11-29
I want to get the ROI for the dates in between the other dates, expected result will be as below
From 2021-11-24 to 2021-11-30
USER
ROI
DATE
1
5
2021-11-24
1
5
2021-11-25
1
4
2021-11-26
1
4
2021-11-27
1
4
2021-11-28
1
6
2021-11-29
1
6
2021-11-30
You may use a calendar table approach here. Create a table containing all dates and then join with it. Sans an actual table, you may use an inline CTE:
WITH dates AS (
SELECT '2021-11-24' AS dt UNION ALL
SELECT '2021-11-25' UNION ALL
SELECT '2021-11-26' UNION ALL
SELECT '2021-11-27' UNION ALL
SELECT '2021-11-28' UNION ALL
SELECT '2021-11-29' UNION ALL
SELECT '2021-11-30'
),
cte AS (
SELECT USER, ROI, DATE, LEAD(DATE) OVER (ORDER BY DATE) AS NEXT_DATE
FROM yourTable
)
SELECT t.USER, t.ROI, d.dt
FROM dates d
INNER JOIN cte t
ON d.dt >= t.DATE AND (d.dt < t.NEXT_DATE OR t.NEXT_DATE IS NULL)
ORDER BY d.dt;

Count data per day between two dates

Hi i'm trying to count the total late remark per day between two dates inputs by the user.
for example:
ID NAME DATE_TIME REMARKS
1 Aa 2020-01-18 09:57:56 LATE
2 Aa 2020-01-18 10:57:56 LATE
3 Aa 2020-01-19 06:52:56
4 Aa 2020-01-19 09:57:56 LATE
5 Aa 2020-01-19 09:57:56 LATE
6 Aa 2020-01-21 09:57:56 Late
Expected result.
NAME DATE count
Aa 2020-01-18 2
Aa 2020-01-19 2
Aa 2020-01-20 0
Aa 2020-01-21 1
The Data type of DATE_TIME is varhcar2
this is my attemp but i dont know how to achive it.
Select Count(REMARKS) countBT from TBLACCESSLOGS WHERE To_date(DATE_TIME,'YYYY-MM-DD') between To_date('2020-02-18','YYYY-MM-DD') and To_date('2020-02-20','YYYY-MM-DD')
and i get error date format picture ends before converting entire input string pointing on DATE_TIME as i execute.
Hope someone help me with this.
Thank you in advance
Since you face the prospect that there may be missing dates within the range your looking for you need to generate an entry for each date in that range. You the join those dates with your table, counting the number of remarks column.
with date_parms as
(select to_date('&Start_Date','yyyy-mm-dd') start_date
, to_date('&End_Date','yyyy-mm-dd') end_date
from dual
)
, date_list as
(select start_date+lev-1 t_date
from date_parms
, ( select level lev
from dual
connect by level <= (select end_date - start_date + 1
from date_parms
)
)
)
select t_date "Date"
, name
, count(*) "Num Late"
from date_list dl
left join lates l on trunc(l.date_time) = dl.t_date and lower(l.remark) = 'late'
where 1=1 --lower(l.remark) = 'late'
group by trunc(t_time), name;
Note. Once the initial parameters (start and end dates) are converted to from strings to dates no further date-string manipulation is required.
Completely edited; this is a multi-step process, the real important SQL logic is in "THE_GOODS". THe generation of days is in "DAYS" and I took that from here: https://www.zetetic.net/blog/2009/2/12/generating-a-sequential-date-series-in-oracle.html -- I don't understand it much more than ctl-c/ctl-v.
"PERMS" makes the permutation of dates/names, then that is left-joined to THE_GOODS to get the counts. So for each combo of user and dates in range you get one row, and the count from THE_GOODS, or zero if there's no matching row.
to fiddle with it: http://sqlfiddle.com/#!4/42618/8
WITH DAYS AS
(SELECT TO_CHAR(TRUNC(TO_DATE('01-JAN-2020') + ROWNUM - 1, 'DD'), 'YYYY-MM-DD') AS ADAY
FROM (
SELECT ROWNUM FROM (
SELECT 1 FROM DUAL
CONNECT BY LEVEL <= (TO_DATE('08-JAN-2020') - TO_DATE('01-JAN-2020'))
)
)
),
THE_GOODS AS (
select name, to_char(DATE_TIME, 'YYYY-MM-DD') AS ADAY, count(*) AS HOW_MANY
from TBLACCESSLOGS
where trunc(DATE_TIME, 'DD') between to_date('2020-01-01', 'YYYY-MM-DD')
and to_date('2020-01-05', 'YYYY-MM-DD')
and remarks = 'LATE'
group by name, to_char(DATE_TIME, 'YYYY-MM-DD')
)
,
PERMS AS (
SELECT DISTINCT DAYS.ADAY, THE_GOODS.NAME
FROM DAYS
CROSS JOIN
THE_GOODS
)
SELECT p.NAME, p.ADAY, COALESCE(g.HOW_MANY, 0) AS HOWMANY
FROM PERMS p
LEFT JOIN THE_GOODS g
on p.ADAY = g.ADAY
and p.NAME = g.NAME
ORDER BY p.ADAY, g.NAME
Try this ..
SQL> select * from late_remarks;
ID NAME DATE_TIME REMARKS
-- -- ------------------- ----
1 Aa 2020-01-18 09:57:56 LATE
2 Aa 2020-01-18 10:57:56 LATE
3 Aa 2020-01-19 06:52:56
4 Aa 2020-01-19 09:57:56 LATE
5 Aa 2020-01-19 09:57:56 LATE
6 Aa 2020-01-21 09:57:56 LATE
6 rows selected.
SQL> with dates as (
2 select to_date('17-01-2020', 'DD-MM-YYYY') + level "DATE"
3 from dual
4 connect by level <= (to_date('21-01-2020', 'DD-MM-YYYY') - to_date('17-01-2020', 'DD-MM-YYYY'))
5 )
6 select 'Aa' name, d."DATE", count(lr.remarks) count from dates d
7 left outer join late_remarks lr
8 on d."DATE" = trunc(to_timestamp (lr.date_time, 'YYYY-MM-DD HH24:MI:SS'))
9 group by d."DATE"
10 order by d."DATE";
NAME DATE COUNT
-- --------- ----------
Aa 18-JAN-20 2
Aa 19-JAN-20 2
Aa 20-JAN-20 0
Aa 21-JAN-20 1
.. assuming name to be constant

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)

SQL issue - calculate max days sequence

There is a table with visits data:
uid (INT) | created_at (DATETIME)
I want to find how many days in a row a user has visited our app. So for instance:
SELECT DISTINCT DATE(created_at) AS d FROM visits WHERE uid = 123
will return:
d
------------
2012-04-28
2012-04-29
2012-04-30
2012-05-03
2012-05-04
There are 5 records and two intervals - 3 days (28 - 30 Apr) and 2 days (3 - 4 May).
My question is how to find the maximum number of days that a user has visited the app in a row (3 days in the example). Tried to find a suitable function in the SQL docs, but with no success. Am I missing something?
UPD:
Thank you guys for your answers! Actually, I'm working with vertica analytics database (http://vertica.com/), however this is a very rare solution and only a few people have experience with it. Although it supports SQL-99 standard.
Well, most of solutions work with slight modifications. Finally I created my own version of query:
-- returns starts of the vitit series
SELECT t1.d as s FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
s
---------------------
2012-04-28 01:00:00
2012-05-03 01:00:00
-- returns end of the vitit series
SELECT t1.d as f FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
f
---------------------
2012-04-30 01:00:00
2012-05-04 01:00:00
So now only what we need to do is to join them somehow, for instance by row index.
SELECT s, f, DATEDIFF(day, s, f) + 1 as seq FROM (
SELECT t1.d as s, ROW_NUMBER() OVER () as o1 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl1 LEFT JOIN (
SELECT t1.d as f, ROW_NUMBER() OVER () as o2 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl2 ON o1 = o2
Sample output:
s | f | seq
---------------------+---------------------+-----
2012-04-28 01:00:00 | 2012-04-30 01:00:00 | 3
2012-05-03 01:00:00 | 2012-05-04 01:00:00 | 2
Another approach, the shortest, do a self-join:
with grouped_result as
(
select
sr.d,
sum((fr.d is null)::int) over(order by sr.d) as group_number
from tbl sr
left join tbl fr on sr.d = fr.d + interval '1 day'
)
select d, group_number, count(d) over m as consecutive_days
from grouped_result
window m as (partition by group_number)
Output:
d | group_number | consecutive_days
---------------------+--------------+------------------
2012-04-28 08:00:00 | 1 | 3
2012-04-29 08:00:00 | 1 | 3
2012-04-30 08:00:00 | 1 | 3
2012-05-03 08:00:00 | 2 | 2
2012-05-04 08:00:00 | 2 | 2
(5 rows)
Live test: http://www.sqlfiddle.com/#!1/93789/1
sr = second row, fr = first row ( or perhaps previous row? ツ ). Basically we are doing a back tracking, it's a simulated lag on database that doesn't support LAG (Postgres supports LAG, but the solution is very long, as windowing doesn't support nested windowing). So in this query, we uses a hybrid approach, simulate LAG via join, then use SUM windowing against it, this produces group number
UPDATE
Forgot to put the final query, the query above illustrate the underpinnings of group numbering, need to morph that into this:
with grouped_result as
(
select
sr.d,
sum((fr.d is null)::int) over(order by sr.d) as group_number
from tbl sr
left join tbl fr on sr.d = fr.d + interval '1 day'
)
select min(d) as starting_date, max(d) as end_date, count(d) as consecutive_days
from grouped_result
group by group_number
-- order by consecutive_days desc limit 1
STARTING_DATE END_DATE CONSECUTIVE_DAYS
April, 28 2012 08:00:00-0700 April, 30 2012 08:00:00-0700 3
May, 03 2012 08:00:00-0700 May, 04 2012 08:00:00-0700 2
UPDATE
I know why my other solution that uses window function became long, it became long on my attempt to illustrate the logic of group numbering and counting over the group. If I'd cut to the chase like in my MySql approach, that windowing function could be shorter. Having said that, here's my old windowing function approach, albeit better now:
with headers as
(
select
d,lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over (order by d) as group_number
from headers
)
select min(d) as starting_date,max(d) as ending_date,count(d) as consecutive_days
from sequence_group
group by group_number
-- order by consecutive_days desc limit 1
Live test: http://www.sqlfiddle.com/#!1/93789/21
In MySQL you could do this:
SET #nextDate = CURRENT_DATE;
SET #RowNum = 1;
SELECT MAX(RowNumber) AS ConecutiveVisits
FROM ( SELECT #RowNum := IF(#NextDate = Created_At, #RowNum + 1, 1) AS RowNumber,
Created_At,
#NextDate := DATE_ADD(Created_At, INTERVAL 1 DAY) AS NextDate
FROM Visits
ORDER BY Created_At
) Visits
Example here:
http://sqlfiddle.com/#!2/6e035/8
However I am not 100% certain this is the best way to do it.
In Postgresql:
;WITH RECURSIVE VisitsCTE AS
( SELECT Created_At, 1 AS ConsecutiveDays
FROM Visits
UNION ALL
SELECT v.Created_At, ConsecutiveDays + 1
FROM Visits v
INNER JOIN VisitsCTE cte
ON 1 + cte.Created_At = v.Created_At
)
SELECT MAX(ConsecutiveDays) AS ConsecutiveDays
FROM VisitsCTE
Example here:
http://sqlfiddle.com/#!1/16c90/9
I know Postgresql has something similar to common table expressions as available in MSSQL. I'm not that familiar with Postgresql, but the code below works for MSSQL and does what you want.
create table #tempdates (
mydate date
)
insert into #tempdates(mydate) values('2012-04-28')
insert into #tempdates(mydate) values('2012-04-29')
insert into #tempdates(mydate) values('2012-04-30')
insert into #tempdates(mydate) values('2012-05-03')
insert into #tempdates(mydate) values('2012-05-04');
with maxdays (s, e, c)
as
(
select mydate, mydate, 1
from #tempdates
union all
select m.s, mydate, m.c + 1
from #tempdates t
inner join maxdays m on DATEADD(day, -1, t.mydate)=m.e
)
select MIN(o.s),o.e,max(o.c)
from (
select m1.s,max(m1.e) e,max(m1.c) c
from maxdays m1
group by m1.s
) o
group by o.e
drop table #tempdates
And here's the SQL fiddle: http://sqlfiddle.com/#!3/42b38/2
All are very good answers, but I think I should contribute by showing another approach utilizing an analytical capability specific to Vertica (after all it is part of what you paid for). And I promise the final query is short.
First, query using conditional_true_event(). From Vertica's documentation:
Assigns an event window number to each row, starting from 0, and
increments the number by 1 when the result of the boolean argument
expression evaluates true.
The example query looks like this:
select uid, created_at,
conditional_true_event( created_at - lag(created_at) > '1 day' )
over (partition by uid order by created_at) as seq_id
from visits;
And output:
uid created_at seq_id
--- ------------------- ------
123 2012-04-28 00:00:00 0
123 2012-04-29 00:00:00 0
123 2012-04-30 00:00:00 0
123 2012-05-03 00:00:00 1
123 2012-05-04 00:00:00 1
123 2012-06-04 00:00:00 2
123 2012-06-04 00:00:00 2
Now the final query becomes easy:
select uid, seq_id, count(1) num_days, min(created_at) s, max(created_at) f
from
(
select uid, created_at,
conditional_true_event( created_at - lag(created_at) > '1 day' )
over (partition by uid order by created_at) as seq_id
from visits
) as seq
group by uid, seq_id;
Final Output:
uid seq_id num_days s f
--- ------ -------- ------------------- -------------------
123 0 3 2012-04-28 00:00:00 2012-04-30 00:00:00
123 1 2 2012-05-03 00:00:00 2012-05-04 00:00:00
123 2 2 2012-06-04 00:00:00 2012-06-04 00:00:00
One final note:
num_days is actually number of rows of the inner query. If there are two '2012-04-28' visits in the original table (i.e. duplicates), you might want to work around that.
The following should be Oracle friendly, and not require recursive logic.
;WITH
visit_dates (
visit_id,
date_id,
group_id
)
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY TRUNC(created_at)),
TRUNC(SYSDATE) - TRUNC(created_at),
TRUNC(SYSDATE) - TRUNC(created_at) - ROW_NUMBER() OVER (ORDER BY TRUNC(created_at))
FROM
visits
GROUP BY
TRUNC(created_at)
)
,
group_duration (
group_id,
duration
)
AS
(
SELECT
group_id,
MAX(date_id) - MIN(date_id) + 1 AS duration
FROM
visit_dates
GROUP BY
group_id
)
SELECT
MAX(duration) AS max_duration
FROM
group_duration
Postgresql:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over m as group_number
from headers
window m as (order by d)
)
,consecutive_list as
(
select d, group_number, count(d) over m as consecutive_count
from sequence_group
window m as (partition by group_number)
)
select * from consecutive_list
Divide-and-conquer approach: 3 steps
1st step, find headers:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
select * from headers
Output:
d | header
---------------------+--------
2012-04-28 08:00:00 | t
2012-04-29 08:00:00 | f
2012-04-30 08:00:00 | f
2012-05-03 08:00:00 | t
2012-05-04 08:00:00 | f
(5 rows)
2nd step, designate grouping:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over m as group_number
from headers
window m as (order by d)
)
select * from sequence_group
Output:
d | group_number
---------------------+--------------
2012-04-28 08:00:00 | 1
2012-04-29 08:00:00 | 1
2012-04-30 08:00:00 | 1
2012-05-03 08:00:00 | 2
2012-05-04 08:00:00 | 2
(5 rows)
3rd step, count max days:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over m as group_number
from headers
window m as (order by d)
)
,consecutive_list as
(
select d, group_number, count(d) over m as consecutive_count
from sequence_group
window m as (partition by group_number)
)
select * from consecutive_list
Output:
d | group_number | consecutive_count
---------------------+--------------+-----------------
2012-04-28 08:00:00 | 1 | 3
2012-04-29 08:00:00 | 1 | 3
2012-04-30 08:00:00 | 1 | 3
2012-05-03 08:00:00 | 2 | 2
2012-05-04 08:00:00 | 2 | 2
(5 rows)
This is for MySQL, the shortest, and uses minimal variable (one variable only):
select
min(d) as starting_date, max(d) as ending_date,
count(d) as consecutive_days
from
(
select
sr.d,
IF(fr.d is null,#group_number := #group_number + 1,#group_number)
as group_number
from tbl sr
left join tbl fr on sr.d = adddate(fr.d,interval 1 day)
cross join (select #group_number := 0) as grp
) as x
group by group_number
Output:
STARTING_DATE ENDING_DATE CONSECUTIVE_DAYS
April, 28 2012 08:00:00-0700 April, 30 2012 08:00:00-0700 3
May, 03 2012 08:00:00-0700 May, 04 2012 08:00:00-0700 2
Live test: http://www.sqlfiddle.com/#!2/65169/1
For PostgreSQL 8.4 or later, there is a short and clean way with window functions and no JOIN.
I'd expect this to be the fastest solution posted so far:
WITH x AS (
SELECT created_at AS d
, lag(created_at) OVER (ORDER BY created_at) = (created_at - 1) AS nu
FROM visits
WHERE uid = 1
)
, y AS (
SELECT d, count(NULLIF(nu, TRUE)) OVER (ORDER BY d) AS seq
FROM x
)
SELECT count(*) AS max_days, min(d) AS seq_from, max(d) AS seq_to
FROM y
GROUP BY seq
ORDER BY 1 DESC
LIMIT 1;
Returns:
max_days | seq_from | seq_to
---------+------------+-----------
3 | 2012-04-28 | 2012-04-30
Assuming that created_at is a date and unique.
In CTE x: for every day our user visits, check if he was here yesterday, too.
To calculate "yesterday" just use created_at - 1 The first row is a special case and will produce NULL here.
In CTE y: calculate a running count of "days without yesterday so far" (seq) for every day. NULL values don't count, so count(NULLIF(nu, TRUE)) is the fastes and shortest way, also covering the special case.
Finally, group days per seq and count the days. While being at it I added first and last day of the sequence.
ORDER BY length of the sequence, and pick the longest one.
Upon seeing OP's query approach for their Vertica database, I tried making the two joins run at the same time:
These Postgresql and Sql Server query versions shall both work in Vertica
Postgresql version:
select
min(gr.d) as start_date,
max(gr.d) as end_date,
date_part('day', max(gr.d) - min(gr.d))+1 as consecutive_days
from
(
select
cr.d, (row_number() over() - 1) / 2 as pair_number
from tbl cr
left join tbl pr on pr.d = cr.d - interval '1 day'
left join tbl nr on nr.d = cr.d + interval '1 day'
where pr.d is null <> nr.d is null
) as gr
group by pair_number
order by start_date
Regarding pr.d is null <> nr.d is null. It means, it's either the previous row is null or next row is null, but they can never both be null, so this basically removes the non-consecutive dates, as non-consecutive dates' previous & next row are nulls (and this basically gives us all dates that are just headers and footers only). This is also called an XOR operation
If we are left with consecutive dates only, we can now pair them via row_number:
(row_number() over() - 1) / 2 as pair_number
row_number() starts with 1, we need to subtract it with 1 (we can also add with 1 instead), then we divide it by two; this makes the paired date adjacent to each other
Live test: http://www.sqlfiddle.com/#!1/fc440/7
This is the Sql Server version:
select
min(gr.d) as start_date,
max(gr.d) as end_date,
datediff(day, min(gr.d),max(gr.d)) +1 as consecutive_days
from
(
select
cr.d, (row_number() over(order by cr.d) - 1) / 2 as pair_number
from tbl cr
left join tbl pr on pr.d = dateadd(day,-1,cr.d)
left join tbl nr on nr.d = dateadd(day,+1,cr.d)
where
case when pr.d is null then 1 else 0 end
<> case when nr.d is null then 1 else 0 end
) as gr
group by pair_number
order by start_date
Same logic as above, except for artificial differences on date functions. And sql Server requires an ORDER BY clause on its OVER, while Postgresql's OVER can be left empty.
Sql Server has no first class boolean, that's why we cannot compare booleans directly:
pr.d is null <> nr.d is null
We must do this in Sql Server:
case when pr.d is null then 1 else 0 end
<> case when nr.d is null then 1 else 0 end
Live test: http://www.sqlfiddle.com/#!3/65df2/17
There have already been several answers to this question. However the SQL statements all seem too complex. This can be accomplished with basic SQL, a way to enumerate rows, and some date arithmetic.
The key observation is that if you have a bunch of days and have a parallel sequence of integers, then the difference is a constant date when the days are in a sequence.
The following query uses this observation to answer the original question:
select uid, min(d) as startdate, count(*) as numdaysinseq
from
(
select uid, d, adddate(d, interval -offset day) as groupstart
from
(
select uid, d, row_number() over (partition by uid order by date) as offset
from
(
SELECT DISTINCT uid, DATE(created_at) AS d
FROM visits
) t
) t
) t
Alas, mysql does not have the row_number() function. However, there is a work-around with variables (and most other databases do have this function).

select maximum score grouped by date, display full datetime

Gday, I have a table that shows a series of scores and datetimes those scores occurred.
I'd like to select the maximum of these scores for each day, but display the datetime that the score occurred.
I am using an Oracle database (10g) and the table is structured like so:
scoredatetime score (integer)
---------------------------------------
01-jan-09 00:10:00 10
01-jan-09 01:00:00 11
01-jan-09 04:00:01 9
...
I'd like to be able to present the results such the above becomes:
01-jan-09 01:00:00 11
This following query gets me halfway there.. but not all the way.
select
trunc(t.scoredatetime), max(t.score)
from
mytable t
group by
trunc(t.scoredatetime)
I cannot join on score only because the same high score may have been achieved multiple times throughout the day.
I appreciate your help!
Simon Edwards
with mytableRanked(d,scoredatetime,score,rk) as (
select
scoredatetime,
score,
row_number() over (
partition by trunc(scoredatetime)
order by score desc, scoredatetime desc
)
from mytable
)
select
scoredatetime,
score
from mytableRanked
where rk = 1
order by date desc
In the case of multiple high scores within a day, this returns the row corresponding to the one that occurred latest in the day. If you want to see all highest scores in a day, remove scoredatetime desc from the order by specification in the row_number window.
Alternatively, you can do this (it will list ties of high score for a date):
select
scoredatetime,
score
from mytable
where not exists (
select *
from mytable as M2
where trunc(M2.scoredatetime) = trunc(mytable.scoredatetime)
and M2.score > mytable.scoredatetime
)
order by scoredatetime desc
First of all, you did not yet specify what should happen if two or more rows within the same day contain an equal high score.
Two possible answers to that question:
1) Just select one of the scoredatetime's, it doesn't matter which one
In this case don't use self joins or analytics as you see in the other answers, because there is a special aggregate function that can do your job more efficient. An example:
SQL> create table mytable (scoredatetime,score)
2 as
3 select to_date('01-jan-2009 00:10:00','dd-mon-yyyy hh24:mi:ss'), 10 from dual union all
4 select to_date('01-jan-2009 01:00:00','dd-mon-yyyy hh24:mi:ss'), 11 from dual union all
5 select to_date('01-jan-2009 04:00:00','dd-mon-yyyy hh24:mi:ss'), 9 from dual union all
6 select to_date('02-jan-2009 00:10:00','dd-mon-yyyy hh24:mi:ss'), 1 from dual union all
7 select to_date('02-jan-2009 01:00:00','dd-mon-yyyy hh24:mi:ss'), 1 from dual union all
8 select to_date('02-jan-2009 04:00:00','dd-mon-yyyy hh24:mi:ss'), 0 from dual
9 /
Table created.
SQL> select max(scoredatetime) keep (dense_rank last order by score) scoredatetime
2 , max(score)
3 from mytable
4 group by trunc(scoredatetime,'dd')
5 /
SCOREDATETIME MAX(SCORE)
------------------- ----------
01-01-2009 01:00:00 11
02-01-2009 01:00:00 1
2 rows selected.
2) Select all records with the maximum score.
In this case you need analytics with a RANK or DENSE_RANK function. An example:
SQL> select scoredatetime
2 , score
3 from ( select scoredatetime
4 , score
5 , rank() over (partition by trunc(scoredatetime,'dd') order by score desc) rnk
6 from mytable
7 )
8 where rnk = 1
9 /
SCOREDATETIME SCORE
------------------- ----------
01-01-2009 01:00:00 11
02-01-2009 00:10:00 1
02-01-2009 01:00:00 1
3 rows selected.
Regards,
Rob.
You might need two SELECT statements to pull this off: the first to collect the truncated date and associated max score, and the second to pull in the actual datetime values associated with the score.
Try:
SELECT T.ScoreDateTime, T.Score
FROM
(
SELECT
TRUNC(T.ScoreDateTime) ScoreDate, MAX(T.score) BestScore
FROM
MyTable T
GROUP BY
TRUNC(T.ScoreDateTime)
) ByDate
INNER JOIN MyTable T
ON TRUNC(T.ScoreDateTime) = ByDate.ScoreDate and T.Score = ByDate.BestScore
ORDER BY T.ScoreDateTime DESC
This will pull in best score ties as well.
For a version which selects only the most recently-posted high score for each day:
SELECT T.ScoreDateTime, T.Score
FROM
(
SELECT
TRUNC(T.ScoreDateTime) ScoreDate,
MAX(T.score) BestScore,
MAX(T.ScoreDateTime) BestScoreTime
FROM
MyTable T
GROUP BY
TRUNC(T.ScoreDateTime)
) ByDate
INNER JOIN MyTable T
ON T.ScoreDateTime = ByDate.BestScoreTime and T.Score = ByDate.BestScore
ORDER BY T.ScoreDateTime DESC
This may produce multiple records per date if two different scores were posted at exactly the same time.