Totals over rolling timeframe - sql

I have my data arranged like this:
obj_id quantity date
1 3 2014-05-06
2 2 2014-03-12
3 5 2014-10-07
4 7 2014-05-09
2 8 2014-12-31
1 5 2014-01-16
4 1 2014-07-26
3 2 2014-09-15
...
What I need is to find the OBJ_ID's that have the SUM(quantity) > MAX over the period of RANGE days.
In my case MAX is 18 and RANGE is 31 days.
In other words, every given OBJ_ID recieves QUANTITY (no matter of what) from time to time. I need to find OBJ_IDs that had received in total more than 18 and dates that this OBJ_ID recieved Qs span over less than 31 days. Doh.)
I think I need to use LAG here, but not sure how the whole thing should be.
Thanks in advance.

This might need some tweaking as I didn't have the time to decently test it, but maybe it'll get you on the right track:
(I've assumed you want the records where the date is within the last 31 days)
SELECT SUM(quantity)
FROM tblTable
WHERE date between DATEADD(day, -RANGE, GETDATE()) and GETDATE()
HAVING SUM(quantity) > MAX
GROUP BY obj_id

I'm currently testing a solution a colleague of mine has quickly put together:
SELECT A.*
FROM (
SELECT A.obj_id
, A.date
, A.in_month_date
, A.date - A.in_month_date AS in_month
, A.quantity
, A.in_month_quantity
FROM (
SELECT A.obj_id
, A.date
, FIRST_VALUE(A.date)
OVER (
PARTITION BY A.obj_id
ORDER BY A.date
RANGE BETWEEN 31 PRECEDING
AND CURRENT ROW
) AS in_month_date
, A.quantity
, SUM(A.quantity)
OVER (
PARTITION BY A.obj_id
ORDER BY A.date
RANGE BETWEEN 31 PRECEDING
AND CURRENT ROW
) AS in_month_quantity
FROM mytable A
) A
) A
WHERE A.in_month <= 31
AND A.in_month_quantity > 18

Related

SQL question - how to output using iterative date logic in SQL Server

I have the following sample table (provided with single ID for simplicity - need to perform the same logic across all IDs)
ID Visit_date
-----------------
ABC 8/7/2019
ABC 9/10/2019
ABC 9/12/2019
ABC 10/1/2019
ABC 10/1/2019
ABC 10/8/2019
ABC 10/15/2019
ABC 10/17/2019
ABC 10/24/2019
Here is what I need to get the sample output
Mark the first visit as 1 in the "new_visit" column
Compare the subsequent dates with the 1st date until it exceeds 21 days condition. Example Sep 10 is compared to Aug 7 and it doesn’t fall within 21 days of Aug 7, therefore this is considered as another new_visit, so mark new_visit as 1
Then we compare Sep 10 with the subsequent dates with 21 days criteria and mark all of them as follow_up of Sep 10 visit. Eg. Sep 12, Oct 1 are within 21 days of Sep 10; hence they are considered as follow up visits, so mark "follow_up" as 1
When the subsequent date exceeds 21 days criteria of the previous new visit (e.g. Oct 8 compared to Sep 10) then Oct 8 will be considered a new visit & mark "New_visit" as 1 and the subsequent dates will be compared against Oct 8
Sample Output :
Dates New_Visit Follow_up
-----------------------------
8/7/2019 1
9/10/2019 1
9/12/2019 1
10/1/2019 1
10/1/2019 1
10/8/2019 1
10/15/2019 1
10/17/2019 1
10/24/2019 1
You need a recursive query for this.
You would enumerate the rows, then walk through the dataset by ascending date, while keeping track of the first visit date of each group; when the interval since the last first visit exceeds 21 days, the date of the first visit resets, and a new group starts.
with
data as (
select t.*, row_number() over(partition by id order by date) rn
from mtytable t
),
cte as (
select id, visit_date, visit_date first_visit_date
from data
where rn = 1
union all
select c.id, d.visit_date, case when d.visit_date > datead(day, 21, c.first_visit_date) then d.visit_date else c.first_visit_date end
from cte c
inner join data d on d.id = c.id and d.rn = c.rn + 1
)
select
id,
date,
case when visit_date = first_visit_date then 1 else 0 end as is_new
case when visit_date = first_visit_date then 0 else 1 end as is_follow_up
from cte
If a patient may have more than 100 visits, then you need to add option (maxrecursion 0) at the very end of the query.
You need a recursive CTE to handle this. This is the idea, although the exact syntax might vary by database:
with recursive t as (
select id, date,
row_number() over (partition by id order by date) as seqnum
from yourtable
),
recursive cte as (
select id, date, visit_start as date, 1 as is_new_visit
from t
where id = 1
union all
select cte.id, t.date,
(case when t.date < visit_start + interval '21 day'
then cte.visit_start else t.date
end) as visit_start,
(case when t.date < cte.visit_start + interval '21 say'
then 0 else 1
end) as is_new_visit
from cte join
t
on t.id = cte.id and t.seqnum = cte.seqnum + 1
)
select *
from cte
where is_new_visit = 1;

How to pick one non null date from dates - if date is null pick next one

I need to pick one date from week, it has to be Friday. However, when Friday is null - it means no data was entered, and I have to find any other day with data in the same week. Can someone share their views on how to solve this type of situation?
If you see in the following data, in the 2nd week, Friday has null entry, so another day has to be picked up.
Day Weekdate Data entry dt Data
1 2/7/2016
2 2/8/2016
3 2/9/2016
4 2/10/2016
5 2/11/2016
6 2/12/2016 2/12/2016 500
7 2/13/2016
1 2/14/2016
2 2/15/2016
3 2/16/2016
4 2/17/2016 2/17/2016 300
5 2/18/2016
6 2/19/2016 NULL NULL
7 2/20/2016
1 2/21/2016
2 2/22/2016
3 2/23/2016
4 2/24/2016
5 2/25/2016
6 2/26/2016 2/26/2016 250
7 2/27/2016
You may try this
--Not null data
select * from tblData
where DATEPART(dw,weekDate) = 6 and data is not null
Union
Select data.* from
(
select weekDate
from tblData
where DATEPART(dw,weekDate) = 6 and data is null
) nullData --Select Friday with null data
Cross Apply
(
--Find first record with not null data that is within this week
Select top 1 *
From tblData data
Where
data.weekDate between Dateadd(day, -6, nullData.weekDate) and nullData.weekDate
and data.data is not null
Order by data.weekDate desc
) data
You can try something like this to get the data entered for the latest date (Friday first, then every other day) for each week in your table:
SELECT
Weeks.FirstofWeek,
Detail.Day,
Detail.DataEntryDt,
Detail.Data
FROM
( --master list of weeks
SELECT DISTINCT DATEADD(DAY,(1-DATEPART(dw,Weekdate)),Weekdate) AS FirstofWeek
FROM dataTable
) AS Weeks
LEFT OUTER JOIN
( --detail
SELECT
--order first by presence of data, then by date, selecting Friday first:
ROW_NUMBER() OVER (PARTITION BY DATEADD(DAY,(1-DATEPART(dw,Weekdate)),Weekdate) ORDER BY CASE WHEN Data IS NOT NULL THEN 99 ELSE 0 END DESC, CASE WHEN [Day] = 6 THEN 99 ELSE [Day] END DESC) AS RowNum,
[Day],
DATEADD(DAY,(1-DATEPART(dw,Weekdate)),Weekdate) AS FirstofWeek,
Weekdate,
DataEntryDt,
Data
FROM dataTable
) AS Detail
ON Weeks.FirstofWeek = Detail.FirstofWeek
AND Detail.RowNum = 1 --get only top record for week with data present

Averages are too high when getting data from over a month

I was asked to alter a query to work with data from a given date selection instead of just the current month. The query should get the average sales per hour during that date range. It appears to work just fine when selecting one month of data, but when I try go to over a month, the averages appear to be higher than they ought to.
I think the problem may have to do with grouping by the day, since the day would be doubled up when data is over a month, but how would I go about fixing it? Thanks in advance.
DECLARE #Start DATETIME
DECLARE #End DATETIME
SET #Start = '6/15/2015'
SET #End = '8/15/2015'
SELECT TheHour, AVG(TheCount) AS SalesPerHour
FROM
(SELECT DATEPART(DAY, DateTimeCreated) AS TheDay,
DATEPART(HOUR, DateTimeCreated) AS TheHour,
COUNT(*) AS TheCount
FROM OrderHeader
WHERE Deleted = 0
AND OrderType = 1
AND BranchID = 4
AND BackOrderedFromID IS NULL
AND DateTimeCreated >= #Start
AND DateTimeCreated < #End
GROUP BY DATEPART(DAY, DateTimeCreated), DATEPART(HOUR, DateTimeCreated)) AS T
GROUP BY TheHour
ORDER BY TheHour
SAMPLE DATA for 6/15/2015 to 7/15/2015
TheHour SalesPerHour
5 2
6 5
7 6
8 5
9 4
10 4
11 2
12 2
13 3
14 2
15 2
16 1
SAMPLE DATA for 7/15/2015 to 8/15/2015
TheHour SalesPerHour
5 1
6 7
7 6
8 5
9 4
10 4
11 4
12 2
13 4
14 2
15 1
SAMPLE DATA for 6/15/2015 to 8/15/2015 (most values are too high?)
TheHour SalesPerHour
5 2
6 10
7 11
8 8
9 7
10 6
11 5
12 3
13 5
14 4
15 2
16 1
Don't use datepart(day). This gives the day of the month. When your time frame spans multiple months, datepart(day) returns the same value for different days (for instance, "1" on the first of any month).
Instead, simply cast the value to a date to remove the time component. The rest of the query remains the same:
SELECT TheHour, AVG(TheCount) AS SalesPerHour
FROM (SELECT CAST(DateTimeCreated as Date) AS TheDay,
DATEPART(HOUR, DateTimeCreated) AS TheHour,
COUNT(*) AS TheCount
FROM OrderHeader
WHERE Deleted = 0 AND OrderType = 1 AND BranchID = 4 AND
BackOrderedFromID IS NULL AND
DateTimeCreated >= #Start
DateTimeCreated < #End
GROUP BY CAST(DateTimeCreated as Date), DATEPART(HOUR, DateTimeCreated)
) dh
GROUP BY TheHour
ORDER BY TheHour;
Alternatively, you can do this without the double aggregation:
SELECT DATEPART(HOUR, DateTimeCreated) as TheHour,
(COUNT(*) * 1.0 /
COUNT(DISTINCT CAST(DateTimeCreated as Date))
) as SalesPerHour
FROM OrderHeader oh
WHERE Deleted = 0 AND OrderType = 1 AND BranchID = 4 AND
BackOrderedFromID IS NULL AND
DateTimeCreated >= #Start
DateTimeCreated < #End
GROUP BY DATEPART(HOUR, DateTimeCreated);
Also, note that AVG() of an integer value does an integer average. So, the average of 1 and 2 is 1 in SQL Server, not 1.5. In this version the query multiplies the count by 1.0 to get decimal places -- that may or may not be desirable.
To round a datetime down to it's nearest whole hour, use DATEADD and DATEDIFF together:
DECLARE #Start DATETIME
DECLARE #End DATETIME
SET #Start = '6/15/2015'
SET #End = '8/15/2015'
SELECT DATEPART(hour,RoundedHour) as Hour, AVG(TheCount) AS SalesPerHour
FROM
(SELECT DATEADD(hour,DATEDIFF(hour,0,DateTimeCreated),0) as RoundedHour,
COUNT(*) AS TheCount
FROM OrderHeader
WHERE Deleted = 0
AND OrderType = 1
AND BranchID = 4
AND BackOrderedFromID IS NULL
AND DateTimeCreated >= #Start
AND DateTimeCreated < #End
GROUP BY DATEADD(hour,DATEDIFF(hour,0,DateTimeCreated),0)) AS T
GROUP BY DATEPART(hour,RoundedHour)
ORDER BY DATEPART(hour,RoundedHour)
That way you don't have to think about all of the larger components (day, month, year) that you'd also want to group by, for larger ranges.
Since your query is using DAY as the datepart, you're effectively adding the number of sales in each hour on each day before getting the averages. For example, if a salesperson has 10 sales in the 5pm hour on Jan. 1st and 12 sales on Feb. 1st in the 5pm hour then you're going to get an intermediate value of 22 sales for "day 1". You end up averaging these over the days of each individual month, but then not over the days themselves.
You could use the DATEPART of DY (day of year) instead, but then your query would experience the same issue if you started to span years. Instead, just CAST the DATETIME as a DATE to get rid of the time portion, or even better yet, use a windowed function to get your numbers, like so:
;WITH CTE_HourBreakdown AS
(
SELECT
DATEPART(HOUR, DateTimeCreated) AS hr,
COUNT(*) OVER (PARTITION BY (YEAR(DateTimeCreated), DATEPART(DY, DateTimeCreated), DATEPART(HOUR, DateTimeCreated)) AS cnt
FROM
OrderHeader
)
SELECT
hr,
AVG(CAST(cnt AS DECIMAL(10, 2)))
FROM
CTE_HourBreakdown
GROUP BY
hr
There's likely a better way to do this with windowed functions, but this was the first thing that came to me. Also, note that if there are no sales in an hour this method does NOT average that into the results. For example, if on one day between 4pm and 5pm there are no sales and the next day there are 2 sales this will show an average of 2 sales between 4pm and 5pm instead of 1 sale on average. If you want to account for that then you'll need a method to distinguish zero-sale hours from hours when no one is working.

Oracle select sum by time window

Lets assume that we have the ORACLE table of the following format and data:
TIMESTAMP MESSAGENO ORGMESSAGE
------------------------- ---------------------- -------------------------------------
27.04.13 1 START PERIOD
27.04.13 3 10
27.04.13 4 5
28.04.13 5 6
28.04.13 3 20
29.04.13 4 25
29.04.13 5 26
30.04.13 2 END PERIOD
30.04.13 1 START PERIOD
01.05.13 3 10
02.05.13 4 15
02.05.13 5 16
03.05.13 3 30
03.05.13 4 35
04.05.13 5 36
05.05.13 2 END PERIOD
I want to select sum of all the ORGMESSAGE for all the period (window between START PERIOD and END PERIOD) grouped by MESSAGENO.
Exapmle output would be:
PERIOD START PERIOD END MESSAGENO SUM
------------ ------------- -------- ----
27.04.13 30.04.13 3 25
27.04.13 30.04.13 4 30
27.04.13 30.04.13 5 32
30.04.13 05.05.13 3 45
30.04.13 05.05.13 4 50
30.04.13 05.05.13 5 52
I am guessing that use of ORACLE Analityc function woulde be suitable but really dont know how and where to start.
Thanks in advance for any help.
If we assume that the period starts and ends match, then a simple way to find the matching messages is to count the preceding number of starts. This is a cumulative sum and it is easy in Oracle. The rest is just aggregation:
select min(timestamp) as periodstart, max(timestamp) as periodend, messageno, count(*)
from (select om.*,
sum(case when messageno = 1 then 1 else 0 end) over (order by timestamp) as grp
from orgmessages om
) om
where messageno not in (1, 2)
group by grp, messageno;
Note that this method (as with the others) really wants the timestamp to be unique on each record. In the data presented, these solutions will work. But if you have multiple starts and ends on the same day, none of them will work assuming that timestamp only has the date.
First find all period ends per period start. Then join with your table to group and sum.
select
dates.start_date,
dates.end_date,
messageno,
sum(to_number(orgmessage)) as period_sum
from mytable
join
(
select start_dates.timestmp as start_date, min(end_dates.timestmp) as end_date
from (select * from mytable where orgmessage = 'START PERIOD') start_dates
join (select * from mytable where orgmessage = 'END PERIOD') end_dates
on start_dates.timestmp < end_dates.timestmp
group by start_dates.timestmp
) dates on mytable.timestmp between dates.start_date and dates.end_date
where mytable.orgmessage not like '%PERIOD%'
group by dates.start_date, dates.end_date, messageno
order by dates.start_date, dates.end_date, messageno;
SQL fiddle: http://www.sqlfiddle.com/#!4/365de/15.
please, try this one, replace rrr with your table name
select periodstart, periodend, messageno, sum(to_number(orgmessage)) s
from (select TIMESTAMP periodstart,
(select min (TIMESTAMP) from rrr r2 where orgmessage = 'END PERIOD' and r2.TIMESTAMP > r.TIMESTAMP) periodend
from rrr r
where orgmessage = 'START PERIOD'
) borders, rrr r
where r.TIMESTAMP between borders.periodstart and borders.periodend
and r.orgmessage not in ('END PERIOD', 'START PERIOD')
group by periodstart, periodend, messageno
order by periodstart, periodend, messageno

SQL Count Numbers of Projects started Each Day Between Two Dates

I have this table that I need count how many projects (job) i have started each day.
job start end
1 01-01-2013 04-01-2013
2 01-01-2013 02-01-2013
3 01-01-2013 03-01-2013
4 03-01-2013 04-01-2013
5 03-01-2013 04-01-2013
6 03-01-2013 04-01-2013
...
i want count how many job's i have started each day.. / i mean how many job's are open each day..
date count
01-01-2013 3
02-01-2013 3
03-01-2013 5
04-01-2013 4
05-01-2013 0
...
select start, count(*) as jobs_per_day
from your_table
group by start
But this will not return a record for dates where you did not create any job.
The following works for me in postgresql
with dates as (
select aday::date
from generate_series((select min(start) from your_table),
(select max(end) from your_table),
'1 day'::interval) aday
), flat as (
select *
from dates, your_table
where dates.aday between your_table.start and your_table.end
)
select
aday,
count(*) as count
from flat
group by aday
order by aday
;
The first CTE generates a series of dates, which might have to be done differently in another RDBMS.
select start as date, count(*) as count from table_name
where start_date>="your start date" and end_date<="your end date"
group by start;