I want to take the average of multiple rows of table and insert them into a view:
My table:
Date(datetime)
Period(int) -- Half-hour
Price(decimal(18,2)
Sample data:
Date | Period | Price
---------+--------+---------
07-12-17 | 47 | 10
07-12-17 | 48 | 20
07-12-17 | 1 | 30
07-12-17 | 2 | 40
07-12-17 | 3 | 50
07-12-17 | 4 | 60
07-12-17 | 5 | 70
07-12-17 | 6 | 80
07-12-17 | 7 | 10
07-12-17 | 8 | 10
07-12-17 | 9 | 10
07-12-17 | 10 | 10
07-12-17 | 11 | 20
07-12-17 | 12 | 20
07-12-17 | 13 | 20
07-12-17 | 14 | 20
My periods are half hours go from 1-48 and I want to take the average price of intervals of 8 periods and assign that price to all 8 periods in the set. Because of the time difference, I start with half-hour 47 instead of 1:
The intervals:
[48-6], [7-14], [15-22], [23-30], [31-38], [39-46]
I want the resulting view to look like:
Date | Period | Price
---------+--------+---------
07-12-17 | 47 | 45
07-12-17 | 48 | 45
07-12-17 | 1 | 45
07-12-17 | 2 | 45
07-12-17 | 3 | 45
07-12-17 | 4 | 45
07-12-17 | 5 | 45
07-12-17 | 6 | 45
07-12-17 | 7 | 15
07-12-17 | 8 | 15
07-12-17 | 9 | 15
07-12-17 | 10 | 15
07-12-17 | 11 | 15
07-12-17 | 12 | 15
07-12-17 | 13 | 15
07-12-17 | 14 | 15
I have not been able to come up with a complete query, but I am thinking it has to be GROUP BY and maybe with a HAVING Statement.
Hope you can help!
It can be done very easily and efficiently using window functions.
Sample data
DECLARE #T TABLE (dt datetime2(0), Period int, Price money);
INSERT INTO #T VALUES
('2017-12-06', 39, 90),
('2017-12-06', 40, 90),
('2017-12-06', 41, 90),
('2017-12-06', 42, 90),
('2017-12-06', 43, 90),
('2017-12-06', 44, 90),
('2017-12-06', 45, 90),
('2017-12-06', 46, 90),
('2017-12-06', 47, 10),
('2017-12-06', 48, 20),
('2017-12-07', 1, 30),
('2017-12-07', 2, 40),
('2017-12-07', 3, 50),
('2017-12-07', 4, 60),
('2017-12-07', 5, 70),
('2017-12-07', 6, 80),
('2017-12-07', 7, 10),
('2017-12-07', 8, 10),
('2017-12-07', 9, 10),
('2017-12-07', 10, 10),
('2017-12-07', 11, 20),
('2017-12-07', 12, 20),
('2017-12-07', 13, 20),
('2017-12-07', 14, 20),
('2017-12-07', 15, 40),
('2017-12-07', 16, 40),
('2017-12-07', 17, 40),
('2017-12-07', 18, 40),
('2017-12-07', 19, 30),
('2017-12-07', 20, 30),
('2017-12-07', 21, 30),
('2017-12-07', 22, 30);
Query
SELECT
dt
,Period
,Price
,DATEADD(minute, (Period-1)*30 + 60, dt) as d2
,DATEDIFF(hour, '2001-01-01', DATEADD(minute, (Period-1)*30 + 60, dt)) / 4 as d3
,AVG(Price) OVER (PARTITION BY
DATEDIFF(hour, '2001-01-01', DATEADD(minute, (Period-1)*30 + 60, dt)) / 4) AS AvgPrice
FROM #T AS MyTable
ORDER BY dt, Period;
I included intermediary results d2 and d3 in the output to help understand the formula in PARTITION BY.
Result
+---------------------+--------+-------+---------------------+-------+----------+
| dt | Period | Price | d2 | d3 | AvgPrice |
+---------------------+--------+-------+---------------------+-------+----------+
| 2017-12-06 00:00:00 | 39 | 90.00 | 2017-12-06 20:00:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 40 | 90.00 | 2017-12-06 20:30:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 41 | 90.00 | 2017-12-06 21:00:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 42 | 90.00 | 2017-12-06 21:30:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 43 | 90.00 | 2017-12-06 22:00:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 44 | 90.00 | 2017-12-06 22:30:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 45 | 90.00 | 2017-12-06 23:00:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 46 | 90.00 | 2017-12-06 23:30:00 | 37103 | 90.00 |
| 2017-12-06 00:00:00 | 47 | 10.00 | 2017-12-07 00:00:00 | 37104 | 45.00 |
| 2017-12-06 00:00:00 | 48 | 20.00 | 2017-12-07 00:30:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 1 | 30.00 | 2017-12-07 01:00:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 2 | 40.00 | 2017-12-07 01:30:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 3 | 50.00 | 2017-12-07 02:00:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 4 | 60.00 | 2017-12-07 02:30:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 5 | 70.00 | 2017-12-07 03:00:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 6 | 80.00 | 2017-12-07 03:30:00 | 37104 | 45.00 |
| 2017-12-07 00:00:00 | 7 | 10.00 | 2017-12-07 04:00:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 8 | 10.00 | 2017-12-07 04:30:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 9 | 10.00 | 2017-12-07 05:00:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 10 | 10.00 | 2017-12-07 05:30:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 11 | 20.00 | 2017-12-07 06:00:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 12 | 20.00 | 2017-12-07 06:30:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 13 | 20.00 | 2017-12-07 07:00:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 14 | 20.00 | 2017-12-07 07:30:00 | 37105 | 15.00 |
| 2017-12-07 00:00:00 | 15 | 40.00 | 2017-12-07 08:00:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 16 | 40.00 | 2017-12-07 08:30:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 17 | 40.00 | 2017-12-07 09:00:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 18 | 40.00 | 2017-12-07 09:30:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 19 | 30.00 | 2017-12-07 10:00:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 20 | 30.00 | 2017-12-07 10:30:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 21 | 30.00 | 2017-12-07 11:00:00 | 37106 | 35.00 |
| 2017-12-07 00:00:00 | 22 | 30.00 | 2017-12-07 11:30:00 | 37106 | 35.00 |
+---------------------+--------+-------+---------------------+-------+----------+
The code sample below should get you close to where you need to be. I was not sure from your question how you planned on setting up your groups. You will definitely want to use a cross apply if you are using a view. This can be done even more simply with a stored procedure. For more info on Cross Apply, see section L, here https://learn.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql.
CREATE VIEW MyTableAveragePrices
AS
SELECT mt.Date, mt.Period, mtAverageGroup.AveragePrice,
CASE
WHEN mtAvg.Period IN () THEN 'Period Group 1'
WHEN mtAvg.Period IN () THEN 'Period Group 2'
END AS PeriodGroup
FROM MyTable mt
CROSS APPLY
(
SELECT
AVERAGE(mtAvg.Price) AveragePrice,
CASE
WHEN mtAvg.Period IN () THEN 'Period Group 1'
WHEN mtAvg.Period IN () THEN 'Period Group 2'
END AS PeriodGroup
FROM MyTable mtAvg
WHERE
mt.PeriodGroup = PeriodGroup
GROUP BY
CASE
WHEN mtAvg.Period IN () THEN 'Period Group 1'
WHEN mtAvg.Period IN () THEN 'Period Group 2'
END
) mtAverageGroup
GO
Related
I divided the month into four weeks and printed the amount for each week. How do I set this up with a loop for 12 months?
declare
cursor c is
select varis_tar, tutar
from muhasebe.doviz_takip
where trunc(varis_tar) BETWEEN TO_DATE('01/10/2021', 'DD/MM/YYYY') AND
TO_DATE('31/10/2021', 'DD/MM/YYYY')
group by varis_tar,tutar;
tutar1 number(13,2):=0;
tutar2 number(13,2):=0;
tutar3 number(13,2):=0;
tutar4 number(13,2):=0;
begin
for r in c loop
if r.varis_tar between TO_DATE('01/10/2021', 'DD/MM/YYYY') AND
TO_DATE('07/10/2021', 'DD/MM/YYYY') then
tutar1:=(r.tutar)+tutar1;
--message(r.tutar);
elsif r.varis_tar between TO_DATE('07/10/2021', 'DD/MM/YYYY') AND
TO_DATE('14/10/2021', 'DD/MM/YYYY') then
tutar2:=(r.tutar)+tutar2;
--message(r.tutar);
elsif r.varis_tar between TO_DATE('14/10/2021', 'DD/MM/YYYY') AND
TO_DATE('21/10/2021', 'DD/MM/YYYY') then
tutar3:=(r.tutar)+tutar3;
--message(r.tutar);
elsif r.varis_tar between TO_DATE('21/10/2021', 'DD/MM/YYYY') AND
TO_DATE('31/10/2021', 'DD/MM/YYYY') then
tutar4:=(r.tutar)+tutar4;
--message(r.tutar);
end if;
end loop;
I tried to get the dates the same way for all the months. I tried that, but it worked wrong.
where trunc(varis_tar) BETWEEN TO_DATE('1', 'DD') AND
TO_DATE('31', 'DD')
if r.varis_tar between TO_DATE('1', 'DD') AND
TO_DATE('07', 'DD') then
elsif r.varis_tar between TO_DATE('7', 'DD') AND
TO_DATE('14', 'DD') then
elsif r.varis_tar between TO_DATE('14', 'DD') AND
TO_DATE('21', 'DD') then
elsif r.varis_tar between TO_DATE('21', 'DD') AND
TO_DATE('31', 'DD') then
I don't know if I'am understanding it correctly but:
try if extract(day from varis_tar) between 1 and 7
or more complex
l_week := to_char(varis_tar,'W'); --week number
if l_week = 1 then --first week
elsif l_week = 2 etc...
Your code has several issues:
date in Oracle is actually a datetime, so between will not count any time after the midnight of the upper boundary.
you count the midnight of the week's end twice: in current week and in the next week (between includes both boundaries).
you do not need any PL/SQL and especially a cursor loop, because it occupy resources during calculation outside of SQL context.
Use datetime format to calculate weeks, because it is easy to read and understand. Then group by corresponding components.
with a as (
select
date '2021-01-01' - 1 + level as dt
, level as val
from dual
connect by level < 400
)
, b as (
select
dt
, val
/*Map 29, 30 and 31 to 28*/
, to_char(
least(dt, trunc(dt, 'mm') + 27)
, 'yyyymmw'
) as w
from a
)
select
substr(w, 1, 4) as y
, substr(w, 5, 2) as m
, substr(w, -1) as w
, sum(val) as val
, min(dt) as dt_from
, max(dt) as dt_to
from b
group by
w
Y | M | W | VAL | DT_FROM | DT_TO
:--- | :- | :- | ---: | :--------- | :---------
2021 | 01 | 1 | 28 | 2021-01-01 | 2021-01-07
2021 | 01 | 2 | 77 | 2021-01-08 | 2021-01-14
2021 | 01 | 3 | 126 | 2021-01-15 | 2021-01-21
2021 | 01 | 4 | 265 | 2021-01-22 | 2021-01-31
2021 | 02 | 1 | 245 | 2021-02-01 | 2021-02-07
2021 | 02 | 2 | 294 | 2021-02-08 | 2021-02-14
2021 | 02 | 3 | 343 | 2021-02-15 | 2021-02-21
2021 | 02 | 4 | 392 | 2021-02-22 | 2021-02-28
2021 | 03 | 1 | 441 | 2021-03-01 | 2021-03-07
2021 | 03 | 2 | 490 | 2021-03-08 | 2021-03-14
2021 | 03 | 3 | 539 | 2021-03-15 | 2021-03-21
2021 | 03 | 4 | 855 | 2021-03-22 | 2021-03-31
2021 | 04 | 1 | 658 | 2021-04-01 | 2021-04-07
2021 | 04 | 2 | 707 | 2021-04-08 | 2021-04-14
2021 | 04 | 3 | 756 | 2021-04-15 | 2021-04-21
2021 | 04 | 4 | 1044 | 2021-04-22 | 2021-04-30
2021 | 05 | 1 | 868 | 2021-05-01 | 2021-05-07
2021 | 05 | 2 | 917 | 2021-05-08 | 2021-05-14
2021 | 05 | 3 | 966 | 2021-05-15 | 2021-05-21
2021 | 05 | 4 | 1465 | 2021-05-22 | 2021-05-31
2021 | 06 | 1 | 1085 | 2021-06-01 | 2021-06-07
2021 | 06 | 2 | 1134 | 2021-06-08 | 2021-06-14
2021 | 06 | 3 | 1183 | 2021-06-15 | 2021-06-21
2021 | 06 | 4 | 1593 | 2021-06-22 | 2021-06-30
2021 | 07 | 1 | 1295 | 2021-07-01 | 2021-07-07
2021 | 07 | 2 | 1344 | 2021-07-08 | 2021-07-14
2021 | 07 | 3 | 1393 | 2021-07-15 | 2021-07-21
2021 | 07 | 4 | 2075 | 2021-07-22 | 2021-07-31
2021 | 08 | 1 | 1512 | 2021-08-01 | 2021-08-07
2021 | 08 | 2 | 1561 | 2021-08-08 | 2021-08-14
2021 | 08 | 3 | 1610 | 2021-08-15 | 2021-08-21
2021 | 08 | 4 | 2385 | 2021-08-22 | 2021-08-31
2021 | 09 | 1 | 1729 | 2021-09-01 | 2021-09-07
2021 | 09 | 2 | 1778 | 2021-09-08 | 2021-09-14
2021 | 09 | 3 | 1827 | 2021-09-15 | 2021-09-21
2021 | 09 | 4 | 2421 | 2021-09-22 | 2021-09-30
2021 | 10 | 1 | 1939 | 2021-10-01 | 2021-10-07
2021 | 10 | 2 | 1988 | 2021-10-08 | 2021-10-14
2021 | 10 | 3 | 2037 | 2021-10-15 | 2021-10-21
2021 | 10 | 4 | 2995 | 2021-10-22 | 2021-10-31
2021 | 11 | 1 | 2156 | 2021-11-01 | 2021-11-07
2021 | 11 | 2 | 2205 | 2021-11-08 | 2021-11-14
2021 | 11 | 3 | 2254 | 2021-11-15 | 2021-11-21
2021 | 11 | 4 | 2970 | 2021-11-22 | 2021-11-30
2021 | 12 | 1 | 2366 | 2021-12-01 | 2021-12-07
2021 | 12 | 2 | 2415 | 2021-12-08 | 2021-12-14
2021 | 12 | 3 | 2464 | 2021-12-15 | 2021-12-21
2021 | 12 | 4 | 3605 | 2021-12-22 | 2021-12-31
2022 | 01 | 1 | 2583 | 2022-01-01 | 2022-01-07
2022 | 01 | 2 | 2632 | 2022-01-08 | 2022-01-14
2022 | 01 | 3 | 2681 | 2022-01-15 | 2022-01-21
2022 | 01 | 4 | 3915 | 2022-01-22 | 2022-01-31
2022 | 02 | 1 | 1194 | 2022-02-01 | 2022-02-03
db<>fiddle here
Or the same in columns:
with a as (
select
date '2021-01-01' - 1 + level as dt
, level as val
from dual
connect by level < 400
)
, b as (
select
val
/*Map 29, 30 and 31 to 28*/
, to_char(dt, 'yyyymm') as m
, to_char(
least(dt, trunc(dt, 'mm') + 27)
, 'w'
) as w
from a
)
select
substr(m, 1, 4) as y
, substr(m, 5, 2) as m
, tutar1
, tutar2
, tutar3
, tutar4
from b
pivot(
sum(val)
for w in (
1 as tutar1, 2 as tutar2
, 3 as tutar3, 4 as tutar4
)
)
Y | M | TUTAR1 | TUTAR2 | TUTAR3 | TUTAR4
:--- | :- | -----: | -----: | -----: | -----:
2021 | 01 | 28 | 77 | 126 | 265
2021 | 02 | 245 | 294 | 343 | 392
2021 | 03 | 441 | 490 | 539 | 855
2021 | 04 | 658 | 707 | 756 | 1044
2021 | 05 | 868 | 917 | 966 | 1465
2021 | 06 | 1085 | 1134 | 1183 | 1593
2021 | 07 | 1295 | 1344 | 1393 | 2075
2021 | 08 | 1512 | 1561 | 1610 | 2385
2021 | 09 | 1729 | 1778 | 1827 | 2421
2021 | 10 | 1939 | 1988 | 2037 | 2995
2021 | 11 | 2156 | 2205 | 2254 | 2970
2021 | 12 | 2366 | 2415 | 2464 | 3605
2022 | 01 | 2583 | 2632 | 2681 | 3915
2022 | 02 | 1194 | null | null | null
db<>fiddle here
Problem
I am having trouble figuring out how to create a query that can tell if any userentry is preceded by 7 days without any activity (secondsPlayed == 0) and if so, then indicate it with the value of 1, otherwise 0.
This also means that if the user has less than 7 entries, the value will be 0 across all entries.
Input table:
+------------------------------+-------------------------+---------------+
| userid | estimationDate | secondsPlayed |
+------------------------------+-------------------------+---------------+
| a | 2016-07-14 00:00:00 UTC | 192.5 |
| a | 2016-07-15 00:00:00 UTC | 357.3 |
| a | 2016-07-16 00:00:00 UTC | 0 |
| a | 2016-07-17 00:00:00 UTC | 0 |
| a | 2016-07-18 00:00:00 UTC | 0 |
| a | 2016-07-19 00:00:00 UTC | 0 |
| a | 2016-07-20 00:00:00 UTC | 0 |
| a | 2016-07-21 00:00:00 UTC | 0 |
| a | 2016-07-22 00:00:00 UTC | 0 |
| a | 2016-07-23 00:00:00 UTC | 0 |
| a | 2016-07-24 00:00:00 UTC | 0 |
| ---------------------------- | ---------------------- | ---- |
| b | 2016-07-02 00:00:00 UTC | 31.2 |
| b | 2016-07-03 00:00:00 UTC | 42.1 |
| b | 2016-07-04 00:00:00 UTC | 41.9 |
| b | 2016-07-05 00:00:00 UTC | 43.2 |
| b | 2016-07-06 00:00:00 UTC | 91.5 |
| b | 2016-07-07 00:00:00 UTC | 0 |
| b | 2016-07-08 00:00:00 UTC | 0 |
| b | 2016-07-09 00:00:00 UTC | 239.1 |
| b | 2016-07-10 00:00:00 UTC | 0 |
+------------------------------+-------------------------+---------------+
The intended output table should look like this:
Output table:
+------------------------------+-------------------------+---------------+----------+
| userid | estimationDate | secondsPlayed | inactive |
+------------------------------+-------------------------+---------------+----------+
| a | 2016-07-14 00:00:00 UTC | 192.5 | 0 |
| a | 2016-07-15 00:00:00 UTC | 357.3 | 0 |
| a | 2016-07-16 00:00:00 UTC | 0 | 0 |
| a | 2016-07-17 00:00:00 UTC | 0 | 0 |
| a | 2016-07-18 00:00:00 UTC | 0 | 0 |
| a | 2016-07-19 00:00:00 UTC | 0 | 0 |
| a | 2016-07-20 00:00:00 UTC | 0 | 0 |
| a | 2016-07-21 00:00:00 UTC | 0 | 0 |
| a | 2016-07-22 00:00:00 UTC | 0 | 1 |
| a | 2016-07-23 00:00:00 UTC | 0 | 1 |
| a | 2016-07-24 00:00:00 UTC | 0 | 1 |
| ---------------------------- | ----------------------- | ----- | ----- |
| b | 2016-07-02 00:00:00 UTC | 31.2 | 0 |
| b | 2016-07-03 00:00:00 UTC | 42.1 | 0 |
| b | 2016-07-04 00:00:00 UTC | 41.9 | 0 |
| b | 2016-07-05 00:00:00 UTC | 43.2 | 0 |
| b | 2016-07-06 00:00:00 UTC | 91.5 | 0 |
| b | 2016-07-07 00:00:00 UTC | 0 | 0 |
| b | 2016-07-08 00:00:00 UTC | 0 | 0 |
| b | 2016-07-09 00:00:00 UTC | 239.1 | 0 |
| b | 2016-07-10 00:00:00 UTC | 0 | 0 |
+------------------------------+-------------------------+---------------+----------+
Thoughts
At first I was thinking about using the Lag function with a 7 offset, but this would obviously not relate to any of the subjects in between.
I was also thinking about creating a rolling window/average for a period of 7 days and evaluating if this is above 0. However this might be a bit above my skill level.
Any chance anyone has a good solution to this problem.
Assuming that you have data every day (which seems like a reasonable assumption), you can sum a window function:
select t.*,
(case when sum(secondsplayed) over (partition by userid
order by estimationdate
rows between 6 preceding and current row
) = 0 and
row_number() over (partition by userid order by estimationdate) >= 7
then 1
else 0
end) as inactive
from t;
In addition to no holes in the dates, this also assumes that secondsplayed is never negative. (Negative values can easily be incorporated into the logic, but that seems unnecessary.)
In my experience this type of input tables do not consist of inactivity entries and usually look like this (only activity entries are present here)
Input table:
+------------------------------+-------------------------+---------------+
| userid | estimationDate | secondsPlayed |
+------------------------------+-------------------------+---------------+
| a | 2016-07-14 00:00:00 UTC | 192.5 |
| a | 2016-07-15 00:00:00 UTC | 357.3 |
| ---------------------------- | ---------------------- | ---- |
| b | 2016-07-02 00:00:00 UTC | 31.2 |
| b | 2016-07-03 00:00:00 UTC | 42.1 |
| b | 2016-07-04 00:00:00 UTC | 41.9 |
| b | 2016-07-05 00:00:00 UTC | 43.2 |
| b | 2016-07-06 00:00:00 UTC | 91.5 |
| b | 2016-07-09 00:00:00 UTC | 239.1 |
+------------------------------+-------------------------+---------------+
So, below is for BigQuery Standard SQL and input as above
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' userid, TIMESTAMP '2016-07-14 00:00:00 UTC' estimationDate, 192.5 secondsPlayed UNION ALL
SELECT 'a', '2016-07-15 00:00:00 UTC', 357.3 UNION ALL
SELECT 'b', '2016-07-02 00:00:00 UTC', 31.2 UNION ALL
SELECT 'b', '2016-07-03 00:00:00 UTC', 42.1 UNION ALL
SELECT 'b', '2016-07-04 00:00:00 UTC', 41.9 UNION ALL
SELECT 'b', '2016-07-05 00:00:00 UTC', 43.2 UNION ALL
SELECT 'b', '2016-07-06 00:00:00 UTC', 91.5 UNION ALL
SELECT 'b', '2016-07-09 00:00:00 UTC', 239.1
), time_frame AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2016-07-02', '2016-07-24')) day
)
SELECT
users.userid,
day,
IFNULL(secondsPlayed, 0) secondsPlayed,
CAST(1 - SIGN(SUM(IFNULL(secondsPlayed, 0))
OVER(
PARTITION BY users.userid
ORDER BY UNIX_DATE(day)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
)) AS INT64) AS inactive
FROM time_frame tf
CROSS JOIN (SELECT DISTINCT userid FROM `project.dataset.table`) users
LEFT JOIN `project.dataset.table` t
ON day = DATE(estimationDate) AND users.userid = t.userid
ORDER BY userid, day
with result
Row userid day secondsPlayed inactive
...
13 a 2016-07-14 192.5 0
14 a 2016-07-15 357.3 0
15 a 2016-07-15 357.3 0
16 a 2016-07-16 0.0 0
17 a 2016-07-17 0.0 0
18 a 2016-07-18 0.0 0
19 a 2016-07-19 0.0 0
20 a 2016-07-20 0.0 0
21 a 2016-07-21 0.0 0
22 a 2016-07-22 0.0 1
23 a 2016-07-23 0.0 1
24 a 2016-07-24 0.0 1
25 b 2016-07-02 31.2 0
26 b 2016-07-03 42.1 0
27 b 2016-07-04 41.9 0
28 b 2016-07-05 43.2 0
29 b 2016-07-06 91.5 0
30 b 2016-07-07 0.0 0
31 b 2016-07-08 0.0 0
32 b 2016-07-09 239.1 0
33 b 2016-07-10 0.0 0
...
I have a Hive table with some data and i would like to split it in to 15 minutes intervals et return the total call duration for every interval
Hive Table example :
ID Start End Total Duration
1 1502296261 1502325061 28800
My output should be shown as :
ID Interval Duration
1 2017-08-09 18:30:00 839
1 2017-08-09 18:45:00 900
1 2017-08-09 19:00:00 900
...
1 2017-08-10 02:15:00 900
1 2017-08-10 02:30:00 61
What is the best solution to do that in a efficient way ?
Thanks.
This is the basic solution.
The displayed timestamp (Interval) depends on your system timezone.
with t as (select stack(1,1,1502296261,1502325061) as (`ID`,`Start`,`End`))
select t.`ID` as `ID`
,from_unixtime((t.`Start` div (15*60) + pe.pos)*(15*60)) as `Interval`
, case
when pe.pos = t.`End` div (15*60) - t.`Start` div (15*60)
then t.`End`
else (t.`Start` div (15*60) + pe.pos + 1)*(15*60)
end
- case
when pe.pos = 0
then t.`Start`
else (t.`Start` div (15*60) + pe.pos)*(15*60)
end as `Duration`
from t
lateral view
posexplode(split(space(int(t.`End` div (15*60) - t.`Start` div (15*60))),' ')) pe
;
+----+---------------------+----------+
| id | interval | duration |
+----+---------------------+----------+
| 1 | 2017-08-09 09:30:00 | 839 |
| 1 | 2017-08-09 09:45:00 | 900 |
| 1 | 2017-08-09 10:00:00 | 900 |
| 1 | 2017-08-09 10:15:00 | 900 |
| 1 | 2017-08-09 10:30:00 | 900 |
| 1 | 2017-08-09 10:45:00 | 900 |
| 1 | 2017-08-09 11:00:00 | 900 |
| 1 | 2017-08-09 11:15:00 | 900 |
| 1 | 2017-08-09 11:30:00 | 900 |
| 1 | 2017-08-09 11:45:00 | 900 |
| 1 | 2017-08-09 12:00:00 | 900 |
| 1 | 2017-08-09 12:15:00 | 900 |
| 1 | 2017-08-09 12:30:00 | 900 |
| 1 | 2017-08-09 12:45:00 | 900 |
| 1 | 2017-08-09 13:00:00 | 900 |
| 1 | 2017-08-09 13:15:00 | 900 |
| 1 | 2017-08-09 13:30:00 | 900 |
| 1 | 2017-08-09 13:45:00 | 900 |
| 1 | 2017-08-09 14:00:00 | 900 |
| 1 | 2017-08-09 14:15:00 | 900 |
| 1 | 2017-08-09 14:30:00 | 900 |
| 1 | 2017-08-09 14:45:00 | 900 |
| 1 | 2017-08-09 15:00:00 | 900 |
| 1 | 2017-08-09 15:15:00 | 900 |
| 1 | 2017-08-09 15:30:00 | 900 |
| 1 | 2017-08-09 15:45:00 | 900 |
| 1 | 2017-08-09 16:00:00 | 900 |
| 1 | 2017-08-09 16:15:00 | 900 |
| 1 | 2017-08-09 16:30:00 | 900 |
| 1 | 2017-08-09 16:45:00 | 900 |
| 1 | 2017-08-09 17:00:00 | 900 |
| 1 | 2017-08-09 17:15:00 | 900 |
| 1 | 2017-08-09 17:30:00 | 61 |
+----+---------------------+----------+
I need a query to group an aggregate in one table by date ranges in another table.
Table 1
weeknumber | weekyear | weekstart | weekend
------------+----------+------------+------------
18 | 2016 | 2016-02-01 | 2016-02-08
19 | 2016 | 2016-02-08 | 2016-02-15
20 | 2016 | 2016-02-15 | 2016-02-22
21 | 2016 | 2016-02-22 | 2016-02-29
22 | 2016 | 2016-02-29 | 2016-03-07
23 | 2016 | 2016-03-07 | 2016-03-14
24 | 2016 | 2016-03-14 | 2016-03-21
25 | 2016 | 2016-03-21 | 2016-03-28
26 | 2016 | 2016-03-28 | 2016-04-04
27 | 2016 | 2016-04-04 | 2016-04-11
28 | 2016 | 2016-04-11 | 2016-04-18
29 | 2016 | 2016-04-18 | 2016-04-25
30 | 2016 | 2016-04-25 | 2016-05-02
31 | 2016 | 2016-05-02 | 2016-05-09
32 | 2016 | 2016-05-09 | 2016-05-16
33 | 2016 | 2016-05-16 | 2016-05-23
34 | 2016 | 2016-05-23 | 2016-05-30
35 | 2016 | 2016-05-30 | 2016-06-06
36 | 2016 | 2016-06-06 | 2016-06-13
37 | 2016 | 2016-06-13 | 2016-06-20
38 | 2016 | 2016-06-20 | 2016-06-27
39 | 2016 | 2016-06-27 | 2016-07-04
40 | 2016 | 2016-07-04 | 2016-07-11
41 | 2016 | 2016-07-11 | 2016-07-18
42 | 2016 | 2016-07-18 | 2016-07-25
43 | 2016 | 2016-07-25 | 2016-08-01
44 | 2016 | 2016-08-01 | 2016-08-08
45 | 2016 | 2016-08-08 | 2016-08-15
46 | 2016 | 2016-08-15 | 2016-08-22
47 | 2016 | 2016-08-22 | 2016-08-29
48 | 2016 | 2016-08-29 | 2016-09-05
49 | 2016 | 2016-09-05 | 2016-09-12
Table 2
accountid | rdate | fee1 | fee2 | fee3 | fee4
-----------+------------+------+------+------+------
481164 | 2015-12-22 | 8 | 1 | 5 | 1
481164 | 2002-12-22 | 1 | 0 | 0 | 0
481166 | 2015-12-22 | 1 | 0 | 0 | 0
481166 | 2016-10-20 | 14 | 0 | 0 | 0
481166 | 2016-10-02 | 5 | 0 | 0 | 0
481166 | 2016-01-06 | 18 | 4 | 0 | 5
482136 | 2016-07-04 | 18 | 0 | 0 | 0
481164 | 2016-07-04 | 2 | 3 | 4 | 5
481164 | 2016-06-28 | 34 | 0 | 0 | 0
481166 | 2016-07-20 | 50 | 0 | 0 | 69
481166 | 2016-07-13 | 16 | 0 | 0 | 5
481166 | 2016-09-15 | 8 | 0 | 0 | 2
481166 | 2016-10-03 | 8 | 0 | 0 | 0
I need to aggregate fee1+fee2+fee3+fee4 for rdates in each date range(weekstart,weekend) in table 1 and then group by accountid. Something like this:
accountid | weekstart | weekend | SUM
-----------+------------+------------+------
481164 | 2016-02-01 | 2016-02-08 | 69
481166 | 2016-02-01 | 2016-02-08 | 44
481164 | 2016-02-08 | 2016-02-15 | 22
481166 | 2016-02-08 | 2016-02-15 | 12
select accountid, weekstart, weekend,
sum(fee1 + fee2 + fee3 + fee4) as total_fee
from table2
inner join table1 on table2.rdate >= table1.weekstart and table2.rdate < table1.weekend
group by accountid, weekstart, weekend;
Just a thing:
weeknumber | weekyear | weekstart | weekend
------------+----------+------------+------------
18 | 2016 | 2016-02-01 | 2016-02-08
19 | 2016 | 2016-02-08 | 2016-02-15
weekend for week 18 should be 2016-02-07, because 2016-02-08 is weekstart for week 19.
weeknumber | weekyear | weekstart | weekend
------------+----------+------------+------------
18 | 2016 | 2016-02-01 | 2016-02-07
19 | 2016 | 2016-02-08 | 2016-02-14
Check it here: http://rextester.com/NCBO56250
So my data looks like this:
+-----------+---------+-------------+-------+-------------+--+
| time | Outlets | Meal_Period | cover | day_of_week | |
+-----------+---------+-------------+-------+-------------+--+
| 10/1/2013 | 72 | 1 | 0 | Tuesday | |
| 10/1/2013 | 72 | 2 | 31 | Tuesday | |
| 10/1/2013 | 72 | 3 | 116 | Tuesday | |
| 10/1/2013 | 72 | 6 | 32 | Tuesday | |
| 10/1/2013 | 187 | 17 | 121 | Tuesday | |
| 10/1/2013 | 187 | 18 | 214 | Tuesday | |
| 10/1/2013 | 187 | 19 | 204 | Tuesday | |
| 10/1/2013 | 101 | 2 | 0 | Tuesday | |
| 10/1/2013 | 101 | 3 | 0 | Tuesday | |
| 10/1/2013 | 101 | 4 | 0 | Tuesday | |
| 10/1/2013 | 101 | 6 | 0 | Tuesday | |
| 10/1/2013 | 282 | 1 | 17 | Tuesday | |
| 10/1/2013 | 282 | 2 | 207 | Tuesday | |
| 10/1/2013 | 282 | 3 | 340 | Tuesday | |
| 10/1/2013 | 282 | 6 | 4 | Tuesday | |
| 10/1/2013 | 103 | 1 | 0 | Tuesday | |
+-----------+---------+-------------+-------+-------------+--+
The code is:
IF OBJECT_ID('tempdb.dbo.#time') IS NOT NULL
DROP TABLE #time
SELECT DATEADD(dd, 0, DATEDIFF(DD, 0, open_dttime)) AS 'time'
,profit_center_id AS 'Outlets'
,meal_period_id AS 'Meal_Period'
,sum(num_covers) AS 'Number_Covers'
INTO #time
FROM [STOF_Infogen].[dbo].[Order_Header]
WHERE CasinoID = 'csg'
AND profit_center_id IN (
'102'
,'100'
,'283'
,'101'
,'282'
,'187'
,'280'
,'103'
,'281'
,'72'
,'183'
)
AND (
open_dttime BETWEEN '2014-02-01 06:30'
AND '2014-03-01 06:30'
)
GROUP BY profit_center_id
,open_dttime
,meal_period_id
ORDER BY profit_center_id
,meal_period_id
IF OBJECT_ID('tempdb.dbo.#time2') IS NOT NULL
DROP TABLE #time2
SELECT [TIME]
,Outlets AS 'Outlets'
,meal_period AS 'Meal_Period'
,SUM(number_covers) AS 'cover'
,DATENAME(DW, [time]) AS 'day_of_week'
INTO #time2
FROM #time
GROUP BY [TIME]
,Outlets
,Meal_Period
ORDER BY [TIME] ASC
,Outlets
,Meal_Period
SELECT *
FROM #time2
I created temporary drop tables for my date but I'm having two issues;
I will like to group where the profit centres are 187 and 282 while still keeping the other rows.
for some reason I can't tweek the date stamp because it excludes the last day of the month.
As always any help is appreciated.
Making some test data:
DECLARE #MealInfo TABLE
(
MealTime DATETIME,
Outlets VARCHAR(10),
Meal_Period int,
Cover INT
)
INSERT INTO #MealInfo
VALUES
('10/1/2013', '72', 1, 0),
('10/1/2013', '72', 2, 31),
('10/1/2013', '72', 3, 116),
('10/1/2013', '72', 6, 32),
('10/1/2013', '187', 17, 121),
('10/1/2013', '187', 18, 214),
('10/1/2013', '187', 19, 204),
('10/1/2013', '101', 2, 0),
('10/1/2013', '101', 3, 0),
('10/1/2013', '101', 4, 0),
('10/1/2013', '101', 6, 0),
('10/1/2013', '282', 1, 17),
('10/1/2013', '282', 2, 207),
('10/1/2013', '282', 3, 340),
('10/1/2013', '282', 6, 4),
('10/1/2013', '103', 1, 0);
Because you want to group 187 and 282 together, I use a case statement to lump them into one outlet and then we can group on the outlets to break out the sums:
SELECT
m.MealTime,
m.Outlets,
m.Meal_Period,
SUM(m.Cover) AS Number_Covers
FROM
(
SELECT mi.MealTime,
(CASE WHEN mi.Outlets IN ('187', '282') THEN '187+282' ELSE mi.Outlets END) Outlets,
mi.Meal_Period,
mi.Cover
FROM #MealInfo mi
) m
GROUP BY m.MealTime, m.Outlets, m.Meal_Period
Here is the output:
MealTime Outlets Meal_Period Number_Covers
2013-10-01 00:00:00.000 101 2 0
2013-10-01 00:00:00.000 101 3 0
2013-10-01 00:00:00.000 101 4 0
2013-10-01 00:00:00.000 101 6 0
2013-10-01 00:00:00.000 103 1 0
2013-10-01 00:00:00.000 187+282 1 17
2013-10-01 00:00:00.000 187+282 2 207
2013-10-01 00:00:00.000 187+282 3 340
2013-10-01 00:00:00.000 187+282 6 4
2013-10-01 00:00:00.000 187+282 17 121
2013-10-01 00:00:00.000 187+282 18 214
2013-10-01 00:00:00.000 187+282 19 204
2013-10-01 00:00:00.000 72 1 0
2013-10-01 00:00:00.000 72 2 31
2013-10-01 00:00:00.000 72 3 116
2013-10-01 00:00:00.000 72 6 32
If your data had overlapping periods for 187 and 282, the sum total would contain both parts into 1 column.