I am finding myself in the position of having to formulate a (to me) rather complex SQL query and I can't seem to get my head around it.
I have a table called orders and a related table order_state_history that logs the state of those orders over time (see below).
I now need to generate a series of rows - one row per day - containing the amount of orders that were in particular states at the end of that day (see report). Also I want to consider only orders of order.type = 1.
The data resides in a PostgreSQL database. I already found out how to generate a time series using GENERATE_SERIES(DATE '2001-01-01', CURRENT_DATE, '1 DAY'::INTERVAL) days which allows me to generate rows for days on which no state changes were recorded.
My current approach is to join orders, order_state_history and the generated series of days all together and try to filter out all the rows that have DATE(order_state_history.timestamp) > DATE(days) and then somehow get the final state of each order on that day by first_value(order_state_history.new_state) OVER (PARTITION_BY(orders.id) ORDER BY order_state_history.timestamp DESC), but this is where my tiny bit of SQL experience abandons me.
I just can't wrap my head around the problem.
Can this even be solved in a single query or would I be better adviced to compute the data by some kind of intelligent script that performs one query per day?
What would be a reasonable approach to the problem?
orders===
id type
10000 1
10001 1
10002 2
10003 2
10004 1
order_state_history===
order_id index timestamp new_state
10000 1 01.01.2001 12:00 NEW
10000 2 02.01.2001 13:00 ACTIVE
10000 3 03.01.2001 14:00 DONE
10001 1 02.01.2001 13:00 NEW
10002 1 03.01.2001 14:00 NEW
10002 2 05.01.2001 10:00 ACTIVE
10002 3 05.01.2001 14:00 DONE
10003 1 07.01.2001 04:00 NEW
10004 1 05.01.2001 14:00 NEW
10004 2 10.01.2001 17:30 DONE
Expected result===
date new_orders active_orders done_orders
01.01.2001 1 0 0
02.01.2001 1 1 0
03.01.2001 1 0 1
04.01.2001 1 0 1
05.01.2001 2 0 1
06.01.2001 2 0 1
07.01.2001 2 0 1
08.01.2001 2 0 1
09.01.2001 2 0 1
10.01.2001 1 0 2
Step 1. Calculate a cumulative sum of state for each order, using values NEW = 1, ACTIVE = 1, DONE = 2:
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
order_id | day | state
----------+------------+-------
10000 | 2001-01-01 | 1
10000 | 2001-01-02 | 2
10000 | 2001-01-03 | 4
10001 | 2001-01-02 | 1
10004 | 2001-01-05 | 1
10004 | 2001-01-10 | 3
(6 rows)
Step 2. Calculate a transition matrix for each order based on states from step 1 (2 means NEW->ACTIVE, 3 means NEW->DONE, 4 means ACTIVE->DONE):
select
order_id, day, state,
case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
case when state > 2 then 1 else 0 end as done
from (
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
) s
order_id | day | state | new | active | done
----------+------------+-------+-----+--------+------
10000 | 2001-01-01 | 1 | 1 | 0 | 0
10000 | 2001-01-02 | 2 | -1 | 1 | 0
10000 | 2001-01-03 | 4 | 0 | -1 | 1
10001 | 2001-01-02 | 1 | 1 | 0 | 0
10004 | 2001-01-05 | 1 | 1 | 0 | 0
10004 | 2001-01-10 | 3 | -1 | 0 | 1
(6 rows)
Step 3. Calculate a cumulative sum of each state for a series of days:
select distinct
day::date,
sum(new) over w as new,
sum(active) over w as active,
sum(done) over w as done
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day
left join (
select
order_id, day, state,
case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
case when state > 2 then 1 else 0 end as done
from (
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
) s
) s
using(day)
window w as (order by day)
order by 1
day | new | active | done
------------+-----+--------+------
2001-01-01 | 1 | 0 | 0
2001-01-02 | 1 | 1 | 0
2001-01-03 | 1 | 0 | 1
2001-01-04 | 1 | 0 | 1
2001-01-05 | 2 | 0 | 1
2001-01-06 | 2 | 0 | 1
2001-01-07 | 2 | 0 | 1
2001-01-08 | 2 | 0 | 1
2001-01-09 | 2 | 0 | 1
2001-01-10 | 1 | 0 | 2
(10 rows)
Related
Edit. This is a follow up from another question. To simplify the question. Assume a table
date | id | type
01/01 | 1 | F
02/01 | 1 | F
02/01 | 1 | F
03/01 | 1 | S
03/01 | 1 | S
04/01 | 1 | F
04/01 | 1 | S
05/01 | 1 | S
I am looking for a way to summarise the above table by combination of transaction types per day. If a person (id) has only one transaction per day it counts as a Single type. If they have more than one it counts as a Multiple one. I've done that with my original query and it works. The output from the above table would be:
date | Single | Multiple
01/01 | 1 | 0
02/01 | 0 | 1
03/01 | 0 | 1
04/01 | 0 | 1
05/01 | 1 | 0
I got that far and it works. What's I'm struggling with (ie. don't have a clue of how to start) is how set up a query to show all possible combinations of Type (SS, FF, FS) instead of just counting the multiple transactions. The desired output would be like:
date | Single | # FF | # FS | # SS
01/01 | 1 | 0 | 0 | 0
02/01 | 0 | 1 | 0 | 0
03/01 | 0 | 0 | 0 | 1
04/01 | 0 | 0 | 1 | 0
05/01 | 1 | 0 | 0 | 0
Any constructive hints or ideas will be much appreciated.
this is assuming that you have max 2 types per date.
You can use the CASE WHEN statement with MIN() and MAX() to check for combination of FF, FS or SS
select [date],
case when count(*) = 1 then 1 else 0 end as Single,
case when count(*) >= 2
and min([type]) = 'F'
and max([type]) = 'F'
then 1
else 0
end as [# FF],
case when count(*) >= 2
and min([type]) = 'F'
and max([type]) = 'S'
then 1
else 0
end as [# FS],
case when count(*) >= 2
and min([type]) = 'S'
and max([type]) = 'S'
then 1
else 0
end as [# SS]
from yourtable
group by [date]
EDIT :
for more then 3 types, just change the count(*) = 2 to count(*) >= 2 as long as the type are either F or S
I have a table which gives information about when a particular user has used an offer. It has 3 columns
Date: Date at which the offer was used
user_id: Identifier for a particular user
txn_id: Transaction id when a user uses an offer. It is always unique in the table.
The offer is such that a particular user can use it for 5 times.
I want to know at each date the number of users are in which stage of offer usage.
For example
On Day 1 there could be 3 users who have used offer once(redemption_1), 2 users who could have used offer twice (redemption_2).
Now on Day 2 there could be users from day 1(repeat users) as well as users who are coming for offer usage for the first time(new users).
For the new users of day 2 the logic is same as that of day 1 users.(May be 2 new users use the offer for 1 time(redemption_1), 3 new users use it for 3 times(redemption_3))
But for the repeat users now I want to add up to there previous day's usage.
For example
On Day 1, 3 users had used offer once(redemption_1) but on day 2 if they use it one more time then they should be counted in redemption_2.(And not in redemption_1 since they are using it for second time since the offer has started/or there last usage)
In this way I want to go on adding cumulatively the number of time a user has used a offer and the count the number of users who have used offer for 1 time(redemption_1), 2 time(redemption_2) and so on for each date
Table
+------------+---------+------------+
| Date | user_id | txn_id |
+------------+---------+------------+
| 2019-06-04 | 1 | 1ACSA0-ABA |
| 2019-06-04 | 2 | 1BEAA0-CSC |
| 2019-06-04 | 3 | 1AGHF0-CBA |
| 2019-06-04 | 1 | 1AVFA0-GAA |
| 2019-06-05 | 1 | 1BCFA0-AAA |
| 2019-06-05 | 1 | 1AVFB0-GAC |
| 2019-06-05 | 2 | 1AVFA0-GVA |
| 2019-06-05 | 4 | 1AVFA0-GVB |
| 2019-06-05 | 5 | 1AVFA0-BCF |
| 2019-06-06 | 6 | 1AGHF0-CCA |
| 2019-06-06 | 1 | 1BXHF0-CCA |
| 2019-06-06 | 2 | 1AGHF0-CBG |
| 2019-06-06 | 3 | 1AGHF0-CAW |
| 2019-06-06 | 2 | 1AGHF0-CTU |
+------------+---------+------------+
Desired Output
+------------+--------------+--------------+--------------+--------------+--------------+
| Date | redemption_1 | redemption_2 | redemption_3 | redemption_4 | redemption_5 |
+------------+--------------+--------------+--------------+--------------+--------------+
| 2019-06-04 | 2 | 1 | 0 | 0 | 0 |
| 2019-06-05 | 2 | 1 | 0 | 1 | 0 |
| 2019-06-06 | 1 | 1 | 0 | 1 | 1 |
+------------+--------------+--------------+--------------+--------------+--------------+
I will walk you through the rows of output for better understanding
In row one with date 2019-06-04 there are two users who used offer once (2,3) and one user who used offer twice(1)
In row with date 2019-06-05 there are 2 user who used offer once(4,5). Note that they have never used offer before that so they counted for redemption_1.
In the same row there is 1 user who has used offer 2 times (2: Once on 2019-06-04 and then on 2019-06-05) so he is counted for redemption_2
In the same row there is 1 user who has used offer 4 times (1: Twice on 2019-06-04 and then again twice on 2019-06-05) so he is counted for redemption_4
And so on for row with date 2019-06-06
Please let me know for any kind of clarification
Not a paragon of efficiency, but it works.
Test data:
Create Table offer_used(date DateTime, user_id Int, txn_id Varchar(50))
Insert Into dbo.offer_used (date,
user_id,
txn_id)
Values
('2019-06-04', 1, '1ACSA0-ABA'),
('2019-06-04', 2, '1BEAA0-CSC'),
('2019-06-04', 3, '1AGHF0-CBA'),
('2019-06-04', 1, '1AVFA0-GAA'),
('2019-06-05', 1, '1BCFA0-AAA'),
('2019-06-05', 1, '1AVFB0-GAC'),
('2019-06-05', 2, '1AVFA0-GVA'),
('2019-06-05', 4, '1AVFA0-GVB'),
('2019-06-05', 5, '1AVFA0-BCF'),
('2019-06-06', 6, '1AGHF0-CCA'),
('2019-06-06', 1, '1BXHF0-CCA'),
('2019-06-06', 2, '1AGHF0-CBG'),
('2019-06-06', 3, '1AGHF0-CAW'),
('2019-06-06', 2, '1AGHF0-CTU')
Query:
; With
Dates As (Select Distinct date From dbo.offer_used OU),
Users As (Select user_id, FirstTime = Min(date) From dbo.offer_used OU Group By user_id),
UserCounts As (Select
Dates.date,
Users.user_id,
Users.FirstTime,
UsedCount = (Select Count(*) From dbo.offer_used As Used
Where Used.date <= Dates.date
And Used.user_id = Users.user_id)
From
Dates
Cross Join Users)
Select
date = UserCounts.date,
[first time today] = Sum(Case When UserCounts.date = UserCounts.FirstTime
And UserCounts.UsedCount = 1 Then 1 Else 0 End),
[2 times total] = Sum(Case When UserCounts.UsedCount = 2 Then 1 Else 0 End),
[3 times total] = Sum(Case When UserCounts.UsedCount = 3 Then 1 Else 0 End),
[4 times total] = Sum(Case When UserCounts.UsedCount = 4 Then 1 Else 0 End),
[5 times total] = Sum(Case When UserCounts.UsedCount = 5 Then 1 Else 0 End),
[bonus: never] = Sum(Case When UserCounts.UsedCount = 0 Then 1 Else 0 End)
From UserCounts
Group By UserCounts.date
Order By UserCounts.date
Results:
date first time today 2 times total 3 times total 4 times total 5 times total bonus: never
----------- ---------------- ------------- ------------- ------------- ------------- ------------
2019-06-04 2 1 0 0 0 3
2019-06-05 2 1 0 1 0 1
2019-06-06 1 1 0 1 1 0
I think you want conditional aggregation:
select t.date,
sum(case when seqnum = 1 then 1 else 0 end) as redemption_1,
sum(case when seqnum = 2 then 1 else 0 end) as redemption_2,
sum(case when seqnum = 3 then 1 else 0 end) as redemption_3,
sum(case when seqnum = 4 then 1 else 0 end) as redemption_4,
sum(case when seqnum = 5 then 1 else 0 end) as redemption_5
from (select t.*, row_number() over (partition by user_id order by date) as seqnum
from table t
) t
group by t.date
order by t.date
I wish SQL for SUM each column(IPO and UOR) in TOTAL in second last. And GRAND TOTAL(Sum IPO + UOR) in the last one. Thank you so much
No Code IPO UOR
----------------------
1 D173 1 0
2 D176 3 0
3 D184 1 1
4 D185B 1 0
5 D187 1 2
6 F042 3 0
7 ML004 12 3
8 TTPMC 2 0
9 Z00204 1 0
------------------
TOTAL (NOS) 25 6
-------------------------
GRAND TOTAL (NOS) 31
Here is my code, :
SELECT
SUM(CASE WHEN IPOType = 'IPO' THEN 1 ELSE 0 END) as IPO,
SUM(CASE WHEN IPOType = 'UOR' THEN 1 ELSE 0 END) as UOR
FROM IPO2018
GROUP BY OriProjNo
it can show like this
No Code IPO UOR
----------------------
1 D173 1 0
2 D176 3 0
3 D184 1 1
4 D185B 1 0
5 D187 1 2
6 F042 3 0
7 ML004 12 3
8 TTPMC 2 0
9 Z00204 1 0
------------------
Generally speaking, you want to leave totals and sub-totals to whatever tool you are presenting your data in, as they will be able to handle the formatting with significantly more ease. In addition, your desired output does not have the same number of columns (Grand Total row only has one numeric) so even if you did shoehorn this in to the same dataset, the column headings wouldn't make sense.
That said, you can return group totals via the with rollup statement. This will provide an additional row with the aggregate totals for the group. Where there is more than one group in your data, you will get a sub-total row for each group and a total row for the entire dataset:
declare #t table(c nvarchar(10),t nvarchar(3));
insert into #t values ('D173','IPO'),('D176','IPO'),('D176','IPO'),('D176','IPO'),('D184','IPO'),('D184','UOR'),('D185B','IPO'),('D187','IPO'),('D187','UOR'),('D187','UOR'),('F042','IPO'),('F042','IPO'),('F042','IPO'),('TTPMC','IPO'),('TTPMC','IPO'),('Z00204','IPO'),('ML004','UOR'),('ML004','UOR'),('ML004','UOR'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO'),('ML004','IPO');
select row_number() over (order by grouping(c),c) as n
,case when grouping(c) = 1 then 'TOTAL (NOS)' else c end as c
,sum(case when t = 'IPO' then 1 else 0 end) as IPO
,sum(case when t = 'UOR' then 1 else 0 end) as UOR
from #t
group by c
with rollup
order by grouping(c)
,c;
Output:
+----+-------------+-----+-----+
| n | c | IPO | UOR |
+----+-------------+-----+-----+
| 1 | D173 | 1 | 0 |
| 2 | D176 | 3 | 0 |
| 3 | D184 | 1 | 1 |
| 4 | D185B | 1 | 0 |
| 5 | D187 | 1 | 2 |
| 6 | F042 | 3 | 0 |
| 7 | ML004 | 12 | 3 |
| 8 | TTPMC | 2 | 0 |
| 9 | Z00204 | 1 | 0 |
| 10 | TOTAL (NOS) | 25 | 6 |
+----+-------------+-----+-----+
My Sales data for first two weeks of june, Monday Date i.e 1st Jun , 8th Jun are below
date | count
2015-06-01 03:25:53 | 1
2015-06-01 03:28:51 | 1
2015-06-01 03:49:16 | 1
2015-06-01 04:54:14 | 1
2015-06-01 08:46:15 | 1
2015-06-01 13:14:09 | 1
2015-06-01 16:20:13 | 5
2015-06-01 16:22:13 | 1
2015-06-01 16:27:07 | 1
2015-06-01 16:29:57 | 1
2015-06-01 19:16:45 | 1
2015-06-08 10:54:46 | 1
2015-06-08 15:12:10 | 1
2015-06-08 20:35:40 | 1
I need a find weekly avg of sales happened in a given range .
Complex Query:
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
count(sales_count)
from final_result_set
group by h, dow.
Output :
h | day_of_week | count
3 | 1 | 3
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 8
19 | 1 | 1
20 | 1 | 1
If I try to apply avg on the above final result, It is not actually fetching correct answer!
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
avg(sales_count)
from final_result_set
group by h, dow.
h | day_of_week | count
3 | 1 | 1
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 1
19 | 1 | 1
20 | 1 | 1
So I 've two mondays in the given range, it is not actually dividing by it. I am not even sure what is happening inside redshift.
To get "weekly averages" use date_trunc():
SELECT date_trunc('week', my_date_column) as week
, avg(sales_count) AS avg_sales
FROM final_result_set
GROUP BY 1;
I hope you are not actually using date as name for your date column. It's a reserved word in SQL and a basic type name, don't use it as identifier.
If you group by the day of week (DOW) you get averages per weekday. and sunday is 0. (Use ISODOW to get 7 for Sunday.)
My query is to fetch data for last 5 weeks.
select z.week,
sum(case when i.severity=1 then 1 else 0 end) as 1
sum(case when i.severity=2 then 1 else 0 end) as 2
sum(case when i.severity=3 then 1 else 0 end) as 3
sum(case when i.severity=4 then 1 else 0 end) as 4
from instance as i
and left outer join year as z on convert(varchar(10),z.date,101)=convert(varchar(10),i.created,101)
and left outer join year as z on convert(varchar(10),z.date,101)=convert(varchar(10),i.closed,101)
where i.group in '%Teams%'
and z.year=2013
and z.week<=6 and z.week>1
here there are few weeks in my instance table, where there will be not even an single row. so here im not getting null or zero... instead the entire row is not at all prompting.
my present output.
week | 1 | 2 | 3 | 4
---------------------
2 | 0 | 1 | 8 | 5
3 | 2 | 3 | 4 | 9
5 | 1 | 0 | 0 | 0
but i need output like the below...
week | 1 | 2 | 3 | 4
---------------------
2 | 0 | 1 | 8 | 5
3 | 2 | 3 | 4 | 9
4 | 0 | 0 | 0 | 0
5 | 1 | 0 | 0 | 0
6 | 0 | 0 | 0 | 0
How to get the desired outputi n sql
try this
select z.week,
sum(case when i.severity=1 then 1 else 0 end) as 1
sum(case when i.severity=2 then 1 else 0 end) as 2
sum(case when i.severity=3 then 1 else 0 end) as 3
sum(case when i.severity=4 then 1 else 0 end) as 4
from year as z
left outer join instance as i on
convert(varchar(10),z.date,101)=convert(varchar(10),i.created,101)
and convert(varchar(10),z.date,101)=convert(varchar(10),i.closed,101)
where (i.group is null or i.group in '%Teams%')
and z.year=2013
and z.week<=6 and z.week>1
I'm not sure how the query works where you alias year twice to z. But, assuming that's not a problem, you can change the LEFT OUTER JOIN to RIGHT OUTER JOIN. Or, if you don't like the RIGHT OUTER JOIN, rework the SELECT so that the FROM clause references the year table.