I want to get best 3 day of users between "2014-07-01" and "2014-08-01"
Could someone help me? I've been stuck here for 3 days.
In real score table entries are 10:00 to 22:00 and 1 entries for each hour.
Total of 12 entry for each day and each player (sometimes it could be less 1 or 2).
This is the output I'm trying to get:
ID | User_ID | Username | Sum(Score) | Date
--------------------------------------------------
1 | 1 | Xxx | 52 | 2014-07-01
2 | 1 | Xxx | 143 | 2014-07-02
3 | 2 | Yyy | 63 | 2014-07-01
...
Score table:
ID | User_ID | Score | Datetime
-----------------------------------------
1 | 1 | 35 | 2014-07-01 11:00:00
2 | 1 | 17 | 2014-07-01 12:00:00
3 | 2 | 36 | 2014-07-01 11:00:00
4 | 2 | 27 | 2014-07-01 12:00:00
5 | 1 | 66 | 2014-07-02 11:00:00
6 | 1 | 77 | 2014-07-02 12:00:00
7 | 2 | 93 | 2014-07-02 12:00:00
...
User table :
ID | Username
--------------
1 | Xxx
2 | Yyy
3 | Zzz
...
I think you need to aggregate first by date, and then choose the first three using row_number(). To do the aggregation:
select s.user_id, sum(s.datetime, 'day') as theday, sum(score) as score,
row_number() over (partition by s.user_id order by sum(score) desc) as seqnum
from scores s
group by s.user_id;
To get the rest of the information, use this as a subquery or CTE:
select u.*, s.score
from (select s.user_id, sum(s.datetime, 'day') as theday, sum(s.score) as score,
row_number() over (partition by s.user_id order by sum(s.score) desc) as seqnum
from scores s
group by s.user_id
) s join
users u
on s.user_id = u.users_id
where seqnum <= 3
order by u.user_id, s.score desc;
SELECT 'group has no id' as ID,
u.ID as User_ID,
u.Username,
sum(s.Score) "Sum(Score)",
s.Datetime::date as Date
FROM User u,
Score s
WHERE u.id = s.User_ID
AND s.Datetime BETWEEN '2014-07-01' AND '2014-08-01 23:59:59'
GROUP BY u.ID, u.Username, s.Datetime::date
ORDER BY sum(s.Score) DESC
LIMIT 3;
Related
I have a PostgreSQL query:
SELECT DISTINCT ON ("contract"."contract_id") "contract"."id"
FROM "contract_versions" "contract"
WHERE "contract"."client_id" = 1
GROUP BY "contract"."contract_id", "contract"."id"
ORDER BY "contract"."contract_id", "contract"."change_effective_date" DESC
I want to add something like if contract_id occurs more than once then change_effective_date >= now()
Dataset:
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
100 | 10 | 1 | 2020-05-17 00:00:00
200 | 10 | 1 | 2020-05-16 00:00:00
300 | 10 | 1 | 2020-05-14 00:00:00
400 | 20 | 1 | 2020-05-17 00:00:00
500 | 30 | 1 | 2020-05-13 00:00:00
600 | 30 | 1 | 2020-05-14 00:00:00
Expected result:
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
200 | 10 | 1 | 2020-05-16 00:00:00
400 | 20 | 1 | 2020-05-17 00:00:00
600 | 30 | 1 | 2020-05-14 00:00:00
If the count of contract_id is more than 1, I want a row with change_effective_date less than or equals today
I tried using:
SELECT DISTINCT ON ("contract"."contract_id") "contract"."id",
COUNT("contract"."contract_id") AS cnt
FROM "contract_versions" "contract"
WHERE "contract"."client_id" = 1 AND
CASE WHEN "cnt" > 1 THEN "contract"."change_effective_date" <= now() END
GROUP BY "contract"."contract_id", "contract"."id"
ORDER BY "contract"."contract_id", "contract"."change_effective_date" DESC
but its throwing an error column "cnt" does not exist
Thanks
I have never seen DISTINCT ON combined with GROUP BY. It is possible (aggregate first, then pick rows from that aggregation result), but this is not what you are doing and it is not what you want either. You want the ranking applied by the ORDER BY clause for the DISTINCT ON to take the current date into account.
SELECT DISTINCT ON (contract_id) *
FROM contract_versions
WHERE client_id = 1
ORDER BY
contract_id,
CASE WHEN change_effective_date <= CURRENT_DATE THEN 1 ELSE 2 END,
change_effective_date DESC;
Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=f709d18b23504dfaf9f586ace231be1d
There are 2 problems in your queries:
In the first query you have:
ERROR: column "contract.change_effective_date" must appear in the
GROUP BY clause or be used in an aggregate function
In the second you cannot directly reference a column alias ("cnt") in the query: you can do this in another query that must reference the original query as derived table or inline view.
Here is something that could be a solution:
select * from contract_versions;
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 20 | 1 | 2020-05-16 09:00:00
(4 rows)
Null display is "NULL".
SELECT DISTINCT ON (contract_id) contract_id,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC;
contract_id | change_effective_date
-------------+-----------------------
10 | 2020-05-14 14:00:00
20 | 2020-05-16 09:00:00
(2 rows)
SELECT DISTINCT ON (contract_id) contract_id,
COUNT(contract_id) AS cnt,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC;
contract_id | cnt | change_effective_date
-------------+-----+-----------------------
10 | 3 | 2020-05-14 14:00:00
20 | 1 | 2020-05-16 09:00:00
(2 rows)
SELECT
contract_id,
CASE WHEN cnt > 1 THEN change_effective_date <= now()
END
FROM
(
SELECT DISTINCT ON (contract_id) contract_id,
COUNT(contract_id) AS cnt,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC
) v;
contract_id | case
-------------+------
10 | t
20 | NULL
(2 rows)
With row_number() window function:
select t.id, t.contract_id, t.client_id, t.change_effective_date
from (
select *,
row_number() over (
partition by contract_id
order by (change_effective_date > now())::int, change_effective_date desc
) rn
from contract_versions
) t
where t.rn = 1
order by t.id
It is not clear in your question, because your sample data contains only 1 client_id, if a contract_id's value belongs to only one client_id.
If this is not the case, then you must change in the above query:
partition by contract_id
to:
partition by contract_id, client_id
See the demo.
Results:
| id | contract_id | client_id | change_effective_date |
| --- | ----------- | --------- | ------------------------ |
| 200 | 10 | 1 | 2020-05-16 00:00:00.000 |
| 400 | 20 | 1 | 2020-05-17 00:00:00.000 |
| 600 | 30 | 1 | 2020-05-14 00:00:00.000 |
I have below table, it shows user_id and ride_date.
+---------+------------+
| user_id | ride_date |
+---------+------------+
| 1 | 2019-11-01 |
| 1 | 2019-11-03 |
| 1 | 2019-11-05 |
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-05 |
| 4 | 2019-11-07 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 5 | 2019-11-11 |
| 5 | 2019-11-13 |
+---------+------------+
I want user_id who took rides for 3 or more consecutive days along with days on which they took consecutive rides
The desired result is as below
+---------+-----------------------+
| user_id | consecutive_ride_date |
+---------+-----------------------+
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 4 | 2019-11-10 |
+---------+-----------------------+
SQL Fiddle
With LAG() and LEAD() window functions:
with cte as (
select *,
datediff(
day,
lag([ride_date]) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev1,
datediff(
day,
lag([ride_date], 2) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev2,
datediff(
day,
[ride_date],
lead([ride_date]) over (partition by [user_id] order by [ride_date])
) next1,
datediff(
day,
[ride_date],
lead([ride_date], 2) over (partition by [user_id] order by [ride_date])
) next2
from Table1
)
select [user_id], [ride_date]
from cte
where
(prev1 = 1 and prev2 = 2) or
(prev1 = 1 and next1 = 1) or
(next1 = 1 and next2 = 2)
See the demo.
Results:
> user_id | ride_date
> ------: | :---------
> 2 | 03/11/2019
> 2 | 04/11/2019
> 2 | 05/11/2019
> 2 | 06/11/2019
> 3 | 03/11/2019
> 3 | 04/11/2019
> 3 | 05/11/2019
> 3 | 06/11/2019
> 4 | 07/11/2019
> 4 | 08/11/2019
> 4 | 09/11/2019
Here is one way to adress this gaps-and-island problem:
first, assign a rank to each user ride with row_number(), and recover the previous ride_date (aliased lag_ride_date)
then, compare the date of the previous ride to the current one in a conditional sum, that increases when the dates are successive ; by comparing this with the rank of the user ride, you get groups (aliased grp) that represent consecutive rides with a 1 day spacing
do a window count how many records belong to each group (aliased cnt)
filter on records whose window count is greater than 3
Query:
select user_id, ride_date
from (
select
t.*,
count(*) over(partition by user_id, grp) cnt
from (
select
t.*,
rn1
- sum(case when ride_date = dateadd(day, 1, lag_ride_date) then 1 else 0 end)
over(partition by user_id order by ride_date) grp
from (
select
t.*,
row_number() over(partition by user_id order by ride_date) rn1,
lag(ride_date) over(partition by user_id order by ride_date) lag_ride_date
from Table1 t
) t
) t
) t
where cnt >= 3
Demo on DB Fiddle
This is a typical gaps and island problems.
We can solve it as follows
with data
as (
select user_id
,ride_date
,dateadd(day
,-row_number() over(partition by user_id order by ride_date asc)
,ride_date) as grp_field
from Table1
)
,consecutive_days
as(
select user_id
,ride_date
,count(*) over(partition by user_id,grp_field) as cnt
from data
)
select *
from consecutive_days
where cnt>=3
order by user_id,ride_date
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=7bb851d9a12966b54afb4d8b144f3d46
There is no need to apply gaps-and-islands methodologies to this problem. The problem is much simpler to solve.
You can return the users and first date just by using LEAD():
SELECT t1.*
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1
WHERE ride_date_2 = DATEADD(day, 2, ride_date);
If you want the actual dates, you can unpivot the results:
SELECT DISTINCT t1.user_id, v.ride_date
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1 CROSS APPLY
(VALUES (t1.ride_date),
(DATEADD(day, 1, t1.ride_date)),
(DATEADD(day, 2, t1.ride_date))
) v(ride_date)
WHERE t1.ride_date_2 = DATEADD(day, 2, t1.ride_date)
ORDER BY t1.user_id, v.ride_date;
I got an event log table which captures the change of status of all users, say status A, status B and Status C. They can change it whenever they want. How can I get the snapshot of how many users are in each status at every End of Day (from the earliest day in the event log table till the latest day)
Appreciate if anyone can show me how to do it by PostsgreSQL in an elegant way. Thanks!
Edit: the event log table captures a bunch of events (one of them is status change) of every user, log_id records the order of the event log of that particular user.
user_id | log_time | status | event_A | log_id |
----------------------------------------------------------
456 | 2019-01-05 15:00 | C | | 5 |
123 | 2019-01-05 14:00 | C | | 4 |
123 | 2019-01-05 13:00 | | xxx | 3 |
456 | 2019-01-04 22:00 | B | | 4 |
456 | 2019-01-04 10:00 | C | xxx | 3 |
987 | 2019-01-04 05:00 | C | | 3 |
123 | 2019-01-03 23:00 | B | | 2 |
987 | 2019-01-03 15:00 | | xxx | 2 |
456 | 2019-01-02 22:00 | A | xxx | 2 |
123 | 2019-01-01 23:00 | C | | 1 |
456 | 2019-01-01 09:00 | B | xxx | 1 |
987 | 2019-01-01 04:00 | A | | 1 |
So I want to get the total number of user in each status at End of Day:
Date | status A | status B | status C |
---------------------------------------------
2019-01-05 | 0 | 0 | 3 |
2019-01-04 | 0 | 2 | 1 |
2019-01-03 | 2 | 1 | 0 |
2019-01-02 | 2 | 0 | 1 |
2019-01-01 | 1 | 1 | 1 |
This was quiet challenging to do :). I tried to fragment the sub-queries for good readability. It is probably not an very efficient way to do what you want, but it does the job.
-- collect all days to make sure there are no missing days
WITH all_days_cte(dt) as (
SELECT
generate_series(
(SELECT min(date_trunc('day', log_time)) from your_table),
(SELECT max(date_trunc('day', log_time)) from your_table),
'1 day'
)::DATE
),
-- collect all useres
all_users_cte as (
select distinct
user_id
from your_table
),
-- setup the table with infos needed, i.e. only the last status by day and user_id
infos_to_aggregate_cte as (
select
s.user_id,
s.dt,
s.status
from (
select
user_id,
date_trunc('day', log_time)::DATE as dt,
status,
row_number() over (partition by user_id, date_trunc('day', log_time) order by log_time desc) rn
from your_table
where status is not null
) s
-- only the last status of the day
where s.rn = 1
),
-- now we still have a problem, we need to find the last status, if there was no change on a day
completed_infos_cte as (
select
u.user_id,
d.dt,
-- not very efficient, but found no other way (first_value(...) would be nice, but there is no simple way to exclude nulls
(select
status
from infos_to_aggregate_cte i2
where i2.user_id = u.user_id
and i2.dt <= d.dt
and i2.status is not null
order by i2.dt desc
limit 1) status
from all_days_cte d
-- cross product for all dates and users (that is what we need for our aggregation)
cross join all_users_cte u
left outer join infos_to_aggregate_cte i on u.user_id = i.user_id
and d.dt = i.dt
)
select
c.dt,
sum(case when status = 'A' then 1 else 0 end) status_a,
sum(case when status = 'B' then 1 else 0 end) status_b,
sum(case when status = 'C' then 1 else 0 end) status_c
from completed_infos_cte c
group by c.dt
order by c.dt desc
Imagine I have a table on Redshift with this similar structure. Product_Bill_ID is the Primary Key of this table.
| Store_ID | Product_Bill_ID | Payment_Date
| 1 | 1 | 01/10/2016 11:49:33
| 1 | 2 | 01/10/2016 12:38:56
| 1 | 3 | 01/10/2016 12:55:02
| 2 | 4 | 01/10/2016 16:25:05
| 2 | 5 | 02/10/2016 08:02:28
| 3 | 6 | 03/10/2016 02:32:09
If I want to query the number of Product_Bill_ID that a store sold in the first hour after it sold its first Product_Bill_ID, how could I do this?
This example should outcome
| Store_ID | First_Payment_Date | Sold_First_Hour
| 1 | 01/10/2016 11:49:33 | 2
| 2 | 01/10/2016 16:25:05 | 1
| 3 | 03/10/2016 02:32:09 | 1
You need to get the first hour. That is easy enough using window functions:
select s.*,
min(payment_date) over (partition by store_id) as first_payment_date
from sales s
Then, you need to do the date filtering and aggregation:
select store_id, count(*)
from (select s.*,
min(payment_date) over (partition by store_id) as first_payment_date
from sales s
) s
where payment_date <= first_payment_date + interval '1 hour'
group by store_id;
SELECT
store_id,
first_payment_date,
SUM(
CASE WHEN payment_date < DATEADD(hour, 1, first_payment_date) THEN 1 END
) AS sold_first_hour
FROM
(
SELECT
*,
MIN(payment_date) OVER (PARTITION BY store_id) AS first_payment_date
FROM
yourtable
)
parsed_table
GROUP BY
store_id,
first_payment_date
There is a table post_status_changes, which is history of post status changes
post_id | created_at | status
---------+---------------------+---------
3 | 2016-09-02 04:00:00 | 1
3 | 2016-09-04 19:59:21 | 2
6 | 2016-09-03 15:00:00 | 5
6 | 2016-09-03 19:52:46 | 1
6 | 2016-09-04 20:53:22 | 2
What I wanna get is a list for each day from DayA till DayB of post status for end of date.
DayA = 2016-09-01
DayB = 2016-09-05
post_id | date | status
-----------+-------------+---------
3 | 2016-09-01 | null
3 | 2016-09-02 | 1
3 | 2016-09-03 | 1
3 | 2016-09-04 | 2
3 | 2016-09-05 | 2
6 | 2016-09-01 | null
6 | 2016-09-02 | null
6 | 2016-09-03 | 1
6 | 2016-09-04 | 2
6 | 2016-09-05 | 2
Any solutions?
solution was found here: PHP: Return all dates between two dates in an array
$period = new DatePeriod(
new DateTime('2010-10-01'),
new DateInterval('P1D'),
new DateTime('2010-10-05')
);
foreach ($period as $each){
//.. QUERY here, where "CREAtED_AT" = $each
}
with a as
(select convert(varchar(10), created_at, 102) [date], [status],
post_id, rank() over (partition by convert(varchar(10), created_at),
post_id order by created_at desc) as r
from post_status_changes)
select post_id, [date], [status] from a where r =
(select top 1 r from a as a2 where a.[date] =
a2.[date] and a.[post_id] = a2.[post_id])
and #DayA <= [date] and #DayB >= [date] order by post_id, [date];
For each post_id you want as many rows as there are days between the start and end date. This can be done by cross joining the list of dates with the post_ids and then join that result back to the table to get the status for each day:
select x.post_id, t.created, p.status
from generate_series(date '2016-09-01', date '2016-09-05', interval '1' day) as t(created)
cross join (
select distinct post_id
from post_status_changes
) x
left join post_status_changes p on p.created_at::date = t.created
order by 1,2;
Running example: http://rextester.com/CSX38222