depending on the max aggregate function to count the same column - sql

I'm trying to get the date of the latest game and the number of game a player had in a 1 hour window before that.
Here is the code I want to execute
SELECT MAX(saved_date), COUNT(saved_date) FROM Game
WHERE username = 'user_name'
HAVING saved_date >= MAX(saved_date) - interval '1' hour;
but it gives me the error not a GROUP BY expression. The error doesn't appear when I add the line GROUP BY saved_date but it doesn't answer my first question. I know I can do it in two statements but I'd prefer it done in one.
Do you have any advice or solution ? As this is my first post here, please be indulgent, thanks !
Additional info :
The Game table is created like this :
CREATE TABLE GAME (
game_id NUMBER GENERATED ALWAYS AS IDENTITY,
...
saved_date DATE,
...
);
output with GROUP BY saved_date :
MAX(saved_date) | COUNT(saved_date)
----------------|------------------
29/03/21 | 1
29/03/21 | 1
29/03/21 | 1
desired output :
MAX(saved_date) | COUNT(saved_date)
----------------|------------------
29/03/21 | 3

You can use analytic functions (instead of aggregation functions):
SELECT saved_date,
COUNT(*) OVER (
ORDER BY saved_date DESC
RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
AND INTERVAL '1' HOUR FOLLOWING
) AS num_games
FROM game
WHERE username = 'user_name'
ORDER BY saved_date DESC
FETCH FIRST ROW ONLY;
or, if you are using Oracle 11g:
SELECT saved_date,
num_games
FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY saved_date DESC ) AS rn,
saved_date,
COUNT(*) OVER (
ORDER BY saved_date DESC
RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
AND INTERVAL '1' HOUR FOLLOWING
) AS num_games
FROM game
WHERE username = 'user_name'
)
WHERE rn = 1;
Which, for the sample data:
CREATE TABLE GAME (
game_id NUMBER GENERATED ALWAYS AS IDENTITY,
username VARCHAR2(100),
saved_date DATE
);
INSERT INTO game ( username, saved_date )
SELECT 'user_name', DATE '1970-01-01' + ( LEVEL - 1 ) * INTERVAL '10' MINUTE
FROM DUAL
CONNECT BY LEVEL <= 20;
Outputs:
SAVED_DATE | NUM_GAMES
:------------------ | --------:
1970-01-01 03:10:00 | 7
db<>fiddle here

You need a subquery. I would approach this using a window function:
SELECT MAX(saved_date), COUNT(*)
FROM (SELECT G.*,
MAX(G.saved_date) OVER (PARTITION BY G.username) as max_saved_date
FROM Game G
WHERE G.username = 'user_name'
) G
WHERE saved_date >= max_saved_date - interval '1' hour;

Related

How to find the number of occurences within a date range?

Let's say I have hospital visits in the table TestData
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
How would I code this in SQL?
I have patient_id as a TEXT
the date is date_visit is also TEXT and takes the format MM/DD/YYYY
patient_id
date_visit
A123B29133
07/12/2011
A123B29133
07/14/2011
A123B29133
07/20/2011
A123B29134
12/05/2016
In the above table patient A123B29133 fulfills the condition as they were seen on 07/14/2011 which is less that 7 days from 07/12/2011
You can use a subquery with exists:
with to_d(id, v_date) as (
select patient_id, substr(date_visit, 7, 4)||"-"||substr(date_visit, 1, 2)||"-"||substr(date_visit, 4, 2) from visits
)
select t2.id from (select t1.id, min(t1.v_date) d1 from to_d t1 group by t1.id) t2
where exists (select 1 from to_d t3 where t3.id = t2.id and t3.v_date != t2.d1 and t3.v_date <= date(t2.d1, '+7 days'))
id
A123B29133
Since your date column is not in YYYY-MM-DD which is the default value used by several sqlite date functions, the substr function was used to transform your date in this format. JulianDay was then used to convert your dates to an integer value which would ease the comparison of 7 days. The MIN window function was used to identify the first hospital visit date for that patient. The demo fiddle and samples show the query that was used to transform the data and the results before the final query which filters based on your requirements i.e. < 7 days. With this approach using window functions, you may also retrieve the visit_date and the number of days since the first visit date if desired.
You may read more about sqlite date functions here.
Query #1
SELECT
patient_id,
visit_date,
JulianDay(visit_date) -
MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id)
as num_of_days_since_first_visit
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v;
patient_id
visit_date
num_of_days_since_first_visit
A123B29133
2011-07-12
0
A123B29133
2011-07-14
2
A123B29133
2011-07-20
8
A123B29134
2016-12-05
0
Query #2
The below is your desired query, which uses the previous query as a CTE and applies the filter for visits less than 7 days. num_of_days <> 0 is applied to remove entries where the first date is also the date of the record.
WITH num_of_days_since_first_visit AS (
SELECT
patient_id,
visit_date,
JulianDay(visit_date) - MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id) num_of_days
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v
)
SELECT DISTINCT
patient_id
FROM
num_of_days_since_first_visit
WHERE
num_of_days <> 0 AND num_of_days < 7;
patient_id
A123B29133
View on DB Fiddle
Let me know if this works for you.
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
You can use lag(). The following gets all rows where this is true:
select t.*
from (select t.*,
lag(date_visit) over (partition by patient_id order by date_visit) as prev_date_visit
from t
) t
where prev_date_visit >= date(date_visit, '-7 day');
If you just want the patient_ids, you can use select distinct patient_id.

Return number of rows (streak count) from entries where each entry is the previous days date

I have a table with two columns, userid and date. I want to query this table using a specific userid and date, and from that I want to return the row count of entries going backwards from the entered date. i.e. where userid = 1 AND date = '2020-07-09'. It should stop counting if there is a gap between the next date.
So here's my table:
userid | date
-------------------
1 | 2020-07-27
1 | 2020-07-28
2 | 2020-07-28
1 | 2020-07-29
The streak for userid 1 and date 2020-07-29 would be 3.
Now if I remove an entry:
userid | date
-------------------
1 | 2020-07-27
2 | 2020-07-28
1 | 2020-07-29
The streak for userid 1 and date 2020-07-29 would be 1. This is because the 2020-07-28 date is missing for the userid.
How could I do this with postgres sql? I have looked into the generate_series function but this requires me to set a start and end date.
You could achieve this using the following:
Approach 1
Using window functions, you could achieve this eg
SELECT
MAX("date") - MIN("date") + 1 as streak
FROM (
SELECT
*,
SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
FROM (
SELECT
* ,
CASE
WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
PARTITION BY "userid"
ORDER BY "date"
) =1 THEN 0
ELSE 1
END as date_cont
FROM t
WHERE "userid"=1 AND "date" <= '2020-07-29'
) t1
) t2
GROUP BY gn, "userid"
ORDER BY gn DESC
LIMIT 1
or replace MAX("date") - MIN("date") + 1 with COUNT(1),
or if you would like the entire row data
SELECT
*,
MAX("date") OVER (PARTITION BY gn,"userid") -
MIN("date") OVER (PARTITION BY gn,"userid") + 1 as streak
FROM (
SELECT
*,
SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
FROM (
SELECT
* ,
CASE
WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
PARTITION BY "userid"
ORDER BY "date"
) =1 THEN 0
ELSE 1
END as date_cont
FROM t
WHERE "userid"=1 AND "date" <= '2020-07-29'
) t1
) t2
ORDER BY gn DESC
LIMIT 1
or replace MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") + 1 with COUNT(1) OVER (PARTITION BY gn,"userid").
NB. Since we have filtered based on userid we could simple partition by gn only
Approach 2
Create a function to extract the streak. This function loops through the data and breaks when it determines that the streak has been broken.
CREATE OR REPLACE FUNCTION getStreak(
user_id int,
start_date DATE
) RETURNS int AS
$BODY$
DECLARE
r RECORD;
BEGIN
FOR r IN (
SELECT
* ,
ROW_NUMBER() OVER (
PARTITION BY "userid"
ORDER BY "date" DESC
) as streak,
"date"-LAG("date",1,"date") OVER (
PARTITION BY "userid"
ORDER BY "date"
) as date_diff
FROM t
WHERE "userid"=user_id AND "date" <= start_date
ORDER BY "date" DESC
)
LOOP
IF r.date_diff > 1 THEN
RETURN r.streak;
END IF;
END LOOP;
RETURN COALESCE(r.streak,0);
END;
$BODY$
LANGUAGE plpgsql;
with example usage
SELECT getStreak(1,'2020-07-29');
Approach 3
This approach identifies the streak using the difference from the chosen date and row number
SELECT
MAX(rn) as streak
FROM (
SELECT
* ,
MIN('2020-07-29'::date- "date") OVER (PARTITION BY "userid") as earliest_diff,
ROW_NUMBER() OVER (PARTITION BY "userid" ORDER BY "date" DESC) as rn,
('2020-07-29'::date- "date") as diff
FROM t
WHERE "date" <='2020-07-29' AND "userid"=1
) t1
WHERE rn = (diff-earliest_diff+1)
Demo
You may view a working demo with test cases here
Let me know if this works for you.

BigQuery - Nested Query with different WHERE parameters?

I'm trying to trying to fetch the user_counts and new_user_counts by date where new_user_counts is defined by condition WHERE date of timestamp event_timestamp = date of timestamp user_first_touch_timestamp while user_counts would fetch the distinct count of user_pseduo_id field between the same date range. How can I do this in the same query? Here's how my current query is looking.
Eventually, I'd like the result to be as:
|Date | new_user_count | user_counts |
|20200820 | X | Y |
Here is the error I'm getting at line 8 of code:
Syntax error: Function call cannot be applied to this expression. Function calls require a path, e.g. a.b.c() at [8:5]
Thanks.
SELECT
event_date,
COUNT (DISTINCT(user_pseudo_id)) AS new_user_counts FROM
`my-google-analytics-table-name.*`
WHERE DATE(TIMESTAMP_MICROS(event_timestamp)) =
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp))
AND event_date BETWEEN '20200820' AND '20200831'
(SELECT
COUNT (DISTINCT(user_pseudo_id)) AS user_counts
FROM `my-google-analytics-table-name.*`
WHERE event_date BETWEEN '20200820' AND '20200831'
)
GROUP BY event_date
ORDER BY event_date ASC
Try below (solely based on your original query just fixing the syntax/logic)
SELECT
event_date,
COUNT(DISTINCT IF(
DATE(TIMESTAMP_MICROS(event_timestamp)) = DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)),
user_pseudo_id,
NULL
)) AS new_user_counts,
COUNT(DISTINCT(user_pseudo_id)) AS user_counts
FROM `my-google-analytics-table-name.*`
GROUP BY event_date
ORDER BY event_date ASC

How to add corresponding date to grouped max/min-value in postgres?

I have a climate timeseries table with different measured parameters for many stations over some years (daily values). I am using postgres 9.4 with pgadmin.
The table looks this way:
Table name kl
station_id [int],
date [date],
temperature [numeric] ...
my select Code:
select
stat_id,
max(temperatur) as "T_max"
from kl
group by stat_id
order by stat_id
gives out the max-temperature value for every station:table
Now the question: how to add for every T_max value the corresponding date in another column (the date on which that max value was measured)?
thanks for your help
You use row_number() to get the whole row
PARTITION BY reset the row counter for each station, so you wont need group by.
WITH cte as (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY station_id
ORDER BY temperature DESC) AS rn
FROM kl
)
SELECT *
FROM cte
WHERE rn = 1
And just change * for the field names you need
select distinct on (stat_id)
stat_id, temperatur, date
from kl
order by stat_id, temperatur desc
Use the date column (bad name) to untie:
order by stat_id, temperatur desc, date
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
If you want both the min and max temperatures in the same query:
with kl (stat_id, temperatur, date) as (values
(1, 17.1, '2015-01-01'::date), (1, 17.2, '2015-01-02')
)
select stat_id,
t_max[1]::numeric as t_max,
(date 'epoch' + t_max[2] * interval '1 second')::date as d_max,
t_min[1]::numeric as t_min,
(date 'epoch' + t_min[2] * interval '1 second')::date as d_min
from (
select
stat_id,
max(array[temperatur, extract(epoch from date)::numeric]) as t_max,
min(array[temperatur, extract(epoch from date)::numeric]) as t_min
from kl
group by 1
) s
;
stat_id | t_max | d_max | t_min | d_min
---------+-------+------------+-------+------------
1 | 17.2 | 2015-01-02 | 17.1 | 2015-01-01

Postgresql: Gaps Between tsranges, empty set

I have a tables of reservations for each user:
reservations_development=# \d reservations
Table "public.reservations"
Column | Type | Modifiers
------------+---------+-----------------------------------------------------------
id | integer | not null default nextval('reservations_id_seq'::regclass)
user_id | integer |
occurrence | tsrange |
Indexes:
"reservations_pkey" PRIMARY KEY, btree (id)
"reservations_occurrence_user_id_excl" EXCLUDE USING gist (occurrence WITH &&, user_id WITH =)
I am trying to create a view of the gaps/opening between reservations for each user, and I currently have the following query:
CREATE OR REPLACE VIEW reservation_gaps AS (
with user_mins as (select tsrange(LOCALTIMESTAMP, min(lower(occurrence))), user_id
FROM (
SELECT user_id, occurrence
FROM reservations
WHERE lower(occurrence) >= LOCALTIMESTAMP
) as y
GROUP BY user_id
),
gaps as (select
tsrange(upper(occurrence), lead(lower(occurrence),1, LOCALTIMESTAMP + interval '1 year') over (win_user_gaps)),
user_id
from (
select user_id, occurrence
from reservations
) as x
WINDOW win_user_gaps AS (PARTITION BY user_id ORDER BY occurrence)
UNION ALL SELECT * FROM user_mins
)
select *
FROM gaps
ORDER BY user_id, tsrange
);
It currently gives the expected results as long as the user has one reservation, but if the user is new, and has not currently been reserved I get an empty result.
I need to in some way append a {tsrange(LOCALTIMESTAMP, LOCALTIMESTAMP + interval '1 year'), user_id} row to the view for each user without a reservation, but I'm currently stumped as to how to do that.
Thanks
You should change the CTE to be a UNION ALL with the artificial rows and then use DISTINCT ON to select one row per user.
with user_mins as (SELECT DISTINCT ON (user_id) user_id, tsrange FROM(
select tsrange(LOCALTIMESTAMP, min(lower(occurrence))) as tsrange, user_id, 1 as priotity
FROM (
SELECT user_id, occurrence
FROM reservations
WHERE lower(occurrence) >= LOCALTIMESTAMP
) as y
GROUP BY user_id
UNION ALL
SELECT user_id, tsrange(LOCALTIMESTAMP, LOCALTIMESTAMP + interval '1 year'),
0
FROM users)
ORDER BY user_id, priority DESC
)
SQL Fiddle
with this_year as (
select tsrange(
date_trunc('year', current_date)::timestamp,
date_trunc('year', current_date)::timestamp + interval '1' year, '[)'
) as this_year
), gaps as (
select
user_id,
this_year - tsrange(lower(occurrence), 'infinity', '[]') lower_range,
this_year - tsrange('-infinity', upper(occurrence), '[]') upper_range,
this_year
from
reservations
cross join
this_year
)
select *
from (
select
user_id,
upper_range *
lead (lower_range, 1, this_year)
over (partition by user_id order by lower_range, upper_range)
as gap
from gaps
union (
select distinct on (user_id)
user_id,
tsrange(
lower(this_year),
coalesce(upper(lower_range), upper(this_year)),
'[)'
) as gap
from gaps
order by user_id, lower_range
)
) s
where gap != 'empty'
order by user_id, gap