PostgreSQL: Getting date associated with max value - sql

I have a table in the form of
USER |VALUE |DATE
-------------------------
user1 | 1337 | 2019-11-01
user1 | 1338 | 2019-03-28
user2 | 1234 | 2019-04-23
user2 | 4567 | 2019-05-05
and want to get the maximum value of every user with the associated date. When I do something like
SELECT max(VALUE) FROM table
GROUP BY USER
I have the problem that I can neither aggregate nor group by the DATE. How can I tell PostgreSQL that I just wanna have the date associated with the row where the max. value is?
Thanks!

Click: demo:db<>fiddle
You can use DISTINCT ON: This gives you the first record of an ordered group. In your case the groups is your user column. This you have to order by value DESC to get the max value to be the first record. This will be taken - including the associated date value.
SELECT DISTINCT ON (user)
*
FROM
mytable
ORDER BY user, value DESC

If you want other aggregations in the query, you can also use array_agg() to mimic a "first" function:
select user, max(value), count(*),
(array_agg(date order by value desc))[1] as date_at_max
from t
group by user;

Related

Display the previous date for a user in an additional column

I have a list of users and a list of review dates corresponding to each user, the user can have multiple reviews relating to them. What I need to do is create an additional column that shows me the users previous review date, if they don't have a previous review I need it to be null.
An example of the result I require is shown below with the column in bold being the column I want to add:
| User | Review Date | Previous Review Date
| ----- | -------------- | ------------------------
| 1122334 | 01/01/2022 | 06/06/2021
| 1122334 | 06/06/2021 | 06/01/2021
| 1122334 | 06/01/2021 | null
| 2244668 | 01/10/2021 | 01/04/2021
| 2244668 | 01/04/2021 | null
| 3344556 | 10/11/2021 | 10/03/2021
| 3344556 | 10/03/2021 | null
You can see in the example, that the previous review date for the user on row 1 will be the same users review date on row number 2
I have tried using the below:
select user, lead(review_date) over order(order by user,review_date desc) as Previous_review_date
this code works until I need it to be a null value in which case it will simply add the previous review date from an unrelated user.
Any help would be greatly appreciated.
Pretty sure OUTER APPLY would work here as well using a limit.
Note this could be useful if you need more than just a single column of data.
some docs - outer apply
ask tom - LINQ, cross/outer apply
In essence outer apply will run sub query once for each row in table A correlating the results between the two. Since we limit and order the results; we'll only get 1 record back whose review date is less than the review date. Now as an outer, we keep all records from A and only show results from Z when they exist. So the Z.review_date will be null when no such date/user can be correlated.
SELECT A.user, A.Review_date Z.review_date as Previous_review_Date
FROM TABLE A
OUTER APPLY (SELECT review_date
FROM Table B on A.User=B.User and B.Review_date < a.Review_Date
ORDER BY review_Date Desc
FETCH FIRST 1 ROWS ONLY) Z
Depending on volumn of data one approach vs the other can be more efficient. (See ask tom article)
Using your current approach:
SELECT A.user, A.Review_Date, lead(A.Review_date) over (partition by A.User ORDER BY A.Review_Date DESC) FROM TABLE A
The reason your's isn't working is because it's ordering ALL records by date; not those specific to a user. So you need to "partition" the data to each user and only order that users' review dates.
you need to partition the data to identify the lead value
select user, lead(review_date) over order(partition by user order by review_date desc) as Previous_review_date

How to select unique sessions per unique dates with SQL?

I'm struggling with my SQL. I want to select all unique sessions on unique dates from a table. I don't get the results I want.
Example of table:
session_id | date
87654321 | 2020-05-22 09:10:10
12345678 | 2020-05-23 10:19:50
12345678 | 2020-05-23 10:20:23
87654321 | 2020-05-23 12:00:10
This is my SQL right now. I select all distinct dates from a datetime column. I also count all distinct session_id's. I group them by date.
SELECT DISTINCT DATE_FORMAT(`date`, '%d-%m-%Y') as 'date', COUNT(DISTINCT `session_id`) as 'count' FROM `logging` GROUP BY 'date'
What I want to see is (with example above):
date | count
22-05-2020 | 1
23-05-2020 | 2
The result I get with my real table (with 354 sessions on 3 different dates) right now is:
date | count
21-05-2020 | 200
Edit
Changes ` to '.
The name of the field and the name of the alias is the same (date). Please try to use different name for the alias to avoid confusion in GROUP BY part
You probably want to group on your date expression
SELECT DATE_FORMAT(`date`, '%d-%m-%Y') as `date`, COUNT(DISTINCT `session_id`) as `count` FROM `logging` GROUP BY DATE_FORMAT(`date`, '%d-%m-%Y')

How to make query that selects based on 1 day interval?

How can I get all IDs that have more than 10 entries on one day?
Here is the sample data:
ID | Time
__________________________
4 | 2019-02-14 17:22:43
__________________________
2 | 2019-04-27 07:51:09
__________________________
83 | 2018-01-07 08:38:37
__________________________
I am having a hard time using count and going through and finding all of the ones on the same day. The Hour:Min:Sec is what is causing problems for me.
For MySql it would be:
select distinct id from tablename
group by id, date(time)
having count(*) > 10
The date() function rejects the time part of the column, so the grouping is done only by the date part.
For SqlServer you would use:
convert(date, time)

How to do a sub-select per result entry in postgresql?

Assume I have a table with only two columns: id, maturity. maturity is some date in the future and is representative of until when a specific entry will be available. Thus it's different for different entries but is not necessarily unique. And with time number of entries which have not reached this maturity date changes.
I need to count a number of entries from such a table that were available on a specific date (thus entries that have not reached their maturity). So I basically need to join this two queries:
SELECT generate_series as date FROM generate_series('2015-10-01'::date, now()::date, '1 day');
SELECT COUNT(id) FROM mytable WHERE mytable.maturity > now()::date;
where instead of now()::date I need to put entry from the generated series. I'm sure this has to be simple enough, but I can't quite get around it. I need the resulting solution to remain a query, thus it seems that I can't use for loops.
Sample table entries:
id | maturity
---+-------------------
1 | 2015-10-03
2 | 2015-10-05
3 | 2015-10-11
4 | 2015-10-11
Expected output:
date | count
------------+-------------------
2015-10-01 | 4
2015-10-02 | 4
2015-10-03 | 3
2015-10-04 | 3
2015-10-05 | 2
2015-10-06 | 2
NOTE: This count doesn't constantly decrease, since new entries are added and this count increases.
You have to use fields of outer query in WHERE clause of a sub-query. This can be done if the subquery is in the SELECT clause of the outer query:
SELECT generate_series,
(SELECT COUNT(id)
FROM mytable
WHERE mytable.maturity > generate_series)
FROM generate_series('2015-10-01'::date, now()::date, '1 day');
More info: http://www.techonthenet.com/sql_server/subqueries.php
I think you want to group your data by the maturity Date.
Check this:
select maturity,count(*) as count
from your_table group by maturity;

Trending sum over time

I have a table (in Postgres 9.1) that looks something like this:
CREATE TABLE actions (
user_id: INTEGER,
date: DATE,
action: VARCHAR(255),
count: INTEGER
)
For example:
user_id | date | action | count
---------------+------------+--------------+-------
1 | 2013-01-01 | Email | 1
1 | 2013-01-02 | Call | 3
1 | 2013-01-03 | Email | 3
1 | 2013-01-04 | Call | 2
1 | 2013-01-04 | Voicemail | 2
1 | 2013-01-04 | Email | 2
2 | 2013-01-04 | Email | 2
I would like to be able to view a user's total actions over time for a specific set of actions; for example, Calls + Emails:
user_id | date | count
-----------+-------------+---------
1 | 2013-01-01 | 1
1 | 2013-01-02 | 4
1 | 2013-01-03 | 7
1 | 2013-01-04 | 11
2 | 2013-01-04 | 2
The monstrosity that I've created so far looks like this:
SELECT
date, user_id, SUM(count) OVER (PARTITION BY user_id ORDER BY date) AS count
FROM
actions
WHERE
action IN ('Call', 'Email')
GROUP BY
user_id, date, count;
Which works for single actions, but seems to break for multiple actions when they happen on the same day, for example instead of the expected 11 on 2013-01-04, we get 9:
date | user_id | count
------------+--------------+-------
2013-01-01 | 1 | 1
2013-01-02 | 1 | 4
2013-01-03 | 1 | 7
2013-01-04 | 1 | 9 <-- should be 11?
2013-01-04 | 2 | 2
Is it possible to tweak my query to resolve this issue? I tried removing the grouping on count, but Postgres doesn't seem to like that:
column "actions.count" must appear in the GROUP BY clause
or be used in an aggregate function
LINE 2: date, user_id, SUM(count) OVER (PARTITION BY user...
^
This query produces the result you are looking for:
SELECT DISTINCT
date, user_id, SUM(count) OVER (PARTITION BY user_id ORDER BY date) AS count
FROM actions
WHERE
action IN ('Call', 'Email');
The default window is already what you want, according to the official docs and the "DISTINCT" eliminates duplicate rows when both Emails and Calls happen on the same day.
See SQL Fiddle.
The table has a column named "count", and the expresion in the SELECT clause is aliased as "count", it is ambiguous.
Read documentation: http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-GROUPBY
In case of ambiguity, a GROUP BY name will be interpreted as an
input-column name rather than an output column name.
That means, that your query does not group by "count" evaluated in the SELECT clause, but rather it groups by "count" values taken from the table.
This query gives expected results, see SQL Fiddle
SELECT date, user_id, count
from (
Select date, user_id,
SUM(count) OVER (PARTITION BY user_id ORDER BY date) AS count
FROM actions
WHERE
action IN ('Call', 'Email')
) alias
GROUP BY
user_id, date, count;
Asserts
It is unclear whether you want to sort by user_id or date
It is also unclear whether you want to include dates in the result list, for which there is no row in the base table. In this case, refer to this closely related answer:
PostgreSQL: running count of rows for a query 'by minute'
Repair names
First off, I am using this test table instead of your problematic table:
CREATE TEMP TABLE actions (
user_id integer,
thedate date,
action text,
ct integer
);
Your use of reserved words and function names as identifiers (column names) is part of the problem.
Repair query
Combine aggregate and window functions
Since aggregate functions are applied first, your original query lumps the two rows found for user_id = 1 and thedate = '2013-01-04' into one. You have to multiply by count(*) to get the actual running count.
You can do this without subquery, since you can combine aggregate functions and window functions. Aggregate functions are applied first. You can even have a window functions over the result of aggregate functions.
SELECT thedate
, user_id
, sum(ct * count(*)) OVER (PARTITION BY user_id
ORDER BY thedate) AS running_ct
FROM actions
WHERE action IN ('Call', 'Email')
GROUP BY user_id, thedate, ct
ORDER BY user_id, thedate;
Or simplify to:
...
, sum(sum(ct)) OVER (PARTITION BY user_id
ORDER BY thedate) AS running_ct
...
This should also be the fastest of the solutions presented.
Here, the inner sum() is an aggregate function, while the outer sum() is a window function - over the result of the aggregate function.
Or use DISTINCT
Another way would to use DISTINCT or DISTINCT ON, since that is applied after window functions:
DISTINCT - this is possible, since running_ct is guaranteed to be the same in this case anyway, since all peers are summed at once for the default frame definition of window functions.
SELECT DISTINCT
thedate
, user_id
, sum(ct) OVER (PARTITION BY user_id ORDER BY thedate) AS running_ct
FROM actions
WHERE action IN ('Call', 'Email')
ORDER BY thedate, user_id;
Or simplify with DISTINCT ON:
SELECT DISTINCT ON (thedate, user_id)
...
->SQLfiddle demonstrating all variants.