Access Bare Columns w/ Aggregate Function w/o adding to Group By - sql

I have 2 tables in postgres.
users
auth0_id
email
123-A
a#a
123-B
b#b
123-C
c#c
auth0_logs
id
date
user_id
client_name
abc-1
021-10-16T00:18:41.381Z
123-A
example_client
abc-2
...
123-A
example_client
abc-3
...
123-B
example_client
abc-4
...
123-A
example_client
abc-5
...
123-B
example_client
abc-6
...
123-C
example_client
I am trying to get the last login information (a single row in the auth0_logs table based on MAX(auth0_logs.date) ) for for each unique user (auth0_logs.user_id) joined to the users table on user.auth0_id.
[
{
// auth0_logs information
user_id: "123-A",
last_login: "021-10-16T00:18:41.381Z",
client_name: "example_client",
// users information
email: "a#a"
},
{
user_id: "123-B",
last_login: "...",
client_name: "example_client",
email: "b#b"
},
{
user_id: "123-C",
last_login: "...",
client_name: "example_client",
email: "c#c"
}
]
I know this is a problem with "bare" columns not being allowed in queries that use aggregators (without being added to the GROUP BY -- but adding to the GROUP BY returned > 1 row) but I cannot get a solution that works from other SO posts (best post I've found: SQL select only rows with max value on a column). I promise you I have been on this for many hours over the past few days ....
-- EDIT: start --
I have removed my incorrect attempts as to not confuse / misdirect future readers. Please see #MichaelRobellard answer using the WITH clause based on the above information.
-- EDIT: end --
Any help or further research direction would be greatly appreciated!

with user_data as (
select user_id, max(date) from auth0_logs group by user_id
)
select * from user_data
join auth0_logs on user_data.user_id = auth0_logs.user_id
and user_data.date = auth0_logs.date
join users on user_data.user_id = users.auth0_id

with
t as
(
select distinct on (user_id) *
from login_logs
order by user_id, ldate desc
),
tt as
(
select auth0_id user_id, ldate last_login, client_name, email
from t join users on auth0_id = user_id
)
select json_agg(to_json(tt.*)) from tt;
SQL fiddle here.

Related

SQL Query With Max Value from Child Table

Three pertinent tables: tracks (music tracks), users, and follows.
The follows table is a many to many relationship relating users (followers) to users (followees).
I'm looking for this as a final result:
<track_id>, <user_id>, <most popular followee>
The first two columns are simple and result from a relationship between tracks and users. The third is my problem. I can join with the follows table and get all of the followees that each user follows, but how to get only the most followee that has the highest number of follows.
Here are the tables with their pertinent columns:
tracks: id, user_id (fk to users.id), song_title
users: id
follows: followee_id (fk to users.id), follower_id (fk to users.id)
Here's some sample data:
TRACKS
1, 1, Some song title
USERS
1
2
3
4
FOLLOWS
2, 1
3, 1
4, 1
3, 4
4, 2
4, 3
DESIRED RESULT
1, 1, 4
For the desired result, the 3rd field is 4 because as you can see in the FOLLOWS table, user 4 has the most number of followers.
I and a few great minds around me are still scratching our heads.
So I threw this into Linqpad because I'm better with Linq.
Tracks
.Where(t => t.TrackId == 1)
.Select(t => new {
TrackId = t.TrackId,
UserId = t.UserId,
MostPopularFolloweeId = Followers
.GroupBy(f => f.FolloweeId)
.OrderByDescending(g => g.Count())
.FirstOrDefault()
.Key
});
The resulting SQL query was the following (#p0 being the track id):
-- Region Parameters
DECLARE #p0 Int = 1
-- EndRegion
SELECT [t0].[TrackId], [t0].[UserId], (
SELECT [t3].[FolloweeId]
FROM (
SELECT TOP (1) [t2].[FolloweeId]
FROM (
SELECT COUNT(*) AS [value], [t1].[FolloweeId]
FROM [Followers] AS [t1]
GROUP BY [t1].[FolloweeId]
) AS [t2]
ORDER BY [t2].[value] DESC
) AS [t3]
) AS [MostPopularFolloweeId]
FROM [Tracks] AS [t0]
WHERE [t0].[TrackId] = #p0
That outputs the expected response, and should be a start to a cleaner query.
This sounds like an aggregation query with row_number(). I'm a little confused on how all the joins come together:
select t.*
from (select t.id, f.followee_id, count(*) as cnt,
row_number() over (partition by t.id order by count(*) desc) as seqnum
from followers f join
tracks t
on f.follow_id = t.user_id
group by t.id, f.followee_id
) t
where seqnum = 1;

ORDER BY alternative values from main- and subquery

Right now I have a query that looks something like so:
SELECT id
FROM post_table post
WHERE post.post_user_id = user_id
ORDER BY (SELECT max(comment_created_date)
FROM comments_table WHERE comments_post_id = post.id) DESC,
post.post_created_date DESC
My idea for this query was that it would order a series of posts like so
Post 1:
created Date: Jan 1 2015
comments : []
Post 2:
created Date: Jan 5 2015
comments : []
Post 3:
created Date : December 1 2014
comments: [
0: Created Date: Jan 6 2015
]
So in this case the order the posts would be returned in is
Post 3, Post 2, Post 1
Because Post 3 has a comment that is newer than any other posts, and Post 2 was created before Post 1.
But when I run the query, the posts are still all sorted by their created date and the query doesn't seem to take into account the comments created date.
It should work like this:
SELECT id
FROM post_table p
WHERE post_user_id = $user_id -- this is your input parameter
ORDER BY GREATEST(
(
SELECT max(comment_created_date)
FROM comments_table
WHERE comments_post_id = p.id
)
, post_created_date) DESC NULLS LAST;
You will want to add NULLS LAST if date columns can be NULL.
PostgreSQL sort by datetime asc, null first?
If comments can only be later than posts (would make sense), you can use COALESCE instead of GREATEST.
Cleaner alternative (may or may not be faster, depending on data distribution):
SELECT id
FROM post_table p
LEFT JOIN (
SELECT comments_post_id AS id, max(comment_created_date) AS max_date
FROM comments_table
GROUP BY 1
) c USING (id)
WHERE post_user_id = $user_id
ORDER BY GREATEST(c.max_date, p.post_created_date) DESC NULLS LAST;
Since you have pg 9.3 you can also use a LATERAL join. Probably faster:
SELECT id
FROM post_table p
LEFT JOIN LATERAL (
SELECT max(comment_created_date) AS max_date
FROM comments_table
WHERE comments_post_id = p.id
GROUP BY comments_post_id
) c ON TRUE
WHERE post_user_id = $user_id
ORDER BY GREATEST(c.max_date, p.post_created_date) DESC NULLS LAST;
The other thing I would suggest, if this is going to be a heavily hit query, is to create a calculated column on the post_table of 'date_last_commented'
and use a trigger to update that on any insert or update of the comments table.
Then you could run simple query of:
SELECT id
FROM post_table post
WHERE post.post_user_id = user_id
ORDER BY post.date_last_commented DESC nulls last, post.post_created_date DESC
;

Group by repeating attribute

Basically I have a table messages, with user_id field that identifies a user that created the message.
When I display a conversation(set of messages) between two users, I want to be able to group the messages by user_id, but in a tricky way:
Let's say there are some messages (sorted by created_at desc):
id: 1, user_id: 1
id: 2, user_id: 1
id: 3, user_id: 2
id: 4, user_id: 2
id: 5, user_id: 1
I want to get 3 message groups in the below order:
[1,2], [3,4], [5]
It should group by *user_id* until it sees a different one and then groups by that one.
I'm using PostgreSQL and would be happy to use something specific to it, whatever would give the best performance.
Try something like this:
SELECT user_id, array_agg(id)
FROM (
SELECT id,
user_id,
row_number() OVER (ORDER BY created_at)-
row_number() OVER (PARTITION BY user_id ORDER BY created_at) conv_id
FROM table1 ) t
GROUP BY user_id, conv_id;
The expression:
row_number() OVER (ORDER BY created_at)-
row_number() OVER (PARTITION BY user_id ORDER BY created_at) conv_id
Will give you a special id for every message group (this conv_id can be repeated for other user_id, but user_id, conv_id will give you all distinct message groups)
My SQLFiddle with example.
Details: row_number(), OVER (PARTITION BY ... ORDER BY ...)
Proper SQL
I want to get 3 message groups in the below order: [1,2], [3,4], [5]
To get the requested order, add ORDER BY min(id):
SELECT grp, user_id, array_agg(id) AS ids
FROM (
SELECT id
, user_id
, row_number() OVER (ORDER BY id) -
row_number() OVER (PARTITION BY user_id ORDER BY id) AS grp
FROM tbl
ORDER BY 1 -- for ordered arrays in result
) t
GROUP BY grp, user_id
ORDER BY min(id);
db<>fiddle here
Old sqliddle
The addition would barely warrant another answer. The more important issue is this:
Faster with PL/pgSQL
I'm using PostgreSQL and would be happy to use something specific to it, whatever would give the best performance.
Pure SQL is all nice and shiny, but a procedural server-side function is much faster for this task. While processing rows procedurally is generally slower, plpgsql wins this competition big-time, because it can make do with a single table scan and a single ORDER BY operation:
CREATE OR REPLACE FUNCTION f_msg_groups()
RETURNS TABLE (ids int[])
LANGUAGE plpgsql AS
$func$
DECLARE
_id int;
_uid int;
_id0 int; -- id of last row
_uid0 int; -- user_id of last row
BEGIN
FOR _id, _uid IN
SELECT id, user_id FROM messages ORDER BY id
LOOP
IF _uid <> _uid0 THEN
RETURN QUERY VALUES (ids); -- output row (never happens after 1 row)
ids := ARRAY[_id]; -- start new array
ELSE
ids := ids || _id; -- add to array
END IF;
_id0 := _id;
_uid0 := _uid; -- remember last row
END LOOP;
RETURN QUERY VALUES (ids); -- output last iteration
END
$func$;
Call:
SELECT * FROM f_msg_groups();
Benchmark and links
I ran a quick test with EXPLAIN ANALYZE on a similar real life table with 60k rows (execute several times, pick fastest result to exclude cashing effects):
SQL:
Total runtime: 1009.549 ms
Pl/pgSQL:
Total runtime: 336.971 ms
Related:
GROUP BY and aggregate sequential numeric values
GROUP BY consecutive dates delimited by gaps
Ordered count of consecutive repeats / duplicates
The GROUP BY clause will collapse the response in 2 records - one with user_id 1 and one with user_id 2 no matter of the ORDER BY clause so I recommend you'd send just the ORDER BY created_at
prev_id = -1
messages.each do |m|
if ! m.user_id == prev_id do
prev_id = m.user_id
#do whatever you want with a new message group
end
end
You can use chunk:
Message = Struct.new :id, :user_id
messages = []
messages << Message.new(1, 1)
messages << Message.new(2, 1)
messages << Message.new(3, 2)
messages << Message.new(4, 2)
messages << Message.new(5, 1)
messages.chunk(&:user_id).each do |user_id, records|
p "#{user_id} - #{records.inspect}"
end
The output:
"1 - [#<struct Message id=1, user_id=1>, #<struct Message id=2, user_id=1>]"
"2 - [#<struct Message id=3, user_id=2>, #<struct Message id=4, user_id=2>]"
"1 - [#<struct Message id=5, user_id=1>]"

Optimizing a troublesome query

I'm generating a PDF, via php from 2 mysql tables, that contains a table. On larger tables the script is eating up a lot of memory and is starting to become a problem.
My first table contains "inspections." There are many rows per day. This has a many to one relationship with the user table.
Table "inspections"
id
area
inpsection_date
inpsection_agent_1
inpsection_agent_2
inpsection_agent_3
id (int)
area (varchar) - is one of 8 "areas" ie: Concrete, Soils, Earthwork
inspection_date (int) - unix timestamp
inspection_agent_1 (int) - a user id
inspection_agent_2 (int) - a user id
inspection_agent_3 (int) - a user id
Second table is the user's info. All I need is to join the name to the "inspection_agents_x"
id
name
The final table, that is going to be in the PDF, needs to organize the data by:
by day
by user, find every "area" that the user "inspected" on that day
Concrete
Soils
Earthwork
1/18/2011
Jon Doe
X
Jane Doe
X
X
And so on for each day. Right now I'm just doing a simple join on the names and then organizing everything on the code end. I know I'm leaving a lot on the table as far as the queries go, I just can't think of way to do it.
Thanks for any and all help.
Select U.name
, user_inspections.inspection_date
, Min( Case When user_inspections.area = 'Concrete' Then 'X' End ) As Concrete
, Min( Case When user_inspections.area = 'Soils' Then 'X' End ) As Soils
, Min( Case When user_inspections.area = 'Earthwork' Then 'X' End ) As Earthwork
From users As U
Join (
Select area, inspection_date, inspection_agent1 As user_id
From inspections
Union All
Select area, inspection_date, inspection_agent2 As user_id
From inspections
Union All
Select area, inspection_date, inspection_agent3 As user_id
From inspections
) As user_inspections
On user_inspections.user_id = U.id
Group By U.name, user_inspections.inspection_date
This is effectively a static crosstab. It means that you will need to know all areas that should be outputted in the query at design time.
One of the reasons this query is problematic is that your schema is not normalized. Your inspection table should look like:
Create Table inspections
(
id int...
, area varchar...
, inspection_date date ...
, inspection_agent int References Users ( Id )
)
That would avoid the inner Union All query to get the output you want.
I would go like this:
select i.*, u1.name, u2.name, u3.name from inspections i left join users u1 on (i.inspection_agent_id1 = u1.id) left join users u2 on (i.inspection_agent_id2 = u2.id) left join users u3 on (i.inspection_agent_id3 = u3.id) order by i.inspection_date asc;
Then select distinct areas names and remember them or fetch them from area table if you have any:
select distinct area from inspections;
Then it's just foreach:
$day = "";
foreach($inspection in $inspections)
{
if($day == "" || $inspection["inspection_date"] != $day)
{
//start new row with date here
}
//start standard row with user name
}
It isn't clear if you have to display all users each time ( even if some of them do not do inspection that thay), if you have to you should fetch users once and loop over $users and search for user in $inspection row.

SQL query to get duplicate record counts based on other factors

I have a table (participants) which has multiple columns that could all be distinct.
Two columns that are of special interest in this query are the userID and the programID
I have a two part inquery here.
I want to be able to acquire the list of all userIDs that appear more than once in this table. How do I go about doing it?
I want to be able to acquire the count of all programID's where the same userID appears in multiple programIDs. (I.E. count of programs where same userID appears in 2 programs, count of programs where same USErID appears in 3 programs, etc...)
For Example:
programID: prog1
userID: uid1
userID: uid3
userID: uid12
programID: prog2
userID: uid3
userID: uid5
userID: uid14
userID: uid27
programID: prog3
userID: uid3
userID: uid7
userID: uid14
userID: uid30
programID: prog4
userID: uid1
Expected Results:
userID count = 2; programs = 3
userID count = 3; programs = 3
Can anyone please help me with this.
my current code for question 1 is:
SELECT
WPP.USERID,
WPI.EMAIL,
WPI.FIRSTNAME,
WPI.LASTNAME,
WPI.INSTITUTION
FROM WEBPROGRAMPARTICIPANTS WPP
INNER JOIN WEBPERSONALINFO WPI
ON WPP.USERID = WPI.USERID
INNER JOIN WEBPROGRAMS WP
ON WPP.PROGRAMCODE = WP.PROGRAMCODE
WHERE
WP.PROGRAMTYPE IN ('1','2','3','4','5','6', '9', '10')
GROUP BY
WPP.USERID,
WPI.EMAIL,
WPI.FIRSTNAME,
WPI.LASTNAME,
WPI.INSTITUTION
HAVING COUNT(WPP.USERID) > 1
ORDER BY WPI.EMAIL
1.
select userID , SUM(userID ) AS Count
from Preparations
group by userID where Count > 1
Your query for part one looks good. Here's your query for part 2:
SELECT DISTINCT p1.programID, COUNT(p1.userID) AS Multiple
FROM participants p1
JOIN participants p2 ON p2.userID = p1.userID
GROUP BY p1.userID, programID
ORDER BY Multiple, programID
It lists programID and the number of other programIDs that the same userID appears in for each programID. I think your expected results are wrong for your sample data. It should be:
userID count = 1; programs = 3;
userID count = 2; programs = 4;
userID count = 3; programs = 3;
You can use the above as a subquery (derived table) if you want to fine tune the results to look more like your expected results.
This was an issue on my side, with a logic step that was left out.