SQL Order and group by - sql

I'm getting a bit lost in making a query that performs a certain look up.
I have the first part of the query going, which returns me all accounts that have missing some entries on their account. Now I need to filter this subset further based on their last login attempt.
The table structures are as follows:
The users table contains all user information. We only care about users under project_id 33.
The users_account_list contains all accounts a user has. We only care about users NOT having an entry for service 50.
The users_login_logs contains all login attempts for a user.
The original query I have this this:
SELECT u.id,
u.login,
u.email,
u.nickname,
b.station_login AS "additionals.station_login",
a.id AS "user_account_list.id",
a.game_id AS "user_account_list.game_id",
a.game_uid AS "user_account_list.game_uid",
c.created_at AS "last login"
FROM users u
LEFT JOIN user_account_list a ON u.id = a.user_id AND a.game_id = 50
LEFT JOIN user_additionals b ON u.id = b.id
LEFT JOIN user_login_logs c ON u.id = c.user_id
WHERE u.project_id = 33
AND u.verified_at IS NOT NULL
AND (a.id IS NULL OR a.game_id IS NULL OR a.game_uid IS NULL)
AND (b.station_login IS NULL OR b.station_login = '')
ORDER BY c.created_at DESC
This returns me all users that have been registered under project_id 33, and do not have an entry for game_id 50, and have no information stored in their additional info table. Optional, but not relevant, Just limits the data returned. It does give me multiple rows per user back , sorted according their latest login date.
What I need is to get only 1 row per user returned with their LATEST login date. I tried replacing the ORDER BY with GROUP by u.id but this gives me the oldest result back, not the latest.
How can I:
Limit the rows returned to only 1 row per user
Make sure the row is based on the latest login attempt of the user.
EDIT:
This is what the query currently returns:
+----+-------+-----------------+----------+---------------------------+----------------------+---------------------------+----------------------------+---------------------+
| id | login | email | nickname | additionals.station_login | user_account_list.id | user_account_list.game_id | user_account_list.game_uid | last login |
+----+-------+-----------------+----------+---------------------------+----------------------+---------------------------+----------------------------+---------------------+
| 1 | usrnm | someon#mail.com | Nickname | | NULL | NULL | NULL | 2012-10-19 00:00:00 |
| 1 | usrnm | someon#mail.com | Nickname | | NULL | NULL | NULL | 2012-10-18 00:00:00 |
| 1 | usrnm | someon#mail.com | Nickname | | NULL | NULL | NULL | 2012-10-17 00:00:00 |
+----+-------+-----------------+----------+---------------------------+----------------------+---------------------------+----------------------------+---------------------+
3 rows in set (0.08 sec)

One way to do this, is to JOIN the table user_login_logs, with the following table:
SELECT user_id, MAX(created_at) LatestDate
FROM user_login_logs
GROUP BY user_id
and join it on created_at = LatestDate. This will limit the users login logs to the latest created date for each user. Here is your query:
SELECT u.id,
u.login,
u.email,
u.nickname,
b.station_login AS "additionals.station_login",
a.id AS "user_account_list.id",
a.game_id AS "user_account_list.game_id",
a.game_uid AS "user_account_list.game_uid",
c.created_at AS "last login"
FROM users u
LEFT JOIN user_account_list a ON u.id = a.user_id AND a.game_id = 50
LEFT JOIN user_additionals b ON u.id = b.id
LEFT JOIN user_login_logs c ON u.id = c.user_id
LEFT JOIN
(
SELECT user_id, MAX(created_at) LatestDate
FROM user_login_logs
GROUP BY user_id
) maxc ON c.userid = maxc.userid AND c.created_at = maxc.LatestDate
WHERE u.project_id = 33
AND u.verified_at IS NOT NULL
AND (a.id IS NULL OR a.game_id IS NULL OR a.game_uid IS NULL)
AND (b.station_login IS NULL OR b.station_login = '')
ORDER BY c.created_at DESC;
Note that: You are LEFT JOIN the table, so that the unmatched rows from the left joined table will be included in the result set. If you didn't need to include them in the result set, use INNER JOIN instead.

You need to replace your ORDER BY by a GROUP by u.id as you tried but you also need in your SELECT to indicate that you want the last date in the group so you need to replace
c.created_at AS "last login"
by
MAX(c.created_at) AS "last login"
This will return only one line per user thanks to the GROUP BY and you will only select the latest date for each user thanks to MAX()
Edit: I think you should avoid using alias column names with spaces inside to avoid mistakes

Related

Efficiently getting multiple counts of foreign key rows in PostgreSQL

I have a database that consists of users who can perform various actions, which I keep track of in multiple tables. I'm creating a point system, so I need to count how many of each type of action the user did. For example, if I had:
users posts comments shares
id | username id | user_id id | user_id id | user_id
------------- -------------- -------------- --------------
1 | abc 1 | 1 1 | 1 1 | 2
2 | xyz 2 | 1 2 | 2 2 | 2
I would want to return:
user_details
id | username | post_count | comment_count | share_count
---------------------------------------------------------
1 | abc | 2 | 1 | 0
2 | xyz | 0 | 1 | 2
This is slightly different from this question about foreign key counts since I want to return the individual counts per table.
What I've tried so far (example code):
SELECT
users.id,
users.username,
COUNT( DISTINCT posts.id ) as post_count,
COUNT( DISTINCT comments.id ) as comment_count,
COUNT( DISTINCT shares.id ) as share_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
LEFT JOIN comments ON comments.user_id = users.id
LEFT JOIN shares ON shares.user_id = users.id
GROUP BY users.id
While this works, I had to use DISTINCT in all of my counts because the LEFT JOINS were causing high numbers of duplicate rows. I feel like there must be a better way to do this since (please correct me if I'm wrong) on each LEFT JOIN, the DISTINCT is having to filter out an exponentially growing number of duplicated rows.
Thank you so much for any help you could give me with this!
You can join derived tables that already do the aggregation.
SELECT u.id,
u.username,
coalesce(pc.c, 0) AS post_count,
coalesce(cc.c, 0) AS comment_count,
coalesce(sc.c, 0) AS share_count
FROM users AS u
LEFT JOIN (SELECT p.user_id,
count(*) AS cc
FROM posts AS p
GROUP BY p.user_id) AS pc
ON pc.user_id = u.id
LEFT JOIN (SELECT c.user_id,
count(*) AS
FROM comments AS c
GROUP BY c.user_id) AS cc
ON cc.user_id = u.id
LEFT JOIN (SELECT s.user_id,
count(*) AS c
FROM shares AS s
GROUP BY s.user_id) AS sc
ON sc.user_id = u.id;

PostgreSQL - How to remove duplicates when doing LEFT OUTER JOIN with WHERE clause?

I have 2 tables:
users table
+--------+---------+
| id | integer |
+--------+---------+
| phone | string |
+--------+---------+
| active | boolean |
+--------+---------+
statuses table
+---------+---------+
| id | integer |
+---------+---------+
| user_id | integer |
+---------+---------+
| step_1 | boolean |
+---------+---------+
| step_2 | boolean |
+---------+---------+
I'm doing LEFT OUTER JOIN statuses table on users table with WHERE clause like this:
SELECT users.id, statuses.step_1, statuses.step_2
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
ORDER BY users.id DESC
My problem
There are some users that have same phone number inside the users table and I want remove the duplicate users based on the phone number.
I don't want to delete them from database. But just want to exclude them for this query only.
For example, say John (ID: 1) and Sara (ID: 2) shared same phone number (+6012-3456789), removing one of them, either John or Sara is fine for me.
What I've tried but did not work?
First:
SELECT DISTINCT users.phone
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
ORDER BY users.id DESC
Second:
SELECT users.phone, COUNT(*)
FROM users
LEFT OUTER JOIN statuses ON users.id = statuses.user_id
WHERE (users.active='f')
GROUP BY phone
HAVING COUNT(users.phone) > 1
I would do this before doing the join. In Postgres, select distinct on is a very useful construct:
SELECT u.id, s.step_1, s.step_2
FROM (SELECT distinct on (phone) u.*
FROM users u
WHERE u.active = 'f'
ORDER BY phone
) u LEFT OUTER JOIN
statuses s
ON u.id = s.user_id
WHERE u.active = 'f'
ORDER BY u.id DESC;
distinct on returns one row for whatever is in parentheses. In this case, that would be by phone (based on "I want remove the duplicate users based on the phone number"). Then, the join should not be showing these as duplicates.
Here is one way
Self Join the users table and join using phone numbers and filter any one of the duplicate name by comparison operator.
SELECT *
FROM (SELECT u.*
FROM users u
JOIN users u1
ON u. u.phone = u1.phone -- to
AND u.name >= u1.name) u
LEFT OUTER JOIN statuses
ON users.id = statuses.user_id
WHERE ( users.active = 'f' )
or use ROW_NUMBER
Generate row number for each phone numbers and filter the first phone number with row number as 1
SELECT *
FROM (SELECT u.*,
Row_number()OVER(partition BY phone ORDER BY name) rn
FROM users u) u
LEFT OUTER JOIN statuses
ON users.id = statuses.user_id
WHERE ( users.active = 'f' )
AND rn = 1

sql select where column is a count

I have a table with users, and I have another table with activity, the user who had the activity is logged in a column. how could I make a query so that I can select each user with the count of activities they have.
I really can't think of how to do it nor search for something like this on the web.
so for example
User table
id | name
1 | john
2 | karen
Activity table
id | user_id
1 | 1
2 | 1
3 | 2
Results
name | Count
john | 2
karen| 1
Make use of LEFT JOIN and COUNT aggregate
SELECT name, COUNT(a.user_id) count
FROM [User] u LEFT JOIN Activity a
ON u.id = a.user_id
GROUP BY u.id, u.name
Output:
| name | count |
|-------|-------|
| john | 2 |
| karen | 1 |
Here is a SQLFiddle demo
Recommended reading:
A Visual Explanation of SQL Joins
select name, count(a.Id) as ActivityCount
from [user] u
inner join activity a on u.Us = a.UserId
group by name
very simple to do. You can combine the two tables by using a join. To have the count (ie the total count) added, there is a function you can use which is conveniently called "Count". So all together, it would look something like this-
select u.id, u.name, count(*) as ct
from tblUser u
left join tblActivity a on u.id = a.id
group by u.id, u.name
order by ct desc
select
u.id as user_id, -- name is not necessary unique
max(u.name) as name,
count(a.Id) as [count]
from
[User] u
left join Activity a -- left join becuase some users can have no activities
on u.Id = a.user_id
group by u.id

Determining relationship between rows using subqueries in PostgreSQL

I'm trying to figure out how to complete the following task in a single query.
Basically, given a user's ID, I want to return the user profiles of all users he is friends with.
If anything is unclear, I'll be happy to go into more detail. Thanks!
table 'users':
user_id | col1 | col2 | etc
-----------------------------------------
a | *** | *** | ***
-----------------------------------------
b | *** | *** | ***
table 'users_friends'
user_id | friend_user_id | status
-----------------------------------------
a | b | 1
-----------------------------------------
b | a | 1
given a value of a, find rows in table users_friends where
user_id = a
status = 1
using the resulting rows of that query, find rows in table users_friends where
user_id = b (column `user_friend_id` from resulting rows)
user_friend_id = a (column `user_id` from resulting rows)
status = 1
if any rows are returned, select rows from table 'users' where
user_id = b (column `user_id` from resulting row)
This is a really rough one I came up with. I think it does what I'm looking for, but I'm sure there are better ways to go about it.
SELECT * FROM users WHERE user_id IN
(SELECT user_id FROM users_friends WHERE friend_user_id IN
(SELECT user_id FROM users_friends WHERE user_id = 'someuserid' AND status = 1 ) AND status = 1 );
select u.*
from
users u
inner join
users_friends f on u.user_id = f.friend_user_id
where
f.status = 1
and f.friend_user_id = 'a'
Assuming there are no duplicates in friends table:
SELECT u.user_id, u.col1, u.col2
JOIN users_friends AS f1 ON u.user_id=f1.user_id
JOIN users_friends AS f2 ON f1.user_id=f2.friend_id AND f1.friend_id=f2.user_id
WHERE f1.status=1 AND f2.status=1 AND f2.user_id='a'
SQL Fiddle
SELECT u.user_id, u.col1
FROM users_friends AS f
JOIN users AS u
ON f.friend_user_id = u.user_id
WHERE f.user_id = 'a'
AND f.status = 1

Subquery to return the latest entry for each parent ID

I have a parent table with entries for documents and I have a history table which logs an audit entry every time a user accesses one of the documents.
I'm writing a search query to return a list of documents (filtered by various criteria) with the latest user id to access each document returned in the result set.
Thus for
DOCUMENTS
ID | NAME
1 | Document 1
2 | Document 2
3 | Document 3
4 | Document 4
5 | Document 5
HISTORY
DOC_ID | USER_ID | TIMESTAMP
1 | 12345 | TODAY
1 | 11111 | IN THE PAST
1 | 11111 | IN THE PAST
1 | 12345 | IN THE PAST
2 | 11111 | TODAY
2 | 12345 | IN THE PAST
3 | 12345 | IN THE PAST
I'd be looking to get a return from my search like
ID | NAME | LAST_USER_ID
1 | Document 1 | 12345
2 | Document 2 | 11111
3 | Document 3 | 12345
4 | Document 4 |
5 | Document 5 |
Can I easily do this with one SQL query and a join between the two tables?
Revising what Andy White produced, and replacing square brackets (MS SQL Server notation) with DB2 (and ISO standard SQL) "delimited identifiers":
SELECT d.id, d.name, h.last_user_id
FROM Documents d LEFT JOIN
(SELECT r.doc_id AS id, user_id AS last_user_id
FROM History r JOIN
(SELECT doc_id, MAX("timestamp") AS "timestamp"
FROM History
GROUP BY doc_id
) AS l
ON r."timestamp" = l."timestamp"
AND r.doc_id = l.doc_id
) AS h
ON d.id = h.id
I'm not absolutely sure whether "timestamp" or "TIMESTAMP" is correct - probably the latter.
The advantage of this is that it replaces the inner correlated sub-query in Andy's version with a simpler non-correlated sub-query, which has the potential to be (radically?) more efficient.
I couldn't get the "HAVING MAX(TIMESTAMP)" to run in SQL Server - I guess having requires a boolean expression like "having max(TIMESTAMP) > 2009-03-05" or something, which doesn't apply in this case. (I might be doing something wrong...)
Here is something that seems to work - note the join has 2 conditions (not sure if this is good or not):
select
d.ID,
d.NAME,
h."USER_ID" as "LAST_USER_ID"
from Documents d
left join History h
on d.ID = h.DOC_ID
and h."TIMESTAMP" =
(
select max("TIMESTAMP")
from "HISTORY"
where "DOC_ID" = d.ID
)
This doesn't use a join, but for some queries like this I like to inline the select for the field. If you want to catch the situation when no user has accessed you can wrap it with an NVL().
select a.ID, a.NAME,
(select x.user_id
from HISTORY x
where x.doc_id = a.id
and x.timestamp = (select max(x1.timestamp)
from HISTORY x1
where x1.doc_id = x.doc_id)) as LAST_USER_ID
from DOCUMENTS a
where <your criteria here>
I think it should be something like this:
SELECT ID, Name, b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a LEFT JOIN
( SELECT DOC_ID, USER_ID
FROM HISTORY
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )) as b
ON a.ID = b.DOC_ID
this might work also:
SELECT ID, Name, b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a
LEFT JOIN HISTORY b ON a.ID = b.DOC_ID
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )
Select ID, Name, User_ID
From Documents Left Outer Join
History a on ID = DOC_ID
Where ( TimeStamp = ( Select Max(TimeStamp)
From History b
Where a.DOC_ID = b.DOC_ID ) OR
TimeStamp Is NULL ) /* this accomodates the Left */