MySQL group by max value - sql

Sample tables (many2many = users has many tickers and tickers has many users):
#users
id relevance
1 10
2 6
3 8
4 3
5 5
#users_tickers
user_id ticker_id
1 2
1 3
2 4
2 1
2 3
3 2
4 2
...
I must select users with max relevance for each ticker - so for each ticker one user with the best relevance.
How would you do that?

Something like this should do:
SELECT FROM users u
INNER JOIN users_tickers ut ON ut.user_id=u.id
WHERE NOT EXISTS(
SELECT FROM users u1
INNER JOIN users_tickers ut1 ON ut1.user_id=u1.id
WHERE ut1.ticker_id=ut.ticker_id AND u1.relevance > u.relevance
)

Here is how I would do this in T-SQL. I am not familiar with MySQL's dialect.
SELECT * FROM users AS U
INNER JOIN user_tickers t ON t.user_id = U.user_id
WHERE U.RELEVANCE = (
SELECT MAX(RELEVANCE) FROM users as usub WHERE U.user_id = usub.user_id
)
ORDER BY U.relevance DESC
Produces:
user_id relevance user_id ticker_id
1 10 1 2
1 10 1 3
3 8 3 2
2 6 2 4
2 6 2 1
2 6 2 3
4 3 4 2

I think you can do this:
SELECT *,
(SELECT U2.user_id FROM users as U2, users_tickers as UT2
WHERE UT2.ticker_id = T1.ticker_id
AND UT2.user_id = U2.user_id
ORDER BY U2.relevance
LIMIT 1)
FROM tickers as T1
I don't know if this is good code, but I think this is what you're looking for. Last time I did SQL everybody was saying Joins are slow, use subquerys. Don't know if that still applies.
EDIT: I tested it in MySQL with the values you gave (and an added tickers table) and it works.
EDIT: And of course you can change the 'n' users by changing LIMIT 1... to LIMIT n

SELECT * FROM users_tickers LEFT JOIN users ON users.id = users_tickers.user_id WHERE ticker_id = 2 ORDER BY relevance DESC LIMIT 1;
This would give you the info you requested for a given ticker. Is this what you are asking for? or a complete list of tickers with its most relevant user?

I'm looking for a fast sql. I did it like this...
SELECT *
FROM (
SELECT id, relevance, ticker_id
FROM `users`
INNER JOIN users_tickers ON users_tickers.user_id=users.id
ORDER BY users.relevance DESC, users.created_at DESC
) AS X
GROUP BY ticker_id
ORDER BY relevance DESC
LIMIT 0,10
Still far from optimal :).

Related

Join three tables and retrieve the expected result

I have 3 tables. User Accounts, IncomingSentences and AnnotatedSentences. Annotators annotate the incoming sentences and tag an intent to it. Then, admin reviews those taggings and makes the corrections on the tagged intent.
DB-Fiddle Playground link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=00a770173fa0568cce2c482643de1d79
Assuming myself as the admin, I want to pull the error report per annotator.
My tables are as follows:
User Accounts table:
userId
userEmail
userRole
1
user1#gmail.com
editor
2
user2#gmail.com
editor
3
user3#gmail.com
editor
4
user4#gmail.com
admin
5
user5#gmail.com
admin
Incoming Sentences Table
sentenceId
sentence
createdAt
1
sentence1
2021-01-01
2
sentence2
2021-01-01
3
sentence3
2021-01-02
4
sentence4
2021-01-02
5
sentence5
2021-01-03
6
sentence6
2021-01-03
7
sentence7
2021-02-01
8
sentence8
2021-02-01
9
sentence9
2021-02-02
10
sentence10
2021-02-02
11
sentence11
2021-02-03
12
sentence12
2021-02-03
Annotated Sentences Table
id
annotatorId
sentenceId
annotatedIntent
1
1
1
intent1
2
4
1
intent2
3
2
2
intent4
4
3
4
intent4
5
1
5
intent2
6
3
3
intent3
7
5
3
intent2
8
1
6
intent4
9
4
6
intent1
10
1
7
intent1
11
4
7
intent3
12
3
9
intent3
13
2
10
intent3
14
5
10
intent1
Expected Output:
I want an output as a table which provides the info about total-sentences-annotated-per-each editor and the total-sentences-corrected-by-admin on top of editor annotated sentences. I don't want to view the admin-tagged-count in the same table. If it comes also, total-admin-corrected should return 0.
|userEmail |totalTagged|totalAdminCorrected|
|---------------|------------|---------------------|
|user1#gmail.com| 4 | 3 |
|user2#gmail.com| 2 | 1 |
|user3#gmail.com| 3 | 1 |
Query I wrote: I've tried my best. You can see that in the DB-Fiddle
My query is not resulting in the expected output. Requesting your help to achieve this.
My proposal...
SELECT UserEmail, SUM(EDICount), SUM(ADMCount)
FROM (SELECT UserAccounts.UserEmail, AnnotatedSentences.SentenceID, COUNT(*) AS EDICount
FROM AnnotatedSentences
LEFT JOIN UserAccounts ON UserAccounts.UserID=AnnotatedSentences.AnnotatorID
WHERE UserRole='editor'
GROUP BY UserAccounts.UserEmail, AnnotatedSentences.SentenceID) AS EDI
LEFT JOIN (SELECT AnnotatedSentences.SentenceID, COUNT(*) AS ADMCount
FROM AnnotatedSentences
LEFT JOIN UserAccounts ON UserAccounts.UserID=AnnotatedSentences.AnnotatorID
WHERE UserRole='admin'
GROUP BY AnnotatedSentences.SentenceID) AS ADM ON EDI.SentenceID=ADM.SentenceID
GROUP BY UserEmail
Because sentence_id might be reviewed by different users (role), you can try to use subquery (INNER JOIN between user_accounts & annotated_sentences) with window function + condition aggregate function, getting count by your logic.
if you don't want to see admin count information you can use where filter rows.
SELECT user_email,
count(Total_Tagged) Total_Tagged,
SUM(totalAdmin) totalAdmin
FROM (
SELECT ist.sentence_id,
user_email,
user_role,
count(CASE WHEN a.user_role = 'editor' THEN 1 END) over(partition by ist.sentence_id) + count(CASE WHEN a.user_role = 'admin' THEN 1 END) over(partition by ist.sentence_id) Total_Tagged,
count(CASE WHEN a.user_role = 'admin' THEN 1 END) over(partition by ist.sentence_id) totalAdmin
FROM user_accounts a
INNER JOIN annotated_sentences ats ON
a.user_id = ats.annotator_id
INNER JOIN incoming_sentences ist
ON ist.sentence_id = ats.sentence_id
) t1
WHERE user_role = 'editor'
GROUP BY user_email
ORDER BY user_email
sqlfiddle
Okay, i really rushed this so there might still be an error in the Code, but try something like this:
SELECT
a.user_email,
count(ist) Total_Tagged,
sum(innerTable.edits)
FROM
incoming_sentences ist
JOIN annotated_sentences ats ON
ist.sentence_id = ats.sentence_id
JOIN user_accounts a ON
a.user_id = ats.annotator_id
LEFT JOIN ( SELECT ics.sentence_id, count(anno.id) AS edits FROM annotated_sentences anno
LEFT JOIN user_accounts ua ON
ua.user_id = anno.annotator_id
LEFT JOIN incoming_sentences AS ics ON
ics.sentence_id = anno.sentence_id
WHERE user_role LIKE 'admin'
GROUP BY ics.sentence_id ) AS innerTable
ON innerTable.sentence_id = ist.sentence_id
GROUP BY a.user_email
The inner select should count how many admin-edits there are per post, the outer one then sums up that number for every post a user edited.
If it is guaranteed that one sentence can only be annotated once and only be reviewed once, then you can simply group by sentence and get the editor and admin. Then you group by editor and count.
select
editor,
count(*) as total_tagged,
count(admin) as total_admin_corrected
from
(
select
max(ua.user_email) filter (where ua.user_role = 'editor') as editor,
max(ua.user_email) filter (where ua.user_role = 'admin') as admin
from annotated_sentences ans
join user_accounts ua on ua.user_id = ans.annotator_id
group by ans.sentence_id
) with_editor_and_admin
group by editor
order by editor;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e409ec49af25ac8329a99b02161832fb

Postgresql query to filter latest data based on 2 columns

Table Structure First
users table
id
1
2
3
sites table
id
1
2
site_memberships table
site_id
user_id
created_on
1
1
1
1
1
2
1
1
3
2
1
1
2
1
2
1
2
2
1
2
3
Assuming higher the created_on number, latest the record
Expected Output
site_id
user_id
created_on
1
1
3
2
1
2
1
2
3
Expected output: I need latest record for each user for each site membership.
Tried the following query, but this does not seem to work.
select * from users inner join
(
SELECT ROW_NUMBER () OVER (
PARTITION BY sm.user_id,
sm.created_on
), sm.*
from site_memberships sm
inner join sites s on sm.site_id=s.id
) site_memberships
ON site_memberships.user_id = users.user_id where row_number=1```
I think you have overcomplicated the problem you want to solve.
You seem to want aggregation:
select site_id, user_id, max(created_on)
from site_memberships sm
group by site_id, user_id;
If you had additional columns that you wanted, you could use distinct on instead:
select distinct on (site_id, user_id) sm.*
from site_memberships sm
order by site_id, user_id, created_on desc;

get sum of count of count postgres

I have two tables jobs and users.
Users has a one-to-many relationship with jobs.
I want to segment users into groups of jobs_done.
In other words, how many users did 1 job, 2 jobs, 3 jobs, etc
The below query does that. However, I would like to lump together all users that have done 3 or more jobs into one group.
Here is the query I currently have
select
jobs_done,
count(1) as number_of_users
from ( select
u.id,
count(*) as jobs_done
from jobs j
JOIN users u on j.user_id = u.id
group by u.id ) a
group by jobs_done
Current Output:
times_used number_of_users
1 255
2 100
3 30
4 10
5 9
Desired Output:
times_used number_of_users
1 255
2 100
3+ 49
You can use a case expression to group values 3+ into one large group. This should work:
select
case
when jobs_done >= 3 then '3+'
else cast(jobs_done as varchar(5))
end as jobs_done,
count(1) as number_of_users
from (
select
u.id,
count(*) as jobs_done
from jobs j
join users u on j.user_id = u.id
group by u.id
) a
group by case when jobs_done >= 3 then '3+'
else cast(jobs_done as varchar(5))
end;
you can group by basically everything. this is a simplistic example:
test=# SELECT CASE WHEN x < 4 THEN x::text ELSE '4+' END AS y,
count(*)
FROM generate_series(1, 10) AS x
GROUP BY y
ORDER BY 1;
y | count
----+-------
1 | 1
2 | 1
3 | 1
4+ | 7
(4 rows)

select N items for every group?

I have tables:
Category: Id, Name...
News: Id, Title
News_Category_Mapping: Id, NewsId, CategoryId
Where newsid, categoryid are foreign keys to these 2 tables.
News_category_mapping:
Id NewsID CategoryId
1 1 1
2 2 1
3 3 1
4 4 3
5 5 5
6 6 3
so i may want to get maximum 2 news items from every categoryid, say like
Id NewsID CategoryId
1 1 1
2 2 1
4 5 3
6 6 3
5 5 5
Sorry for my english.
Let say you need 2 items each
Select *
From Category C
CROSS APPLY (Select top 2 Id,CatId,NewsName
From News Nw where Nw.CatId=C.Id) As N
Here is the fiddle sample
Try this:
WITH CTE AS
(SELECT C.Id,N.Id,N.Title,RN=ROW_NUMBER() OVER (PARTITION BY NC.CategoryID ORDER BY NC.NewsId)
FROM News_Category_Mapping NC JOIN
News N ON NC.NewsId=N.Id JOIN
Category C ON NC.CategoryId=C.Id)
SELECT * FROM CTE WHERE RN<3
Explanation:
Here, the inner query selects the records along a row number RN. To know how the query works, please execute the inner query first.
You can use CROSS APPLY, like so:
Select c.*, Sub.*
from
Categories c cross apply
(
select top 2
*
from
News n
where
exists
(
select 1
from NewsCategories nc
where nc.CatId = c.id and n.id = nc.NewsId
)
) Sub
Here is an SQLFiddle for this

SQL to remove specific rows from select

Ive got a table:
UserA UserB UserBB UserAA
for example:
1 2 2 1
1 3 3 1
2 1 1 2
2 4 4 2
2 5 5 2
5 2 2 5
What I want to achieve is to remove rows (duplicates) like to only leave rows as in example:
1 2 2 1
1 3 3 1
2 4 4 2
2 5 5 2
2 1 1 2 -> deleted because there is already 1 2 2 1
5 2 2 5 -> deleted because there is already 2 5 5 2
How to write such a query ?
Thanks for help
-- Find Duplicate Rows
SELECT MAX(ID) as ID, CustName, Pincode FROM #Customers
GROUP BY CustName, Pincode
HAVING COUNT(*) > 1
-- Delete Duplicate Rows
DELETE FROM #Customers
WHERE ID IN
( SELECT MAX(ID) FROM #Customers
GROUP BY CustName, Pincode
HAVING COUNT(*) > 1)
Taken from MSDN. :
http://archive.msdn.microsoft.com/SQLExamples/Wiki/View.aspx?title=DuplicateRows
Let me know if you are unable to figure it out from that code.
This may be a little bit closer to your needs. :
DELETE FROM TABLE
WHERE USERA IN ( SELECT MAX(USERA) FROM TABLE
GROUP BY USERA, USERB, USERBB, USERAA HAVING COUNT(*) > 1)
The below also covers situations where UserA and UserB are equal between the two rows but UserAA and UserBB are switched and the reverse. Your question is a bit unclear about what exactly constitutes a duplicate. Hopefully this points you in the right direction at the very least though.
I would turn this into a SELECT statement first though and make sure that it is returning the rows that you think should be deleted and only those rows.
DELETE T1
FROM
My_Table T1
INNER JOIN My_Table T2 ON
(
T2.UserA = T1.UserA AND
T2.UserB = T1.UserB AND
T2.UserAA = T1.UserBB AND
T2.UserBB = T1.UserAA AND
T2.UserAA < T2.UserBB
) OR
(
T2.UserA = T1.UserB AND
T2.UserB = T1.UserA AND
T2.UserAA = T1.UserAA AND
T2.UserBB = T1.UserBB AND
T2.UserA < T2.UserB
) OR
(
T2.UserA = T1.UserB AND
T2.UserB = T1.UserA AND
T2.UserAA = T1.UserBB AND
T2.UserBB = T1.UserAA AND
T2.UserA < T2.UserB
)
It was Enough just to add:
Where UserA < UserB