I have 3 tables. User Accounts, IncomingSentences and AnnotatedSentences. Annotators annotate the incoming sentences and tag an intent to it. Then, admin reviews those taggings and makes the corrections on the tagged intent.
DB-Fiddle Playground link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=00a770173fa0568cce2c482643de1d79
Assuming myself as the admin, I want to pull the error report per annotator.
My tables are as follows:
User Accounts table:
userId
userEmail
userRole
1
user1#gmail.com
editor
2
user2#gmail.com
editor
3
user3#gmail.com
editor
4
user4#gmail.com
admin
5
user5#gmail.com
admin
Incoming Sentences Table
sentenceId
sentence
createdAt
1
sentence1
2021-01-01
2
sentence2
2021-01-01
3
sentence3
2021-01-02
4
sentence4
2021-01-02
5
sentence5
2021-01-03
6
sentence6
2021-01-03
7
sentence7
2021-02-01
8
sentence8
2021-02-01
9
sentence9
2021-02-02
10
sentence10
2021-02-02
11
sentence11
2021-02-03
12
sentence12
2021-02-03
Annotated Sentences Table
id
annotatorId
sentenceId
annotatedIntent
1
1
1
intent1
2
4
1
intent2
3
2
2
intent4
4
3
4
intent4
5
1
5
intent2
6
3
3
intent3
7
5
3
intent2
8
1
6
intent4
9
4
6
intent1
10
1
7
intent1
11
4
7
intent3
12
3
9
intent3
13
2
10
intent3
14
5
10
intent1
Expected Output:
I want an output as a table which provides the info about total-sentences-annotated-per-each editor and the total-sentences-corrected-by-admin on top of editor annotated sentences. I don't want to view the admin-tagged-count in the same table. If it comes also, total-admin-corrected should return 0.
|userEmail |totalTagged|totalAdminCorrected|
|---------------|------------|---------------------|
|user1#gmail.com| 4 | 3 |
|user2#gmail.com| 2 | 1 |
|user3#gmail.com| 3 | 1 |
Query I wrote: I've tried my best. You can see that in the DB-Fiddle
My query is not resulting in the expected output. Requesting your help to achieve this.
My proposal...
SELECT UserEmail, SUM(EDICount), SUM(ADMCount)
FROM (SELECT UserAccounts.UserEmail, AnnotatedSentences.SentenceID, COUNT(*) AS EDICount
FROM AnnotatedSentences
LEFT JOIN UserAccounts ON UserAccounts.UserID=AnnotatedSentences.AnnotatorID
WHERE UserRole='editor'
GROUP BY UserAccounts.UserEmail, AnnotatedSentences.SentenceID) AS EDI
LEFT JOIN (SELECT AnnotatedSentences.SentenceID, COUNT(*) AS ADMCount
FROM AnnotatedSentences
LEFT JOIN UserAccounts ON UserAccounts.UserID=AnnotatedSentences.AnnotatorID
WHERE UserRole='admin'
GROUP BY AnnotatedSentences.SentenceID) AS ADM ON EDI.SentenceID=ADM.SentenceID
GROUP BY UserEmail
Because sentence_id might be reviewed by different users (role), you can try to use subquery (INNER JOIN between user_accounts & annotated_sentences) with window function + condition aggregate function, getting count by your logic.
if you don't want to see admin count information you can use where filter rows.
SELECT user_email,
count(Total_Tagged) Total_Tagged,
SUM(totalAdmin) totalAdmin
FROM (
SELECT ist.sentence_id,
user_email,
user_role,
count(CASE WHEN a.user_role = 'editor' THEN 1 END) over(partition by ist.sentence_id) + count(CASE WHEN a.user_role = 'admin' THEN 1 END) over(partition by ist.sentence_id) Total_Tagged,
count(CASE WHEN a.user_role = 'admin' THEN 1 END) over(partition by ist.sentence_id) totalAdmin
FROM user_accounts a
INNER JOIN annotated_sentences ats ON
a.user_id = ats.annotator_id
INNER JOIN incoming_sentences ist
ON ist.sentence_id = ats.sentence_id
) t1
WHERE user_role = 'editor'
GROUP BY user_email
ORDER BY user_email
sqlfiddle
Okay, i really rushed this so there might still be an error in the Code, but try something like this:
SELECT
a.user_email,
count(ist) Total_Tagged,
sum(innerTable.edits)
FROM
incoming_sentences ist
JOIN annotated_sentences ats ON
ist.sentence_id = ats.sentence_id
JOIN user_accounts a ON
a.user_id = ats.annotator_id
LEFT JOIN ( SELECT ics.sentence_id, count(anno.id) AS edits FROM annotated_sentences anno
LEFT JOIN user_accounts ua ON
ua.user_id = anno.annotator_id
LEFT JOIN incoming_sentences AS ics ON
ics.sentence_id = anno.sentence_id
WHERE user_role LIKE 'admin'
GROUP BY ics.sentence_id ) AS innerTable
ON innerTable.sentence_id = ist.sentence_id
GROUP BY a.user_email
The inner select should count how many admin-edits there are per post, the outer one then sums up that number for every post a user edited.
If it is guaranteed that one sentence can only be annotated once and only be reviewed once, then you can simply group by sentence and get the editor and admin. Then you group by editor and count.
select
editor,
count(*) as total_tagged,
count(admin) as total_admin_corrected
from
(
select
max(ua.user_email) filter (where ua.user_role = 'editor') as editor,
max(ua.user_email) filter (where ua.user_role = 'admin') as admin
from annotated_sentences ans
join user_accounts ua on ua.user_id = ans.annotator_id
group by ans.sentence_id
) with_editor_and_admin
group by editor
order by editor;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e409ec49af25ac8329a99b02161832fb
Related
I have a list of total store visits for a customer for a month. The customer has a home store but can visit other stores. Like the table below:
MemberId | HomeStoreId | VisitedStoreId | Month | Visits
1 5 5 1 5
1 5 3 1 2
1 5 2 1 1
1 5 4 1 7
I want my select statement to give the number of visits to the home store against each store for that member for that month. Like the below:
MemberId | HomeStoreId | VisitedStoreId | Month | Visits | HomeStoreVisits
1 5 5 1 5 5
1 5 3 1 2 5
1 5 2 1 1 5
1 5 4 1 7 5
I've looked at a SUM with CASE statements inside and OVER with PARTITION but I can't seem to work it out.
Thanks
I would use window functions:
select t.*,
sum(case when homestoreid = visitedstoreid then visits end) over
(partition by memberid, month) as homestorevisits
from t;
SELECT MemberID,HomestoreID,visitedstoreid,Month,visits, homestorevisits
FROM Table LEFT OUTER JOIN
(SELECT MemberID, Visits homestorevisits
FROM TABLE WHERE homestoreID =VisitedStoreId
)T ON T.MemberID = Table.MemberID
You can achieve this using a simple subquery.
SELECT MemberId, HomeStoreID, VisitedStoreID, Month, Visits,
(SELECT Visits FROM table t2
WHERE t2.MemberId = t1.MemberId
AND t2.HomeStoreId = t1.HomeStoreId
AND t2.Month = t1.Month
AND t2.VisitedStoreId = t2.HomeStoreId) AS HomeStoreVisits
FROM table t1
I have 2 tables named user and statistics
user table has 3 columns: id, name and category
statistics table has 3 columns: id, idUser (relational), cal
something like this:
user
Id name category
1 name1 1
2 name2 2
3 name3 3
statistics
Id idUser cal
1 1 1
2 1 1
3 1 1
4 2 1
5 2 1
How can I apply a query that sum the cal column by each category of users and give me something like this:
category totalcal
1 3
2 2
3 0
You want to do a left join to keep all the categories. The rest is just aggregation:
select u.category, coalesce(sum(s.cal), 0) as cal
from users u left join
statistics s
on u.id = s.idUser
group by u.category;
Use LEFT JOIN to get 0 sum for the category=3:
SELECT
user.category
,SUM(statistics.cal) AS totalcal
FROM
user
LEFT JOIN statistics ON statistics.idUser = user.Id
GROUP BY
user.category
Here SUM would return NULL for category=3. To get 0 instead of NULL you can use COALESCE(SUM(statistics.cal), 0).
I'm facing a problem with Postgres. Here is the example:
i got 3 tables: users, items and boxes
boxes table:
user_id | item_id
1 | 3
1 | 4
1 | 6
1 | 7
2 | 5
2 | 10
2 | 11
3 | 5
3 | 6
3 | 7
Given this boxes table, i would like to retrieve items among users who share minimum 2. So the SQL query result expected should be
item_id: 6, 7
because user 1 and user 3 share items 6 and 7.
But user 2 and 3 share only one item: the item 5 so item 5 is not in result.
I'm trying so many ways without success. I wonder if someone can help me.
Try this. It returns 6 and 7 (and 5,6,7 if you add a record "1,5"), but I haven't tested it extensively.
-- The Outer query gets all the item_ids matching the user_ids returned from the subquery
SELECT DISTINCT c.item_id FROM boxes c -- need DISTINCT because we get 1,3 and 3,1...
INNER JOIN boxes d ON c.item_id = d.item_id
INNER JOIN
--- the subquery gets all the combinations of user ids which have more than one shared item_id
(SELECT a.user_id as first_user,b.user_id as second_user FROM
boxes a
INNER JOIN boxes b ON a.item_id = b.item_id AND a.user_id <> b.user_id -- don't count items where the user_id is the same! Could just make the having clause be > 2 but this way is clearer
GROUP BY a.user_id,b.user_id
HAVING count(*) > 1) s
ON s.first_user = c.user_id AND s.second_user = d.user_id
I want to display all available users (user type: employee) on a given schedule date. They are not available if they are scheduled both day (PM/AM)
Here are my following tables:
User Types
TypeID TypeName
1 Admin
2 Employee
Users
UserID TypeID Name
1 1 Admin 1
2 2 Employee 1
3 2 Employee 2
4 1 Admin 2
5 2 Employee 3
6 2 Employee 4
7 2 Employee 5
Schedule
SchedID UserID SchedDate Day (PM/AM)
1 2 8/27/2013 PM
2 2 8/27/2013 AM
3 3 8/27/2013 AM
4 5 8/27/2013 PM
5 6 8/27/2013 AM
Expected Result (WHERE SchedDate='8/27/2013')
UserID Name
3 Employee 2
5 Employee 3
6 Employee 4
7 Employee 5
This is my current SQL statement:
SELECT Users.UserID, Users.Name FROM Users LEFT OUTER JOIN
Schedule ON Schedule.UserID = Users.UserID WHERE Users.TypeID = 5
Let's phrase this a little differently. A user is unavailable if the user has both AM and PM scheduled for the DAY column. Otherwise, the user is available.
Given that there are only two values in that column, the following query does the filtering you want:
SELECT u.UserID, u.Name
FROM Users u LEFT OUTER JOIN
Schedule s
ON s.UserID = u.UserID and
s.ScheduleDate = '2013-08-27'
WHERE u.TypeID = 5
GROUP BY u.UserID, u.Name
HAVING COUNT(distinct s.day) < 2;
If you know the values are never repeated, then you can change the having clause to:
HAVING COUNT(*) < 2;
This is a bit of a trick. When there is no match in the schedule table at all, the counts will return 0 (in the first case) or 1 (in the second case).
SELECT USERS.USERID,
USERS.NAME
FROM USERS
WHERE NOT EXISTS (SELECT SCHEDID
FROM SCHEDULE
WHERE SCHEDULE.USERID = USERS.USERID
AND DAY = 'AM')
AND NOT EXISTS (SELECT SCHEDID
FROM SCHEDULE
WHERE SCHEDULE.USERID = USERS.USERID
AND DAY = 'PM')
Sample tables (many2many = users has many tickers and tickers has many users):
#users
id relevance
1 10
2 6
3 8
4 3
5 5
#users_tickers
user_id ticker_id
1 2
1 3
2 4
2 1
2 3
3 2
4 2
...
I must select users with max relevance for each ticker - so for each ticker one user with the best relevance.
How would you do that?
Something like this should do:
SELECT FROM users u
INNER JOIN users_tickers ut ON ut.user_id=u.id
WHERE NOT EXISTS(
SELECT FROM users u1
INNER JOIN users_tickers ut1 ON ut1.user_id=u1.id
WHERE ut1.ticker_id=ut.ticker_id AND u1.relevance > u.relevance
)
Here is how I would do this in T-SQL. I am not familiar with MySQL's dialect.
SELECT * FROM users AS U
INNER JOIN user_tickers t ON t.user_id = U.user_id
WHERE U.RELEVANCE = (
SELECT MAX(RELEVANCE) FROM users as usub WHERE U.user_id = usub.user_id
)
ORDER BY U.relevance DESC
Produces:
user_id relevance user_id ticker_id
1 10 1 2
1 10 1 3
3 8 3 2
2 6 2 4
2 6 2 1
2 6 2 3
4 3 4 2
I think you can do this:
SELECT *,
(SELECT U2.user_id FROM users as U2, users_tickers as UT2
WHERE UT2.ticker_id = T1.ticker_id
AND UT2.user_id = U2.user_id
ORDER BY U2.relevance
LIMIT 1)
FROM tickers as T1
I don't know if this is good code, but I think this is what you're looking for. Last time I did SQL everybody was saying Joins are slow, use subquerys. Don't know if that still applies.
EDIT: I tested it in MySQL with the values you gave (and an added tickers table) and it works.
EDIT: And of course you can change the 'n' users by changing LIMIT 1... to LIMIT n
SELECT * FROM users_tickers LEFT JOIN users ON users.id = users_tickers.user_id WHERE ticker_id = 2 ORDER BY relevance DESC LIMIT 1;
This would give you the info you requested for a given ticker. Is this what you are asking for? or a complete list of tickers with its most relevant user?
I'm looking for a fast sql. I did it like this...
SELECT *
FROM (
SELECT id, relevance, ticker_id
FROM `users`
INNER JOIN users_tickers ON users_tickers.user_id=users.id
ORDER BY users.relevance DESC, users.created_at DESC
) AS X
GROUP BY ticker_id
ORDER BY relevance DESC
LIMIT 0,10
Still far from optimal :).