Left Join with Group By - sql

I am using PostgreSQL 9.4.
I have a table of workouts. Users can create multiple results for each workout, and a result has a score.
Given a list of workout_ids and two user_ids, I want to return the best score for each workout for each user. If the user does not have a result for that workout, I want to return a padded/null result.
SELECT "results".*, "workouts".*
FROM "results" LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
WHERE (
(user_id, workout_id, score) IN
(SELECT user_id, workout_id, MAX(score)
FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3)
GROUP BY user_id, workout_id)
)
In this query, the left join is acting as an inner join; I'm not getting any padding if the user has not got a result for the workout. This query should always return six rows, regardless of how many results exist.
Example data:
results
user_id | workout_id | score
-----------------------------
1 | 1 | 10
1 | 3 | 10
1 | 3 | 15
2 | 1 | 5
Desired result:
results.user_id | results.workout_id | max(results.score) | workouts.name
-------------------------------------------------------------------------
1 | 1 | 10 | Squat
1 | 2 | null | Bench
1 | 3 | 15 | Deadlift
2 | 1 | 5 | Squat
2 | 2 | null | Bench
2 | 3 | null | Deadlift

The where filters out your NULL values, so that is why the result is not what you expect.
Joinint the WHERE clause results instead of filter the where clause results.
SELECT "results".*, "workouts".*,"max_score".*
FROM "results"
LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
LEFT JOIN (SELECT user_id, workout_id, MAX(score)
FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3)
GROUP BY user_id, workout_id) max_score ON workouts.workout_id=max_score.workout_id;
You need to alter the SELECT to get the correct columns.

SELECT DISTINCT ON (1, 2)
u.user_id
, w.id AS workout_id
, r.score
, w.name AS workout_name
FROM workouts w
CROSS JOIN (VALUES (1), (2)) u(user_id)
LEFT JOIN results r ON r.workout_id = w.id
AND r.user_id = u.user_id
WHERE w.id IN (1, 2, 3)
ORDER BY 1, 2, r.score DESC NULLS LAST;
Step by step explanation
Form a complete Cartesian product of given workouts and users.
Assuming the given workouts always exist.
Assuming that not all given users have results for all given workouts.
LEFT JOIN to results. All conditions go into the ON clause of the LEFT JOIN, not into the WHERE clause, which would exclude (workout_id, user_id) combinations that have no result. See:
Rails includes query with conditions not returning all results from left table
Finally pick the best result per (user_id, workout_id) with DISTINCT ON. While being at it, produce the desired sort order. See:
Select first row in each GROUP BY group?
Depending on the size of tables and data distribution there may be faster solutions. See:
Optimize GROUP BY query to retrieve latest row per user
Simple version
If all you want is the maximum score for each (user_id, workout_id) combination, there is simple version:
SELECT user_id, workout_id, max(r.score) AS score
FROM unnest('{1,2}'::int[]) u(user_id)
CROSS JOIN unnest('{1,2,3}'::int[]) w(workout_id)
LEFT JOIN results r USING (user_id, workout_id)
GROUP BY 1, 2
ORDER BY 1, 2;
db<>fiddle here
Old sqlfiddle.

How about using distinct on or row_number()?
SELECT DISTINCT ON (r.user_id, r.workout_id) r.*, w.*
FROM "results" r LEFT JOIN
"workouts" w
ON "w."id" = r."workout_id"
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3)
ORDER BY r.user_id, r.workout_id, score desc;
The row_number() equivalent requires a subquery:
SELECT rw.*
FROM (SELECT r.*, w.*,
row_number() over (partition by user_id, workout_id order by score desc) as seqnum
FROM "results" r LEFT JOIN
"workouts" w
ON "w."id" = r."workout_id"
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3)
) rw
WHERE seqnum = 1;
You should choose the columns more judiciously than using a *. The subquery might return errors in the case of duplicate column names.
EDIT:
You need to generate the rows first, and then the results for each. Here is one method, building on the second query:
SELECT u.user_id, w.workout_id, rw.score, rw.name
FROM (SELECT 1 as user_id UNION ALL SELECT 2) u CROSS JOIN
(SELECT 1 as workout_id UNION ALL SELECT 2 UNION ALL SELECT 3) w LEFT JOIN
(SELECT r.*, w.*,
row_number() over (partition by user_id, workout_id order by score desc) as seqnum
FROM "results" r LEFT JOIN
"workouts" w
ON "w."id" = r."workout_id"
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3)
) rw
ON rw.user_id = u.user_id and rw.workout_id = w.workout_id and
rw.seqnum = 1;

Related

Why count ignores grouping by

I don't understand why my query doesn't group results of count by the column I specified. Instead it counts all occurrences of outcome_id in the 'un' subtable.
What am I missing there?
The full structure of my sample database and the query I tried are here:
https://www.db-fiddle.com/f/4HuLpTFWaE2yBSQSzf3dX4/4
CREATE TABLE combination (
combination_id integer,
ticket_id integer,
outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer,
ticket_id integer,
val double precision
);
insert into combination
values
(510,188,'{52,70,10}'),
(511,188,'{52,56,70,18,10}'),
(512,188,'{55,70,18,10}'),
(513,188,'{54,71,18,10}'),
(514,189,'{52,54,71,18,10}'),
(515,189,'{55,71,18,10,54,56}')
;
insert into outcome
values
(52,188,1.3),
(70,188,2.1),
(18,188,2.6),
(56,188,2),
(55,188,1.1),
(54,188,2.2),
(71,188,3),
(10,188,0.5),
(54,189,2.2),
(71,189,3),
(18,189,2.6),
(55,189,2)
with un AS (
SELECT combination_id, unnest(outcomes) outcome
FROM combination c JOIN
outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN
outcome o
on o.outcome_id = un.outcome
GROUP BY 1
) x
GROUP BY 1, 2
ORDER BY 1
Expected result should be:
510 2
511 4
512 2
513 3
514 4
515 4
Assuming, you have these PK constraints:
CREATE TABLE combination (
combination_id integer PRIMARY KEY
, ticket_id integer
, outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer
, ticket_id integer
, val double precision
, PRIMARY KEY (ticket_id, outcome_id)
);
and assuming this objective:
For each row in table combination, count the number of array elements in outcomes for which there is at least one row with matching outcome_id and ticket_id in table outcome - and val >= 1.3.
Assuming above PK, this burns down to a much simpler query:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
JOIN outcome o USING (ticket_id)
WHERE o.outcome_id = ANY (c.outcomes)
AND o.val >= 1.3
GROUP BY 1
ORDER BY 1;
This alternative might be faster with index support:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
CROSS JOIN LATERAL unnest(c.outcomes) AS u(outcome_id)
WHERE EXISTS (
SELECT
FROM outcome o
WHERE o.outcome_id = u.outcome_id
AND o.val >= 1.3
AND o.ticket_id = c.ticket_id -- ??
)
GROUP BY 1
ORDER BY 1;
Plus, it does not require the PK on outcome. Any number of matching rows still count as 1, due to EXISTS.
db<>fiddle here
As always, the best answer depends on the exact definition of setup and requirements.
A simpler version of #forpas answer:
-- You don't need to join to outcomes in the "with" statement.
with un AS (
SELECT combination_id, ticket_id, unnest(outcomes) outcome
FROM combination c
-- no need to join to outcomes here
GROUP BY 1,2,3
)
SELECT combination_id, cnt FROM
(
SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un
JOIN outcome o on o.outcome_id = un.outcome
and o.ticket_id = un.ticket_id
GROUP BY 1
)x
GROUP BY 1,2
ORDER BY 1
As others have pointed out, the expected result for 514 should be 3 based on your input data.
I'd also like to suggest that using full field names in the group by and order by clauses makes queries easier to debug and maintain going forward.
You need to join on ticket_id also:
with un AS (
SELECT c.combination_id, c.ticket_id, unnest(c.outcomes) outcome
FROM combination c JOIN outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2,3
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id, un.ticket_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN outcome o
on o.outcome_id = un.outcome and o.ticket_id = un.ticket_id
GROUP BY 1,2
) x
GROUP BY 1, 2
ORDER BY 1
See the demo.
Results:
> combination_id | cnt
> -------------: | --:
> 510 | 2
> 511 | 4
> 512 | 2
> 513 | 3
> 514 | 3
> 515 | 4

Return most results for a match based on a preferential order of keywords

I have built a program to index keywords in text files and put them to the database.
My tables are simple:
FILE_ID|Name
------------
1 | a.txt
2 | b.txt
3 | c.txt
KEYWORD_ID|FILE_ID|Hits
-----------------------
1 | 1 | 55
2 | 1 | 10
3 | 1 | 88
1 | 2 | 44
2 | 2 | 15
1 | 3 | 199
2 | 3 | 1
3 | 3 | 4
There is no primary key in this table. I didn't find it necessary.
Now I'd like to search which file has most hits to certain keywords.
If I have only one keyword it is easy:
select top 10 *
from words
where keyword_id=1
order by hits desc
Lets say I want to search for files with keyword 1 and 3 (both must be present and first keyword has highest importance). After many hours I came with this:
select top 10 k.*
from
(
select file_id,
max(hits) as maxhits
from words
where keyword_id=3
group by file_id
) as x
inner join keyword as k
on (k.file_id = x.file_id
and k.keyword=1)
order by k.hits desc
How to make this right? Especially if I want to search with N keywords. Would it be better use temp table and work with that?
If searching with keyword 1 and 3 I want FILE_ID 3 and 1 returned, in this order (because file_id 3 has higher hit count for keyword 1)
Not sure, but (based on your comment) may be this is what you need ?
(I used table declaration from #scsimon answer)
declare #words table (KEYWORD_ID int, [FILE_ID] int, HITS int)
insert into #words
values
(1,1,55),
(2,1,10),
(3,1,88),
(1,2,44),
(2,2,15),
(1,3,199),
(2,3,1),
(3,3,4)
select [FILE_ID] from (
select *, row_number() over(partition by KEYWORD_ID order by HITS desc) rn from #words
where KEYWORD_ID in(1,3)
)t
where rn = 1
order by hits desc
Assuming that all relevant keywords to be found are stored in table KTable which has two columns ID and KEYWORD_ID
Then query should be
SELECT
FileID,
SUM(Hits) NetHits,
SUM(Hits/K.ID) WeightedHits
FROM
Words w JOIN Ktable K
on w.KEYWORD_ID= K.KEYWORD_ID
GROUP BY FileID
HAVING count(1) = (SELECT COUNT(1) FROM Ktable )
ORDER BY 2 DESC,3 DESC
Same query using Windowing function will be
SELECT
DISTINCT
FileID,
NetHitsPerFile
FROM
(
SELECT
FileID,
SUM(Hits) OVER (PARTITION BY FileID ORDER BY K.ID ASC) NetHitsPerFile,
SUM(FileID) OVER(PARTITION BY K.ID) Files,
SUM(Hits/K.ID) OVER (PARTITION BY FileID ORDER BY K.ID ASC) weightedHits
FROM
Words w JOIN Ktable K
on w.KEYWORD_ID= K.KEYWORD_ID
)T
WHERE Files= (SELECT COUNT(1) FROM Ktable)
ORDER BY NetHitsPerFile, weightedHits
Here's one way... if you only want to see the rows with the KEYWORD_ID you specify, just add that WHERE CLAUSE at the bottom as well. The INNER JOIN limits the FILE_ID to those which contain both KEYWORD_ID you specify by checking that the distinct count is = to the number of keywords. Thus, in the below example we limit the result set on 2 KEYWORD_ID and check to make sure each FILE_ID has 2 distinct KEYWORD_ID associated, with the HAVING clause
declare #words table (KEYWORD_ID int, [FILE_ID] int, HITS int)
insert into #words
values
(1,1,55),
(2,1,10),
(3,1,88),
(1,2,44),
(2,2,15),
(1,3,199),
(2,3,1),
(3,3,4)
select top 10 w.*
from #words w
inner join
(select [FILE_ID]
from #words
where KEYWORD_ID in (1,3)
group by [FILE_ID]
having count(distinct KEYWORD_ID) = 2
) x on x.[FILE_ID] = w.[FILE_ID]
order by HITS desc
You can use top (n) with ties for your query as below:
declare #n int = 10 --10 in your scenario
select top (#n) with ties *
from (
select w.*, f.name from #words w inner join #files f
on w.[FILE_ID] = f.[file_id]
) a
order by (row_number() over (partition by a.[file_id] order by hits desc)-1)/#n +1

Join a dynamic number of rows in postgres

Let's say I have the following tables:
Batch Items
---+----- ---+----------+--------
id | size id | batch_id | quality
---+----- ---+----------+--------
1 | 10 1 | 1 | 9
2 | 2 2 | 1 | 10
3 | 2 | 1
4 | 2 | 2
5 | 2 | 1
6 | 2 | 9
I have batches of items. They are sent by batches of size batch.size. An item is broken if it's quality is <= 3.
I want to know the number of broken items in the last batches sent:
batch_id | broken_item_count
---------+---------------------
1 | 0
2 | 2 (and not 3)
My idea is the following:
SELECT batch.id as batch_id, COUNT(broken_items.*) as broken_item_count
FROM batch
INNER JOIN (
SELECT id
FROM items
WHERE items.quality <= 3
ORDER BY items.id asc
LIMIT batch.size -- invalid reference to FROM-clause entry for table "batch"
) broken_items ON broken_items.batch_id = batch.id
(I would ORDER BY items.shipped_at. But for simplicity, I order by items.id)
But this query shows me the error I put as the comment.
How can I limit the number of joined items based on the batch.size that is different for each row ?
Is there any other way to achieve what I want ?
SELECT b.id AS batch_id
, count(i.quality < 4 OR NULL) AS broken_item_count
FROM batch b
LEFT JOIN (
SELECT batch_id, quality
, row_number() OVER (PARTITION BY batch_id ORDER BY id DESC) AS rn
FROM items
) i ON i.batch_id = b.id
AND i.rn <= b.size
GROUP BY 1
ORDER BY 1;
SQL Fiddle with added examples.
This is much like #Clodoaldos's answer, but with a couple of differences. Most importantly:
You want to count the broken items in the last batches sent, so we have to ORDER BY id DESC
If there can be batches without items at all you need to use LEFT JOIN instead of a plain JOIN or those batches are excluded.
Consequently, the check i.rn <= b.size needs to move from the WHERE clause to the JOIN clause.
SQL Fiddle
select
b.id as batch_id,
count(quality <= 3 or null) as broken_item_count
from
batch b
inner join (
select
id, quality, batch_id,
row_number() over (partition by batch_id order by id) as rn
from items
) i on i.batch_id = b.id
where rn <= b.size
group by b.id
order by b.id
From what I understand the count of defective items cannot be greater than the batch size.
EDIT: After reading your comments, I think using the RANK() function, and then join by rank and size should work for you. The following query attempts that.
SELECT b.id,
SUM(CASE WHEN i1.quality <= 3 THEN 1 ELSE 0END) as broken_item_count
FROM BATCH as b
LEFT JOIN (SELECT i.id, i.batch_id, i.quality,
RANK() OVER(PARTITION BY i.batch_id ORDER BY i.id) as RANK
FROM ITEMS as i) as i1 ON b.id = i1.batch_id AND i1.RANK <= b.size
GROUP BY b.id
EDIT2: Updated the query with a LEFT JOIN to cover the case where there are no samples in some batch.

Cross Join with Filter?

i need to make Sp to distribute students to their sections
the procedure take 2 string parameters StuID and SecID
in case I've send '1,2,3,4,5' as StuID and 'a,b' as SecID
i'm using spliting function which well return tables
Tb1 | Tb2
1 | a
2 | b
3 |
4 |
5 |
how can i get the following result
1 a
2 b
3 a
4 b
5 a
....
I've tried to do it via cross join but it did not show the result i want
select US.vItem as UserID,SE.vItem as Section
from split(#pUserID,',') us
cross join split(#pSectionID,',') se
Cross join isn't meant to work like that.
This will give you the results you want, but it's a bodge.
select t1.vItem, t2.VItem from
( select *, ROW_NUMBER() over (order by vItem) r from US ) t1
inner join
( select *, ROW_NUMBER() over (order by vItem desc) -1 r from SE ) t2
on t2.r = t1.r % (select COUNT(*) from SE)
order by t1.vItem

(SQL) Match users belong to which group given user_id[]

user table
ID | name
1 | ada
2 | bob
3 | tom
group Table
ID | name
1 | group A
2 | group B
3 | group C
user_group Table
user_id | group_id
1 | 1
2 | 1
1 | 2
2 | 2
3 | 2
1 | 3
3 | 3
Given group of user ids : [1, 2, 3]
How to query the group that all users in the above list belongs to? (in this case: Group B)
To get all groups that contain exactly the specified users (i.e. all specified users and no other users)
DECLARE #numUsers int = 3
SELECT ug.group_id
--The Max doesn't really do anything here because all
--groups with the same group id have the same name. The
--max is just used so we can select the group name eventhough
--we aren't aggregating across group names
, MAX(g.name) AS name
FROM user_group ug
--Filter to only groups with three users
JOIN (SELECT group_id FROM user_group GROUP BY group_id HAVING COUNT(*) = #numUsers) ug2
ON ug.group_id = ug2.group_id
JOIN [group] g
ON ug.group_id = g.ID
WHERE user_id IN (1, 2, 3)
GROUP BY ug.group_id
--The distinct is only necessary if user_group
--isn't keyed by group_id, user_id
HAVING COUNT(DISTINCT user_id) = #numUsers
To get groups that contain all specified users:
DECLARE #numUsers int = 3
SELECT ug.group_id
--The Max doesn't really do anything here because all
--groups with the same group id have the same name. The
--max is just used so we can select the group name eventhough
--we aren't aggregating across group names
, MAX(g.name) AS name
FROM user_group ug
JOIN [group] g
ON ug.group_id = g.ID
WHERE user_id IN (1, 2, 3)
GROUP BY ug.group_id
--The distinct is only necessary if user_group
--isn't keyed by group_id, user_id
HAVING COUNT(DISTINCT user_id) = 3
SQL Fiddle: http://sqlfiddle.com/#!6/0e968/3
Try This:
Select t2.name
FROM
(Select group_id
From
user_group
Group by group_id
Having Count(user_id) = (Select Count(*) FROM User_Table)) AS T1
INNER JOIN
Group_Table AS T2
ON T1.group_id = T2.ID
See Fiddle: http://sqlfiddle.com/#!2/fa7250/4
Select UserID,count(*)
From UserGroupTable
group by UserID
This will give a count of 3 where the UserID/GroupID is unique (as zerkms pointed out)
SELECT name FROM group_tbl WHERE id IN (SELECT g_id FROM user_grp GROUP BY g_id HAVING Count(u_id)=(SELECT Count(id) FROM user_tbl));