Text search in PostgreSQL: How to order rows by column - sql

I have a table with people's three desired job positions, ranked from first to third.
The job positions are in a separate table called "job_positions":
job_position_id job_position_title
1 bar manager
2 barista
3 waiter
4 server
The "people" table contains the person_id with the IDs of the job positions they have chosen.
person_id first_position_id second_position_id third_position_id
1 1 2 3
2 2 4
I want to search this table for a job position and order the results so that the person who has that job in their first_position, will be ranked higher than those who have it in their second or third position.
So in this example, if I search for "barista", I expect the person_id 2 to be displayed first, then person_id 1.
This is my SQL code:
SELECT person_id
TS_RANK_CD(TO_TSVECTOR('english', a.job_position_title), query_first, 1) AS first,
TS_RANK_CD(TO_TSVECTOR('english', b.job_position_title), query_second, 1) AS second,
TS_RANK_CD(TO_TSVECTOR('english', c.job_position_title), query_third, 1) AS third
FROM people
LEFT JOIN job_positions a
ON people.first_position_id = a.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_first
ON TO_TSVECTOR ('english', a.job_position_title) ## query_first
LEFT JOIN job_positions b
ON people.second_position_id = b.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_second
ON TO_TSVECTOR ('english', b.job_position_title) ## query_second
LEFT JOIN job_positions c
ON people.third_position_id = c.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_third
ON TO_TSVECTOR ('english', c.job_position_title) ## query_third
WHERE (TO_TSVECTOR (a.job_position_title) ## query_first OR TO_TSVECTOR (b.job_position_title) ## query_second OR TO_TSVECTOR (c.job_position_title) ## query_third)
The SQL returns the correct matches, but not ranked like they should be. Can I add some kind of score/weight to the columns, to rank them by that score?

I replicated your case with
create table job_positions (job_position_id int, job_position_title varchar);
insert into job_positions values (1, 'bar manager');
insert into job_positions values (2, 'barista');
insert into job_positions values (3, 'waiter');
insert into job_positions values (4, 'server');
create table people (person_id int, first_position_id int, second_position_id int, third_position_id int);
insert into people values (1,1,2,3);
insert into people values (2,2,4,NULL);
If I understand you correctly you want to order based on the position, if that is true, you can simply solve the problem with the following
with unpivoting as (
select person_id, 1 as position, first_position_id as job_position_id from people UNION ALL
select person_id, 2 as position, second_position_id as job_position_id from people UNION ALL
select person_id, 3 as position, third_position_id as job_position_id from people
)
select job_position_title, unpivoting.job_position_id, position, person_id from unpivoting join job_positions on unpivoting.job_position_id = job_positions.job_position_id order by unpivoting.job_position_id, position, person_id;
with the expected result
job_position_title | job_position_id | position | person_id
--------------------+-----------------+----------+-----------
bar manager | 1 | 1 | 1
barista | 2 | 1 | 2
barista | 2 | 2 | 1
waiter | 3 | 3 | 1
server | 4 | 2 | 2
(5 rows)

You want to unpivot the job preferences for each user. In fact, you might want to store the data in the unpivoted way -- which is more commonly called normalized.
In Postgres, you can use a lateral join to unpivot:
select p.*
from people p cross join lateral
(values (1, p.first_position_id),
(2, p.second_position_id),
(3, p.third_position_id)
) v(ord, job_position_id) join
job_positions jp
using (job_position_id)
where jp.job_position_title = ?
order by v.ord;

Related

SQL LEFT JOIN: difference between WHERE and condition inside AND [duplicate]

This question already has answers here:
SQL JOIN - WHERE clause vs. ON clause
(22 answers)
Closed 3 years ago.
What's the difference between
select t.*,
a.age
from t
left join a
on t.ID = a.ID and a.column > 10
and
select t.*,
a.age
from t
left join a
on t.ID = a.ID
where a.column > 10
?
Specifically, what's the difference when I put the condition on the table I am joining to the main table inside AND versus inside WHERE condition?
with a left join there is a difference
with condition on left join rows with column > 10 will be there filled with nulls
with where condition rows will be filtered out
with a inner join there is no difference
example:
declare #t table (id int, dummy varchar(20))
declare #a table (id int, age int, col int)
insert into #t
select * from (
values
(1, 'pippo' ),
(2, 'pluto' ),
(3, 'paperino' ),
(4, 'ciccio' ),
(5, 'caio' ),
(5, 'sempronio')
) x (c1,c2)
insert into #a
select * from (
values
(1, 38, 2 ),
(2, 26, 5 ),
(3, 41, 12),
(4, 15, 11),
(5, 39, 7 )
) x (c1,c2,c3)
select t.*, a.age
from #t t
left join #a a on t.ID = a.ID and a.col > 10
Outputs:
id dummy age
1 pippo NULL
2 pluto NULL
3 paperino 41
4 ciccio 15
5 caio NULL
5 sempronio NULL
While
select t.*, a.age
from #t t
left join #a a on t.ID = a.ID
where a.col > 10
Outputs:
id dummy age
3 paperino 41
4 ciccio 15
So with LEFT JOIN you will get ALWAYS all the rows from 1st table
If the join condition is true, you will get columns from joined table filled with their values, if the condition is false their columns will be NULL
With WHERE condition you will get only the rows that match the condition.
So what's the difference between them?
An explanation through examples:
CREATE TABLE Students
(
StudentId INT PRIMARY KEY,
Name VARCHAR(100)
);
✓
CREATE TABLE Scores
(
ScoreId INT PRIMARY KEY,
ExamId INT NOT NULL,
StudentId INT NOT NULL,
Score DECIMAL(4,1) NOT NULL DEFAULT 0,
FOREIGN KEY (StudentId)
REFERENCES Students(StudentId)
);
✓
INSERT INTO Students
(StudentId, Name) VALUES
(11,'Joe Shmoe'),
(12,'Jane Doe'),
(47,'Norma Nelson');
✓
INSERT INTO Scores
(ScoreId, ExamId, StudentId, Score) VALUES
(1, 101, 11, 65.2),
(2, 101, 12, 72.6),
(3, 102, 11, 69.6);
✓
--
-- Using an INNER JOIN
--
-- Only Students that have scores
-- So only when there's a match between the 2 tables
--
SELECT stu.Name, sco.Score
FROM Students AS stu
INNER JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
ORDER BY stu.Name
Name | Score
:-------- | :----
Jane Doe | 72.6
Joe Shmoe | 65.2
Joe Shmoe | 69.6
--
-- Using an LEFT JOIN
--
-- All Students, even those without scores
-- Those that couldn't be matched will show NULL's
-- for the fields from the joined table
--
SELECT stu.Name, sco.Score, sco.ScoreId
FROM Students AS stu
LEFT JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
ORDER BY stu.Name
Name | Score | ScoreId
:----------- | :---- | :------
Jane Doe | 72.6 | 2
Joe Shmoe | 65.2 | 1
Joe Shmoe | 69.6 | 3
Norma Nelson | null | null
--
-- Using an LEFT JOIN
-- But with an extra criteria in the ON clause
--
-- All Students again.
-- That have scores >= 66
-- But also the unmatched without scores
--
SELECT stu.Name, sco.Score, sco.ScoreId
FROM Students AS stu
LEFT JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
AND sco.Score >= 66.0
ORDER BY stu.Name
Name | Score | ScoreId
:----------- | :---- | :------
Jane Doe | 72.6 | 2
Joe Shmoe | 69.6 | 3
Norma Nelson | null | null
--
-- Using an LEFT JOIN
-- But with an extra criteria in the WHERE clause
--
-- Only students with scores >= 66
-- The WHERE filters out the unmatched.
--
SELECT stu.Name, sco.Score
FROM Students AS stu
LEFT JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
WHERE sco.Score >= 66.0
ORDER BY stu.Name
Name | Score
:-------- | :----
Jane Doe | 72.6
Joe Shmoe | 69.6
--
-- Using an INNER JOIN
-- And with an extra criteria in the WHERE clause
--
-- Only Students that have scores >= 66
--
SELECT stu.Name, sco.Score
FROM Students AS stu
INNER JOIN Scores AS sco
ON sco.StudentId = stu.StudentId
WHERE sco.Score >= 66
ORDER BY stu.Name
Name | Score
:-------- | :----
Jane Doe | 72.6
Joe Shmoe | 69.6
db<>fiddle here
Did you notice how the criteria in the WHERE clause can make a LEFT JOIN behave like an INNER JOIN?

Recursive CTE with three tables

I'm using SQL Server 2008 R2 SP1.
I would like to recursively find the first non-null manager for a certain organizational unit by "walking up the tree".
I have one table containing organizational units "ORG", one table containing parents for each org. unit in "ORG", lets call that table "ORG_PARENTS" and one table containing managers for each organizational unit, lets call that table "ORG_MANAGERS".
ORG has a column ORG_ID:
ORG_ID
1
2
3
ORG_PARENTS has two columns.
ORG_ID, ORG_PARENT
1, NULL
2, 1
3, 2
MANAGERS has two columns.
ORG_ID, MANAGER
1, John Doe
2, Jane Doe
3, NULL
I'm trying to create a recursive query that will find the first non-null manager for a certain organizational unit.
Basically if I do a query today for the manager for ORG_ID=3 I will get NULL.
SELECT MANAGER FROM ORG_MANAGERS WHERE ORG_ID = '3'
I want the query to use the ORG_PARENTS table to get the parent for ORG_ID=3, in this case get "2" and repeat the query against the ORG_MANAGERS table with ORG_ID=2 and return in this example "Jane Doe".
In case the query also returns NULL I want to repeat the process with the parent of ORG_ID=2, i.e. ORG_ID=1 and so on.
My CTE attempts so far have failed, one example is this:
WITH BOSS (MANAGER, ORG_ID, ORG_PARENT)
AS
( SELECT m.MANAGER, m.ORG_ID, p.ORG_PARENT
FROM dbo.MANAGERS m INNER JOIN
dbo.ORG_PARENTS p ON p.ORG_ID = m.ORG_ID
UNION ALL
SELECT m1.MANAGER, m1.ORG_ID, b.ORG_PARENT
FROM BOSS b
INNER JOIN dbo.MANAGERS m1 ON m1.ORG_ID = b.ORG_PARENT
)
SELECT * FROM BOSS WHERE ORG_ID = 3
It returns:
Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
MANAGER ORG_ID ORG_PARENT
NULL 3 2
You need to keep track of the original ID you start with. Try this:
DECLARE #ORG_PARENTS TABLE (ORG_ID INT, ORG_PARENT INT )
DECLARE #MANAGERS TABLE (ORG_ID INT, MANAGER VARCHAR(100))
INSERT #ORG_PARENTS (ORG_ID, ORG_PARENT)
VALUES (1, NULL)
, (2, 1)
, (3, 2)
INSERT #MANAGERS (ORG_ID, MANAGER)
VALUES (1, 'John Doe')
, (2, 'Jane Doe')
, (3, NULL)
;
WITH BOSS
AS
(
SELECT m.MANAGER, m.ORG_ID AS ORI, m.ORG_ID, p.ORG_PARENT, 1 cnt
FROM #MANAGERS m
INNER JOIN #ORG_PARENTS p
ON p.ORG_ID = m.ORG_ID
UNION ALL
SELECT m1.MANAGER, b.ORI, m1.ORG_ID, OP.ORG_PARENT, cnt +1
FROM BOSS b
INNER JOIN #ORG_PARENTS AS OP
ON OP.ORG_ID = b.ORG_PARENT
INNER JOIN #MANAGERS m1
ON m1.ORG_ID = OP.ORG_ID
)
SELECT *
FROM BOSS
WHERE ORI = 3
Results in:
+----------+-----+--------+------------+-----+
| MANAGER | ORI | ORG_ID | ORG_PARENT | cnt |
+----------+-----+--------+------------+-----+
| NULL | 3 | 3 | 2 | 1 |
| Jane Doe | 3 | 2 | 1 | 2 |
| John Doe | 3 | 1 | NULL | 3 |
+----------+-----+--------+------------+-----+
General tips:
Don't predefine the columns of a CTE; it's not necessary, and makes maintenance annoying.
With recursive CTE, always keep a counter, so you can limit the recursiveness, and you can keep track how deep you are.
edit:
By the way, if you want the first not null manager, you can do for example (there are many ways) this:
SELECT BOSS.*
FROM BOSS
INNER JOIN (
SELECT BOSS.ORI
, MIN(BOSS.cnt) cnt
FROM BOSS
WHERE BOSS.MANAGER IS NOT NULL
GROUP BY BOSS.ORI
) X
ON X.ORI = BOSS.ORI
AND X.cnt = BOSS.cnt
WHERE BOSS.ORI IN (3)

Join tables with distinct highest ranked row

I have three tables defined like this:
[tbMember]
memberID | memberName
1 | John
2 | Peter
[tbGroup]
groupID | groupName
1 | Alpha
2 | Beta
3 | Gamma
[tbMemberGroupRelation]
memberID | groupID | memberRank (larger number is higher)
1 | 1 | 0
1 | 2 | 1
2 | 1 | 5
2 | 2 | 3
2 | 3 | 1
And now I want to perform a table-join selection to get result which contains (distinct) member with his highest ranked group in each row, for the given example above, the query result is desired to be:
memberID | memberName | groupName | memberRank
1 | John | Beta | 1
2 | Peter | Alpha | 5
Is there a way to implement it in a single SQL like following style ?
select * from tbMember m
left join tbMemberGroupRelation mg on (m.MemberID = mg.MemberID and ......)
left join tbGroup g on (mg.GroupID = g.GroupID)
Any other solutions are also appreciated if it is impossible to write in a simple query.
========= UPDATED =========
Only ONE highest rank is allowed in table
One solution would be to create an inverted sequence/rank of the memberRank so that the highest rank per member is always equal to 1.
This is how I achieved it using a sub-query:
SELECT
m.memberID,
m.memberName,
g.groupName,
mg.memberRank
FROM
tbMember m
LEFT JOIN
(
SELECT
memberID,
groupID,
groupName,
memberRank,
RANK() OVER(PARTITION BY memberID ORDER BY memberRank DESC) AS invRank
FROM
tbMemberGroupRelation
) mg
ON (mg.memberID = m.memberID)
AND (mg.invRank = 1)
LEFT JOIN
tbGroup g
ON (g.groupID = mg.groupID);
An alternative method:
SELECT
M.memberID,
M.memberName,
G.groupName,
MG.memberRank
FROM
Member M
LEFT OUTER JOIN MemberGroup MG ON MG.memberID = M.memberID
LEFT OUTER JOIN MemberGroup MG2 ON
MG2.memberID = M.memberID AND
MG2.memberRank > MG.memberRank
INNER JOIN [Group] G ON G.groupid = MG.groupid
WHERE
MG2.memberid IS NULL
Might perform better in some situations due to indexing, etc.
create table [tbGroup] (groupid int, groupname varchar(8000))
Insert [tbGroup] Values (1, 'Alpha')
Insert [tbGroup] Values (2, 'Beta')
Insert [tbGroup] Values (3, 'Gamma')
create table [tbMemberGroupRelation] (memberid int, groupid int, memberrank int)
Insert [tbMemberGroupRelation] Values (1,1,0)
Insert [tbMemberGroupRelation] Values (1,2,1)
Insert [tbMemberGroupRelation] Values (2,1,5)
Insert [tbMemberGroupRelation] Values (2,2,3)
Insert [tbMemberGroupRelation] Values (2,3,1)
;With cteMemberGroupRelation As
(
Select *, Row_Number() Over (Partition By MemberID Order By MemberRank Desc) SortOrder
From [tbMemberGroupRelation]
)
Select *
From tbMember M
Join (Select * From cteMemberGroupRelation Where SortOrder = 1) R On R.memberid = M.memberid
Join tbGroup G On G.groupid = R.groupid

SQL Server : query multivalued table recursive

I have a SQL problem which I cannot solve
There are 2 tables mms and mms_mv which are linked via object_id.
The mms_mv is a multivalue table and the content is group memberships and group manager which also can be an other group.
This runs on SQL Server
mms:
|object_id|attribute_type|objectSid|
| 1 |user | a |
| 2 | group | b |
| 3 | group | c |
| 4 | group | d |
| 5 | group | f
mms_mv:
|object_id|attribute_name|reference_id|
| 2 | member | 1 |
| 3 | manager | 1 |
| 4 | manager | 2 |
I am trying to find out which groups a user can manage either directly or indirectly via nested groups.
In the example above the user (1) is member of group Number 2 and group 2 is Manager of group 4
user 1 is manager of group 3 directly.
Which groups can be managed by the user?
So the output I need is group 3 and 4
select
accountname, objectsid, mms1.reference_id as ManagerID,
mms2.object_id
from
dbo.mms_mv_link as mms1 with (nolock)
inner join
dbo.mms_metaverse as mms2 with (nolock) on mms1.object_id = mms2.object_id
where
mms2.object_type ='group'
and mms1.attribute_name = 'manager'
and mms1.reference_id in (1, 3)
This is the best I came up with to find out which of all Group id's and user id I submitted are Manager of a Group. I used an other lookup to get the groups a user is in.
My problem are the nested groups, by long thinking and googling I am not sure if it is even possible to create such a query.
I can find out all groups a user is member of, but I also need the Groups in which these groups are members.
Well I am happy if anyone has some ideas or hints for me to figure this one out.
I am even happy if you have a recommendation for a good sql book which covers such complex queries.
Thank you all for helping me.
I think that the following recursive CTE will give you what you want:
;WITH cte AS
(
SELECT m2.object_id AS groupID, m2.attribute_name
FROM #mms AS m1
INNER JOIN #mms_mv AS m2 ON m1.object_id = m2.reference_id
INNER JOIN #mms m3 ON m2.object_id = m3.object_id
WHERE m1.attribute_type = 'user' AND m3.attribute_type = 'group'
UNION ALL
SELECT m.object_id AS groupID, m.attribute_name
FROM cte AS c
INNER JOIN #mms_mv AS m ON c.groupID = m.reference_id
)
SELECT *
FROM cte
WHERE attribute_name <> 'member'
The so-called 'anchor' query of the CTE returns all groups that every user either manages or is member of. Using recursion we get all other groups managed by either the groups of the original set or by any 'intermediate' set.
With these data as input:
DECLARE #mms TABLE (object_id INT, attribute_type VARCHAR(10), objectSid VARCHAR(10))
DECLARE #mms_mv TABLE (object_id INT, attribute_name VARCHAR(10), reference_id INT)
INSERT #mms VALUES
( 1, 'user', 'a'),
( 2, 'group', 'b'),
( 3, 'group', 'c'),
( 4, 'group', 'd'),
( 5, 'group', 'f')
INSERT #mms_mv VALUES
( 2, 'member', 1),
( 3, 'manager', 1),
( 4, 'manager', 2),
( 5, 'manager', 3)
the above query yields the following output:
groupID attribute_name
----------------------
3 manager
5 manager
4 manager

How to optimise MySQL query containing a subquery?

I have two tables, House and Person. For any row in House, there can be 0, 1 or many corresponding rows in Person. But, of those people, a maximum of one will have a status of "ACTIVE", the others will all have a status of "CANCELLED".
e.g.
SELECT * FROM House LEFT JOIN Person ON House.ID = Person.HouseID
House.ID | Person.ID | Person.Status
1 | 1 | CANCELLED
1 | 2 | CANCELLED
1 | 3 | ACTIVE
2 | 1 | ACTIVE
3 | NULL | NULL
4 | 4 | CANCELLED
I want to filter out the cancelled rows, and get something like this:
House.ID | Person.ID | Person.Status
1 | 3 | ACTIVE
2 | 1 | ACTIVE
3 | NULL | NULL
4 | NULL | NULL
I've achieved this with the following sub select:
SELECT *
FROM House
LEFT JOIN
(
SELECT *
FROM Person
WHERE Person.Status != "CANCELLED"
) Person
ON House.ID = Person.HouseID
...which works, but breaks all the indexes. Is there a better solution that doesn't?
I'm using MySQL and all relevant columns are indexed. EXPLAIN lists nothing in possible_keys.
Thanks.
How about:
SELECT *
FROM House
LEFT JOIN Person
ON House.ID = Person.HouseID
AND Person.Status != "CANCELLED"
Do you have control of the database structure? If so, I think you could better represent your data by removing the column Status from the Person table and instead adding a column ActivePersonID to the House table. This way you remove all the redundant CANCELLED values from Person and eliminate application or stored procedure code to ensure only one person per household is active.
In addition, you could then represent your query as
SELECT * FROM House LEFT JOIN Person ON House.ActivePersonID = Person.ID
Use:
SELECT *
FROM HOUSE h
LEFT JOIN PERSON p ON p.houseid = h.id
AND p.status = 'ACTIVE'
This is in SQL Server, but the logic seems to work, echoing Chris above:
declare #house table
(
houseid int
)
declare #person table
(
personid int,
houseid int,
personstatus varchar(20)
)
insert into #house (houseid) VALUES (1)
insert into #house (houseid) VALUES (2)
insert into #house (houseid) VALUES (3)
insert into #house (houseid) VALUES (4)
insert into #person (personid, houseid, personstatus) VALUES (1, 1, 'CANCELLED')
insert into #person (personid, houseid, personstatus) VALUES (2, 1, 'CANCELLED')
insert into #person (personid, houseid, personstatus) VALUES (3, 1, 'ACTIVE')
insert into #person (personid, houseid, personstatus) VALUES (1, 2, 'ACTIVE')
insert into #person (personid, houseid, personstatus) VALUES (4, 4, 'CANCELLED')
select * from #house
select * from #person
select *
from #house h LEFT OUTER JOIN #person p ON h.houseid = p.houseid
AND p.personstatus <> 'CANCELLED'