How to optimise MySQL query containing a subquery? - sql

I have two tables, House and Person. For any row in House, there can be 0, 1 or many corresponding rows in Person. But, of those people, a maximum of one will have a status of "ACTIVE", the others will all have a status of "CANCELLED".
e.g.
SELECT * FROM House LEFT JOIN Person ON House.ID = Person.HouseID
House.ID | Person.ID | Person.Status
1 | 1 | CANCELLED
1 | 2 | CANCELLED
1 | 3 | ACTIVE
2 | 1 | ACTIVE
3 | NULL | NULL
4 | 4 | CANCELLED
I want to filter out the cancelled rows, and get something like this:
House.ID | Person.ID | Person.Status
1 | 3 | ACTIVE
2 | 1 | ACTIVE
3 | NULL | NULL
4 | NULL | NULL
I've achieved this with the following sub select:
SELECT *
FROM House
LEFT JOIN
(
SELECT *
FROM Person
WHERE Person.Status != "CANCELLED"
) Person
ON House.ID = Person.HouseID
...which works, but breaks all the indexes. Is there a better solution that doesn't?
I'm using MySQL and all relevant columns are indexed. EXPLAIN lists nothing in possible_keys.
Thanks.

How about:
SELECT *
FROM House
LEFT JOIN Person
ON House.ID = Person.HouseID
AND Person.Status != "CANCELLED"

Do you have control of the database structure? If so, I think you could better represent your data by removing the column Status from the Person table and instead adding a column ActivePersonID to the House table. This way you remove all the redundant CANCELLED values from Person and eliminate application or stored procedure code to ensure only one person per household is active.
In addition, you could then represent your query as
SELECT * FROM House LEFT JOIN Person ON House.ActivePersonID = Person.ID

Use:
SELECT *
FROM HOUSE h
LEFT JOIN PERSON p ON p.houseid = h.id
AND p.status = 'ACTIVE'

This is in SQL Server, but the logic seems to work, echoing Chris above:
declare #house table
(
houseid int
)
declare #person table
(
personid int,
houseid int,
personstatus varchar(20)
)
insert into #house (houseid) VALUES (1)
insert into #house (houseid) VALUES (2)
insert into #house (houseid) VALUES (3)
insert into #house (houseid) VALUES (4)
insert into #person (personid, houseid, personstatus) VALUES (1, 1, 'CANCELLED')
insert into #person (personid, houseid, personstatus) VALUES (2, 1, 'CANCELLED')
insert into #person (personid, houseid, personstatus) VALUES (3, 1, 'ACTIVE')
insert into #person (personid, houseid, personstatus) VALUES (1, 2, 'ACTIVE')
insert into #person (personid, houseid, personstatus) VALUES (4, 4, 'CANCELLED')
select * from #house
select * from #person
select *
from #house h LEFT OUTER JOIN #person p ON h.houseid = p.houseid
AND p.personstatus <> 'CANCELLED'

Related

Text search in PostgreSQL: How to order rows by column

I have a table with people's three desired job positions, ranked from first to third.
The job positions are in a separate table called "job_positions":
job_position_id job_position_title
1 bar manager
2 barista
3 waiter
4 server
The "people" table contains the person_id with the IDs of the job positions they have chosen.
person_id first_position_id second_position_id third_position_id
1 1 2 3
2 2 4
I want to search this table for a job position and order the results so that the person who has that job in their first_position, will be ranked higher than those who have it in their second or third position.
So in this example, if I search for "barista", I expect the person_id 2 to be displayed first, then person_id 1.
This is my SQL code:
SELECT person_id
TS_RANK_CD(TO_TSVECTOR('english', a.job_position_title), query_first, 1) AS first,
TS_RANK_CD(TO_TSVECTOR('english', b.job_position_title), query_second, 1) AS second,
TS_RANK_CD(TO_TSVECTOR('english', c.job_position_title), query_third, 1) AS third
FROM people
LEFT JOIN job_positions a
ON people.first_position_id = a.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_first
ON TO_TSVECTOR ('english', a.job_position_title) ## query_first
LEFT JOIN job_positions b
ON people.second_position_id = b.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_second
ON TO_TSVECTOR ('english', b.job_position_title) ## query_second
LEFT JOIN job_positions c
ON people.third_position_id = c.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_third
ON TO_TSVECTOR ('english', c.job_position_title) ## query_third
WHERE (TO_TSVECTOR (a.job_position_title) ## query_first OR TO_TSVECTOR (b.job_position_title) ## query_second OR TO_TSVECTOR (c.job_position_title) ## query_third)
The SQL returns the correct matches, but not ranked like they should be. Can I add some kind of score/weight to the columns, to rank them by that score?
I replicated your case with
create table job_positions (job_position_id int, job_position_title varchar);
insert into job_positions values (1, 'bar manager');
insert into job_positions values (2, 'barista');
insert into job_positions values (3, 'waiter');
insert into job_positions values (4, 'server');
create table people (person_id int, first_position_id int, second_position_id int, third_position_id int);
insert into people values (1,1,2,3);
insert into people values (2,2,4,NULL);
If I understand you correctly you want to order based on the position, if that is true, you can simply solve the problem with the following
with unpivoting as (
select person_id, 1 as position, first_position_id as job_position_id from people UNION ALL
select person_id, 2 as position, second_position_id as job_position_id from people UNION ALL
select person_id, 3 as position, third_position_id as job_position_id from people
)
select job_position_title, unpivoting.job_position_id, position, person_id from unpivoting join job_positions on unpivoting.job_position_id = job_positions.job_position_id order by unpivoting.job_position_id, position, person_id;
with the expected result
job_position_title | job_position_id | position | person_id
--------------------+-----------------+----------+-----------
bar manager | 1 | 1 | 1
barista | 2 | 1 | 2
barista | 2 | 2 | 1
waiter | 3 | 3 | 1
server | 4 | 2 | 2
(5 rows)
You want to unpivot the job preferences for each user. In fact, you might want to store the data in the unpivoted way -- which is more commonly called normalized.
In Postgres, you can use a lateral join to unpivot:
select p.*
from people p cross join lateral
(values (1, p.first_position_id),
(2, p.second_position_id),
(3, p.third_position_id)
) v(ord, job_position_id) join
job_positions jp
using (job_position_id)
where jp.job_position_title = ?
order by v.ord;

Select one set of duplicate data

Using SQL Server, how do I select only one "set" of relationships? To explain, here is an example. Also if you can help me figure out how to "say" this problem to make it more google-able, that would be stellar. For example, the single table contains two identical, inverted rows.
CREATE TABLE #A (
EmployeeID INT,
EmployeeName NVARCHAR(100),
CoworkerID INT,
CoworkerName NVARCHAR(100)
)
INSERT INTO #A VALUES (1, 'Alice', 2, 'Bob')
INSERT INTO #A VALUES (2, 'Bob', 1, 'Alice')
INSERT INTO #A VALUES (3, 'Charlie', 4, 'Dan')
INSERT INTO #A VALUES (4, 'Dan', 3, 'Charlie')
SELECT *
FROM #A
// THIS IS WHAT WE WANT TO PROGRAMATICALLY DETERMINE - which ID's to keep and which to delete.
DELETE FROM #A WHERE EmployeeID = 2
DELETE FROM #A WHERE EmployeeID = 4
SELECT *
FROM #A
So, the net result of the final query is:
|------------|---------------|-------------|--------------|
| EmployeeID | EmployeeName | CoworkerID | CoworkerName |
|------------|---------------|-------------|--------------|
| 1 | Alice | 2 | Bob |
|------------|---------------|-------------|--------------|
| 3 | Charlie | 4 | Dan |
|------------|---------------|-------------|--------------|
Try this:
DELETE FROM #A
WHERE #A.EmployeeID > #a.CoworkerID
AND EXISTS(SELECT 1 FROM #A A2
where a2.CoworkerID = #A.EmployeeID
and a2.employeeID = #a.coworkerID)
See SqlFiddle
This will only delete those with a circular reference. If you have more complex chains, this will not affect them.
DELETE a
FROM #A a
WHERE EXISTS (
SELECT *
FROM #A t
WHERE t.CoworkerID = a.EmployeeID
AND t.EmployeeID < a.EmployeeID
)
WITH t2CTE AS
(
SELECT*, ROW_NUMBER() over (PARTITION BY employeename ORDER BY employeename) as counter1
FROM #a
)
DELETE FROM t2CTE WHERE counter1 >1

SQLite query - filter name where each associated id is contained within a set of ids

I'm trying to work out a query that will find me all of the distinct Names whose LocationIDs are in a given set of ids. The catch is if any of the LocationIDs associated with a distinct Name are not in the set, then the Name should not be in the results.
Say I have the following table:
ID | LocationID | ... | Name
-----------------------------
1 | 1 | ... | A
2 | 1 | ... | B
3 | 2 | ... | B
I'm needing a query similar to
SELECT DISTINCT Name FROM table WHERE LocationID IN (1, 2);
The problem with the above is it's just checking if the LocationID is 1 OR 2, this would return the following:
A
B
But what I need it to return is
B
Since B is the only Name where both of its LocationIDs are in the set (1, 2)
You can try to write two subquery.
get count by each Name
get count by your condition.
then join them by count amount, which means your need to all match your condition count number.
Schema (SQLite v3.17)
CREATE TABLE T(
ID int,
LocationID int,
Name varchar(5)
);
INSERT INTO T VALUES (1, 1,'A');
INSERT INTO T VALUES (2, 1,'B');
INSERT INTO T VALUES (3, 2,'B');
Query #1
SELECT t2.Name
FROM
(
SELECT COUNT(DISTINCT LocationID) cnt
FROM T
WHERE LocationID IN (1, 2)
) t1
JOIN
(
SELECT COUNT(DISTINCT LocationID) cnt,Name
FROM T
WHERE LocationID IN (1, 2)
GROUP BY Name
) t2 on t1.cnt = t2.cnt;
| Name |
| ---- |
| B |
View on DB Fiddle
You can just use aggregation. Assuming no duplicates in your table:
SELECT Name
FROM table
WHERE LocationID IN (1, 2)
GROUP BY Name
HAVING COUNT(*) = 2;
If Name/LocationID pairs can be duplicated, use HAVING COUNT(DISTINCT LocationID) = 2.

Recursive CTE with three tables

I'm using SQL Server 2008 R2 SP1.
I would like to recursively find the first non-null manager for a certain organizational unit by "walking up the tree".
I have one table containing organizational units "ORG", one table containing parents for each org. unit in "ORG", lets call that table "ORG_PARENTS" and one table containing managers for each organizational unit, lets call that table "ORG_MANAGERS".
ORG has a column ORG_ID:
ORG_ID
1
2
3
ORG_PARENTS has two columns.
ORG_ID, ORG_PARENT
1, NULL
2, 1
3, 2
MANAGERS has two columns.
ORG_ID, MANAGER
1, John Doe
2, Jane Doe
3, NULL
I'm trying to create a recursive query that will find the first non-null manager for a certain organizational unit.
Basically if I do a query today for the manager for ORG_ID=3 I will get NULL.
SELECT MANAGER FROM ORG_MANAGERS WHERE ORG_ID = '3'
I want the query to use the ORG_PARENTS table to get the parent for ORG_ID=3, in this case get "2" and repeat the query against the ORG_MANAGERS table with ORG_ID=2 and return in this example "Jane Doe".
In case the query also returns NULL I want to repeat the process with the parent of ORG_ID=2, i.e. ORG_ID=1 and so on.
My CTE attempts so far have failed, one example is this:
WITH BOSS (MANAGER, ORG_ID, ORG_PARENT)
AS
( SELECT m.MANAGER, m.ORG_ID, p.ORG_PARENT
FROM dbo.MANAGERS m INNER JOIN
dbo.ORG_PARENTS p ON p.ORG_ID = m.ORG_ID
UNION ALL
SELECT m1.MANAGER, m1.ORG_ID, b.ORG_PARENT
FROM BOSS b
INNER JOIN dbo.MANAGERS m1 ON m1.ORG_ID = b.ORG_PARENT
)
SELECT * FROM BOSS WHERE ORG_ID = 3
It returns:
Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
MANAGER ORG_ID ORG_PARENT
NULL 3 2
You need to keep track of the original ID you start with. Try this:
DECLARE #ORG_PARENTS TABLE (ORG_ID INT, ORG_PARENT INT )
DECLARE #MANAGERS TABLE (ORG_ID INT, MANAGER VARCHAR(100))
INSERT #ORG_PARENTS (ORG_ID, ORG_PARENT)
VALUES (1, NULL)
, (2, 1)
, (3, 2)
INSERT #MANAGERS (ORG_ID, MANAGER)
VALUES (1, 'John Doe')
, (2, 'Jane Doe')
, (3, NULL)
;
WITH BOSS
AS
(
SELECT m.MANAGER, m.ORG_ID AS ORI, m.ORG_ID, p.ORG_PARENT, 1 cnt
FROM #MANAGERS m
INNER JOIN #ORG_PARENTS p
ON p.ORG_ID = m.ORG_ID
UNION ALL
SELECT m1.MANAGER, b.ORI, m1.ORG_ID, OP.ORG_PARENT, cnt +1
FROM BOSS b
INNER JOIN #ORG_PARENTS AS OP
ON OP.ORG_ID = b.ORG_PARENT
INNER JOIN #MANAGERS m1
ON m1.ORG_ID = OP.ORG_ID
)
SELECT *
FROM BOSS
WHERE ORI = 3
Results in:
+----------+-----+--------+------------+-----+
| MANAGER | ORI | ORG_ID | ORG_PARENT | cnt |
+----------+-----+--------+------------+-----+
| NULL | 3 | 3 | 2 | 1 |
| Jane Doe | 3 | 2 | 1 | 2 |
| John Doe | 3 | 1 | NULL | 3 |
+----------+-----+--------+------------+-----+
General tips:
Don't predefine the columns of a CTE; it's not necessary, and makes maintenance annoying.
With recursive CTE, always keep a counter, so you can limit the recursiveness, and you can keep track how deep you are.
edit:
By the way, if you want the first not null manager, you can do for example (there are many ways) this:
SELECT BOSS.*
FROM BOSS
INNER JOIN (
SELECT BOSS.ORI
, MIN(BOSS.cnt) cnt
FROM BOSS
WHERE BOSS.MANAGER IS NOT NULL
GROUP BY BOSS.ORI
) X
ON X.ORI = BOSS.ORI
AND X.cnt = BOSS.cnt
WHERE BOSS.ORI IN (3)

SQL Server 2014: SELECT only those rows that match all rows in another table

I am a beginner in SQL Server. I am trying to solve this problem:
Select all (distinct) "item_id" from "ItemTag" table whose corresponding "tag_id" values match at least "all" values in the "UserTagList" table.
I tried a join below but instead of the result I got the query should return the item_id 5 since it has both tag_id's 3 & 4.
Any help is deeply appreciated.
Below is the SQL schema
SQL Fiddle
SQL Server 2014 schema setup:
CREATE TABLE UserTagList (id INT);
INSERT INTO UserTagList (id)
VALUES (3), (4);
CREATE TABLE ItemTag (id INT, item_id INT, tag_id INT);
INSERT INTO ItemTag (id, item_id, tag_id)
VALUES (1, 5, 3), (2, 5, 4), (3, 5, 6), (4, 6, 3), (5, 7, 4);
Query 1:
SELECT i.item_id, i.tag_id
FROM ItemTag AS i
JOIN UserTagList AS u ON i.tag_id = u.id
Results:
| item_id | tag_id |
|---------|--------|
| 5 | 3 |
| 5 | 4 |
| 6 | 3 |
| 7 | 4 |
This is a query where group by and having are useful:
select i.item_id
from ItemTag i join
UserTagList u
on i.tag_id = u.id
group by i.item_id
having count(*) = (select count(*) from UserTagList);
I would use a left join here with aggregation:
SELECT t1.item_id
FROM ItemTag t1
LEFT JOIN UserTagList t2
ON t1.tag_id = t2.id
GROUP BY t1.item_id
HAVING SUM(CASE WHEN t2.id IS NULL THEN 1 ELSE 0 END) = 0
The logic here is that if, for a given item_id group, one or more of its tags did not match to anything in the UserTagList table, then the sum in the HAVING clause would detect and count a null and non-matching record.