removing rows from a SELECT based on columns in a different table - sql

I'm pretty much looking for a way to filter out rows from a SELECT of one table based on certain values in rows of another table.
I'm experimenting with the example structure below. I've got a table of blog post content (one row per blog post), and another table of metadata about the posts (one row per key-value pair; each row with a column associating it with a blog post; many rows per blog post). I want to pull a row of posts only if there exists no rows in metadata where metadata.pid=posts.pid AND metadata.k='optout'. That is, for the example structure below, I just want to get back the posts.id=1 row.
(Based on what I've tried) JOINs don't end up removing the posts which have some metadata where metadata.k='optout', because the other row of metadata for that pid means it makes it into the results.
mysql> select * from posts;
+-----+-------+--------------+
| pid | title | content |
+-----+-------+--------------+
| 1 | Foo | Some content |
| 2 | Bar | More content |
| 3 | Baz | Something |
+-----+-------+--------------+
3 rows in set (0.00 sec)
mysql> select * from metadata;
+------+-----+--------+-----------+
| mdid | pid | k | v |
+------+-----+--------+-----------+
| 1 | 1 | date | yesterday |
| 2 | 1 | thumb | img.jpg |
| 3 | 2 | date | today |
| 4 | 2 | optout | true |
| 5 | 3 | date | tomorrow |
| 6 | 3 | optout | true |
+------+-----+--------+-----------+
6 rows in set (0.00 sec)
A subquery can give me the inverse of what I want:
mysql> select posts.* from posts where pid = any (select pid from metadata where k = 'optout');
+-----+-------+--------------+
| pid | title | content |
+-----+-------+--------------+
| 2 | Bar | More content |
| 3 | Baz | Something |
+-----+-------+--------------+
2 rows in set (0.00 sec)
...but using pid != any (...) gives me all 3 of the rows in posts, cause every single pid has a metadata row where k!='optout'.

Sounds like you want to do a LEFT JOIN and then check for results in which the value of the joined table is NULL, indicating that no such joined record exists.
For example:
SELECT * FROM posts
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = 'optout')
WHERE metadata.mdid IS NULL;
This will select any row from the table posts for which no corresponding metadata row exists with a value of k = 'optout'.
edit: Worth noting that this is a key property of a left join and would not work with a regular join; a left join will always return values from the first table, even if no matching values exist in the joined table(s), allowing you to perform selections based on the absence of those rows.
edit 2: Let's clarify what's happening here with respect to the LEFT JOIN versus the JOIN (which I refer to as an INNER JOIN for clarity but is interchangable in MySQL).
Suppose you run either of these two queries:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
INNER JOIN metadata ON posts.pid = metadata.pid;
or
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
LEFT JOIN metadata ON posts.pid = metadata.pid;
Both queries produce the following result set:
+-----+-------+--------------+------+-------+-----------+
| pid | title | content | mdid | k | v |
+-----+-------+--------------+------+-------+-----------+
| 1 | Foo | Some content | 1 | date | yesterday |
| 1 | Foo | Some content | 2 | thumb | img.jpg |
+-----+-------+--------------+------+-------+-----------+
Now, let's suppose we modify the query to add the extra criteria for "optout" that was mentioned. First, the INNER JOIN:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
INNER JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = "optout");
As expected, this returns no results:
Empty set (0.00 sec)
Now, changing that to a LEFT JOIN:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = "optout");
This DOES produce a result set:
+-----+-------+--------------+------+------+------+
| pid | title | content | mdid | k | v |
+-----+-------+--------------+------+------+------+
| 1 | Foo | Some content | NULL | NULL | NULL |
+-----+-------+--------------+------+------+------+
The difference between an INNER JOIN and a LEFT JOIN is that an INNER JOIN will only return a result if rows from BOTH joined tables match. In a LEFT JOIN, matching rows from the first table will ALWAYS be returned, regardless of whether anything is found to join to. In a lot of cases it doesn't matter which one you use, but it's important to choose the right one so as not to get unexpected results down the line.
So in this case, the suggested query of:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = 'optout')
WHERE metadata.mdid IS NULL;
Will return the same result set as above:
+-----+-------+--------------+------+------+------+
| pid | title | content | mdid | k | v |
+-----+-------+--------------+------+------+------+
| 1 | Foo | Some content | NULL | NULL | NULL |
+-----+-------+--------------+------+------+------+
Hopefully that clears it up! Joins are a great thing to learn about, having a full understanding of when to use which one is a very good thing.

You can try something like
select p.*
from posts p
where NOT EXISTS (
select pid
from metadata
where k = 'optout'
and pid = p.pid
)

Related

Oracle SQL query comparing multiple rows with same identifier

I'm honestly not sure how to title this - so apologies if it is unclear.
I have two tables I need to compare. One table contains tree names and nodes that belong to that tree. Each Tree_name/Tree_node combo will have its own line. For example:
Table: treenode
| TREE_NAME | TREE_NODE |
|-----------|-----------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 1 | E |
| 2 | A |
| 2 | B |
| 2 | D |
| 3 | C |
| 3 | D |
| 3 | E |
| 3 | F |
I have another table that contains names of queries and what tree_nodes they use. Example:
Table: queryrecord
| QUERY | TREE_NODE |
|---------|-----------|
| Alpha | A |
| Alpha | B |
| Alpha | D |
| BRAVO | A |
| BRAVO | B |
| BRAVO | D |
| CHARLIE | A |
| CHARLIE | B |
| CHARLIE | F |
I need to create an SQL where I input the QUERY name, and it returns any ‘TREE_NAME’ that includes all the nodes associated with the query. So if I input ‘ALPHA’, it would return TREE_NAME 1 & 2. If I ask it for CHARLIE, it would return nothing.
I only have read access, and don’t believe I can create temp tables, so I’m not sure if this is possible. Any advice would be amazing. Thank you!
You can use group by and having as follows:
Select t.tree_name
From tree_node t
join query_record q
on t.tree_node = q.tree_node
WHERE q.query = 'ALPHA'
Group by t.tree_name
Having count(distinct t.tree_node)
= (Select count(distinct q.tree_node) query_record q WHERE q.query = 'ALPHA');
Using an IN condition (a semi-join, which saves time over a join):
with prep (tree_node) as (select tree_node from queryrecord where query = :q)
select tree_name
from treenode
where tree_node in (select tree_node from prep)
group by tree_name
having count(*) = (select count(*) from prep)
;
:q in the prep subquery (in the with clause) is the bind variable to which you will assign the various QUERY values at runtime.
EDIT
I don't generally set up the test case on online engines; but in a comment below this answer, the OP said the query didn't work for him. So, I set up the example on SQLFiddle, here:
http://sqlfiddle.com/#!4/b575e/2
A couple of notes: for some reason, SQLFiddle thinks table names should be at most eight characters, so I had to change the second table name to queryrec (instead of queryrecord). I changed the name in the query, too, of course. And, second, I don't know how I can give bind values on SQLFiddle; I hard-coded the name 'Alpha'. (Note also that in the OP's sample data, this query value is not capitalized, while the other two are; of course, text values in SQL are case sensitive, so one should pay attention when testing.)
You can do this with a join and aggregation. The trick is to count the number of nodes in query_record before joining:
select qr.query, t.tree_name
from (select qr.*,
count(*) over (partition by query) as num_tree_node
from query_record qr
) qr join
tree_node t
on t.tree_node = qr.tree_node
where qr.query = 'ALPHA'
group by qr.query, t.tree_name, qr.num_tree_node
having count(*) = qr.num_tree_node;
Here is a db<>fiddle.

1 to Many Query: Help Filtering Results

Problem: SQL Query that looks at the values in the "Many" relationship, and doesn't return values from the "1" relationship.
Tables Example: (this shows two different tables).
+---------------+----------------------------+-------+
| Unique Number | <-- Table 1 -- Table 2 --> | Roles |
+---------------+----------------------------+-------+
| 1 | | A |
| 2 | | B |
| 3 | | C |
| 4 | | D |
| 5 | | |
| 6 | | |
| 7 | | |
| 8 | | |
| 9 | | |
| 10 | | |
+---------------+----------------------------+-------+
When I run my query, I get multiple, unique numbers that show all of the roles associated to each number like so.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 4 | C |
| 4 | A |
| 5 | B |
| 5 | C |
| 5 | D |
| 6 | D |
| 6 | A |
+---------------+-------+
I would like to be able to run my query and be able to say, "When the role of A is present, don't even show me the unique numbers that have the role of A".
Maybe if SQL could look at the roles and say, WHEN role A comes up, grab unique number and remove it from column 1.
Based on what I would "like" to happen (I put that in quotations as this might not even be possible) the following is what I would expect my query to return.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 5 | B |
| 5 | C |
| 5 | D |
+---------------+-------+
UPDATE:
Query Example: I am querying 8 tables, but I condensed it to 4 for simplicity.
SELECT
c.UniqueNumber,
cp.pType,
p.pRole,
a.aRole
FROM c
JOIN cp ON cp.uniqueVal = c.uniqueVal
JOIN p ON p.uniqueVal = cp.uniqueVal
LEFT OUTER JOIN a.uniqueVal = p.uniqueVal
WHERE
--I do some basic filtering to get to the relevant clients data but nothing more than that.
ORDER BY
c.uniqueNumber
Table sizes: these tables can have anywhere from 50,000 rows to 500,000+
Pretending the table name is t and the column names are alpha and numb:
SELECT t.numb, t.alpha
FROM t
LEFT JOIN t AS s ON t.numb = s.numb
AND s.alpha = 'A'
WHERE s.numb IS NULL;
You can also do a subselect:
SELECT numb, alpha
FROM t
WHERE numb NOT IN (SELECT numb FROM t WHERE alpha = 'A');
Or one of the following if the subselect is materializing more than once (pick the one that is faster, ie, the one with the smaller subtable size):
SELECT t.numb, t.alpha
FROM t
JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') = 0) AS s USING (numb);
SELECT t.numb, t.alpha
FROM t
LEFT JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') > 0) AS s USING (numb)
WHERE s.numb IS NULL;
But the first one is probably faster and better[1]. Any of these methods can be folded into a larger query with multiple additional tables being joined in.
[1] Straight joins tend to be easier to read and faster to execute than queries involving subselects and the common exceptions are exceptionally rare for self-referential joins as they require a large mismatch in the size of the tables. You might hit those exceptions though, if the number of rows that reference the 'A' alpha value is exceptionally small and it is indexed properly.
There are many ways to do it, and the trade-offs depend on factors such as the size of the tables involved and what indexes are available. On general principles, my first instinct is to avoid a correlated subquery such as another, now-deleted answer proposed, but if the relationship table is small then it probably doesn't matter.
This version instead uses an uncorrelated subquery in the where clause, in conjunction with the not in operator:
select num, role
from one_to_many
where num not in (select otm2.num from one_to_many otm2 where otm2.role = 'A')
That form might be particularly effective if there are many rows in one_to_many, but only a small proportion have role A. Of course you can add an order by clause if the order in which result rows are returned is important.
There are also alternatives involving joining inline views or CTEs, and some of those might have advantages under particular circumstances.

Use JOIN on multiple columns multiple times

I am trying to figure out the best way to use a JOIN in MSSQL in order to do the following:
I have two tables. One table contains technician IDs and an example of one data set would be as follows:
+--------+---------+---------+---------+---------+
| tagid | techBid | techPid | techFid | techMid |
+--------+---------+---------+---------+---------+
| 1-1001 | 12 | 0 | 11 | 6 |
+--------+---------+---------+---------+---------+
I have another table that stores the names of these technicians:
+------+-----------+
| TTID | SHORTNAME |
+------+-----------+
| 11 | Steven |
| 12 | Mark |
| 6 | Pierce |
+------+-----------+
If the ID of a technician in the first table is 0, there is no technician of that type for that row (types are either B, P, F, or M).
I am trying to come up with a query that will give me a result that contains all of the data from table 1 along with the shortnames from table 2 IF there is a matching ID, so the result would look something like the following:
+--------+---------+---------+---------+---------+----------------+----------------+----------------+----------------+
| tagid | techBid | techPid | techFid | techMid | techBShortName | techPShortName | techFShortName | techMShortName |
+--------+---------+---------+---------+---------+----------------+----------------+----------------+----------------+
| 1-1001 | 12 | 0 | 11 | 6 | Mark | NULL | Steven | Pierce |
+--------+---------+---------+---------+---------+----------------+----------------+----------------+----------------+
I am trying to use a JOIN to do this, but I cannot figure out how to join on multiple columns multiple times to where it would look something like
Select table1.tagid, table1.techBid, table1.techPid, table1.techFid, table1.techMid, table2.shortname
FROM table1
INNER JOIN table2 on //Dont know what to put here
You need to use left joins like this:
Select table1.tagid, table1.techBid, table1.techPid, table1.techFid, table1.techMid,
t2b.shortname, t2p.shortname, t2f.shortname, t2m.shortname,
FROM table1
LEFT JOIN table2 t2b on table1.techBid = t2b.ttid
LEFT JOIN table2 t2p on table1.techPid = t2p.ttid
LEFT JOIN table2 t2f on table1.techFid = t2f.ttid
LEFT JOIN table2 t2m on table1.techMid = t2m.ttid
you just do mutiple left join
select tech.techPid, techPname.SHORTNAME
, tech.techFid, techFname.SHORTNAME
from tech
left join techName as techPname
on tech.techPid = techPname.TTID
left join techName as techFname
on tech.techFid = techFname.TTID

Oracle 10 SQL: FULL JOIN through Cross Reference Table

http://sqlfiddle.com/#!4/24637/1
I have three tables, (better details/data shown in sqlfiddle link), one replacing another, and a cross reference table in between. One of the fields in each of the table uses the cross reference (version), and another one of the fields in each of the tables is the same (changeID).
I need a query that when passed a list of new_version + new_changeType, along with the equivalent original_version + old_changeType (if there is an old version equivalent) PLUS any old changeIDs that were 'missed' in the conversion of data.
TABLES (fields on the same line are equivalent)
OLD_table | XREF_table | NEW_Table
original_version | original_version |
changeID | | changeID
OLD_changeType | |
| new_version | new_version
| | NEW_changeType
DATA
111,1,CT1 | 111,AAA | AAA,1,ONE
111,2,CT2 | 222,BBB | AAA,2,TWO
222,1,CT1 | 333,DDD | BBB,1,ONE
222,2,CT2 | | BBB,2,TWO
222,3,CT3 | | CCC,1,ONE
333,1,CT1 | |
444,1,CT1 | |
If passed the following list, the result set should look like so. (order doesnt matter)
AAA,BBB,CCC
| NEW_VERSION | NEW_CHANGE_TYPE| ORIGINAL_VERSION | CHANGEID | OLD_CHANGE_TYPE |
|-------------|----------------|------------------|----------|-----------------|
| AAA | ONE | 111 | 1 | CT1 |
| AAA | TWO | 111 | 2 | CT2 |
| BBB | ONE | 222 | 1 | CT1 |
| BBB | TWO | 222 | 2 | CT2 |
| CCC | ONE | (null) | (null) | (null) |
| (null) | (null) | 222 | 3 | CT3 |
I'm having trouble getting ALL the data required. I've played with the following query, however I seem to either 1) miss a row or 2) get additional rows not matching the requirements.
The following queries I've played with are as follows.
select
a.new_version,
a.Change_type,
c.original_version,
c.changeID,
c.OLD_Change_type
from NEW_TABLE a
LEFT OUTER JOIN XREF_TABLE b on a.new_version = b.new_version
FULL OUTER JOIN OLD_TABLE c on
b.original_version = c.original_version and a.changeID = c.changeID
where (b.new_version in ('AAA','BBB','CCC') or b.new_version is null);
select
a.new_version,
a.Change_type,
c.original_version,
c.changeID,
c.OLD_Change_type
from NEW_TABLE a
FULL JOIN XREF_TABLE b on a.new_version = b.new_version
FULL JOIN OLD_TABLE c on
b.original_version = c.original_version and a.changeID = c.changeID
where (a.new_version in ('AAA','BBB','CCC'));
The first returns one 'extra' row with the 333,DDD data, which is not specified from the input.
The seconds returns one less row (with the changeID from the old table "missed" from when this data was converted over.
Any thoughts or suggestions on how to solve this?
First inner join old_table and xref_table, as you are not interested in any old_table entries without an xref_table entry. Then full outer join new_table. In your WHERE clause be aware that new_table.new_version can be null, so use coalesce to use xref_table.new_version in this case to limit your results to AAA, BBB and CCC. That's all.
select
coalesce(n.new_version, x.new_version) as new_version,
n.change_type,
o.original_version,
o.changeid,
o.old_change_type
from old_table o
inner join xref_table x
on x.original_version = o.original_version
full outer join new_table n
on n.new_version = x.new_version
and n.changeid = o.changeid
where coalesce(n.new_version, x.new_version) in ('AAA','BBB','CCC')
order by 1,2,3,4,5
;
Here is your fiddle: http://sqlfiddle.com/#!4/24637/11.
BTW: Better never use random aliases like a, b and c that don't indicate what table is meant. That makes the query harder to understand. Use the table's first letter(s) or an acronym instead.

SQL LEFT JOIN help

My scenario: There are 3 tables for storing tv show information; season, episode and episode_translation.
My data: There are 3 seasons, with 3 episodes each one, but there is only translation for one episode.
My objetive: I want to get a list of all the seasons and episodes for a show. If there is a translation available in a specified language, show it, otherwise show null.
My attempt to get serie 1 information in language 1:
SELECT
season_number AS season,number AS episode,name
FROM
season NATURAL JOIN episode
NATURAL LEFT JOIN episode_trans
WHERE
id_serie=1 AND
id_lang=1
ORDER BY
season_number,number
result:
+--------+---------+--------------------------------+
| season | episode | name |
+--------+---------+--------------------------------+
| 3 | 3 | Episode translated into lang 1 |
+--------+---------+--------------------------------+
expected result
+-----------------+--------------------------------+
| season | episode| name |
+-----------------+--------------------------------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 2 | 1 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 3 | 1 | NULL |
| 3 | 2 | NULL |
| 3 | 3 | Episode translated into lang 1 |
+--------+--------+--------------------------------+
Full DB dump
http://pastebin.com/Y8yXNHrH
I tested the following on MySQL 4.1 - it returns your expected output:
SELECT s.season_number AS season,
e.number AS episode,
et.name
FROM SEASON s
JOIN EPISODE e ON e.id_season = s.id_season
LEFT JOIN EPISODE_TRANS et ON et.id_episode = e.id_episode
AND et.id_lang = 1
WHERE s.id_serie = 1
ORDER BY s.season_number, e.number
Generally, when you use ANSI-92 JOIN syntax you need to specify the join criteria in the ON clause. In MySQL, I know that not providing it for INNER JOINs results in a cross join -- a cartesian product.
LEFT JOIN episode_trans
ON episode_trans.id_episode = episode.id_episode
AND episode_trans.id_lang = 1
WHERE id_serie=1
You probably need to move the id_lang = 1 into the LEFT JOIN clause instead of the WHERE clause. Think of it this way... for all of those rows with no translation the LEFT JOIN gives you back NULLs for all of those translation columns. Then in the WHERE clause you are checking to see if that is equal to 1 - which of course evaluates to FALSE.
It would probably be easier if you included your code in the question next time instead of in a link.
Can you try using
LEFT OUTER JOIN
instead of
NATURAL LEFT JOIN