Need help with a complex SQL query - I think I need a two-stage inner join or something like that? - sql

Okay, here's what I'm trying to do. I have a drupal table (term_data) in mysql that lists tags and their ID numbers and a second table, (term_node) that shows the relationship between tags and the data the tags refer to. For example, if node 1 had 3 tags, "A", "B" and "C". term_data might look like this:
name tid
A 1
B 2
C 3
and term_node might look like this:
nid tid
1 1
1 2
2 2
3 3
3 2
In this example, node 1 has been tagged with "A" and "B", node 2 has been tagged with "A" and node 3 has been tagged with "B", and "C".
I need to write a query that, given a tag name, list for me all the OTHER tags that are ever used with that tag. In the above example, searching on "A" should return "A" and "B" because node 1 uses both, searching on "C" should return "B" and "C", and searching on "B" should return "A", "B" and "C".
Any ideas? I got this far:
select distinct n.nid from term_node n INNER join term_data t where n.tid = t.tid and t.name='A';
Which gives me a list of every node that has been tagged with "A" - but I can't figure out the next step.
Can anyone help me out?

Try:
select distinct d2.name
from term_data d1
join term_node n1 on d1.tid = n1.tid
join term_node n2 on n1.nid = n2.nid
join term_data d2 on n2.tid = d2.tid
where d1.name = 'A'

Updated: Mark pointed out that the query wasn't correct.
SELECT DISTINCT t.name, t2.name Other
FROM
term_data t
INNER JOIN term_node n ON t.tid = n.tid
INNER JOIN term_node n2 ON n2.nid = n.nid
INNER JOIN term_data t2 ON n2.tid = t2.tid
WHERE
t.name = 'A'
Marks answer should be accepted since he got it right first. Here is a demonstration of a similar query
https://data.stackexchange.com/stackoverflow/query/13283/demo-for-need-help-with-a-complex-sql-query

Your description of term_node data and the example do not seem to match but using the example data provided I believe the following query will do what you need.
select distinct td.name, td2.name as tagged_name
from term_data td
inner join term_node tn
on tn.tid = td.tid
inner join term_node tn2
on tn2.nid = tn.nid
inner join term_data td2
on td2.tid = tn2.tid
The first join looks up the term_node records that match the name, term_node is then joined to itself to find all other tid's for that node, finally the second term_node is joined to term_data to retrieve the names of the tag.
You need to tack on the appropriate where clause to select just the tag you want.
Result set follows for above:-
name tagged_name
A A
A B
B A
B B
B C
C B
C C
Hope this helps
Ray

I created the schema in my workbench, and here's the query I came up with:
SELECT * FROM `term_data` WHERE `term_data`.`tid` IN (
SELECT `term_node`.`tid` from `term_node` WHERE `nid` IN (
SELECT `nid` FROM `term_node` JOIN `term_data` ON `term_data`.`tid` = `term_node`.`tid` WHERE `term_data`.`name` = 'A'
)
);
Sorry for the structure ;) Here's SHOW CREATE TABLE for both tables:
CREATE TABLE `term_data` (
`tid` int(11) NOT NULL,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`tid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `term_node` (
`term_node_id` int(11) NOT NULL,
`nid` int(11) NOT NULL,
`tid` varchar(45) DEFAULT NULL,
PRIMARY KEY (`term_node_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This seemed to work as expected, if I understood your question correctly. So one more time, we have some nodes which are tagged. We'd like to select a tag (A), and then select other tags that were used to tag same nodes as tag A.
Cheers.
P.S. Output is the following:
tid name
/* For tag A */
1 A
2 B
/* For tag B */
1 A
2 B
3 C
/* For tag C */
2 B
3 C

Related

Data Comparison between Two Tables

I have what should be simple (maybe) and I am just struggling with it.
Here is the scenario:
TABLE 1 contains all the data
TABLE 2 contains only a subset
I need a query that will look at table 1 and give a list of items that are not in table 2. Below is what I have but I know its not performing as such.
SELECT c.[DOC_ID], d.[DOCID]
FROM [dbo].[Custom_SUAM_Docuware] d
LEFT JOIN [dbo].[Custom_SUAM_Content] c ON (c.[DOC_ID] = d.[DOCID])
WHERE c.[DOC_ID] IS NULL
OR d.[DOCID] IS NULL
You are describing a not exists scenario.
You can't expect to return data from c since by definition what you want doesn't exist:
select d.DOCID
from dbo.Custom_SUAM_Docuware d
where not exists(
select * from dbo.Custom_SUAM_Content c
where c.DOC_ID = d.DOCID
);
you can use EXCEPT
SELECT c.[DOC_ID]
FROM [dbo].[Custom_SUAM_Content] c
EXCEPT
SELECT d.[DOC_ID]
FROM [dbo].[Custom_SUAM_Docuware] d ;
that would show all ids from c that are not in d

How to SELECT a.* FROM a WHERE EXCEPT SELECT b.* FROM b WHERE a.id != b.id

I am having quite a struggle with the presented postgres request.
I have a table objects with a few columns, including an id column.
I have a table object_couples that references couples of objects with id. This table contains in consequence 2 columns of ids.
I have an external variable, like int external_variable = 42.
I am trying to select every entry of the objects table where the id of the selected object and the id of the external_variable does not exist as a couple in the object_couples table.
My request looks like the following :
SELECT id, c1, c2
FROM objects
WHERE condition1 AND condition2
EXCEPT SELECT left_id, right_id
FROM object_couples
WHERE objects.id != object_couples.left_id
AND external_variable != object_couples.right_id;
What can I do?
EDIT 1 :
The following request is not rejected but causes in pycharm a code 137(SIGKILL) :
SELECT id, c1, c2
FROM objects AS S
INNER JOIN object_couples
ON object_couples.left_id != S.id
AND object_couples.right_id != external_variable
WHERE S.c1 > 1234 AND S.c2 < 5678```
I am thinking not exists:
select o.*
from objects o
where not exists (select 1
from object_couples oc
where (oc.id = oc.left and 42 = oc.right) or
(oc.id = oc.right and 42 = oc.left)
);
For performance, you might find that this works better:
select o.*
from objects o
where not exists (select 1
from object_couples oc
where oc.id = oc.left and 42 = oc.right
) and
not exists (select 1
from object_couples oc
where oc.id = oc.right and 42 = oc.left
);
In particular if you have indexes on object_couples(left, right) and object_couples(right, left) then this might even be fast.

How to create a subset query in sql?

I have two tables as follows:
CREATE List (
id INTEGER,
type INTEGER REFERENCES Types(id),
data TEXT,
PRIMARY_KEY(id, type)
);
CREATE Types (
id INTEGER PRIMARY KEY,
name TEXT
);
Now I want to create a query that determines all ids of List which has given type strings.
For example,
List:
1 0 "Some text"
1 1 "Moar text"
2 0 "Foo"
3 1 "Bar"
3 2 "BarBaz"
4 0 "Baz"
4 1 "FooBar"
4 2 "FooBarBaz"
Types:
0 "Key1"
1 "Key2"
2 "Key3"
Given the input "Key1", "Key2", the query should return 1, 4.
Given the input "Key2", "Key3", the query should return 3, 4.
Given the input "Key2", the query should return 1, 3, 4.
Thanks!
select distinct l.id
from list l
inner join types t on t.id = l.type
where t.name in ('key1', 'key2')
group by l.id
having count(distinct t.id) = 2
You have to adjust the having clause to the number of keys you are putting in your where clause. Example for just one key:
select distinct l.id
from list l
inner join types t on t.id = l.type
where t.name in ('key2')
group by l.id
having count(distinct t.id) = 1
SQlFiddle example
You can use the following trick to extend Jurgen's idea:
with keys as (
select distinct t.id
from types t
where t.name in ('key1', 'key2')
)
select l.id
from list l join
keys k
on l.type = keys.id cross join
(select count(*) as keycnt from keys) k
group by l.id
having count(t.id) = max(k.keycnt)
That is, calculate the matching keys in a subquery, and then use this for the counts. This way, you only have to change one line to put in key values, and you can have as many keys as you would like. (Just as a note, I haven't tested this SQL so I apologize for any syntax errors.)
If you can dynamically produce the SQL, this may be one of the most efficent ways, in many DBMS:
SELECT l.id
FROM List l
JOIN Types t1 ON t1.id = l.type
JOIN Types t2 ON t2.id = l.type
WHERE t1.name = 'Key1'
AND t2.name = 'Key2' ;
See this similar question, with more than 10 ways to get the same result, plus some benchmarks (for Postgres): How to filter SQL results in a has-many-through relation

Tricky SQLite query, could use some assistance

I have a rather confusing SQLite query that I can't seem to quite wrap my brain around.
I have the following four tables:
Table "S"
sID (string/guid) | sNum (integer)
-----------------------------------
aaa-aaa 1
bbb-bbb 2
ccc-ccc 3
ddd-ddd 4
eee-eee 5
fff-fff 6
ggg-ggg 7
Table "T"
tID (string/guid) | ... other stuff
-----------------------------------
000
www
xxx
yyy
zzz
Table "S2TMap"
sID | tID
-------------------
aaa-aaa 000
bbb-bbb 000
ccc-ccc xxx
ddd-ddd yyy
eee-eee www
fff-fff 000
ggg-ggg 000
Table "temp"
oldID (string/guid) | newID (string/guid)
------------------------------------------
dont care fff-fff
dont care ggg-ggg
dont care zzz
What I need is to be able to get the MAX() sNum that exists in a specified "t" if the sID doesn't exist in the temp.NewID table.
For example, given the T '000', '000' has S 'aaa-aaa', 'bbb-bbb', 'fff-fff', and 'ggg-ggg' mapped to it. However, both 'fff-fff' and 'ggg-ggg' exist in the TEMP table, which means I need to only look at 'aaa-aaa' and 'bbb-bbb'. Thus, the statement would return "2".
How would I go about doing this?
I was thinking something along the lines of the following for selecting s that don't exist in the "temp" table, but I'm not sure how to get the max of the seat and only do it based on a specific 't'
SELECT s.sID, s.sNum FROM s WHERE NOT EXISTS ( SELECT newID from temp where tmp.newID = s.sID)
Thanks!
Give this a try:
select max(s.sNum) result from s2tmap st
join s on st.sId = s.sId
where st.tId = '000' and not exists (
select * from temp
where temp.newId = st.sId)
Here is the fiddle to play with.
Another option, probably less efficient would be:
select max(s.sNum) result from s2tmap st
join s on st.sId = s.sId
where st.tId = '000' and st.sId not in (
select newId from temp)
The following query should give you a list of Ts and their max sNums (as long as all exist in S and S2TMap):
SELECT t.tID, MAX(sNum)
FROM S s
JOIN S2TMap map on s.sID=map.sID
JOIN T t on map.tId=t.tID
LEFT JOIN temp tmp on s.sID=tmp.newID
WHERE tmp.newID IS NULL
You were close, you just had to join on S2TMap and then to T in order to restrict the result set to a given T.
SELECT MAX(s.sNum)
FROM s
INNER JOIN S2TMap m on m.sID = s.sID
INNER JOIN t on t.tID = m.tID
WHERE t.tID = '000'
AND NOT EXISTS (
SELECT newID FROM temp WHERE temp.newID = s.sID
)

MySQL: Select pages that are not tagged?

I have a db with two tables like these below,
page table
pg_id title
1 a
2 b
3 c
4 d
tagged table
tagged_id pg_id
1 1
2 4
I want to select the pages which are tagged, I tried with this query below but doesn't work,
SELECT *
FROM root_pages
LEFT JOIN root_tagged ON ( root_tagged.pg_id = root_pages.pg_id )
WHERE root_pages.pg_id != root_tagged.pg_id
It returns zero - Showing rows 0 - 1 (2 total, Query took 0.0021 sec)
But I want it to return
pg_id title
2 b
3 c
My query must have been wrong?
How can I return the pages which are not tagged correctly?
SELECT *
FROM root_pages
LEFT JOIN root_tagged ON root_tagged.pg_id = root_pages.pg_id
WHERE root_tagged.pg_id IS NULL
The != (or <>) operator compare two values, but cannot be used for NULL.
NULL = NULL returns false
NULL = 0 returns false
NULL != NULL returns false
You get the point, to check for NULL you should use the IS or IS NOT operator.
If your density to tag to pages is more than 2:1 or so, then using NOT EXISTS will be faster than using LEFT JOIN + IS NULL
SELECT *
FROM root_pages
WHERE NOT EXISTS (
SELECT *
FROM root_tagged
WHERE root_tagged.pg_id = root_pages.pg_id )
It is an alternative that more clearly states what you are looking for, a non-existence.
For the strikeout text above:
The question is MySQL specific, and assuming root_tagged.pg_id is not nullable, LEFT JOIN + IS NULL is implemented using ANTI-JOIN which is the same strategy as NOT EXISTS, except there seems to be some overhead added by NOT EXISTS, so LEFT JOIN is supposed to work faster.