sql selection from one-to-many table - sql

I have got 3 tables with those columns below:
Topics:
[TopicID] [TopicName]
Messages:
[MessageID] [MessageText]
MessageTopicRelations
[EntryID] [MessageID] [TopicID]
messages can be about more than one topic. question is: given couple of topics, I need to get messages which are about ALL these topics and not the less, but they can be about some other topic too. a message which is about SOME of these given topics won't be included. I hope I explained my request well. otherwise I can provide sample data. thanks

The following use x, y, and z to stand in for topic ids, being that none were provided for examples.
Using JOINs:
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
JOIN TOPICS tx ON tx.topicid = mtr.topicid
AND tx.topicid = x
JOIN TOPICS ty ON ty.topicid = mtr.topicid
AND ty.topicid = y
JOIN TOPICS tz ON tz.topicid = mtr.topicid
AND tz.topicid = z
Using GROUP BY/HAVING COUNT(*):
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
JOIN TOPICS t ON t.topicid = mtr.topicid
WHERE t.topicid IN (x, y, z)
GROUP BY m.messageid, m.messagetext
HAVING COUNT(*) = 3
Of the two, the JOIN approach is safer.
The GROUP BY/HAVING relies on the MESSAGETOPICRELATIONS.TOPICID being either part of the primary key, or having a unique key constraint to ensure there aren't duplicates. Otherwise, you could have 2+ instances of the same topic associated to a message - which would be a false positive. Using HAVING COUNT(DISTINCT ... would clear up any false positives, but support depends on the database - MySQL supports it at 5.1+, but not on 4.1. Oracle might, have to wait till Monday to test on SQL Server...
I looked into Bill's comment about not needing the join to the TOPICS table:
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
AND mtr.topicid IN (x, y, z)
...will return false positives - rows that match at least one of the values defined in the IN clause. And:
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
AND mtr.topicid = x
AND mtr.topicid = y
AND mtr.topicid = z
...won't return anything at all, because the topicid can never be all of the values at once.

Here's a profoundly inelegant solution
SELECT
m.MessageID
,m.MessageText
FROM
Messages m
WHERE
m.MessageID IN (
SELECT
mt.MessageID
FROM
MessageTopicRelations mt
WHERE
TopicID IN (1,4,5)// List of topic IDS
GROUP BY
mt.MessageID
HAVING
count(*) = 3 //Number of topics
)

Edit: thanks to #Paul Creasey and #OMG Ponies for finding the flaws in my approach.
The correct way to do this is with a self-join for each topic; as shown in the leading answer.
Another profoundly inelegant entry:
select m.MessageText
, t.TopicName
from Messages m
inner join MessageTopicRelations mtr
on mtr.MessageID = m.MessageID
inner join Topics t
on t.TopicID = mtr.TopicID
and
t.TopicName = 'topic1'
UNION
select m.MessageText
, t.TopicName
from Messages m
inner join MessageTopicRelations mtr
on mtr.MessageID = m.MessageID
inner join Topics t
on t.TopicID = mtr.TopicID
and
t.TopicName = 'topic2'
...

Re: the answer by OMG Ponies, you don't need to join to the TOPICS table. And the HAVING COUNT(DISTINCT) clause works fine in MySQL 5.1. I just tested it.
This is what I mean:
Using GROUP BY/HAVING COUNT(*):
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
WHERE mtr.topicid IN (x, y, z)
GROUP BY m.messageid
HAVING COUNT(DISTINCT mtr.topicid) = 3
The reason that I suggest COUNT(DISTINCT) is that if the columns (messageid,topicid) don't have a unique constraint, you could get duplicates, which would result in a count of 3 in the group, even with fewer than three distinct values.

Related

SQL query takes too long to run in SSMS

I've the following query which works absolutely fine in Oracle developer
select TRIM(a.filterh)
from CHANNEL a,GENRE b
where b.label = 'M001CL01_ABC'
and a.s_r_id = b.r_id
and a.filterh in (select c.filterh
from CHANNEL c, GENRE d
where d.label = 'M001AL03' and c.s_r_id = d.r_id)
I just tried to simplify the above query and some syntax change for SQL developer and the below query takes a lot of time to run in SSMS
select TRIM(a.filterh)
from CHANNEL a
inner join GENRE b on a.s_r_id = b.r_id
where b.label = 'M001CL01_ABC'
and a.filterh in (select c.filterh
from CHANNEL c
inner join GENRE d on c.s_r_id = d.r_id
where d.label = 'M001AL03')
I just wish to understand what am I doing wrong here, how can I improve my query and why the SQL query takes a lot of time.
Thank you.
See if EXISTS instead of IN improves the situation any better.
select TRIM(a.filterh)
from CHANNEL a
inner join GENRE b on a.s_r_id = b.r_id and b.label = 'M001CL01_ABC'
where exists(select *
from CHANNEL c
inner join GENRE d on c.s_r_id = d.r_id and d.label = 'M001AL03'
where a.filterh = c.filterh)
It looks like you can probably do the following, hard to say for sure without seeing the data and being able to test:
select TRIM(a.filterh)
from CHANNEL a
inner join GENRE b on a.s_r_id = b.r_id
where b.label = 'M001CL01_ABC'
and exists (select * from GENRE g where g.r_id=a.s_r_id and g.label='M001AL03')
Here is my version of the query. Please keep in mind that since there is no data to test on, I couldn't test it propertly, so it is up to you to check it out, whether it could be compiled.
Here what I've done is just removed the subquery because they often lead to mistakes of query optimizer
select TRIM(a.filterh)
from CHANNEL a
join GENRE b on a.s_r_id = b.r_id
join CHANNEL c on a.filterh = c.filterh
join GENRE d on c.s_r_id = d.r_id
where b.label = 'M001CL01_ABC'
and d.label = 'M001AL03'
Another point to mention is that performance issues may depend on many things. For example quantity of rows, indexes, storage device etc. If you post a similar quesion in future it is better to provide us with execution plans and statistics turned on while executing the query.
If query runs slowly, ON A TEST INSTANCE try to create an index on channel.filterg and another one on genre.label columns.
I think you can use aggregation in either database:
select TRIM(c.filterh)
from CHANNEL c join
GENRE g
on g.s_r_id = c.r_id
where g.label in ('M001CL01_ABC', 'M001AL03')
group by TRIM(c.filterh)
having count(distinct g.label) = 2;
You should learn to use meaningful table aliases. Arbitrary letters such as a and b do not help anyone understand the query.

How can this be translated into postgresql?

I am fairly new to sql in general, and I am trying to work with a database which is extremely large. Now, on the website's examples, there is this query
SELECT m.chembl_id AS compound_chembl_id,
s.canonical_smiles,
r.compound_key,
NVL(TO_CHAR(d.pubmed_id),d.doi) AS pubmed_id_or_doi,
a.description AS assay_description, act.standard_type,
act.standard_relation,
act.standard_value,
act.standard_units,
act.activity_comment
FROM compound_structures s,
molecule_dictionary m,
compound_records r,
docs d,
activities act,
assays a,
target_dictionary t
WHERE s.molregno (+) = m.molregno
AND m.molregno = r.molregno
AND r.record_id = act.record_id
AND r.doc_id = d.doc_id
AND act.assay_id = a.assay_id
AND a.tid = t.tid
AND t.chembl_id = 'CHEMBL1827';
because of this (+) and this NVL I assumed it is Oracle. I am working on pgadmin4 and all my attempts at translating this after doing some research, resulted in errors. Example of my attempt and the error given bellow
SELECT m.chembl_id AS compound_chembl_id,
s.canonical_smiles,
r.compound_key,
COALESCE(CAST(d.pubmed_id AS varchar),d.doi) AS pubmed_id_or_doi,
a.description AS assay_description,
act.standard_type,
act.standard_relation,
act.standard_value,
act.standard_units,
act.activity_comment
FROM compound_structures s,
compound_records r,
docs d,
activities act,
assays a,
target_dictionary t
LEFT OUTER JOIN molecule_dictionary m ON s.molregno = m.molregno
WHERE m.molregno = r.molregno
AND r.record_id = act.record_id
AND r.doc_id = d.doc_id
AND act.assay_id = a.assay_id
AND a.tid = t.tid
AND t.chembl_id = 'CHEMBL1827';
ERROR: invalid reference to FROM-clause entry for table "s"
LINE 16: LEFT OUTER JOIN molecule_dictionary m ON s.molregno = m.molr...
^
HINT: There is an entry for table "s", but it cannot be referenced from this part of the
query.
SQL state: 42P01
Character: 492
Could someone help me?
Don't mix the old, ancient implicit joins and explicit JOIN operator:
SELECT m.chembl_id AS compound_chembl_id,
s.canonical_smiles,
r.compound_key,
COALESCE(CAST(d.pubmed_id AS varchar),d.doi) AS pubmed_id_or_doi,
a.description AS assay_description,
act.standard_type,
act.standard_relation,
act.standard_value,
act.standard_units,
act.activity_comment
FROM compound_structures s
LEFT JOIN molecule_dictionary m ON s.molregno = m.molregno
JOIN compound_records r ON m.molregno = r.molregno
JOIN docs d ON r.doc_id = d.doc_id
JOIN activities act ON r.record_id = act.record_id
JOIN assays a ON act.assay_id = a.assay_id
JOIN target_dictionary t ON a.tid = t.tid
WHERE t.chembl_id = 'CHEMBL1827';
I hope I got the direction of the outer join correct - I haven't used that Oracle syntax for decades (note that even Oracle recommends to stop using it).
But the inner join on the following tables effectively turns that outer join back into an inner join - I don't think the Oracle syntax changes that (meaning: I think that attempt on an outer join in the original query was wrong to begin with). So maybe you can simplify the LEFT JOIN to a JOIN. I am not sure if that outer join actually makes sense in the Oracle query.
Consider:
SELECT
m.chembl_id AS compound_chembl_id,
s.canonical_smiles,
r.compound_key,
COALESCE((d.pubmed_id)::text, d.doi) AS pubmed_id_or_doi,
a.description AS assay_description,
act.standard_type,
act.standard_relation,
act.standard_value,
act.standard_units,
act.activity_comment
FROM
compound_structures s
INNER JOIN molecule_dictionary m ON s.molregno = m.molregno
INNER JOIN compound_records r ON m.molregno = r.molregno
INNER JOIN docs d ON r.doc_id = d.doc_id
INNER JOIN activities act ON r.record_id = act.record_id
INNER JOIN assays a ON act.assay_id = a.assay_id
INNER JOIN target_dictionary t ON a.tid = t.tid AND t.chembl_id = 'CHEMBL1827'
Rationale:
use standard, explicit joins everywhere; old school joins have been deprecated 20 years ago!
COALESCE() is more standard than NVL()
a LEFT JOIN followed by INNER JOINs that relate to columns coming for the left table is actually equivalent to INNER JOIN

Additional select if one select result is not null

I am interested in the results found in table g, which shares a key, sample_name, with tables s and l. In this question the tables are
s - samples,
p - projects,
l - analyses, and
g, analysis g,
all within schema a.
In the interest of optimization, I only want to look for table g after having confirmed that l.analysis_g is NOT NULL.
Given: The only information that I start out with is the project names. The project table, p is linked with other tables by the samples table s. s is linked to every table. Table l contains types of analysis and each column is either NULL or 1.
In the example below I am trying a case but I realize this may be totally incorrect.
SELECT s.sample_name,
s.project_name,
g.*
FROM a.samples s
JOIN a.analyses l
ON s.sample_name = l.sample_name
JOIN a.analysis_g g
ON s.sample_name = g.sample_name
WHERE s.project_name IN (SELECT p.project_name
FROM a.projects p
WHERE p.project_name_other
IN ('PROJ_1',
'PROJ_2'))
;
Then perhaps in the where clause? It's still really hard to understand what you want . . .
SELECT s.sample_name,
s.project_name,
g.*
FROM a.samples s
JOIN a.analyses l
ON s.sample_name = l.sample_name
JOIN a.analysis_g g
ON s.sample_name = g.sample_name
WHERE s.project_name IN (SELECT p.project_name
FROM a.projects p
WHERE p.project_name_other
IN ('PROJ_1',
'PROJ_2'))
and l.analysis_g IS NOT NULL
;
As a side note, I think you could join p.project_name and avoid the where clause. AND I think you might want some inner joins -- but I'm not sure.
SELECT s.sample_name,
s.project_name,
g.*
FROM a.samples s
JOIN a.analyses l ON s.sample_name = l.sample_name
JOIN a.analysis_g g ON s.sample_name = g.sample_name
JOIN a.projects p ON s.project_name = p.project_name
WHERE p.project_name_other IN ('PROJ_1', 'PROJ_2')
and l.analysis_g IS NOT NULL
Again: Please show an example! We can't help if we have to guess, but I'll give it a try...
If l.analysis_g contains an ID from table g, then you can just use:
SELECT * FROM g
JOIN l on g.id = l.analysis_g
WHERE blah, blah, blah...
I removed your WHERE clause because you haven't provided enough information to allow anyone to help optimize it (if needed).

speed up a not-exists query

I have an oracle query that works like this:
SELECT *
FROM VW_REQUIRED r
JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
Required is a view detailing all required training materials, Actual is a view detailing the most recent taking of the given courses. Each of these queries by themselves take under 2 seconds to generate between 10k and 100k rows.
What I want to do is something like:
SELECT *
FROM VW_REQUIRED r
WHERE NOT EXISTS ( SELECT 1 FROM VW_ACTUAL a WHERE a.person_id = r.person_id AND a.target_id = r.target_id)
and it takes more than 20 seconds (I didn't let it finish, because that is obviously too long.
So then I decided to do something different, I made the original JOIN a left join so it will show me all required training, and the actual training only if it exists.
That worked, and was super fast still.
But I want a list of just the courses where there is no actual training attached (i.e. the people we need to kick into gear and get their training...)
When I try something like
SELECT *
FROM VW_REQUIRED r
LEFT JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
WHERE r.person_id = null
I get no rows back. I'm not sure I can filter out the rows where I have no actual results. Normally I'd use WHERE NOT EXISTS but the performance on it was super slow (and I don't think I can put an index on a view...)
I've managed to make some changes that works, but it seems hacky and I'm sure there's a better solution.
SELECT
who, where_from, mand, target_id, grace_period, date_taken
FROM (
SELECT
r.person_id who,
r.where_from where_from,
r.mand mand,
r.target_id target_id
r.grace_period grace_period,
nvl(a.date_taken, to_date('1980/01/01','yyyy/mm/dd')) date_taken
FROM VW_REQUIRED r
LEFT JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
)
WHERE date_taken = to_date('1980/01/01','yyyy/mm/dd')
I think you only mixed the table names. Can you change the last where to
a.person_id is null
?
So your query should like this:
SELECT *
FROM VW_REQUIRED r
LEFT JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
WHERE a.person_id is null
Or maybe the "old" way?
SELECT *
FROM
VW_REQUIRED r,
VW_ACTUAL a
WHERE r.person_id = a.person_id(+)
AND r.target_id = a.target_id(+)
AND a.person_id is null
Maybe your problem came from a bad plan in the non exist query.
could you please show us the plan for this query?
SELECT *
FROM VW_REQUIRED r
WHERE NOT EXISTS ( SELECT 1 FROM VW_ACTUAL a WHERE a.person_id = r.person_id AND a.target_id = r.target_id)
And try to change the join alorithm ( /+use_nl(a)/ or /+use_hash(a)/).
I think it's a nested loop. Maybe you have to put a hash join in this example like this
SELECT *
FROM VW_REQUIRED r
WHERE NOT EXISTS ( SELECT /*+use_hash(a)*/ 1 FROM VW_ACTUAL a WHERE a.person_id = r.person_id AND a.target_id = r.target_id)

How to correctly use LEFT JOIN IS NULL or NOT EXISTS in this query?

After a few dozens of tries I still got wrong results, so I thought I'd better ask for help.
Tables:
labels
id, user_id, name
messages_labels
id, message_id, label_id
labels.id refers to message_labels.label_id
How to get the correct results unused labels given a message-id and a user_id? By unused labels I mean labels that do not have an entry in message_labels for a given message-id, basically that only select labels to add to the message that are not in use for this message yet.
This means something like...
SELECT l.id, l.name
FROM labels l
INNER/LEFT JOIN messages_labels ml ON (l.id=ml.label_id)
WHERE... user_id=:user_id ...
... and `message_id <> :message_id`
??
This should work: LEFT JOIN on the label_id and the message id, anything without an ML record is what you want
SELECT
l.id, l.name
FROM labels l
LEFT JOIN message_labels ml
ON l.id = ml.label_id
AND message_id = :message_id
WHERE l.user_id = :user_id
AND ml.id IS NULL
One method:
SELECT labels.*, count(messages_labels.id) AS mlid
FROM labels
JOIN messages_labels ON labels.id = messages_labels.label_id
WHERE (user_id = :user_id) AND (message_id = :message_id)
GROUP BY labels.id
HAVING (mlid = 0)
if I'm readin your question correctly.