many-to-many query - sql

I have following database structure,
CREATE TABLE IF NOT EXISTS `analyze` (
`disease_id` int(11) NOT NULL,
`symptom_id` int(11) NOT NULL
) ;
CREATE TABLE IF NOT EXISTS `disease` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(10) NOT NULL,
PRIMARY KEY (`id`)
) ;
CREATE TABLE IF NOT EXISTS `symptom` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(4) NOT NULL,
PRIMARY KEY (`id`)
) ;
EDIT:
Sorry, I mean how do I identify the disease from inputted symptoms.
Example:
If I have symptom: fever and cough then I would have influenza.
If I have symptom: sore throat and fever then I would have throat infection.
The input are $symptom1, $symptom2, $symptom3, and so on.
Thank you.

SELECT disease_id
FROM analyze
GROUP BY disease_id
HAVING COUNT(symptom_id) > 1
Edit: to reply to the edited question
SELECT disease_id, COUNT(DISTINCT symptom_id)
FROM analyze
WHERE symptom_id IN ($symptom1, $symptom2, $symptom3)
GROUP BY disease_id
ORDER BY COUNT(DISTINCT symptom_id) DESC
Of course you'll have to replace $symptomX by their respective ID's.
This query lists the diseases which match at least one symptom - the diseases which match the most symptoms are on top.
If you added an unique constraint on symptom_id and disease_id in analyze, you could lose the DISTINCT:
SELECT disease_id, COUNT(symptom_id)
FROM analyze
WHERE symptom_id IN ($symptom1, $symptom2, $symptom3)
GROUP BY disease_id
ORDER BY COUNT(symptom_id) DESC

select d.id from disease d inner join analyze a
on d.id = a.disease_id
group by d.id having count(a.disease_id) > 1

select disease_id, count(*)
from analyze
where symptom_id in ($symptom1, $symptom2, $symptom3)
group by disease_id
order by 2 descending;
will return the matching disease ids in descending order of matching symptoms.

Related

Redshift create list and search different table with it

I think there a few ways to tackle this, but I'm not sure how to do any of them.
I have two tables, the first has ID's and Numbers. The ID's and numbers can potentially be listed more than once, so I create a result table that lists the unique numbers grouped by ID.
My second table has rows (100 million) with the ID and Numbers again. I need to search that table for any ID that has a Number not in the list of Numbers from the result table.
Can redshift do a query based on if the ID matches and the Number exists in the list from the table? Can this all be done in memory/one statement?
DROP TABLE IF EXISTS `myTable`;
CREATE TABLE `myTable` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1890),
("UHO21QQY3TW",4370),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1892),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
SELECT ID, listagg(distinct Numbers,',') as Number_List, count(Numbers) as Numbers_Count
FROM myTable
GROUP BY ID
AS result
DROP TABLE IF EXISTS `myTable2`;
CREATE TABLE `myTable2` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`ID` varchar(255),
`Numbers` mediumint default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;
INSERT INTO `myTable2` (`ID`,`Numbers`)
VALUES
("CRQ44MPX1SZ",1870),
("UHO21QQY3TW",4350),
("JTQ62CBP6ER",1825),
("RFD95MLC2MI",5014),
("URZ04HGG2YQ",2859),
("CRQ44MPX1SZ",1891),
("UHO21QQY3TW",4371),
("JTQ62CBP6ER",1826),
("RFD95MLC2MI",5015),
("URZ04HGG2YQ",2860),
("CRQ44MPX1SZ",1882),
("UHO21QQY3TW",4372),
("JTQ62CBP6ER",1827),
("RFD95MLC2MI",5016),
("URZ04HGG2YQ",2861);
Pseudo Code
Select ID, listagg(distinct Numbers) as Violation
Where Numbers IN NOT IN result.Numbers_List
or possibly: WHERE Numbers NOT LIKE '%' || result.Numbers_List|| '%'
Desired Output
(“CRQ44MPX1SZ”, ”1870,1882”)
(“UHO21QQY3TW”, ”4350”)
EDIT
Going the JOIN route, I am not getting the right results...but I'm pretty sure my WHERE implementation is wrong.
SELECT mytable1.ID, listagg(distinct mytable2.Numbers, ',') as unauth_list, count(mytable2.Numbers) as unauth_count
FROM mytable1
LEFT JOIN mytable2 on mytable1.id = mytable2.id
WHERE (mytable1.id = mytable2.id)
AND (mytable1.Numbers <> mytable2.Numbers)
GROUP BY mytable1.id
Expected output:
(“CRQ44MPX1SZ”, ”1870,1882”, 2)
(“UHO21QQY3TW”, ”4350”, 1)
Just left join the two tables on ID and numbers and check for (where clause) to see if the match wasn't found. Shouldn't be a need for listagg() and complex comparing. Or did I miss part of the question?

How to sort a table by the count of it column?

I have this table:
CREATE TABLE Publications (
publicationId INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (publicationId),
title VARCHAR(60) NOT NULL UNIQUE,
professorId INT NOT NULL,
autors INT NOT NULL,
magazine VARCHAR(60) NOT NULL,
post_date DATE NOT NULL,
FOREIGN KEY (professorId) REFERENCES Professors (professorId),
CONSTRAINT invalidPublication UNIQUE (professorId, magazine, post_date),
CONSTRAINT invalidAutors CHECK (autors >= 1 AND autors <= 10)
);
And I want to create a view that returns the professors sorted by the amount of publications they have done, so I have created this view:
CREATE OR REPLACE VIEW ViewTopAutors AS
SELECT professorId
FROM publications
WHERE autors < 5
ORDER by COUNT(professorId)
LIMIT 3;
I've populated the main table, but when I run the view it only returns one autor (the one with the highest Id)
¿How can I do it?
I think an aggregation is missing from your query:
CREATE OR REPLACE VIEW ViewTopAutors AS
SELECT professorId
FROM publications
WHERE autors < 5
GROUP BY professorId
ORDER BY COUNT(*)
LIMIT 3;
This would return the 3 professors with the fewest number of publications. To return professors with the 3 greatest, use a DESC sort in the ORDER BY step.

How to Make a mutually excusive select query in SQL?

I'm new to sql, and I need to write a query for a table that looks like this
CREATE TABLE TESTS
PATH_ID int PRIMARY KEY,
Day DATE NOT NULL,
Direction varchar(255) NOT NULL,
D_ID int NOT NULL,
FOREIGN KEY (D_ID) REFERENCES Drivers(D_ID),
);
INSERT INTO TESTS(PATH_ID,Day,Direction,D_ID)
VALUES (1,'2021-02-01' ,'Right',001),
(2,'2021-02-01' ,'Left',002),
(3,'2021-02-02','Right',002),
What I need to do is write a query that shows drivers (D_ID) who have ONLY ever gone Right (Direction), and show The D_ID, the Day, and all the times the driver went right.
One method is not exists:
select t.*
from tests t
where not exists (select 1
from tests t2
where t2.d_id = t.d_id and t2.direction <> 'Right'
);
you can use not in
select a.* from Tests a where D_ID not in (
select D_ID from Tests where direction <>'Right'
)

How to show which students are still in school using sql

This table shows the records of students entering and leaving the school. IN represents student entering school and OUT represents student leaving school. I wondering how to show which students are still in school.
I'm trying so much but still cannot figure it out, does anyone can help me, Thank you so much.
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
`id` int(11) NOT NULL auto_increment,
`time` varchar(128) default NULL,
`status` varchar(128) default NULL,
`stu_id` varchar(128) default NULL,
PRIMARY KEY (`id`)
)
INSERT INTO `student` (`id`, `time`, `status`, `stu_id`) VALUES
(1,'11AM','IN','1'),
(2,'11AM','IN','2'),
(3,'12AM','OUT','1'),
(4,'12AM','IN','3'),
(5,'1PM','OUT','3'),
(6,'2PM','IN','3'),
(11,'2PM','IN','4');
I expect the answer is 2, 3, 4
The number of students in the school is the sum of the ins minus the sum of the outs:
select sum(case when status = 'in' then 1
when status = 'out' then -1
else 0
end)
from student;
Basically to see the students who are in the school, you want the students whose last status is in. One way uses a correlated subquery:
select s.stu_id
from student s
where s.time = (select max(s2.time)
from student s2
where s2.stu_id = s.stu_id
) and
s.status = 'in';
If status is either only IN or OUT can't you do
SELECT * from student WHERE status="IN"
here's the query considering the auto increment id
select t2.* from
student t2
left join (select ROW_NUMBER() OVER(PARTITION by stu_id ORDER BY id desc) as row_num, id from student) t1 on t1.id = t2.id
where t1.row_num = 1 and [status] = 'IN'

whats wrong with this query?

I'm trying to write a query that selects from four tables
campaignSentParent csp
campaignSentEmail cse
campaignSentFax csf
campaignSentSms css
Each of the cse, csf, and css tables are linked to the csp table by csp.id = (cse/csf/css).parentId
The csp table has a column called campaignId,
What I want to do is end up with rows that look like:
| id | dateSent | emailsSent | faxsSent | smssSent |
| 1 | 2011-02-04 | 139 | 129 | 140 |
But instead I end up with a row that looks like:
| 1 | 2011-02-03 | 2510340 | 2510340 | 2510340 |
Here is the query I am trying
SELECT csp.id id, csp.dateSent dateSent,
COUNT(cse.parentId) emailsSent,
COUNT(csf.parentId) faxsSent,
COUNT(css.parentId) smsSent
FROM campaignSentParent csp,
campaignSentEmail cse,
campaignSentFax csf,
campaignSentSms css
WHERE csp.campaignId = 1
AND csf.parentId = csp.id
AND cse.parentId = csp.id
AND css.parentId = csp.id;
Adding GROUP BY did not help, so I am posting the create statements.
csp
CREATE TABLE `campaignsentparent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`campaignId` int(11) NOT NULL,
`dateSent` datetime NOT NULL,
`account` int(11) NOT NULL,
`status` varchar(15) NOT NULL DEFAULT 'Creating',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
cse/csf (same structure, different names)
CREATE TABLE `campaignsentemail` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parentId` int(11) NOT NULL,
`contactId` int(11) NOT NULL,
`content` text,
`subject` text,
`status` varchar(15) DEFAULT 'Pending',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=140 DEFAULT CHARSET=latin1
css
CREATE TABLE `campaignsentsms` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parentId` int(11) NOT NULL,
`contactId` int(11) NOT NULL,
`content` text,
`status` varchar(15) DEFAULT 'Pending',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=141 DEFAULT CHARSET=latin1
You need to aggregate the sums separately, not as shown in the question.
SELECT csp.id, csp.dateSent dateSent,
e.email_count, f.fax_count, s.sms_count
FROM campaignSentParent AS csp
JOIN (SELECT cse.ParentId, COUNT(*) AS email_count
FROM campaignSentEmail cse
GROUP BY cse.ParentID) AS e ON e.parentID = csp.id
JOIN (SELECT csf.ParentId, COUNT(*) AS fax_count
FROM campaignSentFax csf
GROUP BY csf.ParentID) AS f ON f.ParentID = csp.id
JOIN (SELECT css.ParentID, COUNT(*) AS sms_count
FROM campaignSentSms css
GROUP BY css.ParentId) AS s ON s.ParentID = csp.id
WHERE csp.campaignId = 1
To do this, you pretty much have to use the JOIN notation as shown.
You depending on the quality of your optimizer and the cardinalities of the various tables and the available indexes, you might find it effective to include a join with CampaignSentParent in each of the sub-queries with the csp.CampaignID = 1 condition, so as to limit the data aggregated by the sub-queries.
You might notice that the result count you get is 2510340. The prime factorization of 2510340 is 2 × 2 × 3 × 5 × 7 × 43 × 139, and your expected answer is 139, 129, and 140. You can get 3 × 43 = 129; 2 × 2 × 5 × 7 = 140; and 139 = 139. In other words, the original query is generating the Cartesian product of all the rows in the three dependent tables and counting the product, rather than counting the relevant rows from each dependent table separately.
You're missing a GROUP BY statement at the end. I can't tell from your example what you want them to be grouped by to actually give you the code.
Add GROUP BY dateSent to the end of your query.
Try adding a group by clause.
SELECT csp.id id, csp.dateSent dateSent,
COUNT('cse.parentId') emailsSent,
COUNT('csf.parentId') faxsSent,
COUNT('css.parentId') smsSent
FROM campaignSentParent csp,
campaignSentEmail cse,
campaignSentFax csf,
campaignSentSms css
WHERE csp.campaignId = 1
AND csf.parentId = csp.id
AND cse.parentId = csp.id
AND css.parentId = csp.id
GROUP BY csp.id, csp.dateSent
When you use an aggregate function, you normally need to include a group by.