Nested Sql select statement - sql

Can anyone tell me what is wrong with the following sql query ?
Select *,
(SELECT [DiseaseID], COUNT(*) AS [Rank] FROM [DiseaseSymptom] WHERE
([SymptomID] IN(1, 5)) GROUP BY [DiseaseID] ORDER BY [Rank] DESC)
FROM Disease WHERE GenderID in (1, 3)
I have 2 tables one contains disease and the gender it is associated with
Disease
+-----------+-------------------+----------+
| DiseaseID | DiseaseName | GenderID |
+-----------+-------------------+----------+
| 1 | Fever | 3 |
| 2 | Flu | 3 |
| 3 | Lady Disease | 2 |
| 4 | Gentlemen Disease | 1 |
+-----------+-------------------+----------+
Gender 1 = Male, 2 = Female, 3 = Common
And a Symptom Disease Matrix like this
DiseaseSymptom
+-----------+-----------+----------+
| DiseaseID | SymptomID | DissymID |
+-----------+-----------+----------+
| 1 | 1 | 1 |
| 1 | 2 | 3 |
| 1 | 4 | 4 |
| 2 | 1 | 5 |
| 2 | 3 | 9 |
| 2 | 4 | 6 |
| 2 | 5 | 7 |
+-----------+-----------+----------+
I get symptoms from user and match it in the DiseaseSymptom table and rank it according to the number of symptoms matched (inner sql statement)
In the outer statement I simply want get the result from inner statement and evaluate whether it belongs to specific gender. The error I get when I try to run the above query is
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

Subqueries in select clause must only generate a scalar value, not a resultset with multiple columns or rows. if you want both then put the subquery in the from clause (properly correlated), and refer to the two different vqlues in the select clause
Select d.*, z.DeseaseId, z.Rank
FROM Disease d
join (SELECT DiseaseID, COUNT(*) Rank
FROM DiseaseSymptom
WHERE SymptomID IN(1, 5)
GROUP BY DiseaseID) Z
On z.DeseaseId = d.DeseaseId
WHERE GenderID in (1, 3)
Order By z.Rank

You are using a subquery with group by. Your intention is to have a correlated subquery. The problem is that the subquery is returning more than one row. I think this is what you want:
Select d.*,
(SELECT COUNT(*) AS [Rank]
FROM [DiseaseSymptom] ds
WHERE [SymptomID] IN (1, 5)) AND ds.DiseaseId = d.DiseaseId
)
FROM Disease d
WHERE GenderID in (1, 3);

You should use Common Table Expression (cte) like this:
with cte as (SELECT [DiseaseID], GenderID, COUNT(*) AS [Rank] FROM [DiseaseSymptom] WHERE
([SymptomID] IN(1, 5)) GROUP BY [DiseaseID],GenderID ORDER BY [Rank] DESC)
select * FROM cte WHERE GenderID in (1, 3)
Hope this help ;)

There is really no need to have a nested query, just join and filter
SELECT d.DiseaseID, d.DiseaseName, d.GenderID
, Symptoms = Count(ds.SymptomID)
FROM Disease d
INNER JOIN DiseaseSymptom ds ON d.DiseaseID = ds.DiseaseID
WHERE ds.SymptomID IN (1, 5)
AND d.GenderID IN (1, 3)
GROUP BY d.DiseaseID, d.DiseaseName, d.GenderID
ORDER BY Count(SymptomID) Desc
SQLFiddle Demo

Related

Unique query give me duplicate rows

I have a database in SQL Server and one table which I have to use to display unique values base on column_one without using distinct so I came up with solution:
select p.id, p.one, two, w.five, p.eight
from table_one p with (nolock)
join table_two w with (nolock) on w.one = p.one
where
w.eight between convert(date, '10/05/2020', 103) and dateadd(d, 7, convert(date, '10/05/2020', 103)) and
p.twelve = 2
and p.id in (SELECT max(id) FROM table_one a with(nolock) GROUP BY two)
order by p.id desc
and I should get two rows, but I have 3, second row is duplicated? Why is that? I was trying to googling some examples and I found my solution as one of them. So what is wrong with it? Any suggestion will be helpful.
PS. I can confirm that subquery select max(id)... give me unique values.
EDITED
Sorry for missing that example earlier.
I hope is more clear now, what I want achieve.
table_one
id | one | two | eight| twelve
-------------------------------------
1 | value_1 | r1c2 | r1c8 | 2
2 | value_1 | r2c2 | r2c8 | 2
3 | value_2 | r3c2 | r3c8 | 2
4 | value_2 | r4c2 | r4c8 | 2
table_two
id | one | five | eight
---------------------------------
1 | value_1 | r1c5 | 22/03/2020
2 | value_1 | r2c5 | 24/03/2020
3 | value_2 | r3c5 | 24/03/2020
4 | value_2 | r4c5 | 25/04/2020
result expected:
id | one | two | eight
-----------------------------------
2 | value_1 | r2c2 | 24/03/2020
4 | value_2 | r4c2 | 25/04/2020
I think I figured it out, but please correct me if I am wrong, is that because I am JOINing table on column one which is not unique?
It's difficult without sample data and expected output, but I think that the following approach using ROW_NUMBER() is a possible option. You need to use the correct columns in the PARTITION BY and ORDER BY clauses:
SELECT *
FROM (
select
p.id, p.one, p.two, w.five, p.eight,
ROW_NUMBER() OVER (PARTITION BY p.two ORDER BY p.id DESC) AS rn
from table_one p with (nolock)
join table_two w with (nolock) on w.one= p.one
where
w.eight between convert(date, '10/05/2020', 103) and dateadd(d, 7, convert(date, '10/05/2020', 103)) and
p.two = 2
) t
WHERE t.rn = 1
ORDER by t.id DESC
That's true because when you join two tables on one column and it has duplicates values in that field, you get duplicate rows in your results. for your task, you can use window functions like this:
SELECT *
FROM (
select
p.*,ROW_NUMBER() OVER (PARTITION BY w.one ORDER BY w.eight DESC) AS rn
from table_one p
join table_two w on w.one= p.one
) t
WHERE t.rn = 1
ORDER by t.id asc

Multiple select from CTE with different number of rows in a StoredProcedure

How to do two select with joins from the cte's which returns total number of columns in the two selects?
I tried doing union but that appends to the same list and there is no way to differentiate for further use.
WITH campus AS
(SELECT DISTINCT CampusName, DistrictName
FROM dbo.file
),creditAcceptance AS
(SELECT CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal, COUNT(id) AS N
FROM dbo.file
WHERE (EligibilityStatusFinal LIKE 'Eligible%') AND (CollegeCreditEarnedFinal = 'Yes') AND (CollegeCreditAcceptedFinal = 'Yes')
GROUP BY CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal
),eligibility AS
(SELECT CampusName, EligibilityStatusFinal, COUNT(id) AS N, CollegeCreditAcceptedFinal
FROM dbo.file
WHERE (EligibilityStatusFinal LIKE 'Eligible%')
GROUP BY CampusName, EligibilityStatusFinal, CollegeCreditAcceptedFinal
)
SELECT a.CampusName, c.[EligibilityStatusFinal], SUM(c.N) AS creditacceptCount
FROM campus as a FULL OUTER JOIN creditAcceptance as c ON a.CampusName=c.CampusName
WHERE (a.DistrictName = 'xy')
group by a.CampusName ,c.EligibilityStatusFinal
Union ALL
SELECT a.CampusName , b.[EligibilityStatusFinal], SUM(b.N) AS eligible
From Campus as a FULL OUTER JOIN eligibility as b ON a.CampusName = b.CampusName
WHERE (a.DistrictName = 'xy')
group by a.CampusName,b.EligibilityStatusFinal
Expected output:
+------------+------------------------+--------------------+
| CampusName | EligibilityStatusFinal | creditacceptCount |
+------------+------------------------+--------------------+
| M | G | 1 |
| E | NULL | NULL |
| A | G | 4 |
| B | G | 8 |
+------------+------------------------+--------------------+
+------------+------------------------+----------+
| CampusName | EligibilityStatusFinal | eligible |
+------------+------------------------+----------+
| A | G | 8 |
| C | G | 9 |
| A | T | 9 |
+------------+------------------------+----------+
As you can see here CTEs can be used in a single statement only, so you can't get the expected output with CTEs.
Here is an excerpt from Microsoft docs:
A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE
statement that references some or all the CTE columns. A CTE can also
be specified in a CREATE VIEW statement as part of the defining SELECT
statement of the view.
You can use table variables (declare #campus table(...)) or temp tables (create table #campus (...)) instead.

Select IDs from multiple rows where column values satisfy one condition but not another

Hello I have the following problem.
I have a table like the one in this sql fiddle
This table defines a relationship and it contains IDs from two other tables
example values
| FirstID | SecondID |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
I want to select all the FirstIDs that satisfy the following criteria.
Their corresponding SecondIDs are in the range 1-3 AND NOT in the range 4-5
For example in this case we would want FirstIDs 1 and 3.
I have tried the following queries
SELECT FirstID from table
WHERE SecondID IN (1,2,3) AND SecondID NOT IN (4,5)
SELECT FirstID,SecondID
FROM(
SELECT FirstID, SecondID
FROM table
WHERE SecondID in (1,2,3,4,5) )
WHERE SecondID NOT IN (4,5)
but I don't get the correct results I am aiming for.
What is the correct query to get the data I want?
SELECT FirstID
FROM table
WHERE SecondId in (1,2,3) --Included values
AND FirstID NOT IN (SELECT FirstID FROM test
WHERE SecondId IN (4,5)) --Excluded values
How about min() and max():
select firstid
from t
group by firstid
having min(secondId) between 1 and 3 and
max(secondid) between 1 and 3;
Assuming 1 is the minimum, then this can be simplified to:
having max(secondid) <= 3;
For arbitrary ranges, you can use sum(case):
having sum(case when secondId between 1 and 3 then 1 else 0 end) > 0 and
sum(case when secondId between 4 and 5 then 1 else 0 end) = 0;
I think Gonzalo Lorieto proably has the best answer to this question already, but depending on the size of your data, SELECT statements in a WHERE clause can get really slow, and the below might be significantly faster (although it's not clear it's worth it for the reduced readability...)
SELECT inrange.FirstId FROM
t inrange
LEFT OUTER JOIN
(SELECT FirstID FROM t
WHERE SEcondId IN (4,5)) outrange
ON inrange.firstID = outrange.firstId
WHERE SecondID IN (1,2,3)
AND outrange.firstId IS NULL
GROUP BY inrange.FirstId
You will want to use the EXISTS clause to exclude the FirstIDs that have an invalid SecondID. here is an example:
SELECT FirstID from test Has123
WHERE SecondID IN (1,2,3)
AND NOT EXISTS (
SELECT 1 FROM test Not45
WHERE Has123.FirstID = Not45.FirstID
AND Not45.SecondID IN (4,5)
)
GROUP BY FirstID
SqlFiddle

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

SQL Select a group when attributes match at least a list of values

Given a table with a (non-distinct) identifier and a value:
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
How can you select the grouped identifiers, which have values for a given list? (e.g. ('B', 'C'))
This list might also be the result of another query (like SELECT Value from Table1 WHERE ID = '2' to find all IDs which have a superset of values, compared to ID=2 (only ID=1 in this example))
Result
| ID |
|----|
| 1 |
| 2 |
1 and 2 are part of the result, as they have both A and B in their Value-column. 3 is not included, as it is missing C
Thanks to the answer from this question: SQL Select only rows where exact multiple relationships exist I created a query which works for a fixed list. However I need to be able to use the results of another query without changing the query. (And also requires the Access-specific IFF function):
SELECT ID FROM Table1
GROUP BY ID
HAVING SUM(Value NOT IN ('A', 'B')) = 0
AND SUM(IIF(Value='A', 1, 0)) = 1
AND SUM(IIF(Value='B', 1, 0)) = 1
In case it matters: The SQL is run on a Excel-table via VBA and ADODB.
In the where criteria filter on the list of values you would like to see, group by id and in the having clause filter on those ids which have 3 matching rows.
select id from table1
where value in ('A', 'B', 'C') --you can use a result of another query here
group by id
having count(*)=3
If you can have the same id - value pair more than once, then you need to slightly alter the having clause: having count(distinct value)=3
If you want to make it completely dynamic based on a subquery, then:
select id, min(valcount) as minvalcount from table1
cross join (select count(*) as valcount from table1 where id=2) as t1
where value in (select value from table1 where id=2) --you can use a result of another query here
group by id
having count(*)=minvalcount