Get related columns in group by SQL query - sql

I have a database with data that relate to countries, simplified it looks something like this:
ID | country_id | country_ISO | var_value
1 | 1 | FR | 10
2 | 2 | BE | 15
3 | 3 | NL | 20
4 | 1 | FR | 6
5 | 2 | BE | 8
6 | 2 | BE | 12
I would like to get the sum of the values of "var_value", but with that I want to have the country_id as well country_ISO.
I can do this:
SELECT
country_ISO,
SUM(var_value) AS sum_of_value
FROM
table_name
GROUP BY
country_ISO;
This query will give me the sum of var_value and the country ISO, but I also want to get the country_id. How do I subquery/self join to get extra columns that are related (in a unique way) to for example country_ISO?

Since country_iso depends on country_id anyway, just extend the GROUP BY by country_id.
SELECT country_id,
country_iso,
sum(var_value) sum_of_value
FROM table_name
GROUP BY country_id,
country_iso;

Just include that column also in GROUP BY clause :
SELECT country_id, country_ISO, SUM(var_value) AS sum_of_value
FROM table_name tn
GROUP BY country_id, country_ISO;

If contry_id, contry_ISO matched 1:1, then just group by with contry_id, contry_ISO.
SELECT
contry_id,
country_ISO,
SUM(var_value) AS sum_of_value
FROM
table_name
GROUP BY
contry_id,
country_ISO;

Related

select max group by without id for join

I have table like this :
id name value
1 roger 43
2 phil 12
3 zac 14
4 phil 42
5 maurice 450
...
and i'm trying to retrieve the max value for each name in order to do a join later.
I'm expecting the intermediate result to be something like this :
name value
roger 43
zac 14
phil 42
maurice 450
And this is easily achieved using and select name,max(value) from table group by name
My issue is that i NEED the id in order to later be able to do my join. but if i add my id to the aggregate/ group by it will mess up the result and will show all values since the id will be different.
So the true expect result is more like this :
id name value
1 roger 43
3 zac 14
4 phil 42
5 maurice 450
I have seen many question regarding similar issues but none where the id need to be retrieved but not included in the group by since i want uniqness for the name and only need the id for my join.
From what you describe, you just want the max of id:
select max(id) as id, name, max(value)
from table
group by name;
Or, what I think you want is the row with the max value:
select distinct on (name) t.*
from t
order by name, value desc;
With not exists:
select min(t.id) id, t.name, t.value
from tablename t
where not exists (
select 1 from tablename
where name = t.name and value > t.value
)
group by t.name, t.value
order by id
See the demo.
Results:
| id | name | value |
| --- | ------- | ----- |
| 1 | roger | 43 |
| 3 | zac | 14 |
| 4 | phil | 42 |
| 5 | maurice | 450 |
Try this:
WITH X (n, v) AS (
SELECT name, MAX(value) FROM tbl GROUP BY name
)
SELECT T.*
FROM tbl AS T INNER JOIN X ON T.name = X.n AND T.value = X.v

sql GROUP within group

Let's say I have this database:
ID | Name | City
1 | John | TLV
2 | Abe | JLM
3 | John | JLM
I want to know how many people with different names are in each city.
I tried to use GROUP BY like this:
SELECT `city`, count(`index`) as `num` FROM `people`
GROUP BY `city`, `name`
But this seems to group by both.
City | num
TLV | 1
JLM | 1
What I want to do is to group by city, and group the results by name.
City | num
TLV | 1
JLM | 2
How can I do this?
I think you want this:
SELECT `city`, count(distinct name) as `num`
FROM `people`
GROUP BY `city`;
You might want just count(name) . . . I'm not sure what you mean by "differently named". count(name) is preferable, if you don't need the distinct.

Nested Sql select statement

Can anyone tell me what is wrong with the following sql query ?
Select *,
(SELECT [DiseaseID], COUNT(*) AS [Rank] FROM [DiseaseSymptom] WHERE
([SymptomID] IN(1, 5)) GROUP BY [DiseaseID] ORDER BY [Rank] DESC)
FROM Disease WHERE GenderID in (1, 3)
I have 2 tables one contains disease and the gender it is associated with
Disease
+-----------+-------------------+----------+
| DiseaseID | DiseaseName | GenderID |
+-----------+-------------------+----------+
| 1 | Fever | 3 |
| 2 | Flu | 3 |
| 3 | Lady Disease | 2 |
| 4 | Gentlemen Disease | 1 |
+-----------+-------------------+----------+
Gender 1 = Male, 2 = Female, 3 = Common
And a Symptom Disease Matrix like this
DiseaseSymptom
+-----------+-----------+----------+
| DiseaseID | SymptomID | DissymID |
+-----------+-----------+----------+
| 1 | 1 | 1 |
| 1 | 2 | 3 |
| 1 | 4 | 4 |
| 2 | 1 | 5 |
| 2 | 3 | 9 |
| 2 | 4 | 6 |
| 2 | 5 | 7 |
+-----------+-----------+----------+
I get symptoms from user and match it in the DiseaseSymptom table and rank it according to the number of symptoms matched (inner sql statement)
In the outer statement I simply want get the result from inner statement and evaluate whether it belongs to specific gender. The error I get when I try to run the above query is
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
Subqueries in select clause must only generate a scalar value, not a resultset with multiple columns or rows. if you want both then put the subquery in the from clause (properly correlated), and refer to the two different vqlues in the select clause
Select d.*, z.DeseaseId, z.Rank
FROM Disease d
join (SELECT DiseaseID, COUNT(*) Rank
FROM DiseaseSymptom
WHERE SymptomID IN(1, 5)
GROUP BY DiseaseID) Z
On z.DeseaseId = d.DeseaseId
WHERE GenderID in (1, 3)
Order By z.Rank
You are using a subquery with group by. Your intention is to have a correlated subquery. The problem is that the subquery is returning more than one row. I think this is what you want:
Select d.*,
(SELECT COUNT(*) AS [Rank]
FROM [DiseaseSymptom] ds
WHERE [SymptomID] IN (1, 5)) AND ds.DiseaseId = d.DiseaseId
)
FROM Disease d
WHERE GenderID in (1, 3);
You should use Common Table Expression (cte) like this:
with cte as (SELECT [DiseaseID], GenderID, COUNT(*) AS [Rank] FROM [DiseaseSymptom] WHERE
([SymptomID] IN(1, 5)) GROUP BY [DiseaseID],GenderID ORDER BY [Rank] DESC)
select * FROM cte WHERE GenderID in (1, 3)
Hope this help ;)
There is really no need to have a nested query, just join and filter
SELECT d.DiseaseID, d.DiseaseName, d.GenderID
, Symptoms = Count(ds.SymptomID)
FROM Disease d
INNER JOIN DiseaseSymptom ds ON d.DiseaseID = ds.DiseaseID
WHERE ds.SymptomID IN (1, 5)
AND d.GenderID IN (1, 3)
GROUP BY d.DiseaseID, d.DiseaseName, d.GenderID
ORDER BY Count(SymptomID) Desc
SQLFiddle Demo

Select the most common item for each category

Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work

SQL - select distinct only on one column [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 9 years ago.
I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 2 | 3968 | Spain | Spanish |
| 3 | 3968 | USA | English |
| 4 | 1234 | Greece | Greek |
| 5 | 1234 | Italy | Italian |
I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 4 | 1234 | Greece | Greek |
How is this achievable?
A very typical approach to this type of problem is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by number order by id) as seqnum
from t
) t
where seqnum = 1;
This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.
Since you don't care, I chose the max ID for each number.
select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id
Query Explanation
select
tbl.* -- give me all the data from the base table (tbl)
from
tbl
inner join ( -- only return rows in tbl which match this subquery
select
max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
from
tbl
group by
NUMBER -- how to group rows for the MAX aggregation
) maxID
on maxID.maxID = tbl.id -- join condition ie only return rows in tbl
-- whose ID is also a MAX ID for a given NUMBER
You will use the following query:
SELECT * FROM [table] GROUP BY NUMBER;
Where [table] is the name of the table.
This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.