SQL: How many rows have the largest value for a column - sql

I am sure this is a very simple answer, though I have not turned anything up. Most because I am sure I am phrasing the question wrong.
Anyway, lets say I have this very simple table:
Table: election_candidates
id | candidate_id | election_id | votes
---------------------------------------
1 | 2 | 1 | 3
2 | 5 | 1 | 3
3 | 3 | 1 | 2
I need to know if two candidates are tied. So if there is more than one candidate with the most amount of votes for an election.
I know I can use MAX function to get the largest value for an election, but is their an easy query to get how many candidates have the MAX for a given election?
I'm using PHP and the Codeigniter framework, though just a general example of a query that could work is just fine.

Most databases support ANSI-standard window functions. One way to do this is using rank():
select ec.election_id, count(*) as NumTies
from (select ec.*, rank(votes) over (partition by election_id order by votes desc) as seqnum
from election_candidates ec
) ec
where seqnum = 1
group by ec.election_id;

Couldn't you just do something like:
select e.*
from election_candidates e
inner join (
select election_id, max(votes) as maxVotes,
from election_candidates
group by election_id
) maxVotesPerElectionId on e.election_Id = maxVotesPerElectionId.election_id
and e.votes = maxVotesPerElectionId.maxVotes
this should get you the candiates (per election) with the max votes.

Just the winner:
SELECT *
from election_candidates
ORDER BY votes DESC
LIMIT 0,1

This will group all elections together, using rank() sort each election by votes cast and list in the order of placement.
All candidates are listed and displayed on how they did in each election.
DECLARE #T AS TABLE (id INT,candidate_id INT,election_id INT,votes INT)
INSERT INTO #T VALUES
(1 ,2,1,3),(2 ,5,1,3),(3 ,3,1,2),(4 ,2,2,3),(5 ,5,3,1),(6 ,6,1,4),(7 ,2,3,3),(8 ,1,4,3),
(9 ,1,5,2),(10,4,5,3),(11,5,5,3),(12,6,5,4)
SELECT
election_id,
votes,
RANK() OVER (PARTITION BY election_id ORDER BY votes) AS RANKING,
candidate_id
FROM #T
ORDER BY election_id,
RANK() OVER (PARTITION BY election_id ORDER BY votes)

Related

SQL get all columns from max aggregation [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 5 months ago.
I have a table like This:
ID (Not PK)
time_to_prioritize
extra_info_1
extra_info_2
001
0
info_1
info_1
001
1
info_1
info_1
001
2
info_1_last
info_1_last
002
1
info_2
info_2
002
2
info_2_last
info_2_last
003
0
info_3_last
info_3_last
My objective is to get the max(time_to_prioritize) of all distinct ID's along with the extra columns, like this:
ID (Not PK)
time_to_prioritize
extra_info_1
extra_info_2
001
2
info_1_last
info_1_last
002
2
info_2_last
info_2_last
003
0
info_3_last
info_3_last
I got stuck at
SELECT TOP 1 * FROM my_table
ORDER BY time_to_prioritize DESC
I am trying to join it with itself, but with no results.
What is the next step to achieve the result ?
thanks.
P.S. the result on SQL MAX of multiple columns?
does not help me, bc that link is the max of every column, I need the max of only 1 column, along with the rest of the data
You may use ROW_NUMBER function as the following:
SELECT T.ID, T.time_to_prioritize, T.extra_info_1, T.extra_info_2
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY time_to_prioritize DESC) rn
FROM my_table
) T
WHERE T.rn=1
ORDER BY T.ID
See a demo.
Your approach with TOP 1 can be employed, though it requires some fixes.
If you want to get the first row with respect the ordering, you use TOP 1, though if you want to get the first row for each ID, then you require the clause TOP 1 WITH TIES, where the tie should happen to the ordering value. If you want to make your three "interesting rows" to be tied in the ordering, you should use the ROW_NUMBER window function inside the ORDER BY clause as follows:
SELECT TOP 1 WITH TIES *
FROM my_table
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY time_to_prioritize DESC)
Check the demo here.
Try this:
SELECT TOP 1 MAX(time_to_prioritize) AS MAXtime_to_prioritize ,* FROM my_table
GROUP BY time_to_prioritize ,extra_info_1, extra_info_2
ORDER BY time_to_prioritize DESC

How can I select the top n records per category without any duplicates in SQL?

I am trying to select a set of recipes in a database by category. The criteria is that I need n number of recipes per category with no repeats. So, given a dataset recipes:
id | category
---|---------
1 | dairy
1 | eggs
1 | vegetarian
2 | dairy
2 | dessert
3 | thanksgiving
...
Is it possible to perform a select in such a way that my resulting dataset looks like this, where n=1?
id | category
----|----------
1 | dairy
2 | dessert
3 | thanksgiving
I happen to be using Presto to query this dataset, and there are about 30 categories total. I originally thought that maybe I could do some nested UNION statements, but a) that would be tedious for the number of categories I have and b) I don't think it will work since each UNION is kind of its own thing and has no knowledge of the past. I also considered using
select id from (
select id, category, row_number() over (partition by category order by id)
from recipes)
where row_num < 2
which would allow me to set how many ids want back from each category, but doesn't deal with the removing duplicates.
Ultimately I have a feeling this isn't possible in SQL, and that I should move it into Python or something, but if it's possible I'm very interested to see it in action!
You are close. use partition by id instead:
select id, category
from (select id, category,
row_number() over (partition by id order by id) as seqnum
from recipes
)
where seqnum = 1;
The order by only makes a difference if you want to determine which row you want -- the first category alphabetically for instance.
As a note: If you wanted one id per category, then I might suggest aggregation:
select category, min(id)
from t
group by category;

Performance issue on selecting n newest rows in subselect

I have a database with courses. Each course contains a set of nodes, and some nodes contains a set of answers from students. The Answer table looks (simplified) like this:
Answer
id | courseId | nodeId | answer
------------------------------------------------
1 | 1 | 1 | <- text ->
2 | 2 | 2 | <- text ->
3 | 1 | 1 | <- text ->
4 | 1 | 3 | <- text ->
5 | 2 | 2 | <- text ->
.. | .. | .. | ..
When a teacher opens a course (i.e. courseId = 1) I want to pick the node that have received the most answers lately. I can do this using the following query:
with Answers as
(
select top 50 id, nodeId from Answer A where courseId=1 order by id desc
)
select top 1 nodeId from Answers group by nodeId order by count(id) desc
or equally using this query:
select top 1 nodeId from
(select top 50 id, nodeId from Answer A where courseId=1 order by id desc)
group by nodeId order by count(id) desc
In both querys the newest 50 answers (with the highest ids) are selected and then grouped by nodeId so I can pick the one with the highest frequency. My problem is, however, that the query is very slow. If I only run the subselect, it takes less than a second, and grouping 50 rows should be fast, but when I run the entire query it takes about 10 seconds! My guess is that sql server does the select and grouping first, and afterwards does the top 50 and top 1, which in this case leads to terrible performance.
So, how can I rewrite the query to be efficient?
You can add indexes to make your queries more efficient. For this query:
with Answers as (
select top 50 id, nodeId
from Answer A
where courseId = 1
order by id desc
)
select top 1 nodeId
from Answers
group by nodeId
order by count(id) desc;
The best index is Answer(courseId, id, nodeid).
To be more insightful we'd need to see the indexes on that table and the execution plans you're getting (one plan for the inner query on it's own, one plan for the full query).
I'd even recommend doing the same analysis again having added the index mentioned elsewhere on this page.
Without that information the only things we can recommend are trial and error.
For example, try avoiding using TOP (this shouldn't matter, but we're guessing while we can't see your indexes and execution plans)
WITH
Answers AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY id DESC) AS rowId,
id,
nodeId
FROM
Answer
WHERE
courseId = 1
),
top50 AS
(
SELECT
nodeId,
COUNT(*) AS row_count
FROM
Answers
WHERE
rowId <= 50
GROUP BY
nodeId
),
ranked AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY row_count DESC, nodeId DESC) AS ordinal,
nodeID
FROM
top50
)
SELECT
nodeID
FROM
ranked
WHERE
oridinal = 1
Which is massively over the top, but functionally the same as you have in your OP, but sufficiently different to potentially get a different execution plan.
Alternatively (and not very nice), just put the results of your inner query in to a table variable, then run the outer query on the table variable.
I still expect, however, that adding the index will be the least-worst option.

Oracle - With a one to many relationship, select distinct rows based on a min value

This question is the same as In one to many relationship, return distinct rows based on MIN value with the exception that I'd like to see what the answer looks like in other dialects, particularly in Oracle.
Reposting from the original description:
Let's say a patient makes many visits. I want to write a query that returns distinct patient rows based on their earliest visit. For example, consider the following rows.
patients
-------------
id name
1 Bob
2 Jim
3 Mary
visits
-------------
id patient_id visit_date reference_number
1 1 6/29/14 09f3be26
2 1 7/8/14 34c23a9e
3 2 7/10/14 448dd90a
What I want to see returned by the query is:
id name first_visit_date reference_number
1 Bob 6/29/14 09f3be26
2 Jim 7/10/14 448dd90a
In the other question, using postgresql, the best solution seemed to be to use distinct on, but that is not available in other dialects.
Typically, one uses row_number():
select id, name, visit_date as first_visit_date, reference_number
from (select v.id, p.name, v.visit_date, v.reference_number,
row_number() over (partition by p.id order by v.visit_date desc) as seqnum
from visits v join
patients p
on v.patient_id p.id
) t
where seqnum = 1;

Summing and ordering at once

I have a table of orders. There I need to find out which 3 partner_id's have made the largest sum of amount_totals, and sort those 3 from biggest to smallest.
testdb=# SELECT amount_total, partner_id FROM sale_order;
amount_total | partner_id
--------------+------------
1244.00 | 9
3065.90 | 12
3600.00 | 3
2263.00 | 25
3000.00 | 10
3263.00 | 3
123.00 | 25
5400.00 | 12
(8 rows)
Just starting SQL, I find it confusing ...
Aggregated amounts
If you want to list aggregated amounts, it can be as simple as:
SELECT partner_id, sum(amount_total) AS amout_suptertotal
FROM sale_order
GROUP BY 1
ORDER BY 2 DESC
LIMIT 3;
The 1 in GROUP BY 1 is a numerical parameter, that refers to the position in the SELECT list. Just a notational shortcut for GROUP BY partner_id in this case.
This ignores the special case where more than three partner would qualify and picks 3 arbitrarily (for lack of definition).
Individual amounts
SELECT partner_id, amount_total
FROM sale_order
JOIN (
SELECT partner_id, rank() OVER (ORDER BY sum(amount) DESC) As rnk
FROM sale_order
GROUP BY 1
ORDER BY 2
LIMIT 3
) top3 USING (partner_id)
ORDER BY top3.rnk;
This one, on the other hand includes all peers if more than 3 partner qualify for the top 3. The window function rank() gives you that.
The technique here is to group by partner_id in the subquery top3 and have the window function rank() attach ranks after the aggregation (window functions execute after aggregate functions). ORDER BY is applied after window functions and LIMIT is applied last. All in one subquery.
Then I join the base table to this subquery, so that only the top dogs remain in the result and order by rnk.
Window functions require PostgreSQL 8.4 or later.
This is rather advanced stuff. You should start learning SQL with something simpler probably.
select amount_total, partner_id
from (
select
sum(amount_total) amount_total,
partner_id
from sale_order
group by partner_id
) s
order by amount_total desc
limit 3