MySQL's alternative to T-SQL's WITH TIES - sql

I have a table from which I want to get the top N records. The records are ordered by values and some records have the same values. What I'd like to do here is to get a list of top N records, including the tied ones. This is what's in the table:
+-------+--------+
| Name | Value |
+-------+--------+
| A | 10 |
| B | 30 |
| C | 40 |
| D | 40 |
| E | 20 |
| F | 50 |
+-------+--------+
Now if I want to get the top 3 like so
SELECT * FROM table ORDER BY Value DESC LIMIT 3
I get this:
+-------+--------+
| Name | Value |
+-------+--------+
| F | 50 |
| C | 40 |
| D | 40 |
+-------+--------+
What I would like to get is this
+-------+--------+
| Name | Value |
+-------+--------+
| F | 50 |
| C | 40 |
| D | 40 |
| B | 30 |
+-------+--------+
I calculate the rank of each record so what I would really like is to get the first N ranked records instead of the first N records ordered by value. This is how I calculate the rank:
SELECT Value AS Val, (SELECT COUNT(DISTINCT(Value))+1 FROM table WHERE Value > Val) as Rank
In T-SQL something like this is achievable by doing this:
SELECT TOP 3 FROM table ORDER BY Value WITH TIES
Does anyone have an idea how to do this in MySQL? I understand it could be done with subqueries or temporary tables but I don't have enough knowledge to accomplish this. I'd prefer a solution without using temporary tables.

Does this work for you?
select Name, Value from table where Value in (
select distinct Value from table order by Value desc limit 3
) order by Value desc
Or perhaps:
select a.Name, a.Value
from table a
join (select distinct Value from table order by Value desc limit 3) b
on a.Value = b.Value

select a.Name, a.Value
from table a
join (select Value from table order by Value desc limit 3) b
on a.Value = b.Value
This is like #Fosco's answer, but without DISTINCT in the subquery. His version returns the players with the top N scores, not the top N players (plus ties). E.g. if the scores are 50, 50, 50, 40, 40, 30, 20, he'll return 6 players (3x50, 2x40, 1x30), but you presumably just want 3x50.

Starting with MySQL 8, you can use window functions to emulate the WITH TIES semantics, by filtering on RANK(). For example:
SELECT Name, Value
FROM (
SELECT Name, Value, RANK() OVER (ORDER BY Value DESC) AS rk
FROM table
) t
WHERE rk <= 3
Note that when reading your question more closely, this doesn't do exactly what you seem to want, but it does exactly what T-SQL can do through the TOP n WITH TIES clause.

Related

how to sum multiple rows with same id in SQL Server

Lets say I have following table:
id | name | no
--------------
1 | A | 10
1 | A | 20
1 | A | 40
2 | B | 20
2 | B | 20
And I want to perform a select query in SQL server which sums the value of "no" field which have same id.
Result should look like this,
id | name | no
--------------
1 | A | 70
2 | B | 40
Simple GROUP BY and SUM should work.
SELECT ID, NAME, SUM([NO])
FROM Your_TableName
GROUP BY ID, NAME;
Use SUM and GROUP BY
SELECT ID,NAME, SUM(NO) AS TOTAL_NO FROM TBL_NAME GROUP BY ID, NAME
SELECT *, SUM(no) AS no From TABLE_NAME GROUP BY name
This will return the same table by summing up the no column of the same name column.

Filter query with a GROUP BY based on column not in GROUP BY statement

Given the following table structure and sample data:
+-------------+------+-------------+
| EmployeeID | Name | WorkWeek |
+--------------+-------+-----------+
| 1 | A | 1 |
| 2 | B | 1 |
| 2 | B | 2 |
| 3 | C | 1 |
| 3 | C | 2 |
| 4 | D | 2 |
+--------------+-------+-----------+
I am looking to select all employees that only worked week 1 (so in this example, only employeeid = 1 would be returned. I am able to get the data with the following query:
SELECT EmployeeId, Name
FROM SomeTable
GROUP BY EmployeeId, Name
HAVING SUM ( WorkWeek ) = 1;
To me, the HAVING SUM( WorkWeek ) = 1 is a hack and this should be handled with some form of a GROUP BY and COUNT but I cannot wrap my head around how that query would be structured.
Any help would be useful and enlightening.
HAVING SUM( WorkWeek ) = 1 may work for week 1 or 2, but will fail for week 3 (since 1+2 = 3).
Use NOT EXISTS operator with a subquery instead:
SELECT EmployeeId, Name
FROM SomeTable t1
WHERE NOT EXISTS (
SELECT * FROM SomeTable t2
WHERE t1.EmployeeId = t2.EmployeeId
AND t2.WorkWeek <> 1
)
Actually, that's exactly why the having clause is for - to filter records according to the aggregated values.
From w3schools sql tutorial:
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.

Hive query selecting top 2 rows by percentage count and display as columns

I have a table something like below in my hadoop cluster
ID | CATEGORY | COUNT
101 | A | 40
101 | B | 40
101 | C | 20
102 | D | 10
102 | A | 20
102 | E | 30
102 | F | 40
I have to write a Hive query which will show IDs and top 2 categories by percentage count as columns. So my result table should look like
ID | CAT1 | % | CAT2 | %
101 | A | 40 | B | 40
102 | F | 40 | E | 30
Please keep in my mind that this is only a sample table which I have kept very simple for explaining purpose.
To get the top 2 per ID, you can use the rank() function, see example here.
To get percentage out of overall, you can join on ID with an aggregate table:
select ID,sum(count) as sum from input_table group by ID
And finally, if you want to turn the table from ID, Cat, % to one ID per row, you would need to use collect_list for Cat and % in a sub query and then create a column for the array elements
Select ID, categories[0], pcts[0],categories[1], pcts[1] from (
Select a.ID, collect_list(Cat) as categories , collect_list(Count/sum) as pcts from (
Select ID, Cat, Count, rank from (
SELECT ID, Cat, Count,
rank() over (PARTITION BY ID ORDER BY Count DESC) as rank
FROM input_table) inner where rank <= 2 ) a,
(select ID,sum(count) as sum from input_table group by ID) b where a.ID = b.ID
group by a.ID ) inner;

SQL Change Rank based on any value in group of values

I'm not looking for the answer as much as what to search for as I think this is possible. I have a query where the result can be as such:
| ID | CODE | RANK |
I want to base rank off of the code so my I get these results
| 1 | A | 1 |
| 1 | B | 1 |
| 2 | A | 1 |
| 2 | C | 1 |
| 3 | B | 2 |
| 3 | C | 2 |
| 4 | C | 3 |
Basically, based on the group of IDs, if any of the CODEs = a certain value I want to adjust the rank so then I can order by rank first and then other columns. Never sure how to phrase things in SQL.
I tried
CASE WHEN CODE = 'A' THEN 1 WHEN CODE = 'B' THEN 2 ELSE 3 END rank
ORDER BY rank DESC
But I want to keep the ids together, I don't want them broken apart, I was thinking of doing all ranks the same based on the highest if I can't solve it another way?
Thoughts of a SQL function to look at?
You could use the MIN() OVER() analytic function to get the minimum rank value per group, and just order by that;
WITH cte AS (
SELECT id, code,
MIN(CASE WHEN code='A' THEN 1 WHEN code='B' THEN 2 ELSE 3 END)
OVER (PARTITION BY id) rank
FROM mytable
)
SELECT * FROM cte
ORDER BY rank, id, code
An SQLfiddle to test with.

select top 1 with max 2 fields

I have this table :
+------+-------+------------------------------------+
| id | rev | class |
+------+-------+------------------------------------+
| 1 | 10 | 2 |
| 1 | 10 | 5 |
| 2 | 40 | 6 |
| 2 | 50 | 6 |
| 2 | 52 | 1 |
| 3 | 33 | 3 |
| 3 | 63 | 5 |
+------+-------+------------------------------------+
I only need the rows where rev AND then class columns have max value.
+------+-------+------------------------------------+
| id | rev | class |
+------+-------+------------------------------------+
| 1 | 10 | 5 |
| 2 | 52 | 1 |
| 3 | 63 | 5 |
+------+-------+------------------------------------+
Query cost is important for me.
Just the rows that satisfy the condition that it has both max values?
Here's an SQL Fiddle;
SELECT h.id, h.rev, h.class
FROM ( SELECT id,
MAX( rev ) rev,
MAX( class ) class
FROM Herp
GROUP BY id ) derp
INNER JOIN Herp h
ON h.rev = derp.rev
AND h.class = derp.class;
The fastest way might be to have an index on t(id, rev) and t(id, class) and then do:
select t.*
from table t
where not exists (select 1
from table t2
where t2.id = t.id and t2.rev > t.rev
) and
not exists (select 1
from table t2
where t2.id = t.id and t2.class > t.class
);
SQL Server is pretty smart in terms of optimization, so the aggregation approach might be just as good. However, in terms of performance, this is just a bunch of index lookups.
Here is a SQL 2012 example. Very straight forward with the implied table and the PARTITION function.
Basically, with each ID as a partition/group, sort the values of the other fields in a descending order assigning each one an incrementing RowId, then only take the first one.
select id, rev, [class]
from
(
SELECT id, rev, [class],
ROW_NUMBER() OVER(PARTITION BY id ORDER BY rev DESC, [class] desc) AS RowId
FROM sample
) t
where RowId = 1
Here is the SQL Fiddle
Keep in mind, this works with the criteria in the example dataset, and not the MAX of two fields as stated in the question's title.
I guess you mean: the max of rev and the max of class. If not, please clarify what to do when there is no row where both fields have the highest value.
select id
, max(rev)
, max(class)
from table
group
by id
If you mean total value of rev and class use this:
select id
, max
, rev
from table
where id in
( select id
, max(rev + class)
from table
group
by id
)