Complex MySQL rank query with ties - sql

Let's say I had the following table:
id | name | points
-------------------
1 | joe | 100
2 | bob | 95
3 | max | 95
4 | leo | 90
Can I produce a reversed rank recordset like so:
id | name | points | rank
--------------------------
4 | leo | 90 | 1
3 | max | 95 | 2.5
2 | bob | 95 | 2.5
1 | joe | 100 | 4

This is a fully working example, with this sample table
create table tpoints (id int, name varchar(10), points int);
insert tpoints values
(1 ,'joe', 100 ),
(2 ,'bob', 95 ),
(3 ,'max', 95 ),
(4 ,'leo', 90 );
The MySQL query
select t.*, sq.`rank`
from
(
select
points,
#rank := case when #g = points then #rank else #rn + (c-1)/2.0 end `rank`,
#g := points,
#rn := #rn + c
from
(select #g:=null, #rn:=1) g,
(select points, count(*) c
from tpoints
group by points
order by points asc) p
) sq inner join tpoints t on t.points = sq.points
order by t.points asc;
It also has the benefit of performing very well compared to performing a correlated cross (self) join.
1x pass through tpoints to aggregate into groups
calculation of rank with ties
1x join to table to put ranks against the records.

Won't do "2.5" as a rank value, but duplicates will have the same number if you use:
SELECT x.id,
x.name,
x.points,
(SELECT COUNT(*)
FROM YOUR_TABLE y
WHERE y.points <= x.points) AS rank
FROM YOUR_TABLE x
ORDER BY x.points

Related

How to make sure the sql result is continued range?

I have table like:
id | low_number | high_number
-------------------------------
1 | 12 | 32
-------------------------------
2 | 13 | 33
-------------------------------
3 | 15 | 36
-------------------------------
4 | 33 | 50
-------------------------------
5 | 35 | 52
...
-------------------------------
17 | 52 | 80
I want to get result like:
id | low_number | high_number
-------------------------------
1 | 12 | 32
-------------------------------
4 | 33 | 50
-------------------------------
17 | 52 | 80
that is because the low_number bigger than the pervious row high_number.
How to write sql to get these result? I use postgresql
This seems like a recursive CTE problem. You want to choose the first row (by id) and then choose the next row based on that.
The idea is to cycle through the rows, one at a time. Then when the condition is met, transition to that row. And so on.
As a query, this looks like:
with recursive tt as (
select id, low_number, high_number, row_number() over (order by id) as seqnum
from t
),
cte as (
select id, low_number, high_number, seqnum, true as is_change, id as grouping_id
from tt
where seqnum = 1
union all
select tt.id, tt.low_number, tt.high_number, tt.seqnum, tt.low_number > t.high_number,
(case when tt.low_number > t.high_number then tt.id else cte.grouping_id end)
from cte join
t
on cte.grouping_id = t.id join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte
where is_change;
Here is a db<>fiddle.
Use the window function LAG() to get a value of a previous row, e.g.
WITH j AS (
SELECT
id,low_number,high_number,
LAG(high_number) OVER (ORDER BY id) AS prev_high_number
FROM t)
SELECT id,low_number,high_number FROM j
WHERE low_number > prev_high_number OR prev_high_number IS NULL;
Demo: db<>fiddle

SQL Server - Insert lines with null values when month doesn't exist

I have a table like this one:
Yr | Mnth | W_ID | X_ID | Y_ID | Z_ID | Purchases | Sales | Returns |
2015 | 10 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2015 | 12 | 1 | 5210 | 1402 | 2 | 12000.00 | etc | etc |
2016 | 1 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2016 | 3 | 1 | 5210 | 1402 | 2 | etc | etc | etc |
2014 | 3 | 9 | 880 | 2 | 7 | etc | etc | etc |
2014 | 12 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 5 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 7 | 9 | 880 | 2 | 7 | etc | etc | etc |
For each combination of (W, X, Y, Z) I would like to insert the months that don't appear in the table and are between the first and last month.
In this example, for combination (W=1, X=5210, Y=1402, Z=2), I would like to have additional rows for 2015/11 and 2016/02, where Purchases, Sales and Returns are NULL. For combination (W=9, X=880, Y=2, Z=7) I would like to have additional rows for months between 2014/4 and 2014/11, 2015/01 and 2015/04, 2016/06.
I hope I have explained myself correctly.
Thank you in advance for any help you can provide.
The process is rather cumbersome in this case, but quite possible. One method uses a recursive CTE. Another uses a numbers table. I'm going to use the latter.
The idea is:
Find the minimum and maximum values for the year/month combination for each set of ids. For this, the values will be turned into months since time 0 using the formula year*12 + month.
Generate a bunch of numbers.
Generate all rows between the two values for each combination of ids.
For each generated row, use arithmetic to re-extract the year and month.
Use left join to bring in the original data.
The query looks like:
with n as (
select row_number() over (order by (select null)) - 1 as n -- start at 0
from master.spt_values
),
minmax as (
select w_id, x_id, y_id, z_id, min(yr*12 + mnth) as minyyyymm,
max(yr*12 + mnth) as maxyyyymm
from t
group by w_id, x_id, y_id, z_id
),
wxyz as (
select minmax.*, minmax.minyyyymm + n.n,
(minmax.minyyyymm + n.n) / 12 as yyyy,
((minmax.minyyyymm + n.n) % 12) + 1 as mm
from minmax join
n
on minmax.minyyyymm + n.n <= minmax.maxyyyymm
)
select wxyz.yyyy, wxyz.mm, wxyz.w_id, wxyz.x_id, wxyz.y_id, wxyz.z_id,
<columns from t here>
from wxyz left join
t
on wxyz.w_id = t.w_id and wxyz.x_id = t.x_id and wxyz.y_id = t.y_id and
wxyz.z_id = t.z_id and wxyz.yyyy = t.yr and wxyz.mm = t.mnth;
Thank you for your help.
Your solution works, but I noticed it is not very good in terms of performance, but meanwhile I have managed to get a solution for my problem.
DECLARE #start_date DATE, #end_date DATE;
SET #start_date = (SELECT MIN(EOMONTH(DATEFROMPARTS(Yr , Mnth, 1))) FROM Table_Input);
SET #end_date = (SELECT MAX(EOMONTH(DATEFROMPARTS(Yr , Mnth, 1))) FROM Table_Input);
DECLARE #tdates TABLE (Period DATE, Yr INT, Mnth INT);
WHILE #start_date <= #end_date
BEGIN
INSERT INTO #tdates(PEriod, Yr, Mnth) VALUES(#start_date, YEAR(#start_date), MONTH(#start_date));
SET #start_date = EOMONTH(DATEADD(mm,1,DATEFROMPARTS(YEAR(#start_date), MONTH(#start_date), 1)));
END
DECLARE #pks TABLE (W_ID NVARCHAR(50), X_ID NVARCHAR(50)
, Y_ID NVARCHAR(50), Z_ID NVARCHAR(50)
, PerMin DATE, PerMax DATE);
INSERT INTO #pks (W_ID, X_ID, Y_ID, Z_ID, PerMin, PerMax)
SELECT W_ID, X_ID, Y_ID, Z_ID
, MIN(EOMONTH(DATEFROMPARTS(Ano, Mes, 1))) AS PerMin
, MAX(EOMONTH(DATEFROMPARTS(Ano, Mes, 1))) AS PerMax
FROM Table1
GROUP BY W_ID, X_ID, Y_ID, Z_ID;
INSERT INTO Table_Output(W_ID, X_ID, Y_ID, Z_ID
, ComprasLiquidas, RTV, DevManuais, ComprasBrutas, Vendas, Stock, ReceitasComerciais)
SELECT TP.DB, TP.Ano, TP.Mes, TP.Supplier_Code, TP.Depart_Code, TP.BizUnit_Code
, TA.ComprasLiquidas, TA.RTV, TA.DevManuais, TA.ComprasBrutas, TA.Vendas, TA.Stock, TA.ReceitasComerciais
FROM
(
SELECT W_ID, X_ID, Y_ID, Z_ID
FROM #tdatas CROSS JOIN #pks
WHERE Period BETWEEN PerMin And PerMax
) AS TP
LEFT JOIN Table_Input AS TA
ON TP.W_ID = TA.W_ID AND TP.X_ID = TA.X_ID AND TP.Y_ID = TA.Y_ID
AND TP.Z_ID = TA.Z_ID
AND TP.Yr = TA.Yr
AND TP.Mnth = TA.Mnth
ORDER BY TP.W_ID, TP.X_ID, TP.Y_ID, TP.Z_ID, TP.Yr, TP.Mnth;
I do the following:
Get the Min and Max date of the entire table - #start_date and #end_date variables;
Create an auxiliary table with all dates between Min and Max - #tdates table;
Get all the combinations of (W_ID, X_ID, Y_ID, Z_ID) along with the min and max dates of that combination - #pks table;
Create the cartesian product between #tdates and #pks, and in the WHERE clause I filter the results between the Min and Max of the combination;
Compute a LEFT JOIN of the cartesian product table with the input data table.

Return every 200:th rows, plus last row

I want to return every 200:th row and also the last row. If less than 200 rows, just return the last one.
If my table has e.g. 333 rows, return row 200 and row 333.
I try to used mod(SQN,200)='0', but the No.333 record will be missing.
Thank your help.
There is a function in oracle to know the rowid rownum. Specially if SQN isnt consecutive.
I create a cte to add the newID
SQL Fiddle Demo
WITH maxN as (
Select count(*) as N
from Employee
),
rowmod as (
SELECT "Name", "id", rownum as newID
FROM Employee
)
SELECT *
FROM rowmod, maxN
WHERE mod(newID, 21) = 0
or newID = N
OUTPUT
| Name | id | NEWID | N |
|----------|----|-------|-----|
| Jocelyn | 89 | 21 | 100 |
| Quail | 8 | 42 | 100 |
| Theodore | 28 | 63 | 100 |
| Shelley | 90 | 84 | 100 |
| Malik | 49 | 100 | 100 |
If you want "every row for which SQN is an integer multiple of 200, and the row having the maximum value of SQN", then:
select *
from (select *,
max(sqn) over () max_sqn
from table)
where mod(sqn,200) = 0
or sqn = max_sqn
If SQN was not a sequential integer, but you wanted every 200th row etc when order by sqn, then:
select *
from (select *,
row_number() over (order by sqn) rownumber,
count(*) over () row_count
from table)
where mod(rownumber,200) = 0
or rownumber = row_count
This is a very hard problem -- you need to use a logical OR.
SELECT *
FROM (
SELECT *, MAX(SQN) as SQNmax
FROM TABLE
)
WHERE MOD(SQN,200) = 0 OR SQN = SQNmax
If the max(sqn) does not work you can do this:
SELECT *
FROM TABLE
JOIN (SELECT MAX(SQN) as SQNmax FROM TABLE) sub
WHERE MOD(SQN,200) = 0 OR SQN = sub.SQNmax

How can I calculate the remaining amount per row?

I have a table that I want to find for each row id the amount remaining from the total. However, the order of amounts is in an ascending order.
id amount
1 3
2 2
3 1
4 5
The results should look like this:
id remainder
1 10
2 8
3 5
4 0
Any thoughts on how to accomplish this? I'm guessing that the over clause is the way to go, but I can't quite piece it together.Thanks.
Since you didn't specify your RDBMS, I will just assume it's Postgresql ;-)
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl;
Output:
| ID | AMOUNT | REMAINDER |
---------------------------
| 3 | 1 | 10 |
| 2 | 2 | 8 |
| 1 | 3 | 5 |
| 4 | 5 | 0 |
How it works: http://www.sqlfiddle.com/#!1/c446a/5
It works in SQL Server 2012 too: http://www.sqlfiddle.com/#!6/c446a/1
Thinking of solution for SQL Server 2008...
Btw, is your ID just a mere row number? If it is, just do this:
select
row_number() over(order by amount) as rn
, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl
order by rn;
Output:
| RN | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
But if you really need the ID intact and move the smallest amount on top, do this:
with a as
(
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder,
row_number() over(order by id) as id_sort,
row_number() over(order by amount) as amount_sort
from tbl
)
select a.id, sort.remainder
from a
join a sort on sort.amount_sort = a.id_sort
order by a.id_sort;
Output:
| ID | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
See query progression here: http://www.sqlfiddle.com/#!6/c446a/11
I just want to offer a simpler way to do this in descending order:
select id, sum(amount) over (order by id desc) as Remainder
from t
This will work in Oracle, SQL Server 2012, and Postgres.
The general solution requres a self join:
select t.id, coalesce(sum(tafter.amount), 0) as Remainder
from t left outer join
t tafter
on t.id < tafter.id
group by t.id
SQL Server 2008 answer, I can't provide an SQL Fiddle, it seems it strips the begin keyword, resulting to syntax errors. I tested this on my machine though:
create function RunningTotalGuarded()
returns #ReturnTable table(
Id int,
Amount int not null,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
as
begin
insert into #ReturnTable(id, amount, RunningTotal)
select id, amount, 0 from tbl order by amount;
declare #RunningTotal numeric(16,4) = 0;
declare #rn_check int = 0;
update #ReturnTable
set
#rn_check = #rn_check + 1
,#RunningTotal =
case when rn = #rn_check then
#RunningTotal + Amount
else
1 / 0
end
,RunningTotal = #RunningTotal;
return;
end;
To achieve your desired output:
with a as
(
select *, sum(amount) over() - RunningTotal as remainder
, row_number() over(order by id) as id_order
from RunningTotalGuarded()
)
select a.id, amount_order.remainder
from a
inner join a amount_order on amount_order.rn = a.id_order;
Rationale for guarded running total: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
Choose the lesser evil ;-)

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
Try this:
SELECT *
FROM (
SELECT a_id, (
SELECT b_id
FROM mytable mib
WHERE mib.a_id = ma.a_id
ORDER BY
dist DESC
LIMIT 1 OFFSET s
) AS b_id
FROM (
SELECT DISTINCT a_id
FROM mytable mia
) ma, generate_series (1, 10) s
) ab
WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
With the values:
SELECT value FROM foo ORDER BY value;
value
-------
0
1
2
3
4
5
6
7
8
9
14
20
32
(13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable
WHERE id IN
(SELECT id FROM mytable WHERE t1.a_id = t2.a_id
ORDER BY distance LIMIT 10);
ORDER BY a_id, distance;
a_id | distance
------+----------
1 | 0.2345
1 | 1
1 | 100
2 | 0.45
2 | 110
2 | 1342.2
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
from mytable
) where rnk <= 10;
This SQL should find you the Nth lowest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM mytable tbl1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(tbl2.distance))
FROM mytable tbl2
WHERE tbl2.distance < tbl1.distance)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the tbl1 value as well.
In order to find the Nth lowest value, we just find the value that has exactly N-1 values lower than itself.