Top Dense_Rank row based on other fields - sql

I have several tables tied together in sql that I am trying to display only the MAX number from a column formulated using DENSE RANK but I need to keep in mind 2 other fields when pulling the TOP row.
Here is a sample of my result:
| sa_id | price | threshold | role_id | rk
1 | 37E41 | 40.00 | NULL | A38D67A | 1
2 | 37E41 | 40.00 | NULL | 46B9D4E | 1
3 | 1CFC1 | 40.00 | NULL | 58C1E03 | 1
4 | BF0D3 | 40.00 | NULL | 28D465B | 1
5 | F914B | 40.00 | NULL | 2920EBD | 1
6 | F3CA1 | 40.00 | NULL | D5E7584 | 1
7 | 0D8C1 | 40.00 | NULL | EECDB5A | 1
8 | A6503 | 40.00 | NULL | B680CB4 | 1
9 | 9BB96 | 40.00 | 0.01 | D66E612 | 1
10 | 9BB96 | 40.00 | 20.03 | D66E612 | 2
11 | 9BB96 | 40.00 | 40.03 | D66E612 | 3
12 | 9BB96 | 40.00 | 60.03 | D66E612 | 4
13 | 9BB96 | 40.00 | 80.03 | D66E612 | 5
What I am hoping to accomplish is to display all columns in this screenshot using the highest value for rk (calculated using DENSE RANK) where price > threshold and the sa_id & role_id are unique.
In this case I would want to display the following rows only: 1, 2, 3, 4, 5, 6, 7, 8, 10
Is this possible?
SELECT
servicerate_audit_id as sa_id
,ticket_price as price
,threshold_threshold/100.00 as threshold
,charge_role.chargerole_id as role_id
,DENSE_RANK() OVER(
PARTITION BY threshold_audit_id
ORDER BY
ISNULL(threshold_threshold,9999999),
threshold_threshold
) as rk
FROM sts_service_charge_rate
INNER JOIN ts_threshold
ON threshold_id = servicerate_threshold_id
INNER JOIN ts_charge_role as charge_role
ON chargerole_id = servicerate_charge_role_id

If you can modify your original query:
SELECT *
FROM (
SELECT
servicerate_audit_id as sa_id
,ticket_price as price
,threshold_threshold/100.00 as threshold
,charge_role.chargerole_id as role_id
,DENSE_RANK() OVER(
PARTITION BY threshold_audit_id
ORDER BY
ISNULL(threshold_threshold,9999999),
threshold_threshold
) as rk
,DENSE_RANK() OVER(
ORDER BY
ISNULL(threshold_threshold,9999999) DESC,
threshold_threshold DESC
) as rk_inverse
FROM sts_service_charge_rate
INNER JOIN ts_threshold
ON threshold_id = servicerate_threshold_id
INNER JOIN ts_charge_role as charge_role
ON chargerole_id = servicerate_charge_role_id
) t
WHERE price > COALESCE(threshold, 0)
AND t.rk_inverse = 1
Observe I just added an inverse calculation of your ranking and filtered for the top rk_inverse per partition. I'm assuming that the PARTITION BY threshold_audit_id and your requirement of having unique (sa_id, role_id) tuples are functionally dependent. Otherwise, your rk_inverse calculation would need to take into consideration a different PARTITION BY clause.
If you cannot modify your original query:
You can calculate another window function that orders your rk values descendingly (highest first) per your partition (sa_id, role_id), and then take only the top one per partition:
SELECT sa_id, price, threshold, role_id, rk
FROM (
SELECT result.*, row_number() OVER (PARTITION BY sa_id, role_id ORDER BY rk DESC) rn
FROM (... original query ...)
WHERE price > COALESCE(threshold, 0)
) t
WHERE rn = 1

Related

Select pairs of values based on condition in other column - PostgreSQL

I've been trying to solve an issue for the past couple of days, but couldn't figure out what the solution would be...
I have a table as the following:
+--------+-----------+-------+
| ShopID | ArticleID | Price |
+--------+-----------+-------+
| 1 | 3 | 150 |
| 1 | 2 | 80 |
| 3 | 3 | 100 |
| 4 | 2 | 95 |
+--------+-----------+-------+
And I woud like to select pairs of shop IDs for which the price of the same article is higher.
F.e. this should look like:
+----------+----------+---------+
| ShopID_1 | ShopID_2 |ArticleID|
+----------+----------+---------+
| 4 | 1 | 2 |
| 1 | 3 | 3 |
+----------+----------+---------+
... showing that Article 2 ist more expensive in ShopID 4 than in ShopID 2. Etc
My code so far looks as following:
SELECT ShopID AS ShopID_1, ShopID AS ShopID_2, ArticleID FROM table
WHERE table.ArticleID=table.ArticleID and table.Price > table.Price
But it doesn't give the result I am searching for.
Can anyone help me with this objective? Thank you very much.
The problem here is about calculating Top N items per Group.
Assuming you have the following data, in table sales.
# select * from sales;
shopid | articleid | price
--------+-----------+-------
1 | 2 | 80
3 | 3 | 100
4 | 2 | 95
1 | 3 | 150
5 | 3 | 50
With the following query we can create a partition for each ArticleId
select
ArticleID,
ShopID,
Price,
row_number() over (partition by ArticleID order by Price desc) as Price_Rank from sales;
This will result:
articleid | shopid | price | price_rank
-----------+--------+-------+------------
2 | 4 | 95 | 1
2 | 1 | 80 | 2
3 | 1 | 150 | 1
3 | 3 | 100 | 2
3 | 5 | 50 | 3
Then we simply select Top 2 items for each AritcleId:
select
ArticleID,
ShopID,
Price
from (
select
ArticleID,
ShopID,
Price,
row_number() over (partition by ArticleID order by Price desc) as Price_Rank
from sales) sales_rank
where Price_Rank <= 2;
which will result:
articleid | shopid | price
-----------+--------+-------
2 | 4 | 95
2 | 1 | 80
3 | 1 | 150
3 | 3 | 100
Finally, we can use crosstab function to get the expected pivot view.
select *
from crosstab(
'select
ArticleID,
ShopID,
ShopID
from (
select
ArticleID,
ShopID,
Price,
row_number() over (partition by ArticleID order by Price desc) as Price_Rank
from sales) sales_rank
where Price_Rank <= 2')
AS sales_top_2("ArticleID" INT, "ShopID_1" INT, "ShopID_2" INT);
And the result:
ArticleID | ShopID_1 | ShopID_2
-----------+----------+----------
2 | 4 | 1
3 | 1 | 3
Note:
You may need to call CREATE EXTENSION tablefunc; in case if you get the error function crosstab(unknown) does not exist.
This query should work:
SELECT t1.ShopID AS ShopID_1, t2.ShopID AS ShopID_2, t1.ArticleID
FROM <yourtable> t1 JOIN
<yourtable> t2
ON t1.ArticleID = t2.ArticleID AND t1.Price > t2.Price;
That is, you need a self-join and appropriate table aliases.

eSQL multiple join but with conditions

I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM
A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!

In Redshift, how do I run the opposite of a SUM function

Assuming I have a data table
date | user_id | user_last_name | order_id | is_new_session
------------+------------+----------------+-----------+---------------
2014-09-01 | A | B | 1 | t
2014-09-01 | A | B | 5 | f
2014-09-02 | A | B | 8 | t
2014-09-01 | B | B | 2 | t
2014-09-02 | B | test | 3 | t
2014-09-03 | B | test | 4 | t
2014-09-04 | B | test | 6 | t
2014-09-04 | B | test | 7 | f
2014-09-05 | B | test | 9 | t
2014-09-05 | B | test | 10 | f
I want to get another column in Redshift which basically assigns session numbers to each users session. It starts at 1 for the first record for each user and as you move further down, if it encounters a true in the "is_new_session" column, it increments. Stays the same if it encounters a false. If it hits a new user, the value resets to 1. The ideal output for this table would be:
1
1
2
1
2
3
4
4
5
5
In my mind it's kind of the opposite of a SUM(1) over (Partition BY user_id, is_new_session ORDER BY user_id, date ASC)
Any ideas?
Thanks!
I think you want an incremental sum:
select t.*,
sum(case when is_new_session then 1 else 0 end) over (partition by user_id order by date) as session_number
from t;
In Redshift, you might need the windowing clause:
select t.*,
sum(case when is_new_session then 1 else 0 end) over
(partition by user_id
order by date
rows between unbounded preceding and current row
) as session_number
from t;

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.
Here is example data:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 1 | 01.01.2016 | 1 |
| A | 2 | 02.01.2016 | 1 |
| A | 3 | 03.01.2016 | 1 |
| A | 4 | 01.01.2016 | 2 |
| A | 5 | 02.01.2016 | 2 |
| A | 6 | 03.01.2016 | 2 |
| B | 1 | 01.01.2016 | 1 |
| B | 2 | 02.01.2016 | 1 |
| B | 3 | 03.01.2016 | 1 |
| B | 4 | 01.01.2016 | 2 |
| B | 5 | 02.01.2016 | 2 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
And here is example of desired output:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 3 | 03.01.2016 | 1 |
| A | 6 | 03.01.2016 | 2 |
| B | 3 | 03.01.2016 | 1 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
To put this in perspective — for every related object I want to select each code with latest date.
Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:
SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
companies.ticker, companies.name
FROM fundamentals t
INNER JOIN companies on companies.id = t.company_id
WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)
It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.
Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?
Postgres offers the convenient distinct on for this purpose:
select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;
So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.
SELECT mytable.code,
mytable.value,
mytable.date,
mytable.relation_id
FROM mytable
JOIN (
SELECT code,
max(date) as date,
relation_id
FROM mytable
GROUP BY code, relation_id
) Q1
ON Q1.code = mytable.code
AND Q1.date = mytable.date
AND Q1.relation_id = mytable.relation_id
Other option:
SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable
This will return top value for what ever you partition by, and for whatever you order by.
I believe we can try something like this
SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable
GROUP BY CODE,Relation_ID,Date

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.