Postgres get first and last version for individual vendor

Postgres get first and last version for individual vendor - sql

I have mapping table for RFQ(request for quotation) and Vendor's bid amount with version.
Table :
id rfq_id(FK) vendor_id(FK) amount version
-----------------------------------------------
1 1 1 100 1
2 1 1 90 2
3 1 1 80 3
4 1 2 50 1
5 1 7 500 1
6 1 7 495 2
7 1 7 500 3
8 1 7 525 4
9 1 7 450 5
10 1 7 430 6
11 2 1 200 1
12 2 2 300 1
13 2 2 350 2
14 2 3 40 1
15 3 4 70 1
In above table, I want analysis for vendor's first and last bid for particular rfq_id.
Expected Output for rfq_id=1 :
vendor_id first_bid last_bid
---------------------------------
1 100 80
2 50 50
7 500 430
From Postgres : get min and max rows count in many to many relation table I have came to know about window and partition. So I have tried below query.
SELECT
vendor_id,
version,
amount,
first_value(amount) over w as first_bid,
last_value(amount) over w as last_bid,
row_number() over w as rn
FROM
rfq_vendor_version_mapping
where
rfq_id=1
WINDOW w AS (PARTITION BY vendor_id order by version)
ORDER by vendor_id;
With above query, every vendor's maximum rn is my output.
http://sqlfiddle.com/#!15/f19a0/7

Window functions add columns to all the existing rows, instead of grouping input rows into a single output row. Since you are only interested in the bid values, use a DISTINCT clause on the fields of interest.
Note that you need a frame clause for the WINDOW definition to make sure that all rows in the partition are considered. By default, the frame in the partition (the rows that are being used in calculations) runs from the beginning of the partition to the current row. Therefore, the last_value() window function always returns the value of the current row; use a frame of UNBOUNDED PRECEDING TO UNBOUNDED FOLLOWING to extend the frame to the entire partition.
SELECT DISTINCT
vendor_id,
version,
amount,
first_value(amount) OVER w AS first_bid,
last_value(amount) OVER w AS last_bid
row_number() over w as rn
FROM
rfq_vendor_version_mapping
WHERE rfq_id = 1
WINDOW w AS (PARTITION BY vendor_id ORDER BY version
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY vendor_id;

You have to GROUP BY vendor_id because you want just one row per vendor_id:
SELECT
vendor_id,
MAX(CASE WHEN rn = 1 THEN amount END) AS first_bid,
MAX(CASE WHEN rn2 = 1 THEN amount END) AS last_bid
FROM (
SELECT
vendor_id,
version,
amount,
row_number() over (PARTITION BY vendor_id order BY version) as rn,
row_number() over (PARTITION BY vendor_id order BY version DESC) as rn2
FROM
rfq_vendor_version_mapping
WHERE
rfq_id=1) AS t
GROUP BY vendor_id
ORDER by vendor_id;
The query uses conditional aggregation in order to extract amount values that correspond to first and last bid.
Demo here

Without ORDER BY OLAP-functions default to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING but with ORDER BY this changes to ROW UNBOUNDED PRECEDING.
You were quite close, but you need two different windows:
select vendor_id, amount as first_bid, last_bid
from
(
SELECT
vendor_id,
version,
amount,
last_value(amount) -- highest version's bid
over (PARTITION BY vendor_id
order by version
rows between unbiunded preceding and unbounded following) as last_bid,
row_number()
over (PARTITION BY vendor_id
order by version) as rn
FROM
rfq_vendor_version_mapping
where
rfq_id=1
) as dt
where rn = 1 -- row with first version/bid
ORDER by vendor_id;

Related

SQL subquery with comparison

On a Rails (5.2) app with PostgreSQL I have 2 tables: Item and ItemPrice where an item has many item_prices.
Table Item
id
name
1
poetry book
2
programming book
Table ItemPrice
id
item_id
price
1
1
4
2
2
20
3
1
8
4
1
6
5
2
22
I am trying to select all the items for which the last price (price of the last offer price attached to it) is smaller than the one before it
So in this example, my request should only return item 1 because 6 < 8, and not item 2 because 22 > 20
I tried various combinations of Active records and SQL subqueries that would allow me to compare the last price with the second to last price but failed so far.
ex Item.all.joins(:item_prices).where('EXISTS(SELECT price FROM item_prices ORDER BY ID DESC LIMIT 1 as last_price WHERE (SELECT price FROM item_prices ... can't work it out..

You can do it as follows using ROW_NUMBER and LAG:
LAG to get the previous row based on a condition
WITH ranked_items AS (
SELECT m.*,
ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY id DESC) AS rn,
LAG(price,1) OVER (PARTITION BY item_id ORDER BY id ) previous_price
FROM ItemPrice AS m
)
SELECT it.*
FROM ranked_items itp
inner join Item it on it.id = itp.item_id
WHERE rn = 1 and price < previous_price
Demo here

sql - select single ID for each group with the lowest value

Consider the following table:
ID GroupId Rank
1 1 1
2 1 2
3 1 1
4 2 10
5 2 1
6 3 1
7 4 5
I need an sql (for MS-SQL) select query selecting a single Id for each group with the lowest rank. Each group needs to only return a single ID, even if there are two with the same rank (as 1 and 2 do in the above table). I've tried to select the min value, but the requirement that only one be returned, and the value to be returned is the ID column, is throwing me.
Does anyone know how to do this?

Use row_number():
select t.*
from (select t.*,
row_number() over (partition by groupid order by rank) as seqnum
from t
) t
where seqnum = 1;

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.

One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

How to get max of the rows above in each group(Sql or SAS)

Data:
ID step order
100 1 1
100 2 2
100 3 3
100 1 4
200 2 5
200 3 6
200 1 7
Desired Result( I want to get the max of the rows above in each group)
ID step max_step
100 1 1
100 2 2
100 3 3
100 1 3
200 2 2
200 3 3
200 1 3
Thank you very much!:)

If your database supports windowed aggregate then
SELECT id,
step,
Max(step) OVER( partition BY ID ORDER BY "order") as max_step
From yourtable
If you want to max step from above rows irrespective of ID then remove the partition by
SELECT id,
step,
Max(step) OVER(ORDER BY "order") as max_step
From yourtable

If you want to have some idea of row order, then SAS is going to be the easier answer here.
data want;
set have;
by ID;
retain max_step;
if first.id then call missing(max_step);
max_step = max(step,max_step);
run;

You need a Cumulative Max:
max(step)
over(partition by id
order by ordercol
rows unbounded preceding)
As Teradata doesn't follow Standard SQL which defaults to range unbounded preceding you need to add it (which it's recommended anyway).
Only last_value defaults to cumulative:
last_value(step)
over(partition by id
order by ordercol)
Of course you could add rows unbounded preceding, too.

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227

The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle

SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres get first and last version for individual vendor - sql

Related

SQL subquery with comparison

sql - select single ID for each group with the lowest value

Calculate "position in run" in SQL

How to get max of the rows above in each group(Sql or SAS)

How to find the SQL medians for a grouping

Categories

Resources