Selecting top most row in Bigquery based on conditions - sql

I have a huge table, where sometimes 1 product ID has multiple specifications. I want to select the newest but unfortunately, I don't have the date information. please consider this example dataset
Row ID Type Sn Sn_Ind
1 3 SLN SL20 20
2 1 SL SL 0
3 2 SL SL 0
4 1 M SL21 10
5 3 M SL21 10
6 1 SLN SL20 20
I used the below query to somehow group the products in give them row numbers like
with cleanedMasterData as(
SELECT *
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Sn DESC, Sn_Ind DESC) AS rn
FROM `project.dataset.table`
)
-- where rn = 1
)
select * from cleanedMasterData
Please find below the example table after cleaning
Row ID Type Sn Sn_Ind rn
1 1 SL SL 0 1
2 1 M SL21 10 2
3 1 SLN SL20 20 3
4 2 SL SL 0 1
5 3 M SL21 10 1
6 3 SLN SL20 20 2
but if you see for ID 2 and 3, I can easily select the top row with where rn = 1
but for ID 1, my preferred row would be 2 because that is the newest.
My question here is how do I prioritise a value in column so that I can get the desired solution like :
Row ID Type Sn Sn_Ind rn
1 1 M SL21 10 1
2 2 SL SL 0 1
3 3 M SL21 10 1
As the values are fixed in Sn column - for ex SL, SL20, SL19, SL21 etc - If somehow I can give weightage to these values and create a new temp column with weightage and sort based on it, then?
Thank you for your support in advance!!

Consider below
SELECT *
FROM `project.dataset.table`
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY IF(Sn = 'SL', 0, 1) DESC, Sn DESC) = 1
If applied to sample data in your question - output is

It wasn't difficult, I tried a few things and it worked out. If anyone can optimize the below solution even more that would be awesome.
first the dataset
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 'SLN' Type, 'SL20' Sn, 20 Sn_Ind UNION ALL
SELECT 1 , 'SL' , 'SL' , 0 UNION ALL
SELECT 2 , 'SL' , 'SL' , 0 UNION ALL
SELECT 1 , 'M' , 'SL21' , 10 UNION ALL
SELECT 3 , 'M' , 'SL21' , 10 UNION ALL
SELECT 1 , 'SLN' , 'SL20' , 20
)
with weightage as(
SELECT
*,
MAX(CASE Sn WHEN 'SL' THEN 0 ELSE 1 END) OVER (PARTITION BY Sn) AS weightt,
FROM
`project.dataset.table`
ORDER BY
weightt DESC, Sn DESC
), main as (
select * EXCEPT(rn, weightt)
from (
select * ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY weightt DESC, Sn DESC) AS rn
from weightage )
where rn = 1
)
select * from main
after this, I can get the desired result
Row ID Type Sn Sn_Ind
1 1 M SL21 10
2 2 SL SL 0
3 3 M SL21 10

Related

SQL Server - group and number matching contiguous values

I have a list of stock transactions and I am using Over(Partition By) to calculate the running totals (positions) by security. Over time a holding in a particular security can be long, short or flat. I am trying to find an efficient way to extract only the transactions relating to the current position for each security.
I have created a simplified sqlfiddle to show what I have so far. The cte query generates the running total for each security (code_id) and identifies when the holdings are long (L), short (s) or flat (f). What I need is to group and number matching contiguous values of L, S or F for each code_id.
What I have so far is this:
; WITH RunningTotals as
(
SELECT
*,
RunningTotal = sum(qty) OVER (Partition By code_id Order By id)
FROM
TradeData
), LongShortFlat as
(
SELECT
*,
LSF = CASE
WHEN RunningTotal > 0 THEN 'L'
WHEN RunningTotal < 0 THEN 'S'
ELSE 'F'
END
FROM
RunningTotals
)
SELECT
*
FROM
LongShortFlat r
I think what I need to do is create a GroupNum column by applying a row_number for each group of L, S and F within each code_id so the results look like this:
id code_id qty RunningTotal LSF GroupNum
1 1 5 5 L 1
2 1 2 7 L 1
3 1 7 14 L 1
4 1 -3 11 L 1
5 1 -5 6 L 1
6 1 -6 0 F 2
7 1 5 5 L 3
8 1 5 10 L 3
9 1 -2 8 L 3
10 1 -4 4 L 3
11 2 5 5 L 1
12 2 3 8 L 1
13 2 -4 4 L 1
14 2 -2 2 L 1
15 2 -2 0 F 2
16 2 6 6 L 3
17 2 -5 1 L 3
18 2 -5 -4 S 4
19 2 2 -2 S 4
20 2 4 2 L 5
21 2 -5 -3 S 6
22 2 -2 -5 S 6
23 3 5 5 L 1
24 3 2 7 L 1
25 3 1 8 L 1
I am struggling to generate the GroupNum column.
Thanks in advance for your help.
[Revised]
Sorry about that, I read your question too quickly. I came up with a solution using a recursive common table expression (below), then saw that you've worked out a solution using LAG. I'll post my revised query anyway, for posterity. Either way, the resulting query is (imho) pretty ugly.
;WITH cteBaseAgg
as (
-- Build the "sum increases over time" data
SELECT
row_number() over (partition by td.code_id order by td.code_id, td.Id) RecurseKey
,td.code_id
,td.id
,td.qty
,sum(tdPrior.qty) RunningTotal
,case
when sum(tdPrior.qty) > 0 then 'L'
when sum(tdPrior.qty) < 0 then 'S'
else 'F'
end LSF
from dbo.TradeData td
inner join dbo.TradeData tdPrior
on tdPrior.code_id = td.code_id -- All for this code_id
and tdPrior.id <= td.Id -- For this and any prior Ids
group by
td.code_id
,td.id
,td.qty
)
,cteRecurse
as (
-- "Set" the first row for each code_id
SELECT
RecurseKey
,code_id
,id
,qty
,RunningTotal
,LSF
,1 GroupNum
from cteBaseAgg
where RecurseKey = 1
-- For each succesive row in each set, check if need to increment GroupNum
UNION ALL SELECT
agg.RecurseKey
,agg.code_id
,agg.id
,agg.qty
,agg.RunningTotal
,agg.LSF
,rec.GroupNum + case when rec.LSF = agg.LSF then 0 else 1 end
from cteBaseAgg agg
inner join cteRecurse rec
on rec.code_id = agg.code_id
and agg.RecurseKey - 1 = rec.RecurseKey
)
-- Show results
SELECT
id
,code_id
,qty
,RunningTotal
,LSF
,GroupNum
from cteRecurse
order by
code_id
,id
Sorry for making this question a bit more complicated than it needed to be but for the sake of closure I have found a solution using the lag function.
In order to achieve what I wanted I continued my cte above with the following:
, a as
(
SELECT
*,
Lag(LSF, 1, LSF) OVER(Partition By code_id ORDER BY id) AS prev_LSF,
Lag(code_id, 1, code_id) OVER(Partition By code_id ORDER BY id) AS prev_code
FROM
LongShortFlat
), b as
(
SELECT
id,
LSF,
code_id,
Sum(CASE
WHEN LSF <> prev_LSF AND code_id = prev_code
THEN 1
ELSE 0
END) OVER(Partition By code_id ORDER BY id) AS grp
FROM
a
)
select * from b order by id
Here is the updated sqlfiddle.

How to Generate Row number Partition by two column match in sql

Tbl1
---------------------------------------------------------
Id Date Qty ReOrder
---------------------------------------------------------
1 1-1-18 1 3
2 2-1-18 0 3
3 3-1-18 2 3
4 4-1-18 3< >3
5 5-1-18 2 3
6 6-1-18 0 3
7 7-1-18 1 3
8 8-1-18 0 3
---------------------------------------------------------
I want the result like below
---------------------------------------------------------
Id Date Qty ReOrder
---------------------------------------------------------
1 1-1-18 1 3
5 5-1-18 2 3
---------------------------------------------------------
if ReOrder not same with Qty then date will be same upto after reorder=Qty
You can use cumulative approach with row_number() function :
select top (1) with ties *
from (select *, max(case when qty = reorder then 'v' end) over (order by id desc) grp
from table
) t
order by row_number() over(partition by grp order by id);
Unfortunately this will require SQL Server, But you can also do:
select *
from (select *, row_number() over(partition by grp order by id) seq
from (select *, max(case when qty = reorder then 'v' end) over (order by id desc) grp
from table
) t
) t
where seq = 1;

How to merge order by records from two different queries in SQL Server

Following scenario, in which I need top 3 records from the same table with specific condition and need to merge with second query records with excluding first query result.
Table A:
Id Name Flag
-------------
1 A 1
2 B 0
3 C 0
4 D 1
5 E 0
6 F 1
7 G 0
8 H 0
Top 3 where flag = 1 with latest records page index 1 and row count 5
Id Name Flag
----------------
6 F 1
4 D 1 first require flag =1 first 3 records
1 A 1
---------------------------------------
8 H 0
7 G 0
Page index 2 and row count 5 with latest records
Id Name Flag
--------------
6 F 1
5 E 0
3 C 0
2 B 0
How can I achieve this with a SQL query?
IF (#PageNumber = 0)
BEGIN
SELECT TOP (#RowsPerPage) [Id], [Name], [Flag]
FROM
(SELECT [Id], [Name], [Flag]
FROM
(SELECT TOP 3 [Id], [Name], [Flag]
FROM [A]
WHERE Flag = 1
ORDER BY Id DESC
UNION
SELECT [Id], [Name], [Flag]
FROM [A]
ORDER BY Id DESC) T
END
ELSE
BEGIN
// Normal paging query excluding top 3 flag records.
END
Try this
WITH CTE
AS
(
SELECT
Seq1 = ROW_NUMBER() OVER(PARTITION BY Flag ORDER BY Id DESC),
Seq2 = ROW_NUMBER() OVER(ORDER BY Id DESC),
Id,
Name,
Flag
FROM t1
)
SELECT
Seq1,
id,
name,
flag
FROM CTE
WHERE Seq1 < 4
UNION ALL
SELECT
Seq2,
id,
name,
flag
FROM CTE
WHERE Seq1 >3
ORDER BY 4 desc,1

MS Sql Server, same column with a different row neighbors

I need a little help on a SQL query. I could not get the result that I wanted.
ID I10 H 10NS HNS CC NSCC
0 1 1 1 1 14 14
1 0 1 0 1 6 2
1 0 2 0 2 12 2
1 0 3 0 3 17 4
1 0 3 0 3 18 4
1 0 3 0 3 19 4
1 0 3 0 3 20 4
What I want to have is one from each ID with highest CC
For example,
ID I10 H 10NS HNS CC NSCC
0 1 1 1 1 14 14
1 0 3 0 3 20 4
I tried with this code:
SELECT a.ID, b.name, a.i10 as[i-10-index], a.h as[h-index], 10ns as[i-10-index based on non-self-citation], a.hns as [h-index based on non-self-citation],
max(a.[Citation Count]), (a.[Non-Self-Citation Count])
FROM tbl_lpNumerical as a
join tbl_lpAcademician as b
on a.ID= (b.ID-1)
GROUP BY a.ID, b.name, a.i10, a.h, a.10ns, a.hns,
a.[Non-Self-Citation Count]
order by a.ID desc
However, I could not get the desired results.
Thank you for your time.
You can simply get all the row where not exist another row with an higher CC
SELECT n.*
FROM tbl_lpNumerical n
WHERE NOT EXISTS ( SELECT 'b'
FROM tbl_lpNumerical n2
WHERE n2.ID = n.ID
AND n2.CC > n.CC
)
In SQL Server, you can use row_number() for this. Based on your sample data`, something like:
select sd.*
from (select sd.*, row_number() over (partition by id order by cc desc) as seqnum
from sampledata sd
) sd
where seqnum = 1;
I have no idea what your query has to do with the sample data. If it generates the data, then you can use a CTE:
with sampledata as (
<some query here>
)
select sd.*
from (select sd.*, row_number() over (partition by id order by cc desc) as seqnum
from sampledata sd
) sd
where seqnum = 1;
The following query will select a single row from each ID partition: the one with the highest CC value:
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CC DESC) AS rn
FROM mytable) t
WHERE t.rn = 1
If there can be multiple rows having the same CC max value and you want all of them selected, then you can replace ROW_NUMBER() with RANK().

Select and aggregate last records base on order

I have different versions of the charges in a table. I want to grab and sum the last charge grouped by Type.
So I want to add 9.87, 9.63, 1.65.
I want the Parent ID , sum(9.87 + 9.63 + 1.65) as the results of this query.
We use MSSQL
ID ORDER CHARGES TYPE PARENT ID
1 1 6.45 1 1
2 2 1.25 1 1
3 3 9.87 1 1
4 1 6.54 2 1
5 2 5.64 2 1
6 3 0.84 2 1
7 4 9.63 2 1
8 1 7.33 3 1
9 2 5.65 3 1
10 3 8.65 3 1
11 4 5.14 3 1
12 5 1.65 3 1
WITH recordsList
AS
(
SELECT Type, Charges,
ROW_NUMBER() OVER (PArtition BY TYPE
ORDER BY [ORDER] DESC) rn
FROM tableName
)
SELECT SUM(Charges) totalCharge
FROM recordsLIst
WHERE rn = 1
SQLFiddle Demo
Use row_number() to identify the rows to be summed, and then sum them:
select SUM(charges)
from (select t.*,
ROW_NUMBER() over (PARTITION by type order by id desc) as seqnum
from t
) t
where seqnum = 1
Alternatively you could use a window aggregate MAX():
SELECT SUM(Charges)
FROM (
SELECT
[ORDER],
Charges,
MaxOrder = MAX([ORDER]) OVER (PARTITION BY [TYPE])
FROM atable
) s
WHERE [ORDER] = MaxOrder
;
SELECT t.PARENT_ID, SUM(t.CHARGES)
FROM dbo.test73 t
WHERE EXISTS (
SELECT 1
FROM dbo.test73
WHERE [TYPE] = t.[TYPE]
HAVING MAX([ORDER]) = t.[ORDER]
)
GROUP BY t.PARENT_ID
Demo on SQLFiddle