Finding top n for each unique row - sql

I'm trying to get the top N records for each unique row of data in a table (I'm grouping on columns b,c and d, column a is the unique identifier and column e is the score of which i want the top 1 in this case).
a b c d e
2 38 NULL NULL 141
1 38 NULL NULL 10
1 38 1 NULL 10
2 38 1 NULL 1
1 38 1 8 10
2 38 1 8 1
2 38 16 NULL 140
2 38 16 12 140
e.g. from this data i would like to find the following rows:
a b c d e
2 38 NULL NULL 141
1 38 1 NULL 10
1 38 1 8 10
2 38 16 NULL 140
2 38 16 12 140
can someone please point me in the right direction to solve this?

Your example doesn't show, and you don't explain how you determine which row is the "top" one, so I've put ?????? in the query where you need to provide a ranking column, such as
a desc
for example. In any case, this is exactly what the analytic functions in SQL Server 2005 and later are for.
declare #howmany int = 3;
with TRanked (a,b,c,d,e,rk) as (
select
a,b,c,d,e,
rank() over (
partition by b,c,d
order by ???????
)
from T
)
select a,b,c,d,e
from TRanked
where rk <= #howmany;

The nulls are a pain, but something like this:
select * from table1 t1
where a in (
select top 1 a from table1 t2
where (t1.b = t2.b or (t1.b is null and t2.b is null))
and (t1.c = t2.c or (t1.c is null and t2.c is null))
and (t1.d = t2.d or (t1.d is null and t2.d is null))
order by e desc
)
or better yet:
select * from (
select *, seqno = row_number() over (partition by b, c, d order by e desc)
from table1
) a
where seqno = 1

I believe this will do what you said (extending the idea from here):
select b,c,d,e,
rank() over
(partition by b,c,d order by e desc) "rank"
from t1 where rank < 5
Cheers.

Related

Countif statement in Postgresql

How can I use countif statement in PostgreSQL?
max(COUNTIF(t1.A1:C10,t2.a1),COUNTIF(t1.A1:C10,t2.b1),COUNTIF(t1.A1:C10,t2.c1))
I have table1 which is more then a million rows
a
b
c
M5
16
27
31
3
7
27
and table2 more then 100 rows including different dates after column c
a
b
c
10
15
16
30
40
50
60
70
80
16
18
37
5
12
16
8
31
28
11
12
13
7
9
31
2
7
21
20
16
27
8
12
17
2
8
14
3
14
15
The outcome should be something like this
a
b
c
M5
16
27
31
3
3
7
27
2
Tried the below query but the outcome is not correct
UPDATE table1 SET m5 = greatest(
case When a in(select unnest(array[a,b,c]) from (select * from table2 order by date DESC limit 10) foo) then 1 else 0 END,
case When b in(select unnest(array[a,b,c]) from (select * from table2 order by date DESC limit 10) foo) then 1 else 0 END,
case When c in(select unnest(array[a,b,c]) from (select * from table2 order by date DESC limit 10) foo) then 1 else 0 END)
Assuming your columns are fixed and predictable, I think you could put all possible table values into a single column and then do counts for each occurrence:
with exploded as (
select a from table2
union all
select b from table2
union all
select c from table2
)
select a, count (*) as count
from exploded e
group by a
So for example, the value 7 occurs twice (which would be reflected in this output).
From there, you can just do the updates from the CTE:
with exploded as (
select a from table2
union all
select b from table2
union all
select c from table2
),
counted as (
select a, count (*) as count
from exploded e
group by a
)
update table1 t
set m5 = greatest (ca.count, cb.count, cc.count)
from
counted ca,
counted cb,
counted cc
where
t.a = ca.a and
t.b = cb.a and
t.c = cc.a
The only issue I see is if one of the values does not come up (the inner join fails), but in your example that doesn't seem to happen.
If it is possible, I would think that could be resolved with one more CTE to fill in missing values from table1 in the set of possible occurrences.

Removing pairs of transactions

I am attempting to remove transactions that have been reversed from a table. the table has Account, Date, Amount and Row. If a transaction has been reversed Account will match and Amount will be inverse of each other.
Example Table
Account Date Amount Row
12 1/1/18 45 72 -- Case 1
12 1/2/18 50 73
12 1/2/18 -50 74
12 1/3/18 52 75
15 1/1/18 51 76 -- Case 2
15 1/2/18 51 77
15 1/2/18 -51 78
15 1/2/18 51 79
18 1/2/18 50 80 -- Case 3
18 1/2/18 50 81
18 1/2/18 -50 82
18 1/2/18 -50 83
18 1/3/18 50 84
18 1/3/18 50 85
20 1/1/18 57 88 -- Case 4
20 1/2/18 57 89
20 1/4/18 -57 90
20 1/5/18 57 91
Desired Results Table
Account Date Amount Row
12 1/1/18 45 72 -- Case 1
12 1/3/18 52 75
15 1/1/18 51 76 -- Case 2
15 1/2/18 51 79
18 1/3/18 50 84 -- Case 3
18 1/3/18 50 85
20 1/1/18 57 88 -- Case 4
20 1/5/18 57 91
Removing all instances of inverse transactions does not work when there are multiple transactions when all other columns are the same. My attempt was to count all duplicate transactions, count of all inverse duplicate transactions, subtracting those to get the number of rows I needed from each transactions group. I was going to pull the first X rows but found in most cases I want the last X rows of each group, or even a mix (the first and last in Case 2).
I either need a method of removing pairs from the original table, or working from what I have so far, a method of distinguishing which transactions to pull.
Code so far:
--adding row Numbers
with a as (
select
account a,
date d,
amount f,
row_number() over(order by account, date) r
from table),
--counting Duplicates
b as (
select a.a, a.f, Dups
from a join (
select a, f, count(*) Dups
from a
group by a.a, a.f
having count(*)>1
) b
on a.a=b.a and
b.f=a.f
where a.f>0
),
--counting inverse duplicates
c as (
select a.a, a.f, InvDups
from a join (
select a, f, count(*) InvDups
from a
group by a.a, a.f
having count(*)>1
) b
on a.a=b.a and
-b.f=a.f
where a.f>0
),
--combining c and d to get desired number of rows of each transaction group
d as (
select
b.a, b.f, dups, InvDups, Dups-InvDups TotalDups
from b join c
on b.a=c.a and
b.f=c.f
),
--getting the number of rows from the beginning of each transaction group
select d.a, d.d, d.f
from
(select
a, d, f, row_number() over (group by a, d, f) r2
from a) e
join d
on e.a=d.a and
TotalDups<=r2
You can try this.
SELECT T_P.* FROM
( SELECT *, ROW_NUMBER() OVER(PARTITION BY Account, Amount ORDER BY [Row] ) RN from #MyTable WHere Amount > 0 ) T_P
LEFT JOIN
( SELECT *, ROW_NUMBER() OVER(PARTITION BY Account, Amount ORDER BY [Row] ) RN from #MyTable WHere Amount < 0 ) T_N
ON T_P.Account = T_N.Account
AND T_P.Amount = ABS(T_N.Amount)
AND T_P.RN = T_N.RN
WHERE
T_N.Account IS NULL
The following handles your three cases:
with t as (
select t.*,
row_number() over (partition by account, date, amount order by row) as seqnum
from table t
)
select t.*
from t
where not exists (select 1
from t t2
where t2.account = t.account and t2.date = t.date and
t2.amount = -t.amount and t2.seqnum = t.seqnum
);
Use This
;WITH CTE
AS
(
SELECT
[Row]
FROM YourTable YT
WHERE Amount > 0
AND EXISTS
(
SELECT 1 FROM YourTable WHERE Account = YT.Account
AND [Date] = YT.[Date]
AND (Amount+YT.Amount)=0
)
UNION ALL
SELECT
[Row]
FROM YourTable YT
WHERE Amount < 0
AND EXISTS
(
SELECT 1 FROM YourTable WHERE Account = YT.Account
AND [Date] = YT.[Date]
AND (Amount+YT.Amount)>0
)
)
SELECT * FROM YourTable
WHERE EXISTS
(
SELECT 1 FROM CTE WHERE [Row] = YourTable.[Row]
)

Use a standard single sql to group data (SQL Server)

Raw data with 2 columns:
0 33
2 null
0 44
2 null
2 null
2 null
0 55
2 null
2 null
.....
Results I want:
2 33
2 44
2 44
2 44
2 55
2 55
....
Can I use a SQL statement to accomplish this? (return the rows with 2 only but fill with values come from the previous row that is 0), there could be many '2 null' between 0.
This way
with s as (
select *
from
(values
(1,0,33 ),
(2,2,null),
(3,0,44 ),
(4,2,null),
(5,2,null),
(6,2,null),
(7,0,55 ),
(8,2,null),
(9,2,null)
) T(id,a,b)
)
select s1.a, t.b
from s s1
cross apply (
select top(1) s2.b
from s s2
where s2.id < s1.id and s2.b is not null and s2.a = 0
order by s2.id desc ) t
where s1.a = 2
order by s1.id;
I use CROSS APPLY so the query may be easily extended to get other columns from the relevant '0' row.
First of all, select value for every row with null:
SELECT col2 FROM (SELECT MAX(ID) FROM your_tbl t WHERE t.ID < ID AND col2 IS NOT NULL);
Then write a condition for your table with that subquery:
SELECT col1, (
SELECT col2 FROM your_tbl where id = (SELECT MAX(ID) FROM your_tbl t
WHERE t.ID < tbl.ID AND col2 IS NOT NULL))
FROM your_tbl tbl WHERE col1 <> 0;

Get next minimum, greater than or equal to a given value for each group

given the following Table1:
RefID intVal SomeVal
----------------------
1 10 val01
1 20 val02
1 30 val03
1 40 val04
1 50 val05
2 10 val06
2 20 val07
2 30 val08
2 40 val09
2 50 val10
3 12 val11
3 14 val12
4 10 val13
5 100 val14
5 150 val15
5 1000 val16
and Table2 containing some RefIDs and intVals like
RefID intVal
-------------
1 11
1 28
2 9
2 50
2 51
4 11
5 1
5 150
5 151
need an SQL Statement to get the next greater intValue for each RefID and NULL if not found in Table1
following is the expected result
RefID intVal nextGt SomeVal
------------------------------
1 11 20 val01
1 28 30 val03
2 9 10 val06
2 50 50 val10
2 51 NULL NULL
4 11 NULL NULL
5 1 100 val14
5 150 150 val15
5 151 1000 val16
help would be appreciated !
Derived table a retrieves minimal values from table1 given refid and intVal from table2; outer query retrieves someValue only.
select a.refid, a.intVal, a.nextGt, table1.SomeVal
from
(
select table2.refid, table2.intval, min (table1.intVal) nextGt
from table2
left join table1
on table2.refid = table1.refid
and table2.intVal <= table1.intVal
group by table2.refid, table2.intval
) a
-- table1 is joined again to retrieve SomeVal
left join table1
on a.refid = table1.refid
and a.nextGt = table1.intVal
Here is Sql Fiddle with live test.
You can solve this using the ROW_NUMBER() function:
SELECT
RefID,
intVal,
NextGt,
SomeVal,
FROM
(
SELECT
t2.RefID,
t2.intVal,
t1.intVal AS NextGt,
t1.SomeVal,
ROW_NUMBER() OVER (PARTITION BY t2.RefID, t2.intVal ORDER BY t1.intVal) AS rn
FROM
dbo.Table2 AS t2
LEFT JOIN dbo.Table1 AS t1 ON t1.RefID = t2.RefID AND t1.intVal >= t2.intVal
) s
WHERE
rn = 1
;
The derived table matches each Table2 row with all Table1 rows that have the same RefID and an intVal that is greater than or equal to Table2.intVal. Each subset of matches is ranked and the first row is returned by the main query.
The nested query uses an outer join, so that those Table2 rows that have no Table1 matches are still returned (with nulls substituted for the Table1 columns).
Alternatively you can use OUTER APPLY:
SELECT
t2.RefID,
t2.intVal,
t1.intVal AS NextGt,
t1.SomeVal
FROM
dbo.Table2 AS t2
OUTER APPLY
(
SELECT TOP (1)
t1.intVal
FROM
dbo.Table1 AS t1
WHERE
t1.RefID = t2.RefID
AND t1.intVal >= t2.intVal
ORDER BY
t1.intVal ASC
) AS t1
;
This method is arguably more straightforward: for each Table2 row, get all matches from Table1 based on the same set of conditions, sort the matches in the ascending order of Table1.intVal and take the topmost intVal.
This can be done with a join, group by, and a case statement, and a trick:
select t1.refid, t2.intval,
min(case when t1.intval > t2.intval then t1.intval end) as min_greater_than_ref,
substring(min(case when t1.intval > t2.intval
then right('00000000'+cast(t1.intval as varchar(255)), 8)+t1.SomeVal)
end)), 9, 1000)
from table1 t1 left join
table2 t2
on t1.refid = t2.refid
group by t1.refid, t2.intval
SO, the trick is to prepend the integer value to SomeValue, zero-padding the integer value (in this case to 8 characters). You get something like: "00000020val01". The minimum on this column is based on the minimum of the integer. The final step is to extract the value.
For this example, I used SQL Server syntax for the concatenation. In other databases you might use CONCAT() or ||.

SQL Server query for finding the sum of 4 consecutive values

Can somebody help me in finding the sum of 4 consecutive values i.e rolling sum of last 4 values.
Like:
VALUE SUM
1 NULL
2 NULL
3 NULL
4 10
5 14
6 18
7 22
8 26
9 30
10 34
11 38
12 42
13 46
14 50
15 54
16 58
17 62
18 66
19 70
20 74
21 78
22 82
23 86
24 90
25 94
26 98
27 102
28 106
29 110
30 114
31 118
32 122
33 126
34 130
35 134
36 138
37 142
38 146
Thanks,
select sum(select top 4 Value from [table] order by Value Desc)
or, perhaps
select sum(value)
from [Table]
where Value >= (Max(Value) - 4)
I haven't actually tried either of those- and can't at the moment, but they should get you pretty close.
Quick attempt, which gets the results you've posted in your question (except the 1st 3 rows are not NULL). Assumes that VALUE field is unique and in ascending order:
-- Create test TABLE with 38 values in
DECLARE #T TABLE (Value INTEGER)
DECLARE #Counter INTEGER
SET #Counter = 1
WHILE (#Counter <= 38)
BEGIN
INSERT #T VALUES(#Counter)
SET #Counter = #Counter + 1
END
-- This gives the results
SELECT t1.VALUE, x.Val
FROM #T t1
OUTER APPLY(SELECT SUM(VALUE) FROM (SELECT TOP 4 VALUE FROM #T t2 WHERE t2.VALUE <= t1.VALUE ORDER BY t2.VALUE DESC) x) AS x(Val)
ORDER BY VALUE
At the very least, you should see the kind of direction I was heading in.
Assuming ID can give you the last 4 rows.
SELECT SUM([SUM])
FROM
(
SELECT TOP 4 [SUM] FROM myTable ORDER BY ID DESC
) foo
Each time you query it, it will read the last 4 rows.
If this is wrong (e.g. you want the sum of each consecutive 4 rows), then please give sample output
Following would work if your Value column is sequential
;WITH q (Value) AS (
SELECT 1
UNION ALL
SELECT q.Value + 1
FROM q
WHERE q.Value < 38
)
SELECT q.Value
, CASE WHEN q.Value >= 4 THEN q.Value * 4 - 6 ELSE NULL END
FROM q
otherwise you might use something like this
;WITH q (Value) AS (
SELECT 1
UNION ALL
SELECT q.Value + 1
FROM q
WHERE q.Value < 38
)
, Sequential (ID, Value) AS (
SELECT ID = ROW_NUMBER() OVER (ORDER BY Value)
, Value
FROM q
)
SELECT s1.Value
, [SUM] = s1.Value + s2.Value + s3.Value + s4.Value
FROM Sequential s1
LEFT OUTER JOIN Sequential s2 ON s2.ID = s1.ID - 1
LEFT OUTER JOIN Sequential s3 ON s3.ID = s2.ID - 1
LEFT OUTER JOIN Sequential s4 ON s4.ID = s3.ID - 1
Note that the table qin the examples is a stub for your actual table. The actual statement then becomes
;WITH Sequential (ID, Value) AS (
SELECT ID = ROW_NUMBER() OVER (ORDER BY Value)
, Value
FROM YourTable
)
SELECT s1.Value
, [SUM] = s1.Value + s2.Value + s3.Value + s4.Value
FROM Sequential s1
LEFT OUTER JOIN Sequential s2 ON s2.ID = s1.ID - 1
LEFT OUTER JOIN Sequential s3 ON s3.ID = s2.ID - 1
LEFT OUTER JOIN Sequential s4 ON s4.ID = s3.ID - 1