Select distinct for one column - sql

I have a compound primary key where the single parts are potentially random. They aren't in any particular order and one can be unique or they can be all the same.
I do not care which row I get. This is like "Just pick one from each group".
My table:
KeyPart1 KeyPart2 KeyPart3 colA colB colD
11 21 39 d1
11 22 39 d2
12 21 39 d2
12 22 39 d3
13 21 38 d3
13 22 38 d5
Now what I want is to get for each entry in colD one row, I do not care which one.
KeyPart1 KeyPart2 KeyPart3 colA colB colD
11 21 39 d1
12 21 39 d2
12 22 39 d3
13 22 38 d5

For rows that are unique by colD, you will have to decide which other column values will be discarded. Here, within the over clause I have use partition by colD which provides the wanted uniqueness by that column, but the order by is arbitrary and you may want to change it to suit your needs.
select
d.*
from (
select
t.*
, row_number() over (partition by t.colD
order by t.KeyPart1,t.KeyPart2,t.KeyPart) as rn
from yourtable t
) d
where d.rn = 1;

The following should work in almost any version of DB2:
select t.*
from (select t.*,
row_number() over (partition by KeyPart1, KeyPart2
order by KeyPart1
) as seqnum
from t
) t
where seqnum = 1;
If you only care about column d, and the first two key parts, then you can use group by:
select KeyPart1, KeyPart2, min(colD)
from t
group by KeyPart1, KeyPart2;

Change 'order by' if necessary
with D as (
select distinct ColdD from yourtable
)
select Y.* from D
inner join lateral
(
select * from yourtable X
where X.ColdD =D.ColdD
order by X.KeyPart1, X.KeyPart2, X.KeyPart3
fetch first rows only
) Y on 1=1

Related

Aggregating rows in SQL with missing Booleans

I have the below SQL script which returns the following data from a PostgreSQL DB view table.
SELECT
"V_data".macaddr,
"V_data".sensorid,
"V_data".ts,
"V_data".velocity,
"V_data".temp,
"V_data".highspeed,
"V_data".hightemp,
"V_data".distance,
FROM
sensordb."V_data"
WHERE
"V_data".macaddr like '%abcdef'
AND
(
("V_data".sensorid = 'abc1') or ("V_data".sensorid = 'a2bc') or ("V_data".sensorid = 'ab3c')
)
AND
"V_data".ts >= 1616370867000
ORDER BY
"V_data".ts DESC;
Output
macaddr
sensorid
ts
velocity
temp
highspeed
hightemp
distance
abcdef
abc1
1616370867010
25
32
52
abcdef
a2bc
1616370867008
27
35
T
51
abcdef
ab3c
1616370867006
26
30
50
abcdef
abc1
1616370867005
24
36
T
50
abcdef
a2bc
1616370867004
27
31
50
abcdef
abc1
1616370867002
21
30
T
48
abcdef
ab3c
1616370867000
22
33
F
46
I want to aggregate the rows such that I have the latest readings per sensorid for ts, velocity, temp, distance.
For the Booleans highspeed and hightemp, I want the latest available Boolean value or an empty cell if no Boolean value was available.
Expected output
macaddr
sensorid
ts
velocity
temp
highspeed
hightemp
distance
abcdef
abc1
1616370867010
25
32
T
T
52
abcdef
a2bc
1616370867008
27
35
T
51
abcdef
ab3c
1616370867006
26
30
F
50
How could I simplify this task?
Thanks.
You can use DISTINCT ON (available only in PostgreSQL afaik) to simplify this query. You can do:
with
q as (
-- your query here
)
select
l.macaddr, l.sensorid, l.ts, l.velocity, l.temp,
s.highspeed, t.hightemp,
l.distance
from (
select distinct on (sensorid) *
from q
order by sensorid, ts desc
) l
left join (
select distinct on (sensorid) *
from q
where highspeed is not null
order by sensorid, ts desc
) s on s.sensorid = l.sensorid
left join (
select distinct on (sensorid) *
from q
where hightemp is not null
order by sensorid, ts desc
) t on t.sensorid = l.sensorid
Hmmm . . . For all but the boolean columns DISTINCT ON would work. But those booleans are tricky. You could use some tricks on booleans.
Instead, let's go for ROW_NUMBER() to get the most recent row. And fiddle with arrays to get the most recent boolean values:
SELECT d.macaddr, d.sensorid,
MAX(d.ts) as ts,
MAX(d.velocity) FILTER (WHERE seqnum = 1) as velocity,
MAX(d.temp) FILTER (WHERE seqnum = 1) as temp,
(ARRAY_REMOVE(ARRAY_AGG(d.highspeed ORDER BY ts DESC), NULL))[1] as highspeed,
(ARRAY_REMOVE(ARRAY_AGG(d.hightemp ORDER BY ts DESC), NULL))[1] as hightemp
MAX(d.distance) FILTER (WHERE seqnum = 1)
FROM (SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY d.macaddr, d.sensorid ORDER BY ts DESC) as seqnum
FROM sensordb."V_data" d
WHERE d.macaddr like '%abcdef' AND
d.sensorid IN ('abc1', 'a2bc', 'ab3c') AND
d.ts >= 1616370867000
) d
GROUP BY d.macaddr, d.sensorid
ORDER BY d.ts DESC;

Removing pairs of transactions

I am attempting to remove transactions that have been reversed from a table. the table has Account, Date, Amount and Row. If a transaction has been reversed Account will match and Amount will be inverse of each other.
Example Table
Account Date Amount Row
12 1/1/18 45 72 -- Case 1
12 1/2/18 50 73
12 1/2/18 -50 74
12 1/3/18 52 75
15 1/1/18 51 76 -- Case 2
15 1/2/18 51 77
15 1/2/18 -51 78
15 1/2/18 51 79
18 1/2/18 50 80 -- Case 3
18 1/2/18 50 81
18 1/2/18 -50 82
18 1/2/18 -50 83
18 1/3/18 50 84
18 1/3/18 50 85
20 1/1/18 57 88 -- Case 4
20 1/2/18 57 89
20 1/4/18 -57 90
20 1/5/18 57 91
Desired Results Table
Account Date Amount Row
12 1/1/18 45 72 -- Case 1
12 1/3/18 52 75
15 1/1/18 51 76 -- Case 2
15 1/2/18 51 79
18 1/3/18 50 84 -- Case 3
18 1/3/18 50 85
20 1/1/18 57 88 -- Case 4
20 1/5/18 57 91
Removing all instances of inverse transactions does not work when there are multiple transactions when all other columns are the same. My attempt was to count all duplicate transactions, count of all inverse duplicate transactions, subtracting those to get the number of rows I needed from each transactions group. I was going to pull the first X rows but found in most cases I want the last X rows of each group, or even a mix (the first and last in Case 2).
I either need a method of removing pairs from the original table, or working from what I have so far, a method of distinguishing which transactions to pull.
Code so far:
--adding row Numbers
with a as (
select
account a,
date d,
amount f,
row_number() over(order by account, date) r
from table),
--counting Duplicates
b as (
select a.a, a.f, Dups
from a join (
select a, f, count(*) Dups
from a
group by a.a, a.f
having count(*)>1
) b
on a.a=b.a and
b.f=a.f
where a.f>0
),
--counting inverse duplicates
c as (
select a.a, a.f, InvDups
from a join (
select a, f, count(*) InvDups
from a
group by a.a, a.f
having count(*)>1
) b
on a.a=b.a and
-b.f=a.f
where a.f>0
),
--combining c and d to get desired number of rows of each transaction group
d as (
select
b.a, b.f, dups, InvDups, Dups-InvDups TotalDups
from b join c
on b.a=c.a and
b.f=c.f
),
--getting the number of rows from the beginning of each transaction group
select d.a, d.d, d.f
from
(select
a, d, f, row_number() over (group by a, d, f) r2
from a) e
join d
on e.a=d.a and
TotalDups<=r2
You can try this.
SELECT T_P.* FROM
( SELECT *, ROW_NUMBER() OVER(PARTITION BY Account, Amount ORDER BY [Row] ) RN from #MyTable WHere Amount > 0 ) T_P
LEFT JOIN
( SELECT *, ROW_NUMBER() OVER(PARTITION BY Account, Amount ORDER BY [Row] ) RN from #MyTable WHere Amount < 0 ) T_N
ON T_P.Account = T_N.Account
AND T_P.Amount = ABS(T_N.Amount)
AND T_P.RN = T_N.RN
WHERE
T_N.Account IS NULL
The following handles your three cases:
with t as (
select t.*,
row_number() over (partition by account, date, amount order by row) as seqnum
from table t
)
select t.*
from t
where not exists (select 1
from t t2
where t2.account = t.account and t2.date = t.date and
t2.amount = -t.amount and t2.seqnum = t.seqnum
);
Use This
;WITH CTE
AS
(
SELECT
[Row]
FROM YourTable YT
WHERE Amount > 0
AND EXISTS
(
SELECT 1 FROM YourTable WHERE Account = YT.Account
AND [Date] = YT.[Date]
AND (Amount+YT.Amount)=0
)
UNION ALL
SELECT
[Row]
FROM YourTable YT
WHERE Amount < 0
AND EXISTS
(
SELECT 1 FROM YourTable WHERE Account = YT.Account
AND [Date] = YT.[Date]
AND (Amount+YT.Amount)>0
)
)
SELECT * FROM YourTable
WHERE EXISTS
(
SELECT 1 FROM CTE WHERE [Row] = YourTable.[Row]
)

Different values of column 1 based on column 2 SQL Server

I have a table with columns ID and Val. For each value of ID we can have either same or different values of Val.
ID Val
1 A
1 NULL
2 00
2 00
2 00
2 00
3 00
3 A
4 A
5 00
5 00
5 A
6 A
6 A
6 NULL
6 00
From above table, I am looking for IDs which has different values in Val column. If for any given ID all values of Val column are same then it should not come in result.
So result would be something like.
D Val
1 A
1 NULL
3 00
3 A
5 00
5 00
5 A
6 A
6 A
6 NULL
6 00
Id 2 should not come in result because for Id 2, Val column has same data.
Similarly ID 4 will not come in result as ID 4 has only one row.
For each ID if we have more than one value in Val column then is it should show in result.
Thanks for the Help!
For the ids that meet the condition of having different values:
select id
from t
group by id
having min(id) <> max(id);
You can then incorporate this into a query as:
select t.*
from t join
(select id
from t
group by id
having min(id) <> max(id)
) tt
on t.id = tt.id;
Or, you can use window functions:
select t.id, t.val
from (select t.*,
min(val) over (partition by id) as minval,
max(val) over (partition by id) as maxval
from t
) t
where minval <> maxval;
Try this:
SELECT ID, Val
FROM mytable
WHERE ID IN (SELECT ID
FROM mytable
GROUP BY ID
HAVING COUNT(DISTINCT CASE
WHEN Val IS NULL THEN ''
ELSE Val
END) > 1
I've made the assumption that Val field is of type VARCHAR and that it can be either NULL or <> ''.
Build your query at three steps:
Select distinct values id, val (to ensure you will get the null safe count)
Count distinct values for each id
Show results from source table
Try to use inner select instead of subselect to speed up the query.
The solution is written in the query below:
SELECT
t.*
FROM
-- select only ids with distinct count > 1
(
SELECT
id
FROM
-- select distinct values to ensure your count of null values is real
(
SELECT DISTINCT
id, val
FROM
t
) AS td
GROUP BY
id
HAVING
COUNT(*) > 1
) AS tc
-- join the source table
INNER JOIN
t
ON
t.id = tc.id

SQL Server query for finding the sum of 4 consecutive values

Can somebody help me in finding the sum of 4 consecutive values i.e rolling sum of last 4 values.
Like:
VALUE SUM
1 NULL
2 NULL
3 NULL
4 10
5 14
6 18
7 22
8 26
9 30
10 34
11 38
12 42
13 46
14 50
15 54
16 58
17 62
18 66
19 70
20 74
21 78
22 82
23 86
24 90
25 94
26 98
27 102
28 106
29 110
30 114
31 118
32 122
33 126
34 130
35 134
36 138
37 142
38 146
Thanks,
select sum(select top 4 Value from [table] order by Value Desc)
or, perhaps
select sum(value)
from [Table]
where Value >= (Max(Value) - 4)
I haven't actually tried either of those- and can't at the moment, but they should get you pretty close.
Quick attempt, which gets the results you've posted in your question (except the 1st 3 rows are not NULL). Assumes that VALUE field is unique and in ascending order:
-- Create test TABLE with 38 values in
DECLARE #T TABLE (Value INTEGER)
DECLARE #Counter INTEGER
SET #Counter = 1
WHILE (#Counter <= 38)
BEGIN
INSERT #T VALUES(#Counter)
SET #Counter = #Counter + 1
END
-- This gives the results
SELECT t1.VALUE, x.Val
FROM #T t1
OUTER APPLY(SELECT SUM(VALUE) FROM (SELECT TOP 4 VALUE FROM #T t2 WHERE t2.VALUE <= t1.VALUE ORDER BY t2.VALUE DESC) x) AS x(Val)
ORDER BY VALUE
At the very least, you should see the kind of direction I was heading in.
Assuming ID can give you the last 4 rows.
SELECT SUM([SUM])
FROM
(
SELECT TOP 4 [SUM] FROM myTable ORDER BY ID DESC
) foo
Each time you query it, it will read the last 4 rows.
If this is wrong (e.g. you want the sum of each consecutive 4 rows), then please give sample output
Following would work if your Value column is sequential
;WITH q (Value) AS (
SELECT 1
UNION ALL
SELECT q.Value + 1
FROM q
WHERE q.Value < 38
)
SELECT q.Value
, CASE WHEN q.Value >= 4 THEN q.Value * 4 - 6 ELSE NULL END
FROM q
otherwise you might use something like this
;WITH q (Value) AS (
SELECT 1
UNION ALL
SELECT q.Value + 1
FROM q
WHERE q.Value < 38
)
, Sequential (ID, Value) AS (
SELECT ID = ROW_NUMBER() OVER (ORDER BY Value)
, Value
FROM q
)
SELECT s1.Value
, [SUM] = s1.Value + s2.Value + s3.Value + s4.Value
FROM Sequential s1
LEFT OUTER JOIN Sequential s2 ON s2.ID = s1.ID - 1
LEFT OUTER JOIN Sequential s3 ON s3.ID = s2.ID - 1
LEFT OUTER JOIN Sequential s4 ON s4.ID = s3.ID - 1
Note that the table qin the examples is a stub for your actual table. The actual statement then becomes
;WITH Sequential (ID, Value) AS (
SELECT ID = ROW_NUMBER() OVER (ORDER BY Value)
, Value
FROM YourTable
)
SELECT s1.Value
, [SUM] = s1.Value + s2.Value + s3.Value + s4.Value
FROM Sequential s1
LEFT OUTER JOIN Sequential s2 ON s2.ID = s1.ID - 1
LEFT OUTER JOIN Sequential s3 ON s3.ID = s2.ID - 1
LEFT OUTER JOIN Sequential s4 ON s4.ID = s3.ID - 1

Finding top n for each unique row

I'm trying to get the top N records for each unique row of data in a table (I'm grouping on columns b,c and d, column a is the unique identifier and column e is the score of which i want the top 1 in this case).
a b c d e
2 38 NULL NULL 141
1 38 NULL NULL 10
1 38 1 NULL 10
2 38 1 NULL 1
1 38 1 8 10
2 38 1 8 1
2 38 16 NULL 140
2 38 16 12 140
e.g. from this data i would like to find the following rows:
a b c d e
2 38 NULL NULL 141
1 38 1 NULL 10
1 38 1 8 10
2 38 16 NULL 140
2 38 16 12 140
can someone please point me in the right direction to solve this?
Your example doesn't show, and you don't explain how you determine which row is the "top" one, so I've put ?????? in the query where you need to provide a ranking column, such as
a desc
for example. In any case, this is exactly what the analytic functions in SQL Server 2005 and later are for.
declare #howmany int = 3;
with TRanked (a,b,c,d,e,rk) as (
select
a,b,c,d,e,
rank() over (
partition by b,c,d
order by ???????
)
from T
)
select a,b,c,d,e
from TRanked
where rk <= #howmany;
The nulls are a pain, but something like this:
select * from table1 t1
where a in (
select top 1 a from table1 t2
where (t1.b = t2.b or (t1.b is null and t2.b is null))
and (t1.c = t2.c or (t1.c is null and t2.c is null))
and (t1.d = t2.d or (t1.d is null and t2.d is null))
order by e desc
)
or better yet:
select * from (
select *, seqno = row_number() over (partition by b, c, d order by e desc)
from table1
) a
where seqno = 1
I believe this will do what you said (extending the idea from here):
select b,c,d,e,
rank() over
(partition by b,c,d order by e desc) "rank"
from t1 where rank < 5
Cheers.