Aggregate and count disctinct values with a join

Aggregate and count disctinct values with a join - sql

This question is a mix of those duplicates:
Get unique values using STRING_AGG in SQL Server
SQL/mysql - Select distinct/UNIQUE but return all columns?
I just can't manage to make it work all at once.
I have two tables :
TABLE A
IntervalId Starts Ends
-------------------------
1 0 10
2 10 25
3 25 32
4 32 40
TABLE B
Id ErrorType Starts Ends
----------------------------------------
1 666 0 25
2 666 10 32
3 777 0 32
4 666 25 40
5 777 10 25
Starting from the time intervals in table B, I'm trying to count and list, in each interval, the error types that might have happened during that interval. And remove duplicates.
Please note that there isn't any Start or End in table B that does not exist in Table A (Table A is generated from them).
the result with duplicates would be this :
Starts Ends ErrorsCount Errors
-----------------------------------------------
0 10 2 666, 777
10 25 4 666, 666, 777, 777
25 32 3 666, 777, 666
32 40 1 666
The result I'm looking for is without duplicates:
Starts Ends DistinctErrorsCnt DistinctErrors
-----------------------------------------------
0 10 2 666, 777
10 25 2 666, 777
25 32 2 666, 777
32 40 1 666
Here is my attemot, but I can't understand how to get ErrorType out of the bit that does the "distinct" without SQL server complaining that it's not in an aggregate or group by. Or, as soon as I put it into a Group by, then all the different Error Types are wiped by the first one that comes around. I end up with only 666 everywhere.
SELECT
IntervalId,
Starts,
Ends,
COUNT([TableB].ErrorType) as DistinctErrorsCnt,
DistinctErrors= STRING_AGG([TableB].ErrorType, ',')
FROM
(
SELECT DISTINCT
[TableA].IntervalId,
FROM TableB LEFT JOIN TableA ON
(
[TableA].Starts= [TableB].Starts
OR [TableA].Ends = [TableB].Ends
OR ([TableA].Starts >= [TableB].Starts AND [TableA].Ends <= [TableB].Ends)
)
GROUP BY
[TableA].IntervalId,
[TableA].Starts,
[TableA].Ends,
) NoDuplicates
GROUP BY
NoDuplicates.IntervalId,
NoDuplicates.Starts,
NoDuplicates.Starts
Again: This is not syntactically correct, for the reason I explained above.

You can use aggregation:
select
a.starts,
a.ends,
count(distinct b.errorType) DistinctErrorsCnt,
string_agg(b.errorType, ', ') within group(order by b.starts) DistinctErrors
from tablea a
inner join tableb b on b.starts >= a.ends and b.ends <= a.start
group by a.intervalId, a.start, a.end
If you want to avoid duplicates, you could use a subquery, or better yet, cross apply:
select
a.starts,
a.ends,
count(*) DistinctErrorsCnt,
string_agg(b.errorType, ', ') within group(order by b.starts) DistinctErrors
from tablea a
cross apply (
select distinct errorType from tableb b where b.starts >= a.ends and b.ends <= a.start
) b
group by a.intervalId, a.start, a.end

Related

Nested sum loop until foreign key 'dies out'

I am pulling my hair out over a data retrieval function I'm trying to write. In essence this query is meant to SUM up the count of all voorwerpnummers in the Voorwerp_in_Rubriek table, grouped by their rubrieknummer gathered from Rubriek.
After that I want to keep looping through the sum in order to get to their 'top level parent'. Rubriek has a foreign key reference to itself with a 'hoofdrubriek', this would be easier seen as it's parent in a category tree.
This also means they can be nested. A value of 'NULL' in the hoofdcategory column means that it is a top-level parent. The idea behind this query is to SUM up the count of voorwerpnummers in Voorwerp_in_rubriek, and add them together until they are at their 'top level parent'.
As the database and testdata is quite massive I've decided not to add direct code to this question but a link to a dbfiddle instead so there's more structure.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=8068a52da6a29afffe6dc793398f0998
I got it working in some degree using this query:
SELECT R2.hoofdrubriek ,
COUNT(Vr.rubrieknummer) AS aantal
FROM Rubriek R1
RIGHT OUTER JOIN Rubriek R2 ON R1.rubrieknummer = R2.hoofdrubriek
INNER JOIN Voorwerp_in_rubriek Vr ON R2.rubrieknummer = Vr.rubrieknummer
WHERE NOT EXISTS ( SELECT *
FROM Rubriek
WHERE hoofdrubriek = R2.rubrieknummer )
AND R1.hoofdrubriek IS NOT NULL
GROUP BY Vr.rubrieknummer ,
R2.hoofdrubriek
But that doesn't get back all items and flops in general. I hope someone can help me.

If I got it right
declare #t table (
rubrieknummer int,
cnt int);
INSERT #t(rubrieknummer, cnt)
SELECT R.rubrieknummer, COUNT(Vr.voorwerpnummer)
FROM Rubriek R
INNER JOIN voorwerp_in_rubriek Vr ON R.rubrieknummer = Vr.rubrieknummer
GROUP BY Vr.rubrieknummer, R.rubrieknummer;
--select * from #t;
with t as(
select rubrieknummer, cnt
from #t
union all
select r.hoofdrubriek, cnt
from t
join Rubriek r on t.rubrieknummer = r.rubrieknummer
)
select rubrieknummer, sum(cnt) cnt
from t
group by rubrieknummer;
applying to your fiddle data returns
rubrieknummer cnt
<null> 42
100 42
101 26
102 6
103 10
10000 8
10100 4
10101 1
10102 3
10500 4
10501 2
10502 2
15000 18
15100 6
15101 2
15102 2
15103 2
15500 12
15501 4
15502 3
15503 5
20000 6
20001 2
20002 1
20003 1
20004 2
25000 4
25001 1
25002 1
25003 1
25004 1
30001 2
30002 1
30004 3

Need to find the count of user who belongs to different depts

I have table with dept,user and so on, I need to find the number of count of user that belongs to different combinations of the dept.
Lets consider I've a table like this:
dept user
1 33
1 33
1 45
2 11
2 12
3 33
3 15
Then I've to find the uniq user and dept combination: something like this:
select distinct dept,user from x;
Which will give me result like :
Dept user
1 33
1 45
2 11
2 12
3 33
3 15
which actually removes the duplicates of the combination:
And here's the thing which i need to do :
My output should look like this:
dep_1_1 dep_1_2 dep_1_3 dep_2_2 dep_2_1 dep_2_3 Dep_3_1 Dep_3_2 Dep_3_3
2 0 1 2 0 0 1 0 2
So, Basically I need to find the count of common users between all the combinations of departments
Thanks for the help

You can get a row for each department combination using a self-join of your Distinct Select:
with cte as
(
select distinct dept,user from x
)
select t1.dept, t2.dept, count(*)
from cte a st1 join cte as t2
on t1.user = t2.user -- same user
and t1.dept < t2.dept -- different department
group by t1.dept, t2.dept
order by t1.dept, t2.dept

Query to return all results except for the first record

I have a archive table that has records of transactions per locationID.
A location will have 0, 1 or many rows in this table.
I need a SELECT query that will return rows for any location that has more than 1 row, and to skip the first entry.
e.g.
Transactions table
transactionId locationId amount
1 11 2343
2 11 23434
3 25 342
4 32 234
5 77 234
6 11 38938
7 43 234
8 43 1235
So given the above, since the locationID has multiple rows, I will get back all rows except for the first one (lowest transacton_id):
2 11 23434
6 11 38938
8 43 1235

You can use row_number to do this. This assumes there would be no duplicate transactionid's.
select transactionid,locationid,amount
from
(select t.*, row_number() over(partition by locationid order by transactionid) as rn
from transactions t) t
where rn > 1

The other answer is fine. You could also write it this way, it might give you a little insight into grouping practices:
SELECT Transactions.TransactionID, Transactions.locationID, Transactions.amount
FROM Transactions INNER JOIN
(SELECT locationID, MIN(TransactionID) AS MinTransaction,
COUNT(TransactionID) AS CountTransaction
FROM Transactions
GROUP BY locationID) TableSum ON Transactions.locationID = TableSum.locationID
WHERE (Transactions.TransactionID <> TableSum.MinTransaction) AND
(TableSum.CountTransaction > 1)

how to Get only the rows which's D column hold nearest lowest number to the C column?

------------------------------------------
ID Name C D
------------------------------------------
1 AK-47 10 5
2 RPG 10 20
3 Mp5 20 15
4 Sniper 20 18
5 Tank 90 80
6 Space12 90 20
7 Rifle 90 110
8 Knife 90 85
Consider 1,2 ; 3,4 ; 5,6,7,8 are as separate groups
So i need to get the row group wise that which's D column holds the nearest lower number to the C column
So the Expected Result is :
------------------------------------------
ID Name C D
------------------------------------------
1 AK-47 10 5
4 Sniper 20 18
8 Knife 90 85
How can I achieve this ?

select t1.*
from your_table t1
join
(
select c, min(abs(c-d)) as near
from your_table
group by c
) t2 on t1.c = t2.c and abs(t1.c-t1.d) = t2.near

Here is the syntax for another way of doing this. This uses a cte and will only hit the base table once.
with MySortedData as
(
select ID, Name, C, D, ROW_NUMBER() over(PARTITION BY C order by ABS(C - D)) as RowNum
from Something
)
select *
from MySortedData
where RowNum = 1

SELECT clause with SUM condition

Have this table :
//TEST
NUMBER TOTAL
----------------------------
1 158
2 355
3 455
//TEST1
NUMBER QUANTITY UNITPRICE
--------------------------------------------
1 3 5
1 3 6
1 3 4
2 4 8
3 5 4
I used following query:
SELECT t.NUMBER,sum(t.TOTAL),NVL(SUM(t2.quantity*t2.unitprice),0)
FROM test t INNER JOIN test1 t2 ON t.NUMBER=t2.NUMBER
GROUP BY t.NUMBER;
OUTPUT:
NUMBER SUM(TOTAL) SUM(t2.quantity*t2.unitprice)
-----------------------------------------------------------
1 474 45 <--- only this wrong
2 355 32
It seem like loop for three times so 158*3 in the record.
EXPECTED OUTPUT:
NUMBER SUM(TOTAL) SUM(t2.quantity*t2.unitprice)
-----------------------------------------------------------
1 158 45
2 355 32

You have to understand that the result of your join is something like this:
//TEST1
NUMBER QUANTITY UNITPRICE TOTAL
--------------------------------------------------------------
1 3 5 158
1 3 6 158
1 3 4 158
2 4 8 355
3 5 4 455
It means you don't need to apply a SUM on TOTAL
SELECT t.NUMBER,t.TOTAL,NVL(SUM(t2.quantity*t2.unitprice),0)
FROM test t INNER JOIN test1 t2 ON t.NUMBER=t2.NUMBER
GROUP BY t.NUMBER, t.TOTAL;

Something like this should work using a subquery separating the sums:
select t.num,
sum(t.total),
test1sum
from test t
join (
select num, sum(qty*unitprice) test1sum
from test1
group by num
) t2 on t.num = t2.num
group by t.num, test1sum
SQL Fiddle Demo
In regards to your sample data, you may not even need the additional group by on the test total field. If that table only contains distinct ids, then this would work the same:
select t.num,
t.total,
sum(qty*unitprice)
from test t
join test1 t2 on t.num = t2.num
group by t.num, t.total

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aggregate and count disctinct values with a join - sql

Related

Nested sum loop until foreign key 'dies out'

Need to find the count of user who belongs to different depts

Query to return all results except for the first record

how to Get only the rows which's D column hold nearest lowest number to the C column?

SELECT clause with SUM condition

Categories

Resources