Summing the distinct elements of query result

Summing the distinct elements of query result - sql

I have following three tables representing a tree structure. Every row in #A is ancestor of zero or more rows in #B. Similarly every row in #B is ancestor of zero or more rows in #C. Table #B contains a column value. I need to find sum of value for all rows in #B whose belong to an input ancestor.
For example, consider following content of tables:
CREATE TABLE #A (id varchar(10));
CREATE TABLE #B (id varchar(10), value int);
CREATE TABLE #C (id varchar(10), a_id varchar(10), b_id varchar(10));
INSERT INTO #A(id) VALUES ('A1'), ('A2');
INSERT INTO #B(id, value) VALUES('B1', 41), ('B2', 43), ('B3', 47);
INSERT INTO #C(id, a_id, b_id) VALUES('C1', 'A1', 'B1'), ('C2', 'A1', 'B1'),
('C3', 'A1', 'B2'), ('C4', 'A2', 'B3');
The above content represents following structure:
A1
|--- B1 (41)
| |-------- C1
| |-------- C2
|
|--- B2 (43)
|-------- C3
A2
|--- B3 (47)
|-------- C4
The parent-child relationship is weirdly defined. Table #B does not have its own column that says which row in table #A is its ancestor. All the mappings should be evaluated from table #C. Columns a_id and b_id in table #C designate grandparent and parent rows in table #A and #B respectively. If there is a row Z in #C where a_id is X and b_id is Y, then X is the ancestor of Y and Y is ancestor of Z. There will not be conflicting mappings in #C.
Problem Statement: For given id A1, find the sum of column value for all rows in #B whose parent is A1. Here there are two children of A1, B1 with value 41 and B2 with value 43 so we expect answer to be 84.
If I do something like below:
SELECT SUM(#B.value) FROM #B
INNER JOIN #C ON #B.id = #C.b_id
INNER JOIN #A ON #C.a_id = #A.id
WHERE #A.id = 'A1'
I get 125 i.e. 41 + 41 + 43 instead of 84, since two rows in #A have mapping B1 -> C1. I can write below query to get values associated with distinct rows in #B i.e. 41 and 43 but now I do not know how to sum the resultant values. Can I get the expected result without creating a temporary table?
SELECT MAX(#B.value) FROM #B
INNER JOIN #C ON #B.id = #C.b_id
INNER JOIN #A ON #C.a_id = #A.id
WHERE #A.id = 'A1'
GROUP BY #B.id;
I am not a SQL expert, so probably there might be a very simple solution to this.

You don't need table #A here, because the IDs are in table #C and the values in table #B. That is all you need. No need to join either. Simply select the IDs needed from #C, then use them to select from #B.
select sum(value)
from #B
where id in
(
select b_id
from #C
where a_id = 'A1'
);

You could do this:
SELECT SUM(#B.value)
FROM #B
WHERE EXISTS
(
SELECT NULL FROM #C
INNER JOIN #A ON #C.a_id = #A.id
WHERE #B.id = #C.b_id
AND #A.id = 'A1'
)
Then you will only sum up the #B values where they exists in the other tables
The result will be: 84

A slightly different approach.
SELECT SUM(#B.value) FROM #B
INNER JOIN (
SELECT DISTINCT a_id, b_id FROM #C
) temp
ON
temp.b_id=#B.ID
WHERE temp.a_id='A1';
This has the advantage that you can change the WHERE temp.a_id='A1' to GROUP BY temp.a

Related

set value using a conditional of a subquery

Sorry if I am not explaining my issue the best, but basically I have two tables.
Table A has a reference column to table B. On table B there is column X where for each referenced row, there is an unreferenced row with that same value of column X (table B has double the rows of table A). I want to update the reference on table A to be the row of table B that is not currently referenced of the two rows that have the same value on column X.
In pseudo code...
update tableA
set refCol = (select tableB.refCol
from tableB
where colX = (select colX
from tableB
where tableB.refCol = tableA.refCol)
and tableB.refCol != tableA.refCol)
The innermost query returns two rows, the outer query returns one row
sample tables:
Table A
refCol
1
3
Table B
refCol
colX
1
hello
2
hello
3
hi
4
hi
expected output:
Table A
refCol
2
4
Any help would be much appreciated.

Refer it below working example
create table #tableA(
id int)
create table #tableB(
id int,
name varchar(10)
)
insert into #tableA values(1)
insert into #tableA values(3)
insert into #tableA values(5)
insert into #tableA values(6)
insert into #tableA values(7)
insert into #tableA values(8)
insert into #tableB values (1,'A')
insert into #tableB values (2,'A')
insert into #tableB values (3,'C')
insert into #tableB values (4,'C')
select * from #tableA
select * from #tableB
update aa set aa.id=ab.id from #tableA aa inner join (
select b.id,b.name,a.id as ta from (
select B.* from #tableB b left join #tableA a on a.id=b.id where a.id is null)b
inner join (
select b.* from #tableA a inner join #tableB b on a.id=b.id)a on a.name=b.name)ab on aa.id=ab.ta

T-SQL output inserted clause - access data not in the inserted/deleted tables

I want to collect a value from the source table of a SELECT statement used in an INSERT statement, that is NOT inserted into the target table
I am using Microsoft SQL Server 2017
I think the following code explains what I'm trying to do: Just cut and paste into SSMS to reproduce the error
DECLARE #CrossRef TABLE (
MyTable_ID INT,
C_C VARCHAR(10)
);
DECLARE #MyData TABLE (
A VARCHAR(10),
B VARCHAR(10),
C VARCHAR(10) );
INSERT INTO #MyData (A, B, C)
VALUES ('A1', 'B1', 'C1'), ('A2', 'B2', 'C2'),('A3', 'B3', 'C3');
DECLARE #MyTable TABLE (
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
A VARCHAR(10),
B VARCHAR(10) );
INSERT INTO #MyTable (A, B)
OUTPUT INSERTED.Id, MyData.C
INTO #CrossRef (MyTable_ID, C_C)
SELECT A, B
FROM #MyData AS MyData
-- Error: The multi-part identifier "MyData.C" could not be bound.
-- DESIRED OUTPUT
SELECT * FROM #MyTable
/*
ID A B
----------
1 A1 B1
2 A2 B2
3 A3 B3
*/
SELECT * FROM #CrossRef
/*
MyTable_ID C_C
---------------
1 C1
2 C2
3 C3
*/
The OUTPUT clause cannot access anything not in the INSERTED or DELETED internal tables - which is the cause of the error.
However this example Microsoft T-SQL OUTPUT CLAUSE (albeit about DELETED) seems to suggest you can access other tables.
Note - The example has been highly simplified to make the issue as clear as possible
It may seem trivial to get the desired output by other means, but like anything in production the real situation is much more complex

Using the MERGE statement - as Suggested by Tab Alleman here is the solution:
DECLARE #CrossRef TABLE (
MyTable_ID INT,
C_C VARCHAR(10)
);
DECLARE #MyData TABLE (
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
A VARCHAR(10),
B VARCHAR(10),
C VARCHAR(10) );
INSERT INTO #MyData (A, B, C)
VALUES ('A1', 'B1', 'C1'), ('A2', 'B2', 'C2'),('A3', 'B3', 'C3');
DECLARE #MyTable TABLE (
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
A VARCHAR(10),
B VARCHAR(10) );
-- MERGE statement does UPDATE where join condition exists and INSERT where it does not
MERGE #MyTable
USING (SELECT A, B, C FROM #MyData) AS [Source]
ON (1=0) -- join never true so everything inserted, nothing updated
WHEN NOT MATCHED THEN
INSERT (A, B)
VALUES ([Source].A, [Source].B)
OUTPUT INSERTED.Id, [Source].C
INTO #CrossRef (MyTable_ID, C_C);
SELECT * FROM #MyData
SELECT * FROM #MyTable
SELECT * FROM #CrossRef

Left join with nearest value without duplicates

I want to achieve in MS SQL something like below, using 2 tables and through join instead of iteration.
From table A, I want each row to identify from table B which in the list is their nearest value, and when value has been selected, that value cannot re-used. Please help if you've done something like this before. Thank you in advance! #SOreadyToAsk

Below is a set-based solution using CTEs and windowing functions.
The ranked_matches CTE assigns a closest match rank for each row in TableA along with a closest match rank for each row in TableB, using the index value as a tie breaker.
The best_matches CTE returns rows from ranked_matches that have the best rank (rank value 1) for both rankings.
Finally, the outer query uses a LEFT JOIN from TableA to the to the best_matches CTE to include the TableA rows that were not assigned a best match due to the closes match being already assigned.
Note that this does not return a match for the index 3 TableA row indicated in your sample results. The closes match for this row is TableB index 3, a difference of 83. However, that TableB row is a closer match to the TableA index 2 row, a difference of 14 so it was already assigned. Please clarify you question if this isn't what you want. I think this technique can be tweaked accordingly.
CREATE TABLE dbo.TableA(
[index] int NOT NULL
CONSTRAINT PK_TableA PRIMARY KEY
, value int
);
CREATE TABLE dbo.TableB(
[index] int NOT NULL
CONSTRAINT PK_TableB PRIMARY KEY
, value int
);
INSERT INTO dbo.TableA
( [index], value )
VALUES ( 1, 123 ),
( 2, 245 ),
( 3, 342 ),
( 4, 456 ),
( 5, 608 );
INSERT INTO dbo.TableB
( [index], value )
VALUES ( 1, 152 ),
( 2, 159 ),
( 3, 259 );
WITH
ranked_matches AS (
SELECT
a.[index] AS a_index
, a.value AS a_value
, b.[index] b_index
, b.value AS b_value
, RANK() OVER(PARTITION BY a.[index] ORDER BY ABS(a.Value - b.value), b.[index]) AS a_match_rank
, RANK() OVER(PARTITION BY b.[index] ORDER BY ABS(a.Value - b.value), a.[index]) AS b_match_rank
FROM dbo.TableA AS a
CROSS JOIN dbo.TableB AS b
)
, best_matches AS (
SELECT
a_index
, a_value
, b_index
, b_value
FROM ranked_matches
WHERE
a_match_rank = 1
AND b_match_rank= 1
)
SELECT
TableA.[index] AS a_index
, TableA.value AS a_value
, best_matches.b_index
, best_matches.b_value
FROM dbo.TableA
LEFT JOIN best_matches ON
best_matches.a_index = TableA.[index]
ORDER BY
TableA.[index];
EDIT:
Although this method uses CTEs, recursion is not used and is therefore not limited to 32K recursions. There may be room for improvement here from a performance perspective, though.

I don't think it is possible without a cursor.
Even if it is possible to do it without a cursor, it would definitely require self-joins, maybe more than once. As a result performance is likely to be poor, likely worse than straight-forward cursor. And it is likely that it would be hard to understand the logic and later maintain this code. Sometimes cursors are useful.
The main difficulty is this part of the question:
when value has been selected, that value cannot re-used.
There was a similar question just few days ago.
The logic is straight-forward. Cursor loops through all rows of table A and with each iteration adds one row to the temporary destination table. To determine the value to add I use EXCEPT operator that takes all values from the table B and removes from them all values that have been used before. My solution assumes that there are no duplicates in value in table B. EXCEPT operator removes duplicates. If values in table B are not unique, then temporary table would hold unique indexB instead of valueB, but main logic remains the same.
Here is SQL Fiddle.
Sample data
DECLARE #TA TABLE (idx int, value int);
INSERT INTO #TA (idx, value) VALUES
(1, 123),
(2, 245),
(3, 342),
(4, 456),
(5, 608);
DECLARE #TB TABLE (idx int, value int);
INSERT INTO #TB (idx, value) VALUES
(1, 152),
(2, 159),
(3, 259);
Main query inserts result into temporary table #TDst. It is possible to write that INSERT without using explicit variable #CurrValueB, but it looks a bit cleaner with variable.
DECLARE #TDst TABLE (idx int, valueA int, valueB int);
DECLARE #CurrIdx int;
DECLARE #CurrValueA int;
DECLARE #CurrValueB int;
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT idx, value
FROM #TA
ORDER BY idx;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
SET #CurrValueB =
(
SELECT TOP(1) Diff.valueB
FROM
(
SELECT B.value AS valueB
FROM #TB AS B
EXCEPT -- remove values that have been selected before
SELECT Dst.valueB
FROM #TDst AS Dst
) AS Diff
ORDER BY ABS(Diff.valueB - #CurrValueA)
);
INSERT INTO #TDst (idx, valueA, valueB)
VALUES (#CurrIdx, #CurrValueA, #CurrValueB);
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT * FROM #TDst ORDER BY idx;
Result
idx valueA valueB
1 123 152
2 245 259
3 342 159
4 456 NULL
5 608 NULL
It would help to have the following indexes:
TableA - (idx) include (value), because we SELECT idx, value ORDER BY idx;
TableB - (value) unique, Temp destination table - (valueB) unique filtered NOT NULL, to help EXCEPT. So, it may be better to have a temporary #table for result (or permanent table) instead of table variable, because table variables can't have indexes.
Another possible method would be to delete a row from table B (from original or from a copy) as its value is inserted into result. In this method we can avoid performing EXCEPT again and again and it could be faster overall, especially if it is OK to leave table B empty in the end. Still, I don't see how to avoid cursor and processing individual rows in sequence.
SQL Fiddle
DECLARE #TDst TABLE (idx int, valueA int, valueB int);
DECLARE #CurrIdx int;
DECLARE #CurrValueA int;
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT idx, value
FROM #TA
ORDER BY idx;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
WITH
CTE
AS
(
SELECT TOP(1) B.idx, B.value
FROM #TB AS B
ORDER BY ABS(B.value - #CurrValueA)
)
DELETE FROM CTE
OUTPUT #CurrIdx, #CurrValueA, deleted.value INTO #TDst;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT
A.idx
,A.value AS valueA
,Dst.valueB
FROM
#TA AS A
LEFT JOIN #TDst AS Dst ON Dst.idx = A.idx
ORDER BY idx;

I highly believe THIS IS NOT A GOOD PRACTICE because I am bypassing the policy SQL made for itself that functions with side-effects (INSERT,UPDATE,DELETE) is a NO, but due to the fact that I want solve this without resulting to iteration options, I came up with this and gave me better view of things now.
create table tablea
(
num INT,
val MONEY
)
create table tableb
(
num INT,
val MONEY
)
I created a hard-table temp which I shall drop from time-to-time.
if((select 1 from sys.tables where name = 'temp_tableb') is not null) begin drop table temp_tableb end
select * into temp_tableb from tableb
I created a function that executes xp_cmdshell (this is where the side-effect bypassing happens)
CREATE FUNCTION [dbo].[GetNearestMatch]
(
#ParamValue MONEY
)
RETURNS MONEY
AS
BEGIN
DECLARE #ReturnNum MONEY
, #ID INT
SELECT TOP 1
#ID = num
, #ReturnNum = val
FROM temp_tableb ORDER BY ABS(val - #ParamValue)
DECLARE #SQL varchar(500)
SELECT #SQL = 'osql -S' + ##servername + ' -E -q "delete from test..temp_tableb where num = ' + CONVERT(NVARCHAR(150),#ID) + ' "'
EXEC master..xp_cmdshell #SQL
RETURN #ReturnNum
END
and my usage in my query simply looks like this.
-- initialize temp
if((select 1 from sys.tables where name = 'temp_tableb') is not null) begin drop table temp_tableb end
select * into temp_tableb from tableb
-- query nearest match
select
*
, dbo.GetNearestMatch(a.val) AS [NearestValue]
from tablea a
and gave me this..

Find Groups that don't contain all records

I feel like I should be able to get this and I'm just having a brain fart. I've simplified the problem to the following example:
DECLARE #A TABLE (ID int);
DECLARE #B TABLE (GroupID char(1), ID int);
INSERT #A VALUES (1);
INSERT #A VALUES (2);
INSERT #A VALUES (3);
INSERT #B VALUES ('X', 1);
INSERT #B VALUES ('X', 2);
INSERT #B VALUES ('X', 3);
INSERT #B VALUES ('Y', 1);
INSERT #B VALUES ('Y', 2);
INSERT #B VALUES ('Z', 1);
INSERT #B VALUES ('Z', 2);
INSERT #B VALUES ('Z', 3);
INSERT #B VALUES ('Z', 4);
So table A contains a set of some records. Table B contains multiple copies of the set contained in A with Group IDs. But some of those groups may be missing one or more records of the set. I want to find the groups that are missing records. So in the above example, my results should be:
GroupID
-------
Y
But for some reason I just can't wrap my head around this, today. Any help would be appreciated.

Awesome use-case for relational division! (Here's a must-read blog post about it)
SELECT DISTINCT b1.GroupID
FROM #B b1
WHERE EXISTS (
SELECT 1
FROM #A a
WHERE NOT EXISTS (
SELECT 1
FROM #B b2
WHERE b1.GroupID = b2.GroupID
AND b2.ID = a.ID
)
);
How to read this?
I want all distinct GroupIDs in #B for which there is a record in #A for which there isn't a record in #B with the same #A.ID
In fact, this is the "remainder" of the relational division.

try this
SELECT GroupID ,COUNT(GroupID )
FROM #a INNER JOIN #b
ON #a.id=#b.id
GROUP BY GroupID
HAVING COUNT(GroupID )<(SELECT count(*) FROM #a)

This will give you all the combinations that are missing.
select FullList.*
from (select distinct a.ID,
b.GroupId
from #A a
cross join #B b) FullList
left join #B b
on FullList.ID = b.ID
and FullList.GroupID = b.GroupID
where b.ID is null
The answer to your question would just be the same but with the first line:
select distinct FullList.GroupID

This will give you all the combinations that are missing.
select FullList.*
from (select distinct a.ID,
b.GroupId
from #A a
cross join (select distinct db.GroupId from #B db) b
) as FullList
left join #B b
on FullList.ID = b.ID
and FullList.GroupID = b.GroupID
where b.ID is null
The answer to your question would just be the same but with the first line:
select distinct FullList.GroupID

SQL-Query to get the first level child details from a single table based on certain conditions

A table in my database holds data as below,
TBLlocations
-------------------------------------------------------
LocationId LocationName RegisteredUnder Type
--------------------------------------------------------
LOC100 Location1 0 0
LOC201 Location2 LOC100 2
LOC102 Location3 LOC201 1
LOC302 Location4 LOC201 1
LOC103 Location5 LOC201 1
LOC104 Location6 LOC201 1
LOC105 Location7 LOC104 1
LOC106 Location8 LOC105 1
LOC107 Location9 LOC106 1
Now i have to select locations from the above table such that my query would return the first level locations i.e; considering the above table my query has to return the locations
which have their type as '1' and should be the first level child locations with type as '1'. From the above table Locations 3 to 6 are the first level locations, so the query should return the following:
---------------
Location3
Location4
Location5
Location6
I tried to join the same table providing a condition for the 'Type'.
This is the query I built:
Select Distinct t1.LocationId,t1.LocationName,t1.RegisteredUnder from TBLlocations t1
join TBLlocations t2 on t2.RegisteredUnder!=t1.LocationId
where t1.Type='1' and t2.Type='1'
order by t1.RegisteredUnder
The above query returned all the locations under type '1' as shown below:
--------------------------------------------------
LocationId LocationName RegisteredUnder
--------------------------------------------------
LOC102 Location3 LOC201
LOC302 Location4 LOC201
LOC103 Location5 LOC201
LOC104 Location6 LOC201
LOC105 Location7 LOC104
LOC106 Location8 LOC105
LOC107 Location9 LOC106
Hence, I need a query that would return the exact result. The only parameter i can use in the query is the 'Type' and it is '1' always.
PS: I am using SQL Server 2008.

after question changed
Declare #a table (LocationId Varchar(100), LocationName Varchar(100), RegisteredUnder Varchar(100), Type int)
Insert into #a Values('LOC100','Location1','0',0)
Insert into #a Values('LOC201','Location2','LOC100',2)
Insert into #a Values('LOC102','Location3','LOC201',1)
Insert into #a Values('LOC302','Location4','LOC201',1)
Insert into #a Values('LOC103','Location5','LOC201',1)
Insert into #a Values('LOC104','Location6','LOC201',1)
Insert into #a Values('LOC105','Location7','LOC104',1)
Insert into #a Values('LOC106','Location8','LOC105',1)
Insert into #a Values('LOC107','Location9','LOC106',1)
;With CTE as
(
Select 0 as level,* from #a where Type=1
UNION ALL
Select c.Level+1, a.* from #a a
join CTE c on c.LocationId=a.RegisteredUnder and a.Type=1
)
Select c1.* from CTE c1
Left Join CTE c2 on c2.LocationId=c1.LocationId and c2.level>0
where c2.LocationId is NULL
order by LEVEL desc,LocationName
The answer before question changed
Declare #a table (LocationId Varchar(100), LocationName Varchar(100), RegisteredUnder Varchar(100), Type int)
Insert into #a Values('LOC100','Location1','0',0)
Insert into #a Values('LOC201','Location2','LOC100',2)
Insert into #a Values('LOC102','Location3','LOC201',1)
Insert into #a Values('LOC302','Location4','LOC201',1)
Insert into #a Values('LOC103','Location5','LOC201',1)
Insert into #a Values('LOC104','Location6','LOC201',1)
Insert into #a Values('LOC105','Location7','LOC104',1)
Insert into #a Values('LOC106','Location8','LOC105',1)
Insert into #a Values('LOC107','Location9','LOC106',1)
;With CTE as
(
Select 0 as level,* from #a where RegisteredUnder='LOC201'
UNION ALL
Select c.Level+1, a.* from #a a
join CTE c on c.RegisteredUnder=a.LocationId
)
Select DISTINCT * from CTE
where level<2
order by LEVEL desc, LocationName

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Summing the distinct elements of query result - sql

You don't need table #A here, because the IDs are in table #C and the values in table #B. That is all you need. No need to join either. Simply select the IDs needed from #C, then use them to select from #B. select sum(value) from #B where id in ( select b_id from #C where a_id = 'A1' );

You could do this: SELECT SUM(#B.value) FROM #B WHERE EXISTS ( SELECT NULL FROM #C INNER JOIN #A ON #C.a_id = #A.id WHERE #B.id = #C.b_id AND #A.id = 'A1' ) Then you will only sum up the #B values where they exists in the other tables The result will be: 84

A slightly different approach. SELECT SUM(#B.value) FROM #B INNER JOIN ( SELECT DISTINCT a_id, b_id FROM #C ) temp ON temp.b_id=#B.ID WHERE temp.a_id='A1'; This has the advantage that you can change the WHERE temp.a_id='A1' to GROUP BY temp.a

Related

set value using a conditional of a subquery

T-SQL output inserted clause - access data not in the inserted/deleted tables

Left join with nearest value without duplicates

Find Groups that don't contain all records

SQL-Query to get the first level child details from a single table based on certain conditions

Categories

Resources