SQL Query to Find Most Effective Data

SQL Query to Find Most Effective Data - sql

I want to write query for find most effective rows. I have these tables:
Sellers
Id Name
1 Mark
2 Julia
3 Peter
Stocks
Id SellerId ProductCode StockCount
1 1 30A 10
2 2 20A 4
3 1 20A 2
4 3 42B 3
And there sqlfiddle http://sqlfiddle.com/#!6/fe5b1/1/0
My Intent find optimum Seller for stock.
If client want 30A, 20A and 42B products. I need return to "Mark" and "Peter" because Mark have both product(30A and 20A), so not need there Julia.
How can i solve this in sql ?

Got it to work with the help of temporary tables
SELECT
s.SellerId,
ProductList = STUFF((
SELECT ',' + ProductCode FROM Stocks
WHERE s.SellerId = Stocks.SellerId
ORDER BY ProductCode FOR XML PATH('')
)
, 1, 1, ''), COUNT(*) AS numberOfProducts
INTO #tmptable
FROM
Stocks s
WHERE
s.ProductCode IN ('30A','20A','42B')
AND s.StockData > 0
GROUP BY s.SellerId;
/*this second temp table is necessary, so we can delete from one of them*/
SELECT * INTO #tmptable2 FROM #tmptable;
DELETE t1 FROM #tmptable t1
WHERE EXISTS (SELECT 1 FROM #tmptable2 t2
WHERE t1.SellerId != t2.SellerId
AND t2.ProductList LIKE '%' + t1.ProductList + '%'
AND t2.numberOfProducts > t1.numberOfProducts)
;
SELECT Name FROM #tmptable t INNER JOIN Sellers ON t.SellerId = Sellers.Id;
UPDATE:
Please have a try with static tables:
CREATE TABLE tmptable (SellerId int, ProductList nvarchar(max), numberOfProducts int);
same for tmpTable2. Then change above code to
INSERT INTO tmpTable
SELECT
s.SellerId,
ProductList = STUFF((
SELECT ',' + ProductCode FROM Stocks
WHERE s.SellerId = Stocks.SellerId
ORDER BY ProductCode FOR XML PATH('')
)
, 1, 1, ''), COUNT(*) AS numberOfProducts
FROM
Stocks s
WHERE
s.ProductCode IN ('30A','20A','42B')
AND s.StockData > 0
GROUP BY s.SellerId;
INSERT INTO tmpTable2 SELECT * FROM tmpTable;
DELETE t1 FROM tmptable t1
WHERE EXISTS (SELECT 1 FROM tmptable2 t2
WHERE t1.SellerId != t2.SellerId
AND t2.ProductList LIKE '%' + t1.ProductList + '%'
AND t2.numberOfProducts > t1.numberOfProducts)
;
SELECT * FROM tmpTable;
DROP TABLE tmpTable, tmpTable2;

I think this might be what you are looking for?
Select name,sum(stockdata) as stockdata from sellers s1 join Stocks s2 on s1.id=s2.sellerid
where ProductCode in ('30A','20A','42B')
group by name
order by sum(stockdata) desc
I hope it helps.
if you only want the top 2 ppl. You write
Select top 2 name,sum(stockdata) as stockdata from sellers s1 join Stocks s2 on s1.id=s2.sellerid
where ProductCode in ('30A','20A','42B')
group by name
order by sum(stockdata) desc
I think this is what you are looking for, since how I see it, you want to select the two people who has the highest stockdata?

Related

How to combine multiple records from a joined table

I'm out of my depth with this SQL query.
I have two tables A and B with shared data based on a serial number. In table A, there is a unique Serial Nr field, while in B, details relating to a particular serial number are linked vertically over multiple records by a common Group ID. The serial number entry occurs as one of those records in the MyData field. I want to concatenate all records that share the same "Group ID" to a single field in A. For example:
Table A
Serial Nr Name Part Nr
2950 Prod1 1234
2955 Prod2 2345
Table B
Group ID MyData Comments
1 2950 serial nr
1 2016-10 build month
2 2955 serial nr
2 2015-11 build month
and I want Table AxB
Serial Nr Name Part Nr Table B data
2950 Prod1 1234 serial nr, 2950, build month, 2016-10
2955 Prod2 2345 serial nr, 2955, build month, 2015-11
I don't actually want the shared Group ID, but need it as concatenation key.
I have tried to do this with STUFF, but to no avail. Any ideas?

I assume your "TableB" also has a "SerialNr" field (which you just did not model);
That is to say, your Table B actually looks like this:
Serial Nr Group ID MyData Comments
2950 1 2950 serial nr
2950 1 2016-10 build month
2955 2 2955 serial nr
2955 2 2015-11 build month
If so, the following query will aggregate the "Comments" and "MyData" columns into a single row per SerialNumber:
SELECT serialno ,STUFF((SELECT ', ' + Comments + ' - ' + MyData [text()]
FROM TableB
WHERE SerialNo = t.SerialNo
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,2,' ') AggregatedData
FROM TableB t
GROUP BY serialno
You could then join that query to your original TableA to obtain the result set you posted, ie:
select *
from TableA
join (SELECT serialno ,STUFF((SELECT ', ' + Comments + ' - ' + MyData [text()]
FROM TableB
WHERE SerialNo = t.SerialNo
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,2,' ') AggregatedData
FROM TableB t
GROUP BY serialno
) AggregatedTableB
on TableA.SerialNo = AggregatedTableB.SerialNo
UPDATED:
Ok - so based on the fact that your "TableB" doesn't have its own SerialNr row, and instead its buried within the table data.. You need to find a way to extract that row-level data into a column.
Here is a query that can do that:
select tableB.GroupId, MyData, Comments, SerialNo
from tableB
join
(select MyData as serialNo, groupId
from tableb
where Comments ='serial nr') TableBWithSerialNo on tableB.GroupId = TableBWithSerialNo.GroupId
Now that you have this query which adds the Serial No as a column, you can use it in place of just using TableB in the above query. Here's what it would look like:
SELECT SerialNo ,STUFF((SELECT ', ' + Comments + ' - ' + MyData [text()]
FROM
(select tableB.GroupId, MyData, Comments, SerialNo
from tableB
join
(select MyData as serialNo, groupId
from tableb
where Comments ='serial nr') TableBWithSerialNo on tableB.GroupId = TableBWithSerialNo.GroupId
) t1
WHERE t1.SerialNo = t2.SerialNo
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,2,' ') AggregatedData
FROM (select tableB.GroupId, MyData, Comments, SerialNo
from tableB
join
(select MyData as serialNo, groupId
from tableb
where Comments ='serial nr') TableBWithSerialNo on tableB.GroupId = TableBWithSerialNo.GroupId
) t2
GROUP BY SerialNo
Granted - this is one ugly query - but that's one ugly table you are working with ;-)
If anything, I'd recommend making the first query into a view, and then using that view in the second query - that way you aren't repeating so much code, ie;
create view TableBView as
select tableB.GroupId, MyData, Comments, SerialNo
from tableB
join
(select MyData as serialNo, groupId
from tableb
where Comments ='serial nr') TableBWithSerialNo on tableB.GroupId = TableBWithSerialNo.GroupId
go
SELECT SerialNo ,STUFF((SELECT ', ' + Comments + ' - ' + MyData [text()]
FROM
TableBView t1
WHERE t1.SerialNo = t2.SerialNo
FOR XML PATH(''), TYPE).value('.','NVARCHAR(MAX)'),1,2,' ') AggregatedData
FROM TableBView t2
GROUP BY SerialNo

How to synthesize attribute for joined tables

I have a view defined like this:
CREATE VIEW [dbo].[PossiblyMatchingContracts] AS
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts
FROM [dbo].AllContracts AS C
INNER JOIN [dbo].AllContracts AS CC
ON C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND C.AssociatedUser IS NULL
AND C.UniqueID <> CC.UniqueID
Which is basically finding contracts where f.e. the first name and the birthday are matching. This works great. Now I want to add a synthetic attribute to each row with the value from only one source row.
Let me give you an example to make it clearer. Suppose I have the following table:
UniqueID | FirstName | LastName | Birthday
1 | Peter | Smith | 1980-11-04
2 | Peter | Gray | 1980-11-04
3 | Peter | Gray-Smith| 1980-11-04
4 | Frank | May | 1985-06-09
5 | Frank-Paul| May | 1985-06-09
6 | Gina | Ericson | 1950-11-04
The resulting view should look like this:
UniqueID | PossiblyMatchingContracts | SyntheticID
1 | 2 | PeterSmith1980-11-04
1 | 3 | PeterSmith1980-11-04
2 | 1 | PeterSmith1980-11-04
2 | 3 | PeterSmith1980-11-04
3 | 1 | PeterSmith1980-11-04
3 | 2 | PeterSmith1980-11-04
4 | 5 | FrankMay1985-06-09
5 | 4 | FrankMay1985-06-09
6 | NULL | NULL [or] GinaEricson1950-11-04
Notice that the SyntheticID column uses ONLY values from one of the matching source rows. It doesn't matter which one. I am exporting this view to another application and need to be able to identify each "match group" afterwards.
Is it clear what I mean? Any ideas how this could be done in sql?
Maybe it helps to elaborate a bit on the actual use case:
I am importing contracts from different systems. To account for the possibility of typos or people that have married but the last name was only updated in one system, I need to find so called 'possible matches'. Two or more contracts are considered a possible match if they contain the same birthday plus the same first, last or birth name. That implies, that if contract A matches contract B, contract B also matches contract A.
The target system uses multivalue reference attributes to store these relationships. The ultimate goal is to create user objects for these contracts. The catch first is, that the shall only be one user object for multiple matching contracts. Thus I'm creating these matches in the view. The second catch is, that the creation of user objects happens by workflows, which run parallel for each contract. To avoid creating multiple user objects for matching contracts, each workflow needs to check, if there is already a matching user object or another workflow, which is about to create said user object. Because the workflow engine is extremely slow compared to sql, the workflows should not repeat the whole matching test. So the idea is, to let the workflow check only for the 'syntheticID'.

I have solved it with a multi step approach:
Create the list of possible 1st level matches for each contract
Create the base groups list, assigning a different group for for
each contract (as if they were not related to anybody)
Iterate the matches list updating the group list when more contracts need to
be added to a group
Recursively build up the SyntheticID from final group list
Output results
First of all, let me explain what I have understood, so you can tell if my approach is correct or not.
1) matching propagates in "cascade"
I mean, if "Peter Smith" is grouped up with "Peter Gray", it means that all Smith and all Gray are related (if they have the same birth date) so Luke Smith can be in the same group of John Gray
2) I have not understood what you mean with "Birth Name"
You say contracts matches on "first, last or birth name", sorry, I'm italian, I thought birth name and first were the same, also in your data there is not such column. Maybe it is related to that dash symbol between names?
When FirstName is Frank-Paul it means it should match both Frank and Paul?
When LastName is Gray-Smith it means it should match both Gray and Smith?
In following code I have simply ignored this problem, but it could be handled if needed (I already did a try, breaking names, unpivoting them and treating as double match).
Step Zero: some declaration and prepare base data
declare #cli as table (UniqueID int primary key, FirstName varchar(20), LastName varchar(20), Birthday varchar(20))
declare #comb as table (id1 int, id2 int, done bit)
declare #grp as table (ix int identity primary key, grp int, id int, unique (grp,ix))
declare #str_id as table (grp int primary key, SyntheticID varchar(1000))
declare #id1 as int, #g int
;with
t as (
select *
from (values
(1 , 'Peter' , 'Smith' , '1980-11-04'),
(2 , 'Peter' , 'Gray' , '1980-11-04'),
(3 , 'Peter' , 'Gray-Smith', '1980-11-04'),
(4 , 'Frank' , 'May' , '1985-06-09'),
(5 , 'Frank-Paul', 'May' , '1985-06-09'),
(6 , 'Gina' , 'Ericson' , '1950-11-04')
) x (UniqueID , FirstName , LastName , Birthday)
)
insert into #cli
select * from t
Step One: Create the list of possible 1st level matches for each contract
;with
p as(select UniqueID, Birthday, FirstName, LastName from #cli),
m as (
select p.UniqueID UniqueID1, p.FirstName FirstName1, p.LastName LastName1, p.Birthday Birthday1, pp.UniqueID UniqueID2, pp.FirstName FirstName2, pp.LastName LastName2, pp.Birthday Birthday2
from p
join p pp on (pp.Birthday=p.Birthday) and (pp.FirstName = p.FirstName or pp.LastName = p.LastName)
where p.UniqueID<=pp.UniqueID
)
insert into #comb
select UniqueID1,UniqueID2,0
from m
Step Two: Create the base groups list
insert into #grp
select ROW_NUMBER() over(order by id1), id1 from #comb where id1=id2
Step Three: Iterate the matches list updating the group list
Only loop on contracts that have possible matches and updates only if needed
set #id1 = 0
while not(#id1 is null) begin
set #id1 = (select top 1 id1 from #comb where id1<>id2 and done=0)
if not(#id1 is null) begin
set #g = (select grp from #grp where id=#id1)
update g set grp= #g
from #grp g
inner join #comb c on g.id = c.id2
where c.id2<>#id1 and c.id1=#id1
and grp<>#g
update #comb set done=1 where id1=#id1
end
end
Step Four: Build up the SyntheticID
Recursively add ALL (distinct) first and last names of group to SyntheticID.
I used '_' as separator for birth date, first names and last names, and ',' as separator for the list of names to avoid conflicts.
;with
c as(
select c.*, g.grp
from #cli c
join #grp g on g.id = c.UniqueID
),
d as (
select *, row_number() over (partition by g order by t,s) n1, row_number() over (partition by g order by t desc,s desc) n2
from (
select distinct c.grp g, 1 t, FirstName s from c
union
select distinct c.grp, 2, LastName from c
) l
),
r as (
select d.*, cast(CONVERT(VARCHAR(10), t.Birthday, 112) + '_' + s as varchar(1000)) Names, cast(0 as bigint) i1, cast(0 as bigint) i2
from d
join #cli t on t.UniqueID=d.g
where n1=1
union all
select d.*, cast(r.names + IIF(r.t<>d.t,'_',',') + d.s as varchar(1000)), r.n1, r.n2
from d
join r on r.g = d.g and r.n1=d.n1-1
)
insert into #str_id
select g, Names
from r
where n2=1
Step Five: Output results
select c.UniqueID, case when id2=UniqueID then id1 else id2 end PossibleMatchingContract, s.SyntheticID
from #cli c
left join #comb cb on c.UniqueID in(id1,id2) and id1<>id2
left join #grp g on c.UniqueID = g.id
left join #str_id s on s.grp = g.grp
Here is the results
UniqueID PossibleMatchingContract SyntheticID
1 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
1 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
4 5 1985-06-09_Frank,Frank-Paul_May
5 4 1985-06-09_Frank,Frank-Paul_May
6 NULL 1950-11-04_Gina_Ericson
I think that in this way the resulting SyntheticID should also be "unique" for each group

This creates a synthetic value and is easy to change to suit your needs.
DECLARE #T TABLE (
UniqueID INT
,FirstName VARCHAR(200)
,LastName VARCHAR(200)
,Birthday DATE
)
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 1,'Peter','Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 2,'Peter','Gray','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 3,'Peter','Gray-Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 4,'Frank','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 5,'Frank-Paul','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 6,'Gina','Ericson','1950-11-04'
DECLARE #PossibleMatches TABLE (UniqueID INT,[PossibleMatch] INT,SynKey VARCHAR(2000)
)
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Fn=' + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,t2.UniqueID,'Ln=' + t1.LastName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,pm.UniqueID,'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
LEFT JOIN #PossibleMatches pm on pm.UniqueID=t1.UniqueID
WHERE pm.UniqueID IS NULL
SELECT *
FROM #PossibleMatches
ORDER BY UniqueID,[PossibleMatch]

I think this will work for you
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C INNER JOIN
[dbo].AllContracts AS CC ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN(
SELECT UniqueID FROM [dbo].DefinitiveMatches)
AND C.AssociatedUser IS NULL

You can try this:
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C
INNER JOIN
[dbo].AllContracts AS CC
ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND
C.AssociatedUser IS NULL
This will generate one extra row (because we left out C.UniqueID <> CC.UniqueID) but will give you the good souluton.

Following an example with some example data extracted from your original post. The idea: Generate all SyntheticID in a CTE, query all records with a "PossibleMatch" and Union it with all records which are not yet included:
DECLARE #t TABLE(
UniqueID int
,FirstName nvarchar(20)
,LastName nvarchar(20)
,Birthday datetime
)
INSERT INTO #t VALUES (1, 'Peter', 'Smith', '1980-11-04');
INSERT INTO #t VALUES (2, 'Peter', 'Gray', '1980-11-04');
INSERT INTO #t VALUES (3, 'Peter', 'Gray-Smith', '1980-11-04');
INSERT INTO #t VALUES (4, 'Frank', 'May', '1985-06-09');
INSERT INTO #t VALUES (5, 'Frank-Paul', 'May', '1985-06-09');
INSERT INTO #t VALUES (6, 'Gina', 'Ericson', '1950-11-04');
WITH ctePrep AS(
SELECT UniqueID, FirstName, LastName, BirthDay,
ROW_NUMBER() OVER (PARTITION BY FirstName, BirthDay ORDER BY FirstName, BirthDay) AS k,
FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
),
cteKeys AS(
SELECT FirstName, BirthDay, SyntheticID
FROM ctePrep
WHERE k = 1
),
cteFiltered AS(
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
keys.SyntheticID
FROM #t AS C
JOIN #t AS CC ON C.FirstName = CC.FirstName
AND C.Birthday = CC.Birthday
JOIN cteKeys AS keys ON keys.FirstName = c.FirstName
AND keys.Birthday = c.Birthday
WHERE C.UniqueID <> CC.UniqueID
)
SELECT UniqueID, PossiblyMatchingContracts, SyntheticID
FROM cteFiltered
UNION ALL
SELECT UniqueID, NULL, FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
WHERE UniqueID NOT IN (SELECT UniqueID FROM cteFiltered)
Hope this helps. The result looked OK to me:
UniqueID PossiblyMatchingContracts SyntheticID
---------------------------------------------------------------
2 1 PeterSmith1980-11-04
3 1 PeterSmith1980-11-04
1 2 PeterSmith1980-11-04
3 2 PeterSmith1980-11-04
1 3 PeterSmith1980-11-04
2 3 PeterSmith1980-11-04
4 NULL FrankMay1985-06-09
5 NULL Frank-PaulMay1985-06-09
6 NULL GinaEricson1950-11-04

Tested in SSMS, it works perfect. :)
--create table structure
create table #temp
(
uniqueID int,
firstname varchar(15),
lastname varchar(15),
birthday date
)
--insert data into the table
insert #temp
select 1, 'peter','smith','1980-11-04'
union all
select 2, 'peter','gray','1980-11-04'
union all
select 3, 'peter','gray-smith','1980-11-04'
union all
select 4, 'frank','may','1985-06-09'
union all
select 5, 'frank-paul','may','1985-06-09'
union all
select 6, 'gina','ericson','1950-11-04'
select * from #temp
--solution is as below
select ab.uniqueID
, PossiblyMatchingContracts
, c.firstname+c.lastname+cast(c.birthday as varchar) as synID
from
(
select a.uniqueID
, case
when a.uniqueID < min(b.uniqueID)over(partition by a.uniqueid)
then a.uniqueID
else min(b.uniqueID)over(partition by a.uniqueid)
end as SmallestID
, b.uniqueID as PossiblyMatchingContracts
from #temp a
left join #temp b
on (a.firstname = b.firstname OR a.lastname = b.lastname) AND a.birthday = b.birthday AND a.uniqueid <> b.uniqueID
) as ab
left join #temp c
on ab.SmallestID = c.uniqueID
Result capture is attached below:

Say we have following table (a VIEW in your case):
UniqueID PossiblyMatchingContracts SyntheticID
1 2 G1
1 3 G2
2 1 G3
2 3 G4
3 1 G4
3 4 G6
4 5 G7
5 4 G8
6 NULL G9
In your case you can set initial SyntheticID as a string like PeterSmith1980-11-04 using UniqueID for each line. Here is a recursive CTE query it divides all lines to unconnected groups and select MAX(SyntheticId) in the current group as a new SyntheticID for all lines in this group.
WITH CTE AS
(
SELECT CAST(','+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' as Varchar(MAX)) as GroupCont,
SyntheticID
FROM PossiblyMatchingContracts
UNION ALL
SELECT CAST(GroupCont+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' AS Varchar(MAX)) as GroupCont,
pm.SyntheticID
FROM CTE
JOIN PossiblyMatchingContracts as pm
ON
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
AND NOT
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
AND
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
)
SELECT pm.UniqueID,
pm.PossiblyMatchingContracts,
ISNULL(
(SELECT MAX(SyntheticID) FROM CTE WHERE
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
))
,pm.SyntheticID) as SyntheticID
FROM PossiblyMatchingContracts pm

edit and Update records using reference id

i have table with multiple records in a field name Comments... with my aspx code the data in comments column gets inserted in three rows with different requirementcommentid but the field comment will remain same
to retrieve distinct i used this query
SELECT distinct (
select top 1 requirementcommentid
from Requirementcomment
where requirementcomment=rc.requirementcomment
and fcr.SectionID in(
SELECT sectionid
FROM [dbo].udfGetSectionID_allComYear(2151)
)
AND fcr.FirmID = 20057
),
rc.IsRejected,
fcr.SectionID,
rc.UserID,
rc.RequirementComment,
convert(varchar(25), dateadd(hour, -5, rc.InsertDate),101) as InsertDate,
Department.DeptName,
FirmUser.DepartmentID,
rc.FirmComplianceYearID
FROM RequirementComment rc
INNER JOIN FirmComplianceRequirement fcr ON fcr.FirmComplianceRequirementID = rc.FirmComplianceRequirementID
INNER JOIN FirmUser ON FirmUser.FirmUserID =rc.UserID
INNER JOIN Department ON Department.DeptID = FirmUser.DepartmentID WHERE rc.IsRejected = 1
AND fcr.SectionID in(SELECT sectionid FROM [dbo].udfGetSectionID_allComYear (2151))
AND fcr.FirmID = 20057 AND rc.RequirementComment!=''
if i want to edit this distinct comment and update it.how can i do this... as only one comment row get edited remaining two rows value in field comment remain the same...!
i want remaining data to be updated automatically if i clicked on edit and updated only single record

If you can not solve this with a procedure when storing, or in .NET, consider to use a trigger. I have made a generic example, since your example code is a bit complex :)
CREATE TABLE TMP_TriggerTable
(
ID INT IDENTITY(1,1) PRIMARY KEY
, ID2 INT NOT NULL
, Comment VARCHAR(255) NOT NULL
)
GO
INSERT INTO TMP_TriggerTable
SELECT 1, 'asd'
UNION ALL
SELECT 1, 'asd'
UNION ALL
SELECT 1, 'asd'
UNION ALL
SELECT 2, 'asd'
UNION ALL
SELECT 2, 'asd'
UNION ALL
SELECT 2, 'asd'
GO
CREATE TRIGGER TRG_TMP_TriggerTable ON TMP_TriggerTable
AFTER UPDATE
AS
BEGIN
WITH InsertedIDPriority AS
(
--Handle if more than one related comment was updated
SELECT Prio = ROW_NUMBER() OVER (PARTITION BY ID2 ORDER BY ID)
, ID
, ID2
, Comment
FROM INSERTED
)
UPDATE t SET Comment = i.Comment FROM TMP_TriggerTable t
JOIN InsertedIDPriority i ON
t.ID2 = i.ID2 --Select all related comments
AND t.ID != i.ID2 --No need to update main column two times
AND i.Prio = 1 --Handle if more than one related comment was updated
END
GO
UPDATE TMP_TriggerTable SET Comment = 'asd2' WHERE ID = 1
/*
SELECT * FROM TMP_TriggerTable
--Returns--
ID ID2 Comment
1 1 asd2
2 1 asd2
3 1 asd2
4 2 asd
5 2 asd
6 2 asd
*/

SUM by two different GROUP BY

I'm getting the wrong result from my report. Maybe i'm missing something simple.
The report is an inline table-valued-function that should count goods movement in our shop and how often these spareparts are claimed(replaced in a repair).
The problem: different spareparts in the shop-table(lets call it SP) can be linked to the same sparepart in the "repair-table"(TSP). I need the goods movement of every sparepart in SP and the claim-count of every distinct sparepart in TSP.
This is a very simplified excerpt of the relevant part:
create table #tsp(id int, name varchar(20),claimed int);
create table #sp(id int, name varchar(20),fiTsp int,ordered int);
insert into #tsp values(1,'1235-6044',300);
insert into #tsp values(2,'1234-5678',400);
insert into #sp values(1,'1235-6044',1,30);
insert into #sp values(2,'1235-6044',1,40);
insert into #sp values(3,'1235-6044',1,50);
insert into #sp values(4,'1234-5678',2,60);
WITH cte AS(
select tsp.id As TspID,tsp.name as TspName,tsp.claimed As Claimed
,sp.id As SpID,sp.name As SpName,sp.ordered As Ordered
from #sp sp inner join #tsp tsp
on sp.fiTsp=tsp.id
)
SELECT TspName, SUM(Claimed) As Claimed, Sum(Ordered) As Ordered
FROM cte
Group By TspName
drop table #tsp;
drop table #sp;
Result:
TspName Claimed Ordered
1234-5678 400 60
1235-6044 900 120
The Ordered-count is correct but the Claimed-count should be 300 instead of 900 for TspName='1235-6044'.
I need to group by Tsp.ID for the claim-count and group by Sp.ID for the order-count. But how in one query?
Edit: Actually the TVF looks like(note that getOrdered and getClaimed are SVFs and that i'm grouping in the outer select on TSP's Category):
CREATE FUNCTION [Gambio].[rptReusedStatistics](
#fromDate datetime
,#toDate datetime
,#fromInvoiceDate datetime
,#toInvoiceDate datetime
,#idClaimStatus varchar(50)
,#idSparePartCategories varchar(1000)
,#idSpareParts varchar(1000)
)
RETURNS TABLE AS
RETURN(
WITH ExclusionCat AS(
SELECT idSparePartCategory AS ID From tabSparePartCategory
WHERE idSparePartCategory IN(- 3, - 1, 6, 172,168)
), Report AS(
SELECT Cat.SparePartCategoryName AS Category
,TSP.SparePartDescription AS Part
,TSP.SparePartName AS PartNumber
,SP.Inventory
,Gambio.getGoodsIn(SP.idSparePart,#FromDate,#ToDate) GoodsIn
,Gambio.getOrdered(SP.idSparePart,#FromDate,#ToDate) Ordered
--,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
-- Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,NULL)END AS Claimed
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,1)END AS ClaimedReused
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getCostSaving(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus)END AS Costsaving
FROM Gambio.SparePart AS SP
INNER JOIN tabSparePart AS TSP ON SP.fiTabSparePart = TSP.idSparePart
INNER JOIN tabSparePartCategory AS Cat
ON Cat.idSparePartCategory=TSP.fiSparePartCategory
WHERE Cat.idSparePartCategory NOT IN(SELECT ID FROM ExclusionCat)
AND (#idSparePartCategories IS NULL
OR TSP.fiSparePartCategory IN(
SELECT Item From dbo.Split(#idSparePartCategories,',')
)
)
AND (#idSpareParts IS NULL
OR TSP.idSparePart IN(
SELECT Item From dbo.Split(#idSpareParts,',')
)
)
)
SELECT Category
--, Part
--, PartNumber
, SUM(Inventory)As InventoryCount
, SUM(GoodsIn) As GoodsIn
, SUM(Ordered) As Ordered
--, SUM(Claimed) As Claimed
, SUM(ClaimedReused)AS ClaimedReused
, SUM(Costsaving) As Costsaving
, Count(*) AS PartCount
FROM Report
GROUP BY Category
)
Solution:
Thanks to Aliostad i've solved it by first grouping and then joining(actual TVF, reduced to a minimum):
WITH Report AS(
SELECT Cat.SparePartCategoryName AS Category
,TSP.SparePartDescription AS Part
,TSP.SparePartName AS PartNumber
,SP.Inventory
,SP.GoodsIn
,SP.Ordered
,Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,1) AS ClaimedReused
,Gambio.getCostSaving(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus) AS Costsaving
FROM (
SELECT GSP.fiTabSparePart
,SUM(GSP.Inventory)AS Inventory
,SUM(Gambio.getGoodsIn(GSP.idSparePart,#FromDate,#ToDate))AS GoodsIn
,SUM(Gambio.getOrdered(GSP.idSparePart,#FromDate,#ToDate))AS Ordered
FROM Gambio.SparePart GSP
GROUP BY GSP.fiTabSparePart
)As SP
INNER JOIN tabSparePart TSP ON SP.fiTabSparePart = TSP.idSparePart
INNER JOIN tabSparePartCategory AS Cat
ON Cat.idSparePartCategory=TSP.fiSparePartCategory
)
SELECT Category
, SUM(Inventory)As InventoryCount
, SUM(GoodsIn) As GoodsIn
, SUM(Ordered) As Ordered
, SUM(ClaimedReused)AS ClaimedReused
, SUM(Costsaving) As Costsaving
, Count(*) AS PartCount
FROM Report
GROUP BY Category

You are JOINing first and then GROUPing by. You need to reverse it, GROUP BY first and then JOIN.
So here in my subquery, I group by first and then join:
select
claimed,
ordered
from
#tsp
inner JOIN
(select
fitsp,
SUM(ordered) as ordered
from
#sp
group by
fitsp) as SUMS
on
SUMS.fiTsp = id;

I think you just need to select Claimed and add it to the Group By in order to get what you are looking for.
WITH cte AS(
select tsp.id As TspID,tsp.name as TspName,tsp.claimed As Claimed
,sp.id As SpID,sp.name As SpName,sp.ordered As Ordered
from #sp sp inner join #tsp tsp
on sp.fiTsp=tsp.id )
SELECT TspName, Claimed, Sum(Ordered) As Ordered
FROM cte
Group By TspName, Claimed

Your cte is an inner join between tsp and sp, which means that the data you're querying looks like this:
SpID Ordered TspID TspName Claimed
1 30 1 1235-6044 300
2 40 1 1235-6044 300
3 50 1 1235-6044 300
4 60 2 1234-5678 400
Notice how TspID, TspName and Claimed all get repeated. Grouping by TspName means that the data gets grouped in two groups, one for 1235-6044 and one for 1234-5678. The first group has 3 rows on which to run the aggregate functions, the second group only one. That's why your sum(Claimed) will get you 300*3=900.
As Aliostad suggested, you should first group by TspID and do the sum of Ordered and then join to tsp.

No need to join, just subselect:
create table #tsp(id int, name varchar(20),claimed int);
create table #sp(id int, name varchar(20),fiTsp int,ordered int);
insert into #tsp values(1,'1235-6044',300);
insert into #tsp values(2,'1234-5678',400);
insert into #sp values(1,'1235-6044',1,30);
insert into #sp values(2,'1235-6044',1,40);
insert into #sp values(3,'1235-6044',1,50);
insert into #sp values(4,'1234-5678',2,60);
WITH cte AS(
select tsp.id As TspID,tsp.name as TspName,tsp.claimed As Claimed
,sp.id As SpID,sp.name As SpName,sp.ordered As Ordered
from #sp sp inner join #tsp tsp
on sp.fiTsp=tsp.id
)
SELECT id, name, SUM(claimed) as Claimed, (SELECT SUM(ordered) FROM #sp WHERE #sp.fiTsp = #tsp.id GROUP BY #sp.fiTsp) AS Ordered
FROM #tsp
GROUP BY id, name
drop table #tsp;
drop table #sp;
Produces:
id name Claimed Ordered
1 1235-6044 300 120
2 1234-5678 400 60
-- EDIT --
Based on the additional info, this is how I might try to split the CTE to form the data as per the example. I fully admit that Aliostad's approach may yield a cleaner query but here's an attempt (completely blind) using the subselect:
CREATE FUNCTION [Gambio].[rptReusedStatistics](
#fromDate datetime
,#toDate datetime
,#fromInvoiceDate datetime
,#toInvoiceDate datetime
,#idClaimStatus varchar(50)
,#idSparePartCategories varchar(1000)
,#idSpareParts varchar(1000)
)
RETURNS TABLE AS
RETURN(
WITH ExclusionCat AS (
SELECT idSparePartCategory AS ID From tabSparePartCategory
WHERE idSparePartCategory IN(- 3, - 1, 6, 172,168)
), ReportSP AS (
SELECT fiTabSparePart
,Inventory
,Gambio.getGoodsIn(idSparePart,#FromDate,#ToDate) GoodsIn
,Gambio.getOrdered(idSparePart,#FromDate,#ToDate) Ordered
FROM Gambio.SparePart
), ReportTSP AS (
SELECT TSP.idSparePart
,Cat.SparePartCategoryName AS Category
,TSP.SparePartDescription AS Part
,TSP.SparePartName AS PartNumber
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getClaimed(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus,1)END AS ClaimedReused
,CASE WHEN TSP.idSparePart IS NULL THEN 0 ELSE
Gambio.getCostSaving(TSP.idSparePart,#FromInvoiceDate,#ToInvoiceDate,#idClaimStatus)END AS Costsaving
FROM tabSparePart AS TSP
INNER JOIN tabSparePartCategory AS Cat
ON Cat.idSparePartCategory=TSP.fiSparePartCategory
WHERE Cat.idSparePartCategory NOT IN(SELECT ID FROM ExclusionCat)
AND (#idSparePartCategories IS NULL
OR TSP.fiSparePartCategory IN(
SELECT Item From dbo.Split(#idSparePartCategories,',')
)
)
AND (#idSpareParts IS NULL
OR TSP.idSparePart IN(
SELECT Item From dbo.Split(#idSpareParts,',')
)
)
)
SELECT Category
--, Part
--, PartNumber
, (SELECT SUM(Inventory) FROM ReportSP WHERE ReportSP.fiTabSparePart = idSparePart GROUP BY fiTabSparePart) AS Inventory
, (SELECT SUM(GoodsIn) FROM ReportSP WHERE ReportSP.fiTabSparePart = idSparePart GROUP BY fiTabSparePart) AS GoodsIn
, (SELECT SUM(Ordered) FROM ReportSP WHERE ReportSP.fiTabSparePart = idSparePart GROUP BY fiTabSparePart) AS Ordered
, Claimed
, ClaimedReused
, Costsaving
, Count(*) AS PartCount
FROM ReportTSP
GROUP BY Category
)
Without a better understanding of the whole schema it's difficult to cover for all the eventualities but whether this works or not (I suspect PartCount will be 1 for all instances) hopefully it'll give you some fresh thoughts for alternate approaches.

SELECT
tsp.name
,max(tsp.claimed) as claimed
,sum(sp.ordered) as ordered
from #sp sp
inner join #tsp tsp
on sp.fiTsp=tsp.id
GROUP BY tsp.name

Creating a SQL query that performs math with variables from multiple tables

Here's an example of what I've attempted thus far:
A mockup of what the tables look like:
Inventory
ID | lowrange | highrange
-------------------------------
1 | 15 | 20
2 | 21 | 30
Audit (not used in this query asides from the join)
MissingOrVoid
ID | Item | Missing | Void
---------------------------------
1 | 17 | 1 | 0
1 | 19 | 1 | 0
The most recent query I've attempted to use:
SELECT I.*,
SUM(
(I.HIGHRANGE - I.LOWRANGE + 1)
- (Count(M.Missing) from M where M.ID = I.ID)
- (Count(M.Void) from M where M.ID = I.ID)) AS Item_Quantity
FROM Inventory I
JOIN Audit A
ON A.ID = I.ID
JOIN MissingOrVoid M
ON M.ID = I.ID
The result should be:
ID | lowrange | highrange | Item_Quantity
-----------------------------------------------
1 | 15 | 20 | 4
2 | 21 | 30 | 10
I can't remember exactly where I've made changes, but in a previous attempt the error message received prior was "Cannot perform an aggregate function on an expression containing an aggregate or a subquery." Currently the error is incorrect syntax near "from" (the one beside M.Missing but in my minimal knowledge of SQL, it appears that these syntax issues cause an outright failure and there may be underlying issues with the query that aren't visible until all of the syntax problems are fixed).
The part where I'm really bombing on is obviously the SUM() section. I am far from a database architect so could someone explain how to correctly perform this and possibly point me in the direction of a resource to learn about this type of function?
Thanks

You almost had it right. I am guessing missing/void are BIT types, which you cannot SUM directly.
SELECT I.*,
(I.HIGHRANGE - I.LOWRANGE + 1)
- (select Count(nullif(M.Missing,0)) from MissingOrVoid M where M.ID = I.ID)
- (select Count(nullif(M.Void,0)) from MissingOrVoid M where M.ID = I.ID)
AS Item_Quantity
FROM Inventory I
If an item cannot both be missing and void, then
SELECT I.*,
I.HIGHRANGE - I.LOWRANGE + 1
- (select Count(case when M.Missing=1 or M.Void=1 then 1 end)
from MissingOrVoid M where M.ID = I.ID)
AS Item_Quantity
FROM Inventory I
In fact, if it is only present in MissingOrVoid when it is missing or void, then the CASE in the above query will always be true, so this simplifies to
SELECT I.*,
I.HIGHRANGE - I.LOWRANGE + 1
- (select Count(*) from MissingOrVoid M where M.ID = I.ID)
AS Item_Quantity
FROM Inventory I

Initially I have a question as to whether you need to sum those values. If your inventory table has one row per item, that shouldn't be necessary. I'll assume that your table can have multiple rows for a given item, though, and proceed from there.
I think the issue is just a problem with the construction of the subquery. I haven't tested this, but I think it should look more like:
select I.ID,
I.Item,
SUM(I.HighRange - I.LowRange + 1)
- (
select SUM(M.Missing + M.Void)
from dbo.Audit A
where A.ID = I.ID
)
from Inventory I
group by I.ID, I.Item

Is this what you're trying to do? I'm not sure what the numbers in the missing and void columns are unless they're just flags...
SELECT I.*,
((I.highrange - I.lowrange + 1)
- SUM(M.Missing)
- SUM(M.Void)) AS Item_Quantity
FROM Inventory I
JOIN MissingOrVoid M
ON M.ID = I.ID

The following query works. This assumes there is only one highrange and lowrange for each id.
CREATE TABLE #Inventory (ID INT,Lowrange INT,highrange INT)
CREATE TABLE #MissingOrVoid (Id INT,item INT, missing INT, void INT)
INSERT #Inventory
( ID, Lowrange, highrange )
VALUES ( 1, -- ID - int
15, -- Lowrange - int
20 -- highrange - int
)
INSERT #Inventory
( ID, Lowrange, highrange )
VALUES ( 2, -- ID - int
21, -- Lowrange - int
30 -- highrange - int
)
INSERT #MissingOrVoid
( Id, item, missing, void )
VALUES ( 1, -- Id - int
17, -- item - int
1, -- missing - int
0 -- void - int
)
INSERT #MissingOrVoid
( Id, item, missing, void )
VALUES ( 1, -- Id - int
19, -- item - int
1, -- missing - int
0 -- void - int
)
SELECT #Inventory.ID,
#Inventory.highrange,
#Inventory.Lowrange,
highrange-Lowrange+1
-SUM(ISNULL(missing,0))
-SUM(ISNULL(void,0)) AS ITEM_QUANTITY
FROM #Inventory
left JOIN #MissingOrVoid ON #Inventory.ID = #MissingOrVoid.Id
GROUP BY #Inventory.ID,#Inventory.highrange,#Inventory.Lowrange
DROP TABLE #Inventory
DROP TABLE #MissingOrVoid

I'd say this would work :
SELECT I.ID,I.Lowrange as Lowrange,
I.highrange as Highrange,
Highrange-Lowrange+1-COUNT(J.missing)-COUNT(J.void) AS ITEM_QUANTITY
FROM Inventory I
left JOIN ( select missing as missing, void as void, id from MissingOrVoid
) J
ON I.ID = J.Id
JOIN Audit A
ON A.ID = I.ID
GROUP BY I.ID,Highrange,Lowrange
But it looks like what RemoteSojourner suggested a lot (and his one is also more esthetic).

I'm going to give the derived table approach as it could be faster than a correlated subquery (which run row by row)
SELECT I.*,
I.HIGHRANGE - I.LOWRANGE + 1 - MissingVoidCount AS Item_Quantity
FROM Inventory I
JOIN
(SELECT ID,Count(*) AS MissingVoidCount FROM MissingOrVoid GROUP BY ID) M
on M.ID = I.ID
Of course in real life, I would never use select *. You also could use a CTE approach.
;WITH MissingVoid(ID, MissingVoidCount) AS
(
SELECT ID, Count(*) FROM MissingOrVoid GROUP BY ID
)
SELECT
I.*,
I.HIGHRANGE - I.LOWRANGE + 1 - MissingVoidCount AS Item_Quantity
FROM Inventory I
JOIN MissingVoid M
on M.ID = I.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query to Find Most Effective Data - sql

Related

How to combine multiple records from a joined table

How to synthesize attribute for joined tables

edit and Update records using reference id

SUM by two different GROUP BY

Creating a SQL query that performs math with variables from multiple tables

Categories

Resources