T-Sql Query - Get Unique Rows Across 2 Columns - sql

I have a set of data, with columns x and y. This set contains rows where, for any 2 given values, A and B, there is a row with A and B in columns x and y respectivly and there will be a second row with B and A in columns x and y respectivly.
E.g
**Column X** **Column Y**
Row 1 A B
Row 2 B A
There are multiple pairs of data in
this set that follow this rule.
For every row with A, B in Columns
X and Y, there will always be a
row with B, A in X and Y
Columns X and Y are of type int
I need a T-Sql query that given a set with the rules above will return me either Row 1 or Row 2, but not both.
Either the answer is very difficult, or its so easy that I can't see the forest for the trees, either way it's driving me up the wall.

Add to your query the predicate,
where X < Y
and you can never get row two, but will always get row one.
(This assumes that when you wrote "two given values" you meant two distinct given values; if the two values can be the same, add the predicate where X <= Y (to get rid of all "reversed" rows where X > Y) and then add a distinct to your select list (to collapse any two rows where X == Y into one row).)
In reply to comments:
That is, if currently your query is select foo, x, y from sometable where foo < 3; change it to select foo, x, y from sometable where foo < 3 and x < y;, or for the the second case (where X and Y are not distinct values) select distinct foo, x, y from sometable where foo < 3 and x <= y;.

This should work.
Declare #t Table (PK Int Primary Key Identity(1, 1), A int, B int);
Insert into #t values (1, 2);
Insert into #t values (2, 1);
Insert into #t values (3, 4);
Insert into #t values (4, 3);
Insert into #t values (5, 6);
Insert into #t values (6, 5);
Declare #Table Table (ID Int Primary Key Identity(1, 1), PK Int, A Int, B Int);
Declare #Current Int;
Declare #A Int;
Insert Into #Table
Select PK, A, B
From #t;
Set #Current = 1;
While (#Current <= (Select Max(ID) From #Table) Begin
Select #A = A
From #Table
Where ID = #Current;
If (#A Is Not Null) Begin
Delete From #Table Where B = #A;
If ((Select COUNT(*) From #Table Where A = #A) > 1) Begin
Delete From #Table Where ID = #Current;
End
End
Set #A = Null;
Set #Current = #Current + 1;
End
Select a.*
From #tAs a
Inner Join #Table As b On a.PK = b.PK

SELECT O.X, O.Y
FROM myTable O
WHERE EXISTS (SELECT X, Y FROM myTable I WHERE I.X = O.Y AND I.Y = O.X)
I have not tried this. But, this should work.

To get the highest and lowest of each pair, you could use:
(X+Y+ABS(X-Y)) / 2 as High, (X+Y-ABS(X-Y)) / 2 as Low
So now use DISTINCT to get the pairs of them.
SELECT DISTINCT
(X+Y+ABS(X-Y)) / 2 as High, (X+Y-ABS(X-Y)) / 2 as Low
FROM YourTable

Related

Intersecting a dynamic number of tables

I have two relations, a and b, with attributes given by
CREATE TABLE a (id int, b_id int)
CREATE TABLE b (id int)
for which I can assume that all pairs of values in a and all values in b are unique, and which will be based in an SQL Server 2016 database.
A given element of b defines a subset of a.id given by those elements for which the corresponding a.b_id is the given value, and my goal is to produce the intersection of all those subsets.
Say, for instance, that a contains the six values,
INSERT INTO a VALUES (1, 1), (1, 2), (1, 3), (2, 2), (3, 2), (3, 3)
Then the expected results would include the following:
b: (1), (2), (3). Expected result: (1)
b: (1), (2). Expected result: (1)
b: (2). Expected result: (1), (2), (3)
b: (2), (3). Expected result: (1), (3)
b: [Empty]. Expected result: (1), (2), (3)
Using uniqueness, for the case of non-empty b, this can be achieved through
SELECT a.id FROM a
JOIN b on a.b_id = b.id
GROUP BY a.id
HAVING COUNT(a.id) = (SELECT COUNT(*) FROM b)
but this feels clunky, given that SQL has an INTERSECT operator readily available, and were I to write the same query in, say, LINQ, I would simply aggregate intersections. It also fails to produce the desired result in the case of empty b without treating that as a special case.
So, the question becomes: Is there a more idiomatic way of performing the above query, which also works properly for trivial b?
What you are trying to do is called a relational division [1, 2].
CREATE TABLE a (id INT, b_id INT);
CREATE TABLE b (id INT);
INSERT INTO a VALUES
(1, 1), (1, 2), (1, 3),
(2, 2), (3, 2), (3, 3);
DECLARE #i INT = 0;
WHILE #i < 5 BEGIN
TRUNCATE TABLE b;
IF #i = 0 INSERT b VALUES (1), (2), (3);
IF #i = 1 INSERT b VALUES (1), (2);
IF #i = 2 INSERT b VALUES (2);
IF #i = 3 INSERT b VALUES (2), (3);
SELECT DISTINCT x.id
FROM a AS x
WHERE NOT EXISTS (
SELECT *
FROM b AS y
WHERE NOT EXISTS (
SELECT *
FROM a AS z
WHERE z.id = x.id AND z.b_id=y.id
)
)
SET #i = #i + 1;
END;
Test it online.
You could extend your approach to handle empty set case by using:
SELECT a.id
FROM a
LEFT JOIN b on a.b_id = b.id
GROUP BY a.id
HAVING COUNT(b.id) = (SELECT COUNT(*) FROM b);
DBFiddle Demo | DBFiddle Demo - all test cases
Extra: Transient Data (used in second demo)
EDIT:
Another approach to handle empty set and leave INNER JOIN:
SELECT a.id FROM a
JOIN b on a.b_id = b.id
GROUP BY a.id
HAVING COUNT(a.id) = (SELECT COUNT(*) FROM b)
UNION
SELECT a.id
FROM a
WHERE NOT EXISTS (SELECT 1 FROM b);
DBFiddle Demo 3

Use subselect in insert to temp-table

I'm trying to insert a subselect in one insert to a temp table. Thing is I want to use one insert, in this insert I want to insert my sub-select to the third column of the temp table. I know that I only have 2 parameters in the first select, the trick is the third. How do I get into col 3 by using 1 insert and 1 subselect. I get the error message
Msg 120, Level 15, State 1, Procedure Stored_Procedure,
Line 24 The select list for the INSERT statement
contains fewer items than the insert list. The
number of SELECT values must match the number of INSERT columns.
This is my code.
insert into #Temp (Col01,Col02,Col03)
select X, Y from Table
where Y = CONVERT(varchar,Dateadd(DD,-0,GETDATE()),112)
and Z = '8:00' (Select X from Table
where Datum = CONVERT(varchar,Dateadd(DD,-0,GETDATE()),112)
and Z = '17:00')
number of parameter does not match to no of value
insert into #Temp (Col01,Col02,Col03)--parameters 3
select X, Y --values 2
insert into #Temp (Col01,Col02,Col03)
select X, Y , Z from Table
where Y = CONVERT(varchar,Dateadd(DD,-0,GETDATE()),112)
and Z = '8:00' (Select X from Table
where Datum = CONVERT(varchar,Dateadd(DD,-0,GETDATE()),112)
and Z = '17:00')
This is how I solved it instead.. if you want to use 1 insert with subselect you should do it this way.
insert into #ExcelPrint (Col01,Col02,Col03)
select
Z,
X as X_8,
(Select Kl_17.X from Table as Kl_17
where Kl_17.Y = CONVERT(varchar,Dateadd(DD,-0,GETDATE()),112)
and Kl_17.X = '17:00'
and Kl_17.Z = Table.Z)
as X_17
from Table
where Datum = CONVERT(varchar,Dateadd(DD,-0,GETDATE()),112)
and Z = '8:00'

SQL Trigger to split string during insert without a common delimiter and store it into another table

Currently I have a system that is dumping data into a table with the format:
Table1
Id#, row#, row_dump
222, 1, “set1 = aaaa set2 =aaaaaa aaaa dd set4=1111”
I want to take the row dump and transpose it into rows and insert it into another table of the format:
Table2
Id#, setting, value
222, ‘set1’,’aaa’
222, ‘set2’,’aaaaaa aaaa dd’
222, ‘set4’,’1111’
Is there a way to make a trigger in MSSQL that will parse this string on insert in Table1 and insert it into Table2 properly?
All of the examples I’ve found required a common delimiter. ‘=’ separates the setting from the value, space(s) separate a value from a setting but a value could have spaces in it (settings do not have spaces in them so the last word before the equal sign is the setting name but there could be spaces between the setting name and equal sign).
There could be 1-5 settings and values in any given row. The values can have spaces. There may or may not be space between the setting name and the ‘=’ sign.
I have no control over the original insert process or format as it is used for other purposes.
You could use 'set' as a delimiter. This is a simple sample. It obviously may have to be molded to your environment.
use tempdb
GO
IF OBJECT_ID('dbo.fn_TVF_Split') IS NOT NULL
DROP FUNCTION dbo.fn_TVF_Split;
GO
CREATE FUNCTION dbo.fn_TVF_Split(#arr AS NVARCHAR(2000), #sep AS NCHAR(3))
RETURNS TABLE
AS
RETURN
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1) --2 rows
,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B) --4 rows (2x2)
,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B) --16 rows (4x4)
,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B) --256 rows (16x16)
,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B) --65536 rows (256x256)
,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B) --4,294,967,296 rows (65536x65536)
,Nums AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5)
SELECT
(n - 1) - LEN(REPLACE(LEFT(#arr, n-1), #sep, N'')) + 1 AS pos,
SUBSTRING(#arr, n, CHARINDEX(#sep, #arr + #sep, n) - n) AS element
FROM Nums
WHERE
n <= LEN(#arr) + 3
AND SUBSTRING(#sep + #arr, n, 3) = #sep
AND N<=100000
GO
declare #t table(
Id int,
row int,
row_dump varchar(Max)
);
insert into #t values(222, 1, 'set1 = aaaa set2 =aaaaaa aaaa dd set4=1111')
insert into #t values(111, 2, ' set1 =cx set2 =4444set4=124')
DECLARE #t2 TABLE(
Id int,
Setting VARCHAR(6),
[Value] VARCHAR(50)
)
insert into #t2 (Id,Setting,Value)
select
Id,
[Setting]='set' + left(LTRIM(element),1),
[Value]=RIGHT(element,charindex('=',reverse(element))-1)
from #t t
cross apply dbo.fn_TVF_Split(row_dump,'set')
where pos > 1
order by
id asc,
'set' + left(LTRIM(element),1) asc
select *
from #t2
Update
You could do something like this. It is not optimal and could probably be better handled in the transformation tool or application. Anyway here we go.
Note: You will need the split function I posted before.
declare #t table(
Id int,
row int,
row_dump varchar(Max)
);
insert into #t values(222, 1, 'set1 = aaaa set2 =aaaaaa aaaa dd set3=abc set4=1111 set5=7373')
insert into #t values(111, 2, 'set1 =cx set2 = 4444 set4=124')
DECLARE #t2 TABLE(
Id int,
Setting VARCHAR(6),
[Value] VARCHAR(50)
)
if OBJECT_ID('tempdb.dbo.#Vals') IS NOT NULL
BEGIN
DROP TABLE #Vals;
END
CREATE TABLE #Vals(
Id INT,
Row INT,
Element VARCHAR(MAX),
pos int,
value VARCHAR(MAX)
);
insert into #Vals
select
Id,
row,
element,
pos,
Value=STUFF(LEFT(element,len(element) - CHARINDEX(' ',reverse(element))),1,1,'')
from(
select
Id,
row,
row_dump = REPLACE(REPLACE(REPLACE(row_dump,'= ','='),' =','='),'=','=|')
from #t
) AS t
cross apply dbo.fn_TVF_Split(row_dump,'=')
where pos >=1 and pos < 10
insert into #t2 (Id,Setting,Value)
select
t1.Id,
Setting =
(
SELECT TOP 1
CASE WHEN t2.pos = 1
THEN LTRIM(RTRIM(t2.element))
ELSE LTRIM(RTRIM(RIGHT(t2.element,CHARINDEX(' ',REVERSE(t2.element)))))
END
FROM #Vals t2
where
t2.Id = t1.id
and t2.row = t1.row
and t2.pos < t1.pos
ORDER BY t2.pos DESC
),
t1.Value
from #Vals t1
where t1.pos > 1 and t1.pos < 10
order by t1.id,t1.row,t1.pos
select * from #t2

SQL Group By Modulo of Row Count

I have the following sample data:
Id Name Quantity
1 Red 1
2 Red 3
3 Blue 1
4 Red 1
5 Yellow 3
So for this example, there are a total of 5 Red, 1 Blue, and 3 Yellow. I am looking for a way to group them by Color, but with a maximum of 2 items per group (sorting is not important). Like so:
Name QuantityInPackage
Red 2
Red 2
Red 1
Blue 1
Yellow 2
Yellow 1
Any suggestions on how to accomplish this using T-SQL on MS-SQL 2005?
I would define a table containing sequential numbers, say 1 to 1000 and join that table (unless your database supports generating these numbers in the query like Oracle using CONNECT BY):
Table num
n
1
2
3
...
I tried the following query using Oracle (should work with TSQL too):
With summed_colors As (
Select name, Sum(quantity) quantity
From colors
Group By name
)
Select
name,
Case When n*2-1 = quantity Then 1 Else 2 End quantityInPackage
From summed_colors
Join nums On ( n*2-1 <= quantity )
Order By name, quantityInPackage Desc
and it returns
Blue 1
Red 2
Red 2
Red 1
Yellow 2
Yellow 1
You need to use a numbers table to unpivot your data to make multiple rows:
DECLARE #PackageSize AS int
SET #PackageSize = 2
DECLARE #numbers AS TABLE (Number int)
INSERT INTO #numbers
VALUES (1)
INSERT INTO #numbers
VALUES (2)
INSERT INTO #numbers
VALUES (3)
INSERT INTO #numbers
VALUES (4)
INSERT INTO #numbers
VALUES (5)
INSERT INTO #numbers
VALUES (6)
INSERT INTO #numbers
VALUES (7)
INSERT INTO #numbers
VALUES (8)
INSERT INTO #numbers
VALUES (9)
INSERT INTO #numbers
VALUES (10)
DECLARE #t AS TABLE
(
Id int
,Nm varchar(6)
,Qty int
)
INSERT INTO #t
VALUES (1, 'Red', 1)
INSERT INTO #t
VALUES (2, 'Red', 3)
INSERT INTO #t
VALUES (3, 'Blue', 1)
INSERT INTO #t
VALUES (4, 'Red', 1)
INSERT INTO #t
VALUES (5, 'Yellow', 3) ;
WITH Totals
AS (
SELECT Nm
,SUM(Qty) AS TotalQty
,SUM(Qty) / #PackageSize AS NumCompletePackages
,SUM(Qty) % #PackageSize AS PartialPackage
FROM #t
GROUP BY Nm
)
SELECT Totals.Nm
,#PackageSize AS QuantityInPackage
FROM Totals
INNER JOIN #numbers AS numbers
ON numbers.Number <= Totals.NumCompletePackages
UNION ALL
SELECT Totals.Nm
,PartialPackage AS QuantityInPackage
FROM Totals
WHERE PartialPackage <> 0
It's not grouping or modulo/division that's the hard part here, it's the fact that you need to do an aggregate (sum) and then explode the data again. There aren't actually any "Red 2" rows, you have to create them somehow.
For SQL Server 2005+, I would probably use a function do the "exploding":
CREATE FUNCTION dbo.CreateBuckets
(
#Num int,
#MaxPerGroup int
)
RETURNS TABLE
AS RETURN
WITH First_CTE AS
(
SELECT CASE
WHEN #MaxPerGroup < #Num THEN #MaxPerGroup
ELSE #Num
END AS Seed
),
Sequence_CTE AS
(
SELECT Seed AS [Current], Seed AS Total
FROM First_CTE
UNION ALL
SELECT
CASE
WHEN (Total + #MaxPerGroup) > #Num THEN (#Num - Total)
ELSE #MaxPerGroup
END,
Total + #MaxPerGroup
FROM Sequence_CTE
WHERE Total < #Num
)
SELECT [Current] AS Num
FROM Sequence_CTE
Then, in the main query, group (sum) the data first and then use the bucket function:
WITH Totals AS
(
SELECT Name, SUM(Quantity) AS Total
FROM Table
GROUP BY Name
)
SELECT Name, b.Num AS QuantityInPackage
FROM Totals
CROSS APPLY dbo.CreateBuckets(Total, 2) b
This should work for any bucket size, doesn't have to be 2 (just change the parameter).
This is very crude, but it works.
CREATE TABLE #Colors
(
Id int,
Name varchar(50),
Quantity int
)
INSERT INTO #Colors VALUES (1, 'Red', 1)
INSERT INTO #Colors VALUES (2, 'Red', 3)
INSERT INTO #Colors VALUES (3, 'Blue', 1)
INSERT INTO #Colors VALUES (4, 'Red', 1)
INSERT INTO #Colors VALUES (5, 'Yellow', 3)
INSERT INTO #Colors VALUES (6, 'Green', 2)
SELECT
Name,
SUM(Quantity) AS TotalQuantity
INTO #Summed
FROM
#Colors
GROUP BY
Name
SELECT
Name,
TotalQuantity / 2 AS RecordsWithQuantity2,
TotalQuantity % 2 AS RecordsWithQuantity1
INTO #SortOfPivot
FROM
#Summed
ORDER BY
Name
DECLARE #RowCount int
SET #RowCount = (SELECT COUNT(*) FROM #SortOfPivot)
DECLARE #Name varchar(50)
DECLARE #TwosInsertCount int
DECLARE #OnesInsertCount int
CREATE TABLE #Result (Name varchar(50), Quantity int)
WHILE #RowCount > 0
BEGIN
SET #Name = (SELECT TOP 1 Name FROM #SortOfPivot)
SET #TwosInsertCount = (SELECT TOP 1 RecordsWithQuantity2 FROM #SortOfPivot)
SET #OnesInsertCount = (SELECT TOP 1 RecordsWithQuantity1 FROM #SortOfPivot)
WHILE #TwosInsertCount > 0
BEGIN
INSERT INTO #Result (Name, Quantity) VALUES (#Name, 2)
SET #TwosInsertCount = #TwosInsertCount - 1
END
WHILE #OnesInsertCount > 0
BEGIN
INSERT INTO #Result (Name, Quantity) VALUES (#Name, 1)
SET #OnesInsertCount = #OnesInsertCount - 1
END
DELETE FROM #SortOfPivot WHERE Name = #Name
SET #RowCount = (SELECT COUNT(*) FROM #SortOfPivot)
END
SELECT * FROM #Result
DROP TABLE #Colors
DROP TABLE #Result
DROP TABLE #Summed
DROP TABLE #SortOfPivot

Delete duplicated rows and Update references

How do I Delete duplicated rows in one Table and update References in another table to the remaining row? The duplication only occurs in the name. The Id Columns are Identity columns.
Example:
Assume we have two tables Doubles and Data.
Doubles table (
Id int,
Name varchar(50)
)
Data Table (
Id int,
DoublesId int
)
Now I Have Two entries in the Doubls table:
Id Name
1 Foo
2 Foo
And two entries in the Data Table:
ID DoublesId
1 1
2 2
At the end there should be only one entry in the Doubles Table:
Id Name
1 Foo
And two entries in the Data Table:
Id DoublesId
1 1
2 1
In the doubles Table there can be any number of duplicated rows per name (up to 30) and also regular 'single' rows.
I've not run this, but hopefully it should be correct, and close enough to the final soln to get you there. Let me know any mistakes if you like and I'll update the answer.
--updates the data table to the min ids for each name
update Data
set id = final_id
from
Data
join
Doubles
on Doubles.id = Data.id
join
(
select
name
min(id) as final_id
from Doubles
group by name
) min_ids
on min_ids.name = Doubles.name
--deletes redundant ids from the Doubles table
delete
from Doubles
where id not in
(
select
min(id) as final_id
from Doubles
group by name
)
Note: I have taken the liberty to rename your Id's to DoubleID and DataID respectively. I find that eassier to work with.
DECLARE #Doubles TABLE (DoubleID INT, Name VARCHAR(50))
DECLARE #Data TABLE (DataID INT, DoubleID INT)
INSERT INTO #Doubles VALUES (1, 'Foo')
INSERT INTO #Doubles VALUES (2, 'Foo')
INSERT INTO #Doubles VALUES (3, 'Bar')
INSERT INTO #Doubles VALUES (4, 'Bar')
INSERT INTO #Data VALUES (1, 1)
INSERT INTO #Data VALUES (1, 2)
INSERT INTO #Data VALUES (1, 3)
INSERT INTO #Data VALUES (1, 4)
SELECT * FROM #Doubles
SELECT * FROM #Data
UPDATE #Data
SET DoubleID = MinDoubleID
FROM #Data dt
INNER JOIN #Doubles db ON db.DoubleID = dt.DoubleID
INNER JOIN (
SELECT db.Name, MinDoubleID = MIN(db.DoubleID)
FROM #Doubles db
GROUP BY db.Name
) dbmin ON dbmin.Name = db.Name
/* Kudos to quassnoi */
;WITH q AS (
SELECT Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS rn
FROM #Doubles
)
DELETE
FROM q
WHERE rn > 1
SELECT * FROM #Doubles
SELECT * FROM #Data
Take a look at this one, i have tried this, working fine
--create table Doubles ( Id int, Name varchar(50))
--create table Data( Id int, DoublesId int)
--select * from doubles
--select * from data
Declare #NonDuplicateID int
Declare #NonDuplicateName varchar(max)
DECLARE #sqlQuery nvarchar(max)
DECLARE DeleteDuplicate CURSOR FOR
SELECT Max(id),name AS SingleID FROM Doubles
GROUP BY [NAME]
OPEN DeleteDuplicate
FETCH NEXT FROM DeleteDuplicate INTO #NonDuplicateID, #NonDuplicateName
--Fetch next record
WHILE ##FETCH_STATUS = 0
BEGIN
--select b.ID , b.DoublesID, a.[name],a.id asdasd
--from doubles a inner join data b
--on
--a.ID=b.DoublesID
--where b.DoublesID<>#NonDuplicateID
--and a.[name]=#NonDuplicateName
print '---------------------------------------------';
select
#sqlQuery =
'update b
set b.DoublesID=' + cast(#NonDuplicateID as varchar(50)) + '
from
doubles a
inner join
data b
on
a.ID=b.DoublesID
where b.DoublesID<>' + cast(#NonDuplicateID as varchar(50)) +
' and a.[name]=''' + cast(#NonDuplicateName as varchar(max)) +'''';
print #sqlQuery
exec sp_executeSQL #sqlQuery
print '---------------------------------------------';
-- now move the cursor
FETCH NEXT FROM DeleteDuplicate INTO #NonDuplicateID ,#NonDuplicateName
END
CLOSE DeleteDuplicate --Close cursor
DEALLOCATE DeleteDuplicate --Deallocate cursor
---- Delete duplicate rows from original table
DELETE
FROM doubles
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM doubles
GROUP BY [NAME]
)
Please try and let me know if this helped you
Thanks
~ Aamod
If you are using MYSQL following worked for me. I did it for 2 steps
Step 1 -> Update all Data rows to one Double table reference (with lowest id)
Step 2 -> Delete all duplicates with keeping lowest id
Step 1 ->
update Data
join
Doubles
on Data.DoublesId = Doubles.id
join
(
select name, min(id) as final_id
from Doubles
group by name
) min_ids
on min_ids.name = Doubles.name
set DoublesId = min_ids.final_id;
Step 2 ->
DELETE c1 FROM Doubles c1
INNER JOIN Doubles c2
WHERE
c1.id > c2.id AND
c1.name = c2.name;