Select one set of duplicate data - sql

Using SQL Server, how do I select only one "set" of relationships? To explain, here is an example. Also if you can help me figure out how to "say" this problem to make it more google-able, that would be stellar. For example, the single table contains two identical, inverted rows.
CREATE TABLE #A (
EmployeeID INT,
EmployeeName NVARCHAR(100),
CoworkerID INT,
CoworkerName NVARCHAR(100)
)
INSERT INTO #A VALUES (1, 'Alice', 2, 'Bob')
INSERT INTO #A VALUES (2, 'Bob', 1, 'Alice')
INSERT INTO #A VALUES (3, 'Charlie', 4, 'Dan')
INSERT INTO #A VALUES (4, 'Dan', 3, 'Charlie')
SELECT *
FROM #A
// THIS IS WHAT WE WANT TO PROGRAMATICALLY DETERMINE - which ID's to keep and which to delete.
DELETE FROM #A WHERE EmployeeID = 2
DELETE FROM #A WHERE EmployeeID = 4
SELECT *
FROM #A
So, the net result of the final query is:
|------------|---------------|-------------|--------------|
| EmployeeID | EmployeeName | CoworkerID | CoworkerName |
|------------|---------------|-------------|--------------|
| 1 | Alice | 2 | Bob |
|------------|---------------|-------------|--------------|
| 3 | Charlie | 4 | Dan |
|------------|---------------|-------------|--------------|

Try this:
DELETE FROM #A
WHERE #A.EmployeeID > #a.CoworkerID
AND EXISTS(SELECT 1 FROM #A A2
where a2.CoworkerID = #A.EmployeeID
and a2.employeeID = #a.coworkerID)
See SqlFiddle

This will only delete those with a circular reference. If you have more complex chains, this will not affect them.
DELETE a
FROM #A a
WHERE EXISTS (
SELECT *
FROM #A t
WHERE t.CoworkerID = a.EmployeeID
AND t.EmployeeID < a.EmployeeID
)

WITH t2CTE AS
(
SELECT*, ROW_NUMBER() over (PARTITION BY employeename ORDER BY employeename) as counter1
FROM #a
)
DELETE FROM t2CTE WHERE counter1 >1

Related

Create a table with unique values from another table

I am using MS SQL Server Management Studio. I have table -
+--------+----------+
| Num_ID | Alpha_ID |
+--------+----------+
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | B |
| 2 | C |
| 3 | A |
| 4 | C |
| 5 | A |
| 5 | B |
+--------+----------+
I want to create another table with 2 columns from this table so that column_1 gives Unique values in Num_ID (i.e. 1,2,3,4 and so on) and column_2 gives Unique values in Alpha_ID (A, B, C and so on).
But if an alphabet has already occurred, it should not occur again. So the output will be something like this -
Col_1 Col_2
================
1 - A
----------------
2 - B
----------------
3 - NULL (as A has been chosen by 1, it cannot occur next to 3)
----------------
4 - C
----------------
5 - NULL (both 5 A and 5 B cannot be chosen as A and B were picked up by 1 and 2)
----------------
Hope that makes sense.
I would like to clarify that the IDs in the input table are not numerical as I have shown, but both Num_ID and Alpha_ID are complex strings. I have simplified them to 1,2,3,... and A, B, C .... for the purpose of this question
I don't think this could be done without a cursor.
I added few more rows to your sample data to test how it works with other cases.
The logic is straight-forward. At first get a list of all distinct values of Num_ID. Then loop through them and with each iteration add one row to the destination table. To determine the Alpha_ID value to add I'll use EXCEPT operator that takes all available Alpha_ID values for the current Num_ID from the source table and removes from them all values that have been used before.
It is possible to write that INSERT without using explicit variable #CurrAlphaID, but it looks a bit cleaner with variable.
Here is SQL Fiddle.
DECLARE #TSrc TABLE (Num_ID varchar(10), Alpha_ID varchar(10));
INSERT INTO #TSrc (Num_ID, Alpha_ID) VALUES
('1', 'A'),
('1', 'B'),
('1', 'C'),
('2', 'B'),
('2', 'C'),
('3', 'A'),
('3', 'C'),
('4', 'A'),
('4', 'C'),
('5', 'A'),
('5', 'B'),
('5', 'C'),
('6', 'D'),
('6', 'E');
DECLARE #TDst TABLE (Num_ID varchar(10), Alpha_ID varchar(10));
DECLARE #CurrNumID varchar(10);
DECLARE #CurrAlphaID varchar(10);
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT DISTINCT Num_ID
FROM #TSrc
ORDER BY Num_ID;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrNumID;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
SET #CurrAlphaID =
(
SELECT TOP(1) Diff.Alpha_ID
FROM
(
SELECT Src.Alpha_ID
FROM #TSrc AS Src
WHERE Src.Num_ID = #CurrNumID
EXCEPT
SELECT Dst.Alpha_ID
FROM #TDst AS Dst
) AS Diff
ORDER BY Diff.Alpha_ID
);
INSERT INTO #TDst (Num_ID, Alpha_ID)
VALUES (#CurrNumID, #CurrAlphaID);
FETCH NEXT FROM #VarCursor INTO #CurrNumID;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT * FROM #TDst;
Result
Num_ID Alpha_ID
1 A
2 B
3 C
4 NULL
5 NULL
6 D
Having index on (Num_ID, Alpha_ID) on the source table would help. Having index on (Alpha_ID) on the destination table would help as well.
I think I've made something not through a recursion (cursor or a while)
First, I created a table with rows.
create table #tmptest
(
Num_ID int
, Alpha_ID varchar(50)
)
insert into #tmptest (Num_ID, Alpha_ID) values
(1,'A'),
(1,'B'),
(1,'C'),
(2,'B'),
(2,'C'),
(3,'A'),
(4,'C'),
(5,'A'),
(5,'B')
// this one, with row column
SELECT
ROW_NUMBER() OVER (PARTITION BY Num_ID ORDER BY Num_ID ASC) as row
, *
INTO #tmp_withrow
FROM #tmptest
and these were the results
Lastly, I made an inner query (could possibly be a left join or better).
SELECT DISTINCT
Num_ID
, (
SELECT
TOP 1
Alpha_ID
FROM #tmp_withrow in1
WHERE
in1.Num_ID = t.Num_ID
AND in1.Alpha_ID NOT IN (
SELECT
Alpha_ID
FROM #tmp_withrow in2
WHERE
in2.Num_ID < in1.Num_ID
AND in2.row = 1
)
ORDER BY in1.Num_ID ASC
) AS [NonRepeatingAlpha]
from #tmptest t
and these were the results
Note : I created a flag (row) which will allow you to query all less than the ID's you're in (in2.Num_ID < in1.Num_ID) then find out what letters where already used (in2.row = 1) and then select / avoid all letters that has already been used from the other Num_ID (
WHERE in1.Num_ID = t.Num_ID
AND in1.Alpha_ID NOT IN (
SELECT
Alpha_ID
FROM #tmp_withrow in2
WHERE
in2.Num_ID < in1.Num_ID
AND in2.row = 1
)
I hope this helps. Thanks!

SQL Ignore duplicate primary keys

Imagine you have a string of results from a SELECT statement:
ID (pk) Name Address
1 a b
1 c d
1 e f
2 a b
3 a d
2 a d
Is it possible to alter the SQL statement to get one record ONLY for the record with ID 1?
I have a SELECT statement that displays multiple values which can have the same primary key. I want to only take one of those records, if say, I have 5 records with the same primary key.
SQL: http://pastebin.com/cFCBA2Uy
Screenshot: http://i.imgur.com/UlMBZhC.png
What I want is to show only one file which is for e.g. File Number: 925, 890
You stated that no matter which row to choose when there are more than one row for the same Id, you just want one row for each id.
The following query does what you asked for:
DECLARE #T table
(
id int,
name varchar(50),
address varchar(50)
)
INSERT INTO #T VALUES
(1, 'a', 'b'),
(1, 'c', 'd'),
(1, 'e', 'f'),
(2, 'a', 'b'),
(3, 'a', 'd'),
(2, 'a', 'd');
WITH A AS
(
SELECT
t.id, t.name, t.address,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY (SELECT NULL)) AS RowNumber
FROM
#T t
)
SELECT
A.id, A.name, A.address
FROM
A
WHERE
A.RowNumber = 1
But I think there should be a criteria. If you find one, express your criteria as the ORDER BY inside the OVER clause.
EDIT:
Here you have the result:
+----+------+---------+
| id | name | address |
+----+------+---------+
| 1 | a | b |
| 2 | a | b |
| 3 | a | d |
+----+------+---------+
Disclaimer: the query I wrote is non-deterministic, different conditions (indexes, statistics, etc) might lead to different results.

CONCAT(column) OVER(PARTITION BY...)? Group-concatentating rows without grouping the result itself

I need a way to make a concatenation of all rows (per group) in a kind of window function like how you can do COUNT(*) OVER(PARTITION BY...) and the aggregate count of all rows per group will repeat across each particular group. I need something similar but a string concatenation of all values per group repeated across each group.
Here is some example data and my desired result to better illustrate my problem:
grp | val
------------
1 | a
1 | b
1 | c
1 | d
2 | x
2 | y
2 | z
And here is what I need (the desired result):
grp | val | groupcnct
---------------------------------
1 | a | abcd
1 | b | abcd
1 | c | abcd
1 | d | abcd
2 | x | xyz
2 | y | xyz
2 | z | xyz
Here is the really tricky part of this problem:
My particular situation prevents me from being able to reference the same table twice (I'm actually doing this within a recursive CTE, so I can't do a self-join of the CTE or it will throw an error).
I'm fully aware that one can do something like:
SELECT a.*, b.groupcnct
FROM tbl a
CROSS APPLY (
SELECT STUFF((
SELECT '' + aa.val
FROM tbl aa
WHERE aa.grp = a.grp
FOR XML PATH('')
), 1, 0, '') AS groupcnct
) b
But as you can see, that is referencing tbl two times in the query.
I can only reference tbl once, hence why I'm wondering if windowing the group-concatenation is possible (I'm a bit new to TSQL since I come from a MySQL background, so not sure if something like that can be done).
Create Table:
CREATE TABLE tbl
(grp int, val varchar(1));
INSERT INTO tbl
(grp, val)
VALUES
(1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(2, 'x'),
(2, 'y'),
(2, 'z');
In sql 2017 you can use STRING_AGG function:
SELECT STRING_AGG(T.val, ',') AS val
, T.grp
FROM #tbl AS T
GROUP BY T.grp
I tried using pure CTE approach: Which is the best way to form the string value using column from a Table with rows having same ID? Thinking it is faster
But the benchmark tells otherwise, it's better to use subquery(or CROSS APPLY) results from XML PATH as they are faster: Which is the best way to form the string value using column from a Table with rows having same ID?
DECLARE #tbl TABLE
(
grp INT
,val VARCHAR(1)
);
BEGIN
INSERT INTO #tbl(grp, val)
VALUES
(1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(2, 'x'),
(2, 'y'),
(2, 'z');
END;
----------- Your Required Query
SELECT ST2.grp,
SUBSTRING(
(
SELECT ','+ST1.val AS [text()]
FROM #tbl ST1
WHERE ST1.grp = ST2.grp
ORDER BY ST1.grp
For XML PATH ('')
), 2, 1000
) groupcnct
FROM #tbl ST2
Is it possible for you to just put your stuff in the select instead or do you run into the same issue? (i replaced 'tbl' with 'TEMP.TEMP123')
Select
A.*
, [GROUPCNT] = STUFF((
SELECT '' + aa.val
FROM TEMP.TEMP123 AA
WHERE aa.grp = a.grp
FOR XML PATH('')
), 1, 0, '')
from TEMP.TEMP123 A
This worked for me -- wanted to see if this worked for you too.
I know this post is old, but just in case, someone is still wondering, you can create scalar function that concatenates row values.
IF OBJECT_ID('dbo.fnConcatRowsPerGroup','FN') IS NOT NULL
DROP FUNCTION dbo.fnConcatRowsPerGroup
GO
CREATE FUNCTION dbo.fnConcatRowsPerGroup
(#grp as int) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #val AS VARCHAR(MAX)
SELECT #val = COALESCE(#val,'')+val
FROM tbl
WHERE grp = #grp
RETURN #val;
END
GO
select *, dbo.fnConcatRowsPerGroup(grp)
from tbl
Here is the result set I got from querying a sample table:
grp | val | (No column name)
---------------------------------
1 | a | abcd
1 | b | abcd
1 | c | abcd
1 | d | abcd
2 | x | xyz
2 | y | xyz
2 | z | xyz

IF NOT EXISTS in Merge statement?

I want to do the following, when primary keys matched and if there are no rows with active 'Y' insert records. Is this possible?
I tried this:
-- Merge statement
MERGE INTO table1 AS DST
USING table2 AS SRC
ON (SRC.Code = DST.Code)
--Existing records updated if data changes
WHEN MATCHED
AND IF NOT EXISTS (WHERE active='Y' FROM table1 )
THEN
INSERT INTO table1 (colum)
SELECT value
+-------+-------------+--------+
| Code | description | Active |
+-------+-------------+--------+
| AB | just | |
| | something | No |
+-------+-------------+--------+
only when there is no active record with the same Code, I want to insert a record. The new record would look like this
+-------+-------------+--------+
| Code | description | Active |
+-------+-------------+--------+
| AB | something | |
| | else | YES |
+-------+-------------+--------+
I hope that makes it more clear
edit: Never mind its not possible, I just got this error message:
An action of type 'INSERT' is not allowed in the 'WHEN MATCHED' clause of a MERGE statement.
If I understand you correctly, insert rows from #T2 that is not already in #T1 where Active = 'y'.
declare #T1 table
(
Code char(2),
Descr varchar(10),
Active char(1)
)
declare #T2 table
(
Code char(2),
Descr varchar(10)
)
insert into #T1 values
('1', 'Desc 1', 'y'),
('2', 'Desc 2', 'n')
insert into #T2 values
('1', 'Desc 1'),
('2', 'Desc 2'),
('3', 'Desc 3')
merge #T1 as D
using #T2 as S
on D.Code = S.Code and
D.Active = 'y'
when not matched then
insert (Code, Descr, Active)
values (Code, Descr, 'y');
select *
from #T1
Result:
Code Descr Active
---- ---------- ------
1 Desc 1 y
2 Desc 2 n
2 Desc 2 y
3 Desc 3 y
Row with Code 3 will also be inserted. If you did not want that, meaning that you only want to insert a row to #T1 if there already exist a row in #T2 with a match on code but Active = 'n' you could use this instead.
merge #T1 as D
using (select Code,
Descr
from #T2
where Code in (select Code
from #T1
where Active = 'n')) as S
on D.Code = S.Code and
D.Active = 'y'
when not matched then
insert (Code, Descr, Active)
values (Code, Descr, 'y');
Result:
Code Descr Active
---- ---------- ------
1 Desc 1 y
2 Desc 2 n
2 Desc 2 y

How to optimise MySQL query containing a subquery?

I have two tables, House and Person. For any row in House, there can be 0, 1 or many corresponding rows in Person. But, of those people, a maximum of one will have a status of "ACTIVE", the others will all have a status of "CANCELLED".
e.g.
SELECT * FROM House LEFT JOIN Person ON House.ID = Person.HouseID
House.ID | Person.ID | Person.Status
1 | 1 | CANCELLED
1 | 2 | CANCELLED
1 | 3 | ACTIVE
2 | 1 | ACTIVE
3 | NULL | NULL
4 | 4 | CANCELLED
I want to filter out the cancelled rows, and get something like this:
House.ID | Person.ID | Person.Status
1 | 3 | ACTIVE
2 | 1 | ACTIVE
3 | NULL | NULL
4 | NULL | NULL
I've achieved this with the following sub select:
SELECT *
FROM House
LEFT JOIN
(
SELECT *
FROM Person
WHERE Person.Status != "CANCELLED"
) Person
ON House.ID = Person.HouseID
...which works, but breaks all the indexes. Is there a better solution that doesn't?
I'm using MySQL and all relevant columns are indexed. EXPLAIN lists nothing in possible_keys.
Thanks.
How about:
SELECT *
FROM House
LEFT JOIN Person
ON House.ID = Person.HouseID
AND Person.Status != "CANCELLED"
Do you have control of the database structure? If so, I think you could better represent your data by removing the column Status from the Person table and instead adding a column ActivePersonID to the House table. This way you remove all the redundant CANCELLED values from Person and eliminate application or stored procedure code to ensure only one person per household is active.
In addition, you could then represent your query as
SELECT * FROM House LEFT JOIN Person ON House.ActivePersonID = Person.ID
Use:
SELECT *
FROM HOUSE h
LEFT JOIN PERSON p ON p.houseid = h.id
AND p.status = 'ACTIVE'
This is in SQL Server, but the logic seems to work, echoing Chris above:
declare #house table
(
houseid int
)
declare #person table
(
personid int,
houseid int,
personstatus varchar(20)
)
insert into #house (houseid) VALUES (1)
insert into #house (houseid) VALUES (2)
insert into #house (houseid) VALUES (3)
insert into #house (houseid) VALUES (4)
insert into #person (personid, houseid, personstatus) VALUES (1, 1, 'CANCELLED')
insert into #person (personid, houseid, personstatus) VALUES (2, 1, 'CANCELLED')
insert into #person (personid, houseid, personstatus) VALUES (3, 1, 'ACTIVE')
insert into #person (personid, houseid, personstatus) VALUES (1, 2, 'ACTIVE')
insert into #person (personid, houseid, personstatus) VALUES (4, 4, 'CANCELLED')
select * from #house
select * from #person
select *
from #house h LEFT OUTER JOIN #person p ON h.houseid = p.houseid
AND p.personstatus <> 'CANCELLED'