Keep one instance of duplicate appearing in one of two columns

Keep one instance of duplicate appearing in one of two columns - sql

I've got a table containing one column with unique ID and one column with each unique ID's spouse ID (if they have a spouse). The problem is that each spouse ID also appears in the unique ID column, so when I pull a list, attempting to treat a couple as a single unit, I'm often doublecounting for a single couple.
What's a good, efficient way of taking a given list of unique IDs, checking to see if their spouse is also in the same list of unique IDs, and returning only one unique ID per couple?
The issue is a little more complicated in that sometimes both spouses are not included in the same list, so it's not simply a matter of keeping one person if they're married. In the event that the spouse isn't also in the same list, I want to make sure to retain the one that is. I also want to make sure I'm retaining all people who have a NULL value in the spouse ID column.
Subset of table in question:
Unique_ID Spouse_ID
1 2
2 1
3 NULL
4 NULL
5 10
6 25
7 NULL
8 9
9 8
10 5
In this excerpt, ID's 3, 4, and 7 are all single. ID's 1, 2, 5, 8, and 9 have spouses that appear in the Unique_ID column. ID 6 has a spouse whose ID does not appear in the Unique_ID column. So, I'd want to keep ID's 1 (or 2), 3, 4, 5 (or 10), 6, 7, and 8 (or 9). Hope that makes sense.

My inclination would be to combine the two lists and remove duplicates:
select distinct id
from ((select id
from t
) union all
(select spouse_id
from t
where spouse_id in (select id from t)
)
) t
But, your question asked for an efficient way. Another way to think about this is to add a new column which is the spouse id if in the id list or NULL otherwise (this uses a left outer join. Then there are three cases:
There is no spouse id, so use the id
The id is less than the original id. Use it.
The spouse id is less than the original id. Discard this record, because the original is being used.
Here is an explicit way of expressing this:
select IdToUse
from (select t.*, tspouse.id tsid,
(case when tspouse.id is null then t.id
when t.id < tspouse.id then t.id
else NULL
end) as IdToUse
from t left outer join
t tspouse
on t.spouse_id = tspouse.id
) t
where IdToUse is not null;
You can simplify this to:
select t.*, tspouse.id tsid,
(case when tspouse.id is null then t.id
when t.id < tspouse.id then t.id
else NULL
end) as IdToUse
from t left outer join
t tspouse
on t.spouse_id = tspouse.id
where tspouse.id is null or
t.id < tspouse.id

Two tables is just plain bad design
Combine the tables
select id
from table
where id < spouseID
or spouseID is null

Related

Select all related records

I have a table (in SQL Server) that stores records as shown below. The purpose for Old_Id is for change tracking.
Meaning that when I want to update a record, the original record has to be unchanged, but a new record has to be inserted with a new Id and with updated values, and with the modified record's Id in Old_Id column
Id Name Old_Id
---------------------
1 Paul null
2 Paul 1
3 Jim null
4 Paul 2
5 Tim null
My question is:
When I search for id = 1 or 2 or 4, I want to select all related records.
In this case I want see records the following ids: 1, 2, 4
How can it be written in a stored procedure?
Even if it's bad practice to go with this, I can't change this logic because its legacy database and it's quite a large database.
Can anyone help with this?

you can do that with Recursive Common Table Expressions (CTE)
WITH cte_history AS (
SELECT
h.id,
h.name,
h.old_id
FROM
history h
WHERE old_id IS NULL
and id in (1,2,4)
UNION ALL
SELECT
e.id,
e.name,
e.old_id
FROM
history e
INNER JOIN cte_history o
ON o.id = e.old_id
)
SELECT * FROM cte_history;

Excluding null entries from multiples values with SQL

From 3 different tables, I want to know if a person (table1), with multiple visit in a store (table2), have purchased toys and enjoyed them (table3). In table3, 0 stand as either negative (so not enjoyed) or not bought. 1 stands for positive. Every visit has its own identification number.
My problem is that for every ID in table1, I have multiple entries for table2 for which I have multiple entries for table3 and only one of them is null.
Person Visit Toy
ID age Number Visit ID number name value
1 12 1 1 1 1 Plane
2 10 2 1 2 1 Train 1
3 2 1 2 Plane 1
4 2 2 2 Train 0
3 Plane 0
3 Train 1
(goes on for every id) (goes on for every visit)
I want to if know how many people have enjoyed a certain toy. However, since I have some null info, I have some trouble having those for which I only have value for both of their visit. For instance, the following code works only if the null condition is placed only on one of the visits
Select p.id, max(toy.value) as value
from person p
join visit v on p.id = v.id
join toy t on v.number = t.number
where
((t.name='plane' and v.visit=1)
or (t.name='plane' and v.visit=2))
and (
(v.visit=1 and ((t.value=1 or t.value=0) is not null))
---and (v.visit=2 and ((t.value=1 or t.value=0) is not null))
)
group by p.id
order by p.id
I have tried many ways of writing this. It does work if I try with both of null condition independently, but if I remove the -- and try for the condition on both the visit 1 and 2, it doesn't work. Note that I am using max on the value because I want a positive value is possible.

If you want to know how many people have enjoyed a certain toy, Then you may simply write this:
select count(*) from toy t where t.name='TOY NAME' and t.level=1;
If you want something else. Then kindly clarify.
Edited Query,
Select p.id, max(toy.value) as value
from person p
join visit v on p.id = v.id
join toy t on v.number = t.number
where
t.name='plane'
and t.value is not null
group by p.id
order by p.id

I used count as a way to eliminate all the null entries. The sum of null and a value is always null, so by adding restriction count=2 it eliminate the null

SQL: Tree structure without parent key

Note: The Data schema can not be changed. I'm stuck with it.
Database: SQLite
I have a simple tree structure, without parent keys, that is only 1 level deep. I have simplied the data for clarity:
ID Content Title
1 Null Canada
2 25 Toronto
3 33 Vancouver
4 Null USA
5 45 New York
6 56 Dallas
The structure is ordinal as well so all Canadian Cities are > Canada's ID of 1 and less than the USA's ID of 4
Question: How do I select all a nation's Cities when I do not know how many there are?

My query assigns every city to every country, which is probably not what you want, but:
http://sqlfiddle.com/#!5/94d63/3
SELECT *
FROM (
SELECT
place.Title AS country_name,
place.ID AS id,
(SELECT MIN(ID)
FROM place AS next_place
WHERE next_place.ID > place.ID
AND next_place.Content IS NULL
) AS next_id
FROM place
WHERE place.Content IS NULL
) AS country
INNER JOIN place
ON place.ID > country.id
AND CASE WHEN country.next_id IS NOT NULL
THEN place.ID < country.next_id
ELSE 1 END

select * from tbl
where id > 1
and id < (select min(id) from tbl where content is null and id > 1)
EDIT
I just realized the above does not work if there are no countries with greater ID. This should fix it.
select * from tbl a
where id > 4
and id < (select coalesce(b.id,a.id+1) from tbl b where b.content is null and b.id > a.id)
Edit 2 - Also made subquery fully correlated, so only have to change country id in one place.

You have here severals things to consider, one is if your data is gonna change and the other one is if it isn't gonna change, for the first one exist 2 solutions, and for the second, just one.
If your data is organize as shown in your example, you can do a select top 3, i.e.
SELECT * FROM CITIES WHERE ID NOT IN (SELECT TOP 3 ID FROM CITIES)
You can create another table where you specify wich city belongs to what parent, and make the hierarchy by yourself.
I reccomend the second one to be used.

Sql COALESCE entire rows?

I just learned about COALESCE and I'm wondering if it's possible to COALESCE an entire row of data between two tables? If not, what's the best approach to the following ramblings?
For instance, I have these two tables and assuming that all columns match:
tbl_Employees
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
tbl_Customers
Id Name Email Etc
-----------------------------------
1 Bob ... ...
2 Dan ... ...
3 Mary ... ...
And a table with id's:
tbl_PeopleInCompany
Id CompanyId
-----------------
1 1
2 1
3 1
And I want to query the data in a way that gets rows from the first table with matching id's, but gets from second table if no id is found.
So the resulting query would look like:
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
3 Mary ... ...
Where Sue and Rick was taken from the first table, and Mary from the second.

SELECT Id, Name, Email, Etc FROM tbl_Employees
WHERE Id IN (SELECT ID From tbl_PeopleInID)
UNION ALL
SELECT Id, Name, Email, Etc FROM tbl_Customers
WHERE Id IN (SELECT ID From tbl_PeopleInID) AND
Id NOT IN (SELECT Id FROM tbl_Employees)
Depending on the number of rows, there are several different ways to write these queries (with JOIN and EXISTS), but try this first.
This query first selects all the people from tbl_Employees that have an Id value in your target list (the table tbl_PeopleInID). It then adds to the "bottom" of this bunch of rows the results of the second query. The second query gets all tbl_Customer rows with Ids in your target list but excluding any with Ids that appear in tbl_Employees.
The total list contains the people you want — all Ids from tbl_PeopleInID with preference given to Employees but missing records pulled from Customers.

You can also do this:
1) Outer Join the two tables on tbl_Employees.Id = tbl_Customers.Id. This will give you all the rows from tbl_Employees and leave the tbl_Customers columns null if there is no matching row.
2) Use CASE WHEN to select either the tbl_Employees column or tbl_Customers column, based on whether tbl_Customers.Id IS NULL, like this:
CASE WHEN tbl_Customers.Id IS NULL THEN tbl_Employees.Name ELSE tbl_Customers.Name END AS Name
(My syntax might not be perfect there, but the technique is sound).

This should be pretty performant. It uses a CTE to basically build a small table of Customers that have no matching Employee records, and then it simply UNIONs that result with the Employee records
;WITH FilteredCustomers (Id, Name, Email, Etc)
AS
(
SELECT Id, Name, Email, Etc
FROM tbl_Customers C
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
LEFT JOIN tbl_Employees E
ON C.Id = E.Id
WHERE E.Id IS NULL
)
SELECT Id, Name, Email, Etc
FROM tbl_Employees E
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
UNION
SELECT Id, Name, Email, Etc
FROM FilteredCustomers
Using the IN Operator can be rather taxing on large queries as it might have to evaluate the subquery for each record being processed.

I don't think the COALESCE function can be used for what you're thinking. COALESCE is similar to ISNULL, except it allows you to pass in multiple columns, and will return the first non-null value:
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product
This article should explain it's application:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
It sounds like Larry Lustig's answer is more along the lines of what you need though.

Select values in SQL that do not have other corresponding values except those that i search for

I have a table in my database:
Name | Element
1 2
1 3
4 2
4 3
4 5
I need to make a query that for a number of arguments will select the value of Name that has on the right side these and only these values.
E.g.:
arguments are 2 and 3, the query should return only 1 and not 4 (because 4 also has 5). For arguments 2,3,5 it should return 4.
My query looks like this:
SELECT name FROM aggregations WHERE (element=2 and name in (select name from aggregations where element=3))
What do i have to add to this query to make it not return 4?

A simple way to do it:
SELECT name
FROM aggregations
WHERE element IN (2,3)
GROUP BY name
HAVING COUNT(element) = 2
If you want to add more, you'll need to change both the IN (2,3) part and the HAVING part:
SELECT name
FROM aggregations
WHERE element IN (2,3,5)
GROUP BY name
HAVING COUNT(element) = 3
A more robust way would be to check for everything that isn't not in your set:
SELECT name
FROM aggregations
WHERE NOT EXISTS (
SELECT DISTINCT a.element
FROM aggregations a
WHERE a.element NOT IN (2,3,5)
AND a.name = aggregations.name
)
GROUP BY name
HAVING COUNT(element) = 3
It's not very efficient, though.

Create a temporary table, fill it with your values and query like this:
SELECT name
FROM (
SELECT DISTINCT name
FROM aggregations
) n
WHERE NOT EXISTS
(
SELECT 1
FROM (
SELECT element
FROM aggregations aii
WHERE aii.name = n.name
) ai
FULL OUTER JOIN
temptable tt
ON tt.element = ai.element
WHERE ai.element IS NULL OR tt.element IS NULL
)
This is more efficient than using COUNT(*), since it will stop checking a name as soon as it finds the first row that doesn't have a match (either in aggregations or in temptable)

This isn't tested, but usually I would do this with a query in my where clause for a small amount of data. Note that this is not efficient for large record counts.
SELECT ag1.Name FROM aggregations ag1
WHERE ag1.Element IN (2,3)
AND 0 = (select COUNT(ag2.Name)
FROM aggregatsions ag2
WHERE ag1.Name = ag2.Name
AND ag2.Element NOT IN (2,3)
)
GROUP BY ag1.name;
This says "Give me all of the names that have the elements I want, but have no records with elements I don't want"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Keep one instance of duplicate appearing in one of two columns - sql

Two tables is just plain bad design Combine the tables select id from table where id < spouseID or spouseID is null

Related

Select all related records

Excluding null entries from multiples values with SQL

SQL: Tree structure without parent key

Sql COALESCE entire rows?

Select values in SQL that do not have other corresponding values except those that i search for

Categories

Resources