SQL DELETE group of records based on opposite group being empty - sql

In table T, I'm trying to delete all records in a groups having same value of A, but only if all members of this group have B set to 'x'.
Given the Table T:
+-------+--------+
| A | B |
+-------+--------+
| 2 | '' |
| 2 | 'x' |
| 2 | '' |
| 8 | 'x' |
| 8 | 'x' |
| 15 | '' |
| 15 | '' |
+-------+--------+
The two records with A == 8 have to be deleted as all two of them have B==1. The group of A==2 has mixed value of B so it stays. And group of A==15 doesn't have all of it's B equal to 1 it also stays.
Is this possible to do by one query?
If not, any other way that is fast enough for a table with a lot of records?

you can try this query:
delete from T
where A in (
select A
from T
group by A
having sum(B) = count(*)
)
if column b can contain non 0/1 values, you can add additional conditions:
having sum(B) = count(*) and min(b)=1 and max(b)=1
if you can't use numeric values, you can just use min/max, like
having min(b)='x' and max(b)='x'

Try this. Group by and Having with some aggregate should work
DELETE FROM tablename
WHERE a IN(SELECT a
FROM tablename
GROUP BY a
HAVING count(case when b='x' then 1 end) = Count(b)

Related

Removing duplicates in SQL based on column

I'm trying to remove orphaned rows from a database. Let's say I have this table t:
session | name | record_date | uniqueid
1 | a | 2019-04-03 | 1x
2 | a | 2019-09-19 | 1x
3 | b | 2019-08-09 | zr
4 | c | 2019-09-19 | ww
5 | d | 2019-09-03 | yy
6 | d | 2019-09-25 | rr
7 | e | 2019-09-28 | dd
8 | e | 2019-04-19 |
I'm trying to remove duplicate entries based on oldest record_date, while evaluating both name and uniqueid to ensure they're actual duplicates (not a duplicate just based on name). The catch for not simply evaluating based on uniqueid alone is that some rows have null value in for uniqueid. So in my example table, I'd want to remove the first and last rows.
You can use delete:
delete from t
where t.record_date < (select max(t2.record_date)
from t t2
where t2.name = t.name and
t2.uniqueid = t.uniqueid
);
Note: The above keeps only the most recent record for name/uniqueid pairs.
If you want the unique rows in a query, I would recommend:
select t.*
from t
where t.record_date = (select max(t2.record_date)
from t t2
where t2.name = t.name and
t2.uniqueid = t.uniqueid
);
I think you are finding max() and does not want to consider null uniqueid so use where for filter null values
name,max(record_date),uniqueid from table_name
where uniqueid is not null
group by name,uniqueid
You can try below - with aggregration and group by with where clause to filter out null uniqueid
select name,uniqueid,max(record_date)
from tablename
where uniqueid is not null
group by name,uniqueid

Returning rows with the same ID but exclude some on second column

I've seen similar questions about but not quite hitting the nail on the head for what I need. Lets say I have a table.
+-----+-------+
| ID | Value |
+-----+-------+
| 123 | 1 |
| 123 | 2 |
| 123 | 3 |
| 456 | 1 |
| 456 | 2 |
| 456 | 4 |
| 789 | 1 |
| 789 | 2 |
+-----+-------+
I want to return DISTINCT IDs but exclude those that have a certain value. For example lets say I don't want any IDs that have a 3 as a value. My results should look like.
+-----+
| ID |
+-----+
| 456 |
| 789 |
+-----+
I hope this makes sense. If more information is needed please ask and if this has been answered before please point me in the right direction. Thanks.
You can use group by and having:
select id
from t
group by id
having sum(case when value = 3 then 1 else 0 end) = 0;
The having clause counts the number of "3"s for each id. The = 0 returns only returns groups where the count is 0 (i.e. there are no "3"s).
You can use not exists :
select distinct t.id
from table t
where not exists (select 1 from table t1 where t1.id = t.id and t1.value = 3);
Try this:
select id from tablename
group by id
having (case when value=3 then 1 else 0 end)=0
You can also use EXCEPT for comparing following two data sets that will give the desired result set
select distinct Id from ValuesTbl
except
select Id from ValuesTbl where Value = 3

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

Loop through table, use conditionals, then store value in another table

The procedure is to fill the "City" column in Table B based on the "Letter" column from Table A.
TABLE A
+----------+-------+
| Number | Letter|
+----------+-------+
| 1 | A |
| 1 | |
| 1 | |
| 2 | |
| 2 | |
| 3 | |
| 3 | B |
| 3 | |
| 3 | C |
+----------+-------+
TABLE B
+----------+-------+
| AC | City |
+----------+-------+
| 1 | A |
| 1 | A |
| 1 | A |
| 1 | A |
| 2 | |
| 2 | |
| 2 | |
| 2 | |
| 3 | B |
| 3 | B |
| 3 | B |
+----------+-------+
If AC=1, refer to Number=1, and loop through the "Letter" values from top to bottom to get the top-most value.
For Number=1, the topmost value is A, so for AC=1, fill in all "City" column as A.
For AC=2, Number=2, and there are no values in Table A, so fill in all "City" for each AC=2 as blank.
For AC=3, Number=3, and the top-most value is B, so fill in all "City" for each AC=3 as B.
How do you code this in standard SQL?
I am using the Caspio software and will be inserting the SQL into the "City" column itself, but that shouldn't interfere too much with the code.
This is what I have so far:
SELECT Letter
FROM TableA
WHERE TableA.Number = TableB.AC
AND TableA.Number != ""
LIMIT 1
But it doesn't seem to be working, and I think it's necessary to loop through Table A to find the City value for each AC=Number.
Thanks for any help.
EDIT:
I have figured out the solution:
SELECT TOP 1 Letter
FROM TableA
WHERE Letter !='' AND Number=AC
Thanks.
It doesnt work because you are not including tableB in your FROM clause, or joining it. You can try this one:
SELECT Letter FROM TableA WHERE Number IN
(SELECT AC FROM TableB WHERE City!='' AND City IS NOT NULL)
AND Letter!='' AND LETTER IS NOT NULL
First things first, don't think of "looping" in SQL, it means that you're thinking about the problem wrong. You can to use set-based thinking.
So think about what you want to do, not how you want to do it.
You want to update the TableB.City based on the value of TableA.Letter
UPDATE TableB
SET City = Letter
FROM
(
SELECT Number, Letter,ROW_NUMBER () OVER ( PARTITION BY Number order by number ) AS SortOrder
FROM TableA
WHERE Letter IS NOT NULL AND Letter != ''
) AS A
WHERE A.SortOrder = 1 AND TableB.AC = A.number
I have included the Row_Number sorting, this is to ensure you get the first letter. Please note that you should order by your PK, assuming you have one and assuming that it's an IDENTITY and an int
See the sqlFiddle
EDIT
Sure, you can just do a select.
SELECT TableB.AC, A.Letter
FROM
(
SELECT Number, Letter,ROW_NUMBER () OVER ( PARTITION BY Number order by number ) AS SortOrder
FROM TableA
WHERE Letter IS NOT NULL AND Letter != ''
) AS A
LEFT OUTER JOIN TableB.AC = A.number
WHERE A.SortOrder = 1

Finding records sets with GROUP BY and SUM

I'd like to do a query for every GroupID (which always come in pairs) in which both entries have a value of 1 for HasData.
|GroupID | HasData |
|--------|---------|
| 1 | 1 |
| 1 | 1 |
| 2 | 0 |
| 2 | 1 |
| 3 | 0 |
| 3 | 0 |
| 4 | 1 |
| 4 | 1 |
So the result would be:
1
4
here's what I'm trying, but I can't seem to get it right. Whenever I do a GROUP BY on the GroupID then I only have access to that in the selector
SELECT GroupID
FROM Table
GROUP BY GroupID, HasData
HAVING SUM(HasData) = 2
But I get the following error message because HasData is acutally a bit:
Operand data type bit is invalid for sum operator.
Can I do a count of two where both records are true?
just exclude those group ID's that have a record where HasData = 0.
select distinct a.groupID
from table1 a
where not exists(select * from table1 b where b.HasData = 0 and b.groupID = a.groupID)
You can use the having clause to check that all values are 1:
select GroupId
from table
group by GroupId
having sum(cast(HasData as int)) = 2
That is, simply remove the HasData column from the group by columns and then check on it.
One more option
SELECT GroupID
FROM table
WHERE HasData <> 0
GROUP BY GroupID
HAVING COUNT(*) > 1