SQL Select WHERE duplicate in one column x AND column y != z

SQL Select WHERE duplicate in one column x AND column y != z - sql

So we had a duplicate SQL scripts running on our server and didn't realize it till just recently. Essentially I have many rows where there are 2 entries with the same column x (crn).
The initially got entered with the same column y (status) as well. Our application has users update the column y (status). However now we have 2 rows one with a status of 'S' and one with a status of something other than 'S'. My goal:
DELETE everything from the table WHERE there is a duplicate CRN and the STATUS is S. I don't want to delete rows unless there is a duplicate, but if there is, I only want to delete the row with a status of 'S'. Also, I'd rather not delete both records if both have a status of S, but if I do, that isn't such a big deal because I will get the courses again in the next download.
I have started making a select statement to query the rows I want, but don't know how to do the ONLY SELECT IF DUPLICATE EXISTS part. I feel like I need to UNION or LEFT JOIN or something to only get records if a duplicate exists.
SELECT * FROM
cas_usuECourses
WHERE
crn IN (SELECT crn FROM cas_usuECourses GROUP BY crn having count(1) > 1)
AND status = 'S'
AND termCode = 201320
EDIT: Is there a way to say... the above, but if both dups have 'S' only delete one of them?
EDIT: I "think" this looks good to me. Any thoughts?
SELECT id FROM (
SELECT id, Row_Number() Over (Partition By crn ORDER BY id DESC) as ranking
FROM cas_usuECourses
WHERE status = 'S'
AND termCode = 201320
) as ranking
WHERE ranking = 1
I think this will give me all the ids where the status is 'S', and if there are two with 'S' this will give me the one that was created second. I found out that every entry in our termCode has duplicates, so... don't need to worry about checking for the duplicates.

If you could add one column to your table and fill it with distinct values, it would be trivial - you could target each row.
Otherwise, after your initial step, I would generally open a cursor on your subselect with status S to select only crn's where both statuses are 'S', and in each loop iteration delete top 1 record with appropriate crn. That way you can get rid of duplicate crn/status pairs.

Related

SQL How to update every nth row which meets requirement

I have a table that I would like to update one column data on every nth row if it meets row requirement.
My table has many columns but the key are Object_Id (in case this could be useful for creating temp table)
But the one I'm trying to update is online_status, it looks like below, but on bigger scales so I usually have 10rows that has same time but they all have %Online% in it and in total around 2000 rows (with Online and about another 2000 with Offline). I just need to update every 2-4 rows of those 10 that are repeating itself.
Table picture here: (for some reason table formatting doesn't come up good)
Table
So what I tried is: This pulls a list of every 3rd record that matches criteria Online, I just need a way to update it but can't get through this.
SELECT * FROM (SELECT *, row_number() over() rn FROM people
WHERE online_status LIKE '%Online%') foo WHERE online_status LIKE '%Online%' AND foo.rn % 3 =0
What I also tried is:
However this has updated every single row. not the ones I needed.
UPDATE people
SET online_status = 'Offline 00:00-24:00'
WHERE people.Object_id IN
(SELECT *
FROM
(SELECT people.Object_id, row_number() over() rn FROM people
WHERE online_status LIKE '%Online%') foo WHERE people LIKE '%Online%' AND foo.rn % 3 =0);
Is there a way to take list from Select code above and simply update it or run a few scripts that could add it to like temp table and store object ids, and the next script would update main table if object id would match temp table.
Thank you for any help :)

Don't select other columns but Object_id in the subquery at WHERE people.Object_id IN (..)
UPDATE people
SET online_status = 'Offline 00:00-24:00'
WHERE Object_id IN
( SELECT Object_id
FROM
( SELECT p.Object_id, row_number() over() rn
FROM people p
WHERE p.online_status LIKE '%Online%') foo
WHERE foo.rn % 3 = 0
);

SQL Server Sum multiple rows into one - no temp table

I would like to see a most concise way to do what is outlined in this SO question: Sum values from multiple rows into one row
that is, combine multiple rows while summing a column.
But how to then delete the duplicates. In other words I have data like this:
Person Value
--------------
1 10
1 20
2 15
And I want to sum the values for any duplicates (on the Person col) into a single row and get rid of the other duplicates on the Person value. So my output would be:
Person Value
-------------
1 30
2 15
And I would like to do this without using a temp table. I think that I'll need to use OVER PARTITION BY but just not sure. Just trying to challenge myself in not doing it the temp table way. Working with SQL Server 2008 R2
Simply put, give me a concise stmt getting from my input to my output in the same table. So if my table name is People if I do a select * from People on it before the operation that I am asking in this question I get the first set above and then when I do a select * from People after the operation, I get the second set of data above.

Not sure why not using Temp table but here's one way to avoid it (tho imho this is an overkill):
UPDATE MyTable SET VALUE = (SELECT SUM(Value) FROM MyTable MT WHERE MT.Person = MyTable.Person);
WITH DUP_TABLE AS
(SELECT ROW_NUMBER()
OVER (PARTITION BY Person ORDER BY Person) As ROW_NO
FROM MyTable)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
First query updates every duplicate person to the summary value. Second query removes duplicate persons.
Demo: http://sqlfiddle.com/#!3/db7aa/11

All you're asking for is a simple SUM() aggregate function and a GROUP BY
SELECT Person, SUM(Value)
FROM myTable
GROUP BY Person
The SUM() by itself would sum up the values in a column, but when you add a secondary column and GROUP BY it, SQL will show distinct values from the secondary column and perform the aggregate function by those distinct categories.

Delete multiple occurrences of the same ID # and code in a junction table

enter code here
My problem is this: in this database the junction table contains some rows where the kha_id and the icd_fk are the same. While it's OK that kha_id appears in icd_junction more than once , it has to be with a separate icd_fk. I can run a query and get all of the ID#s and the codes which are listed more than once, but is there an industry-standard way of going about deleting all but one occurrence of each?
example: what i have is above
KHA_ID: 123456 V23
123456 V23
123456 V24
I need one of the rows kha_id=123456 and ICD_FK=V23 taken out.

This:
DELETE j1
FROM ICD_Junction AS j1
WHERE EXISTS
( SELECT 1
FROM ICD_Junction AS j2
WHERE j2.KHA_ID = j1.KHA_ID
AND j2.ICD_FK = j1.ICD_FK
AND j2.ID < j1.ID
)
;
will delete, for each KHA_ID and ICD_FK, all but one relevant row of ICD_Junction. (Specifically, it will keep the one with the least ID, and delete the rest.)
Once you've run the above, you should fix whatever code caused the duplication, and add a unique constraint to prevent this from happening again.
(Disclaimer: Not tested, and it's been a while since I last used SQL Server.)
Edited to add: If I'm understanding your comment correctly, you also need help with the query to find duplicates? For that, you can write:
SELECT KHA_ID,
ICD_FK,
COUNT(1) -- the number of duplicates
FROM ICD_Junction
GROUP
BY KHA_ID,
ICD_FK
HAVING COUNT(1) > 1
;

The original question was delete but the comment was find
Select jDup.*
FROM ICD_Junction AS j
JOIN ICD_Junction AS jDup
On j.KHA_ID = jDup.KHA_ID
AND j.ICD_FK = jDup.ICD_FK
AND j.ID < jDup.ID
Select max(jDup.ID), min(jDup.ID), count(*), jDup.KHA_ID, jDup.ICD_FK
FROM ICD_Junction AS jDup
Group By jDup.KHA_ID, jDup.ICD_FK
Having Count(*) > 1

You want something that uses ROW_NUMBER() and partition by. The reason is that it will let you pick one row to keep from a table that doesn't have a unique id. Like if this was a pure intersection table with no identity, you could use a variation on this to delete all rows where RowID > 1, leaving you just the unique rows. And it works just as well when you do have a unique id, where you can choose to preserve the earliest id.
select * from (select KHA_ID, ICD_FK, ROW_NUMBER()
OVER(PARTITION BY KHA_ID, ICD_FK
ORDER BY ID ASC) AS RowID
from ICD_Junction ) ordered where RowID > 1

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.

Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.

Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC

I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.

But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK

Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.

In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1

Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well

If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.

I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)

How to mark duplicates in an SQL query

I have an SQL query which looks at date-of-birth, last name and a soundex of first name to identify duplicates. The following query finds some 8,000 rows (which I assume means there are around 8,000 duplicate records).
select dob,last_name,soundex(first_name),count(*)
from clients
group by dob,last_name,soundex(first_name)
having count(*) >1
Almost all of the results have a count of 2, a few have a count of 3 where obviously the record existed twice in one of the two databases which were merged.
The next step I need to take is to mark one of the rows, doesn't really matter, with a duplicate flag and to mark each row with the opposite rows key. Is there a way of doing this using SQL?

This should do what you are after, the UPDATE in one go.
UPDATE FROM clients c
INNER JOIN
(
select dob,last_name,soundex(first_name),MIN(id) as keep
from clients
group by dob,last_name,soundex(first_name)
having count(*) >1
) k
ON c.dob=k.dob AND c.last_name=k.last_name AND soundex(c.first_name)=soundex(k.first_name)
SET duplicateid = NULLIF(k.keep, c.id),
hasduplicate = (k.keep = c.id)
It assumes you have 3 columns not stated in the question
id: primary key
duplicateid: points to the dup being kept
hasduplicate: boolean, marks the one to keep

Well, you could use SELECT DISTINCT, and then mark a single row as "not duplicate" -- then search for rows that are "not duplicate" to find the duplicate.

Here is a query that will give you not only the duplicates, but also the first id inserted (assuming Id is the sequential primary-key column) and the newest id.
OTTOMH
select dob, last_name, soundex(first_name) firstnamesoundex, min (Id) OldestId, max (Id) NewestId, Count (*) NumRows
from clients
group by dob,last_name,soundex(first_name)
having count(*) >1
You can use this in a JOIN to do your update
UPDATE Clients
SET OppositeRowId = DuplicateRows.NewestId
FROM
(
select dob, last_name, soundex(first_name) firstnamesoundex, min (Id) OldestId, max (Id) NewestId, Count (*) NumRows
from clients
group by dob,last_name,soundex(first_name)
having count(*) >1
) DuplicateRows
WHERE
DuplicateRows.OldestId = Clients.Id
All of this assumes that you have one duplicate. If you have more than one, you are going to have to try something different.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Select WHERE duplicate in one column x AND column y != z - sql

Related

SQL How to update every nth row which meets requirement

SQL Server Sum multiple rows into one - no temp table

Delete multiple occurrences of the same ID # and code in a junction table

Getting the last record in SQL in WHERE condition

How to mark duplicates in an SQL query

Categories

Resources