How can I delete completely duplicate rows from a query, without having a unique value for it? - sql

I'm having an issue getting information from an MS Access Database table. I need a count of a code but I don't have to take into account duplicate rows, which means that I need to delete all duplicate rows.
Here's an example to illustrate what I need:
Code | Name
12 | George
20 | John
12 | George
33 | John
I will need first to delete both rows with the same code, and then I need a count for the name the rest of the table data for example this will be the result that I'm expecting:
Name | Count
John | 2
I already have a query that does that for me, but is taking around 1 hour to get me around 5000 rows and I need something more efficient. My query:
select name, count(*) from Table
where name = '" + input_name + "'
and code in (select code from Table group by code
having count(code) = 1)
group by name
order by count(name) desc;
I would appreciate any suggestion.

Rather than using in, I might suggest filtering the original dataset in a subquery, e.g.:
select u.name, count(*)
from (select t.code, t.name from yourtable t group by t.code, t.name having count(*) = 1) u
group by u.name
Here, change yourtable to the name of your table.

Related

MS Access equivalent for using dense_rank in select

In MS Access, I have a table with 2 million account records/rows with various columns of data. I wish to apply a sequence number to every account record. (i.e.- 1 for the first account record ABC111, 2 for the second account record DEF222..., etc.)
Then, I would like to assign a batch number sequence for every 5 distinct account number. (i.e - record 1 with account number ABC111 being associated with batch number 101, record 2 with account number DEF222 being associated with batch number of 101)
This is how I would do it with a sql server query:
select distinct(p.accountnumber),FLOOR(((50 + dense_rank() over(order by
p.accountnumber)) - 1)/5) + 100 As BATCH from
db2inst1.account_table p
Raw Data:
AccountNumber
ABC111
DEF222
GHI333
JKL444
MNO555
PQR666
STU777
Resulting Data:
RecordNumber AccountNumber BatchNumber
1 ABC111 101
2 DEF222 101
3 GHI333 101
4 JKL444 101
5 MNO555 101
6 PQR666 102
7 STU777 102
I tried to make a query that uses SELECT as well as DENSE_RANK but I couldn't figure out how to make it work.
Thanks for reading my question
Something like this would probably work.
I'd first create a temporary table to hold the distinct account numbers, then I'd do an update query to assign the ranking.
CREATE TABLE tmpAccountRank
(AccountNumber TEXT(10)
CONSTRAINT PrimaryKey PRIMARY KEY,
AccountRank INTEGER NULL);
Then I'd use this table to generate the account ranking.
DELETE FROM tmpAccountRank;
INSERT INTO tmpAccountRank(AccountNumber)
SELECT DISTINCT AccountNumber FROM db2inst1.account_table;
UPDATE tmpAccountRank
SET AccountRank =
DCOUNT('AccountNumber', 'tmpAccountRank',
'AccountNumber < ''' + AccountNumber + '''') \ 5 + 101
I use DCOUNT and integer division (\ 5) to generate the ranking. This probably will have terrible performance but I think it's the way you would do it in MS Access.
If you want to skip the temp table, you can do it all in a nested subquery, but I don't think it's a great practice to do too much in a single query, especially in MS Access.
SELECT AccountNumber,
(SELECT COUNT(*) FROM
(SELECT DISTINCT AccountNumber
FROM db2inst1.account_table
WHERE AccountNumber < t.AccountNumber) q)) \ 5 + 101
FROM db2inst1.account_table t
Actually, this won't work in MS Access; apparently you can't reference tables outside of multiple levels of nesting in a subquery.
You can do dense_rank() with a correlated subquery. The logic is:
select a.*,
(select count(distinct a2.accountnumber)
from db2inst1.account_table as a2
where a2.accountnumber <= a.accountnumber
) as dense_rank
from db2inst1.account_table as a;
Then, you can use this for getting the batch number. Unfortunately, I don't follow the logic in your question (dense_rank() produces a number but your batch number is not numeric). However, this should answer your question.
EDIT:
Oh, that's right. In MS Access you need nested subqueries:
select a.*,
(select count(*)
from (select distinct a2.accountnumber
from db2inst1.account_table as a2
) as a2
where a2.accountnumber <= a.accountnumber
) as dense_rank
from db2inst1.account_table as a;

SQL Remove Duplicates, save lowest of certain column

I've been looking for an answer to this but couldn't find anything the same as this particular situation.
So I have a one table that I want to remove duplicates from.
__________________
| JobNumber-String |
| JobOp - Number |
------------------
So there are multiples of these two values, together they make the key for the row. I want keep all distinct job numbers with the lowest job op. How can I do this? I've tried a bunch of things, mainly trying the min function, but that only seems to work on the entire table not just the JobNumber sets. Thanks!
Original Table Values:
JobNumber Jobop
123 100
123 101
456 200
456 201
780 300
Code Ran:
DELETE FROM table
WHERE CONCAT(JobNumber,JobOp) NOT IN
(
SELECT CONCAT(JobNumber,MIN(JobOp))
FROM table
GROUP BY JobNumber
)
Ending Table Values:
JobNumber Jobop
123 100
456 200
780 300
With SQL Server 2008 or higher you can enhance the MIN function with an OVER clause specifying a PARTITION BY section.
Please have a look at https://msdn.microsoft.com/en-us/library/ms189461.aspx
You can simply select the values you want to keep:
select jobOp, min(number) from table group by jobOp
Then you can delete the records you don't want:
DELETE t FROM table t
left JOIN (select jobOp, min(number) as minnumber from table group by jobOp ) e
ON t.jobob = e.jobob and t.number = e.minnumber
Where e.jobob is null
I like to do this with window functions:
with todelete as (
select t.*, min(jobop) over (partition by numbers) as minjop
from table t
)
delete from todelete
where jobop > minjop;
It sounds like you are not using the correct GROUP BY clause when using the MIN function. This sql should give you the minimum JobOp value for each JobNumber:
SELECT JobNumber, MIN(JobOp) FROM test.so_test GROUP BY JobNumber;
Using this in a subquery, along with CONCAT (this is from MySQL, SQL Server might use different function) because both fields form your key, gives you this sql:
SELECT * FROM so_test WHERE CONCAT(JobNumber,JobOp)
NOT IN (SELECT CONCAT(JobNumber,MIN(JobOp)) FROM test.so_test GROUP BY JobNumber);

Search for occurrences of string in one table field in the field of another

Let's say I want to find mentions of names listed in one table within another. So for instance I have this table:
ID | Name
----+-----------------------
1 | PersonA
2 | PersonB
3 | PersonC
4 | PersonD
Now I want to search a field in another table for these persons' names and produce a count for each. Here's what I've tried, to no avail:
select
Name,
sum(
select
count(*)
from Posts
where Posts.Body like '%[^N]' + [Name] + '%'
) as [Count]
from NamesTable
order by Name;
I am using Data Explorer here on SE, so whatever syntax will work there is what I need. I'm not sure how to get this working or if this is even the best approach.
Your query is very close. You just don't need the sum() in the outer query:
select Name,
(select count(*)
from Posts
where Posts.Body like '%[^N]' + [Name] + '%'
) as [Count]
from NamesTable
order by Name;

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

write a query to identify discrepancy

I have a table with Student ID's and Student Names. There has been issues with assigning unique Student Id's to students and Hence I want to find the duplicates
Here is the sample Table:
Student ID Student Name
1 Jack
1 John
1 Bill
2 Amanda
2 Molly
3 Ron
4 Matt
5 James
6 Kathy
6 Will
Here I want a third column "Duplicate_Count" to display count of duplicate records.
For e.g. "Duplicate_Count" would display "3" for Student ID = 1 and so on. How can I do this?
Thanks in advance
Select StudentId, Count(*) DupCount
From Table
Group By StudentId
Having Count(*) > 1
Order By Count(*) desc,
Select
aa.StudentId, aa.StudentName, bb.DupCount
from
Table as aa
join
(
Select StudentId, Count(*) as DupCount from Table group by StudentId
) as bb
on aa.StudentId = bb.StudentId
The virtual table gives the count for each StudentId, this is joined back to the original table to add the count to each student record.
If you want to add a column to the table to hold dupcount, this query can be used in an update statement to update that column in the table
This should work:
update mytable
set duplicate_count = (select count(*) from mytable t where t.id = mytable.id)
UPDATE:
As mentioned by #HansUp, adding a new column with the duplicate count probably doesn't make sense, but that really depends on what the OP originally thought of using it for. I'm leaving the answer in case it is of help for someone else.