Remove Duplicate Records While Retaining One Record

Remove Duplicate Records While Retaining One Record - sql

I have a table of phone records as such:
ID int (Primary Key)
company varchar
dbaname varchar
coaddress varchar
cocity varchar
costate varchar
cozip varchar
phonenum varchar
What I want to accomplish is to remove all the duplicates phone numbers (phonenum field) but retain one occurence.
When doing a duplicate check, I see there are over 41000 duplicate phone numbers in the table (total of about 141000).
How would I go about doing this based on the phone number?

Assuming you want to keep only the latest record:
DELETE yourTable
FROM yourTable T
LEFT JOIN
( SELECT MAX(ID) [ID]
FROM yourTable
GROUP BY Phonenum
) MaxT
ON MaxT.ID = T.ID
WHERE MaxT.ID IS NULL
I'd definitely archive what you are deleting into another table though as there is no guarantee you are removing the correct record without checking manually or adding further criteria to the Delete statement.

Related

Complex SQL query Help Pls

Well I have 2 Table named 'A' and 'B'
One of the tables(A) include many records in this records ı have a column holding 'SENDER_CONTACT_NUMBER' it can be too many invoice from same sender.
Other table Holding 'SENDER_CONTACT_NUMBER' and 'STATUS'(1 - is active )( 0 - is not active) so in this table 'SENDER_CONTACT_NUMBER' is unique and I want make sql query that Union 2 tables 'SENDER_CONTACT_NUMBER' and I want to create 2. column for each 'SENDER_CONTACT_NUMBER' And I want to take Counts in table A (I solve the first part but second is real problem to me cause B table 'SENDER_CONTACT_NUMBER' may not be in A table so in this case I couldn't show counts :(
TABLE A
INVOICE_TYPE_CODE varchar no
SENDER_CONTACT_NUMBER varchar no
SENDER_IDENTIFIER varchar no
SENDER_NAME varchar no
RECEIVER_CONTACT varchar no
RECEIVER_IDENTIFIER varchar no
RECEIVER_NAME varchar no
TABLE B
SENDER_CONTACT_NUMBER varchar no
URUN varchar no
BASTAR char no
BITTAR char no
STATUS smallint no
Result that I want to see
COLUMN 1
Union SENDER_CONTACT_NUMBER LIST BY STATUS=1
COLUMN 2
For each record in sender_contact_number at column 1 show counts in a if there is no record put 0
enter image description here

Assuming you want a count of the number of occurrences of each SENDER_CONTACT_NUMBER in table A, including records from B that have no corresponding record in A, then this should work:
SELECT B.SENDER_CONTACT_NUMER, COUNT(A.INVOICE_TYPE_CODE) AS Invoices
FROM B
LEFT JOIN A ON B.SENDER_CONTACT_NUMER = A.SENDER_CONTACT_NUMER
WHERE B.STATUS = 1 -- Only include active contacts

Using another table as value type

I have been searching this and can not find the proper answer if I need an JOIN or SUBQUERY, I have tried multiple ways if doing this and honestly I am hitting a major wall. I am trying to do something simple and I don't know how to progress
I have two tables I am trying to use: table 1) data 2) mapping
table 1 is like this the headers are :
Date
Value1
Value2
Value3
Value4
Etc.
Value in CSV style for example would be:
1/1/10,1,2,3,4
1/2/10,5,6,7,8
1/3/10,9,10,11,12
table 2 has only one row though, here are the headers and one row
Value1
Value2
Value3
Value4
The one row would be like:
Description1, Description2, Description3, Description4
So, I want to be able to, for example do a SELECT FROM table 1 and join in the Description for each matching row where the Column names are the same, so sample output based on the above would be to be like this:
1/1/10,1,Description1,2,Description2,3,Description3,4,Description4
1/2/10,5,Description1,6,Description2,7,Description3,8,Description4
Etc

Since there's just one row in table2 and no key, you can simply join it.
select *
from table1
join table2
Since there's just one row in table2 it's questionable why it exists at all. This could be done without a join at all.
select date, value1, 'Description1', value2, 'Description2', value3, 'Description3', value4, 'Description4'
from table1;
There's likely a better way to do this. Having columns like value1, value2, and value3 usually indicates poor table design. Instead of having four value columns, you should have four value rows. And instead of having a table with four columns of descriptions, it should be four rows of descriptions.
For example, let's say you're storing items in an order. An order can have many items. Rather than having a column for each item in an order like item1, item2, item3, you'd have a row for each item in a join table. Below that's order_items. Descriptions of the items is stored separately in its own table, one row per item.
user
----
id bigint primary key
name text not null
items
-----
id bigint primary key
name text not null
orders
------
id bigint primary key
user_id bigint references users(id)
created_at datetime
order_items
-----------
order_id bigint references orders(id)
item_id bigint references items(id)
If you want to look up all the items in an order, with their names, you'd use the order_items table to get all the items in an order, and join with the items table to get each item's name.
select i.name
from order_items oi
join items i on i.id = oi.item_id
where oi.order_id = ?

Comma separated lists of values are awkward to handle.
Rather than a CSV, storing the values in a mapping table is typically used.
If I understand correctly then I believe that the following may demonstrate along the lines of what you want:-
DROP TABLE IF EXISTS table1 /* Assigned Values per date (the mapping table)*/;
DROP TABLE If EXISTS table2 /* Values */;
CREATE TABLE IF NOT EXISTS table2 (valueid INTEGER PRIMARY KEY,value_description TEXT);
CREATE TABLE IF NOT EXISTS table1 (date TEXT, valueid_reference INTEGER REFERENCES table2(valueid),value INTEGER, UNIQUE(date, valueid_reference));
INSERT INTO table2 VALUES (1,'Value1 Description'),(2,'Value2 Description'),(3,'Value3 Description'),(4,'Value4 Description');
INSERT INTO table1 VALUES
('1/1/10',1,1),('1/1/10',2,2),('1/1/10',3,3),('1/1/10',4,4),
('1/2/10',1,5),('1/2/10',2,6),('1/2/10',3,7),('1/2/10',4,8)
;
SELECT date||','||group_concat(value||','||value_description) AS all_values_and_descriptions FROM table1 JOIN table2 ON valueid_reference = valueid GROUP BY date;
SELECT * FROM table1;
This results in :-
Noting that the REFRENCES (Foreign Key) will be a noop unless Foreign Key support is enabled. However, without it will still work.
As can be seen each value per date is an individual row in table 1 (so 4 rows per date). It is the group_concat function that is used to get all the values per date in conjunction with the GROUP BY clause which creates a set of rows (a Group) for each date.
The 2nd SELECT shows the rows in table1 :-

Update a table and insert missing records

I have a table with a foreign key column with some NULL records. I can select the records with missing column such as:
SELECT * FROM Outgoing WHERE Receipt_Id IS NULL
Now for each of these records I want to insert a new record in the table Receipts, get the inserted record's Id and set it as the value for Receipt_Id in this record.
Is this possible in a query?

It seems you are looking for inserted table
INSERT INTO Receipts (col1, col2....)
OUTPUT INSERTED.*
INTO #CreatedIds -- TEMP TABLE WHICH HOLDS RECENTLY INERTED DATA INCLUDING Receipt_Id (pk)
SELECT col1, col2....
FROM Outgoing
WHERE Receipt_Id IS NULL
To, see recently inserted records
SELECT c.*
FROM #CreatedIds c -- Note this is a table variable that you need to manual create.

Update: Since you are using Receipt table only as a sequence table. You should follow the updated approach which uses Sequences
Updated Answer:
All you need to do is to create a sequence say Receipts instead of a table with one column. And then update the Outgoing table with sequence numbers.
--create table Outgoing ( id int Primary Key IDENTITY(1,1),data nvarchar(100), record_id int);
--insert into Outgoing values ('john',NULL),('jane',NULL),('jean',NULL);
create sequence dbo.receipts as int start with 1 increment by 1;
update Outgoing
set record_id= NEXT VALUE FOR dbo.receipts
where record_id is null
select * from Outgoing
See working demo
Old Answer below
If you have ID column in both tables you can update Receipt_Id based on this column back into the Outgoing table
So you steps are :
1. insert records
DECLARE #LastRID bigint
SELECT #LastRID= MAX(Id) FROM Receipts
INSERT INTO Receipts(<col list>)
SELECT <col list> FROM Outgoing WHERE Receipt_Id IS NULL
update records based on uniqueness of all columns inserted from Outgoing to receipts using CHECKSUM function
update O
set O.Receipt_Id=R.Id
From Outgoing O
Join Receipts R
on CHECKSUM(o.<col list>)=CHECKSUM(R.<col list>)
and R.Id>#LastRID

Find which column differs between 2 rows?

We are using audit tables for each operational table, which stores the previous value of its operational equivalent plus change date, change type (UPDATE or DELETE) and its own auto incremental Primary Key.
So, for a table Users with columns UserID, Name, Email there would be a table xUsers with columns ID, OpererationType, OperationDate, UserID, Name, Email.
See that the xTable contains every column that its 'parent' does with 3 extra fields. This pattern is repeated for all tables used by our system.
table Users:
UserID int
Name nvarchar
Email nvarchar
table xUsers:
xUserID int
OpererationType int
OperationDate datetime
UserID int
Name nvarchar
Email nvarchar
Now, my question:
If I have a certain UserID, for which there is 2 entries in the xUsers table when the email was changed twice,
how would I construct a query that identifies which columns (can be more than 1) differ between the two rows in the audit table?

If I'm understanding this correctly, you'd like to create a query passing in the UserID as a parameter, which I'll call #UserID for the following example.
This query will select all rows from xUsers joined onto itself where there is a difference in a non-UserID column, using a series of case statements (one per column) to pull out specifically which columns differ.
SELECT *
, CASE
WHEN a.OperationType <> b.OperationType
THEN 1
ELSE 0
END AS OperationTypeDiffers
, CASE
WHEN a.OperationDate <> b.OperationDate
THEN 1
ELSE 0
END AS OperationDateDiffers
FROM xUsers a
JOIN xUsers b
ON a.xUserID < b.xUserID
AND a.UserID = b.UserID
AND (a.OperationType <> b.OperationType
OR a.OperationDate <> b.OperationDate) -- etc.
WHERE a.UserID = #UserID

You can put the rows of xUsers in a temporary table and then make a while cycle to go for each one and compare the results.
OR
You can do some dynamic SQL and use sysobjects and syscolumns tables to compare each result. It would be more dynamic and then it would be easy to implement for other tables.

Underlying rows in Group By

I have a table with a certain number of columns and a primary key column (suppose OriginalKey). I perform a GROUP BY on a certain sub-set of those columns and store them in a temporary table with primary key (suppose GroupKey). At a later stage, I may need to get more details about one or more of those groupings (which can be found in the temporary table) i.e. I need to know which were the rows from the original table that formed that group. Simply put, I need to know the mappings between GroupKey and OriginalKey. What's the best way to do this? Thanks in advance.
Example:
Table Student(
StudentID INT PRIMARY KEY,
Level INT, --Grade/Class/Level depending on which country you are from)
HomeTown TEXT,
Gender CHAR)
INSERT INTO TempTable SELECT HomeTown, Gender, COUNT(*) AS NumStudents FROM Student GROUP BY HomeTown, Gender
On a later date, I would like to find out details about all towns that have more than 50 male students and know details of every one of them.

How about joining the 2 tables using the GroupKey, which, you say, are the same?
Or how about doing:
select * from OriginalTable where
GroupKey in (select GroupKey from my_temp_table)

You'd need to store the fields you grouped on in your temporary table, so you can join back to the original table. e.g. if you grouped on fieldA, fieldB, and fieldC, you'd need something like:
select original.id
from original
inner join temptable on
temptable.fieldA = original.fieldA and
temptable.fieldB = original.fieldB and
temptable.fieldC = original.fieldC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas