Import Data and Ignore Duplicates in SQL Server - sql

I'm using SQL Server 2008 R2. I have a source table of data (I_Vendor) that may have duplicates on the CompanyName column. I want to import that data into a new table (Vendor) but the new table has a Name column (that corresponds to CompanyName) with a unique constraint on it. It's been a while since I've done SQL but I saw the MERGE function, and it appears it fits the bill. I wrote the following:
MERGE Vendor AS T
USING I_Vendor AS S
ON (T.Name = S.CompanyName)
WHEN NOT MATCHED BY TARGET
THEN INSERT(VendorId, Name, ContactName, ContactInfoId)
VALUES(S.Vendor_ID, S.CompanyName, S.ContactName, S.Vendor_ID+10000);
It generates a "Violation of UNIQUE KEY constraint" and gives the name of the unique constraint on Vendor.Name. Anybody know what I'm doing wrong?

Your MERGE statement will insert all rows from I_Vendor that do not have a matching row in the Vendor table.
For example, suppose there are two rows in the I_Vendor table with company name "X", and further suppose that company name "X" does not appear in the Vendor table, then both rows will be inserted into the Vendor table, violating the constraint.
To fix the problem, you need to ensure that there is only one row per company name in the source data of the MERGE statement. The following merge statement does this, but as Aditya Naidu has already pointed out, we don't know what you want to do when there multiple records with the same company name in the I_Vendor table:
MERGE Vendor AS T
USING (SELECT MAX(Vendor_ID), CompanyName, MAX(ContactName), MAX(ContactInfoId)
FROM I_Vendor
GROUP BY CompanyName) AS S
WHEN NOT MATCHED BY TARGET
THEN INSERT(VendorId, Name, ContactName, ContactInfoId)
VALUES(S.Vendor_ID, S.CompanyName, S.ContactName, S.Vendor_ID+10000);

Related

Set column value to foreign key based on another column

I am using SQL Server and have imported data from an Excel file into my tables.
My tables consist of:
BH_Overview (foreign key table) BH_OverView Table
BH_Equipment (primary key table) BH_Equipment Table
I have different types of equipment and looking to split it out into its own table called BH_Equipment and link it into the main table BH_Overview.
I have my tables created and constraints made, however when data is imported into table I have just stored the equipment name in the the BH_Overview table in a column "Equipment" that isn't link with BH_Equipment.
I'm wondering how I go about updating the equipmentId column based on what is in the equipment column in the BH_Overview table to match the Id in the BH_Equipment table.
You can see I have the foreign keys done for Factory Area and responsibility and that was done manually with update statement as only a few foreign keys to link but with equipment there is 291 types in the BH_Equipment table.
I have tried a update and inner joins, but can't get my head around it. Apologies if I have went about this an awful way, relatively new to SQL so please show if there is a much easier way or if this has been asked before please link and ill give it a look.
UPDATE:
#Charlieface - error message appearing
The other answer is good, but for SQL Server you can update much more easily directly through a join:
update o
set equipmentId = e.id
from BH_OverView o
join BH_Equipment e on e.equipment = o.Equipment;
This rough syntax also works on Postgres, MySQL/MariaDB and the later versions of SQLite.
First, you should have the content of the column Equipment in the table BH_OverView match one of column equipment content in the table BH_Equipment
Then by the following SQL statement, you populate the corresponding equipmentId in the table BH_OverView
update BH_OverView
set equipmentId = (select id from BH_Equipment
where BH_Equipment.equipment=BH_OverView.Equipment)
after verifying the content of equipmentId in the table BH_OverView, you may drop the column Equipment from the table BH_OverView by
alter table BH_OverView drop column Equipment
I am using standard SQL which should operate on the majority of Databases.
Based on your comment
you got an error message
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
This means that in your table BH_Equipment, you have one more equipment that has the same name. You have repeated equipment name in rows of the table BH_Equipment
to get this equipment and the number of time they exist, use the following SQL statement
select equipment, count(id)
from BH_Equipment
group by equipment
having count(id)>1
delete one of the repeated rows, then the error message will not exist.

one-to-one tables relationship, linked by the autonumber in the main table

I have a main table that contain the customers details, I built another table that contain for example a yes/no fields about if the customer paid his taxes, the two tables is linked with autonumber from the main table.
I want always to keep them both with the same amount of records (that means every customer has a record in the second table even if the second table has empty record with data only in the primary key field)
I need that cause with missing records I cannot run update query to auto fill the second table and i got an error of validation rule violation.
I use this sql:
update clients LEFT JOIN MonthlyTbl ON clients.SerialNo = MonthlyTbl.serialno
set sReport04='ready';
I have almost 700 records in the main table and only 80 records in the second, and when I run the sql it updates only 80!!!!
Thanks for Help
Please use below query,
update clients set sReport04='ready' where SerialNo in
(select serialno from MonthlyTbl);
here is the right answer
first run the sql:
INSERT INTO monthlytbl ( serialno )
SELECT clients.serialno FROM clients
WHERE (((clients.[serialno]) Not In (select serialno from monthlytbl)));
and then:
select sreport04 from monthlytbl
set sReport04='ready';

Moving specific data from one database to another

This is for SQL Server. I have tables Product (product_id pk) and Customers (cust_id pk). And a few other tables that have the above as foreign key.
I need to come up with a good set of INSERT statements that can move rows from the tables above, for a specific Product from one database to another. Is there a good tool that can do this?
The twist is also that the different databases have different ids for products and customers - so inserts should first look up the ids based on something else like product name and customer name (assuming there are no duplicates).
If the databases are within the same server, you may use the following assuming they have the same table structure
USE [TESTDB]
SELECT *
INTO #values
FROM producttbl
USE [OTHERDB]
INSERT INTO tbl_product
SELECT *
FROM #values
The twist is also that the different databases have different ids for
products and customers - so inserts should first look up the ids based
on something else like product name and customer name (assuming there
are no duplicates).
For this, you have to create an SQL statement with conditions to lookup for those specific records

Assign unique ID to duplicates in Access

I had a very big excel spreadsheet that I moved into Access to try to deal with it easier. I'm very much a novice. I'm trying to use SQL via Access.
I need to assign a unique identifier to duplicates. I've seen people use DENSE_RANK in SQL but I can't get it to work in Access.
Here's what I'm trying to do: I have a large amount of patient and sample data (20k rows). My columns are called FULL_NAME, SAMPLE_NUM, and DATE_REC. Some patients have come in more than once and have multiple samples. I want to give each patient a unique ID that I want to call PATIENT_ID.
I can't figure out how to do this, aside from typing it out on each row. I would greatly appreciate help as I really don't know what I'm doing and there is no one at my work who can help.
To illustrate the previous answers' textual explanation, consider the following SQL action queries which can be run in an Access query window one by one or as VBA string queries with DAO's CurrentDb.Execute or DoCmd.RunSQL. The ALTER statements can be done in MSAcecss.exe.
Create a Patients table (make-table query)
SELECT DISTINCT s.FULL_NAME INTO myPatientsTable
FROM mySamplesTable s
WHERE s.FULL_NAME IS NOT NULL;
Add an autonumber field to new Patients table as a Primary Key
ALTER TABLE myPatientsTable ADD COLUMN PATIENT_ID AUTOINCREMENT NOT NULL PRIMARY KEY;
Add a blank Patient_ID column to Samples table
ALTER TABLE mySamplesTable ADD COLUMN PATIENT_ID INTEGER;
Update Patient_ID Column in Samples table using FULL_NAME field
UPDATE mySamplesTable s
INNER JOIN myPatientsTable p
ON s.[FULL_NAME] = p.[FULL_NAME]
SET s.PATIENT_ID = p.PATIENT_ID;
Maintain third-norm principles of relational databases and remove FULL_NAME field from Samples table
ALTER TABLE mySamplesTable DROP COLUMN FULL_NAME;
Then in a separate query, add a foreign key constraint on PATIENT_ID
ALTER TABLE mySamplesTable
ADD CONSTRAINT PatientRelationship
FOREIGN KEY (PATIENT_ID)
REFERENCES myPatientsTable (PATIENT_ID);
Sounds like FULL_NAME is currently the unique identifier. However, names make very poor unique identifiers and name parts should be in separate fields. Are you sure you don't have multiple patients with same name, e.g. John Smith?
You need a PatientInfo table and then the SampleData table. Do a query that pulls DISTINCT patient info (apparently this is only one field - FULL_NAME) and create a table that generates unique ID with autonumber field. Then build a query that joins tables on the two FULL_Name fields and updates a new field in SampleData called PatientID. Delete the FULL_Name field from SampleData.
The command to number rows in your table is [1]
ALTER TABLE MyTable ADD COLUMN ID AUTOINCREMENT;
Anyway as June7 pointed out it might not be a good idea to combine records just based on patient name as there might be duplicates. Better way will be treat each record as unique patient for now and have a way to fix patient ID when patient comes back. I would suggest to go this way:
create two new columns in your samples table
ID with autoincrement as per query above
patientID where you will copy values from ID column - for now they will be same. But in future they will diverge
copy columns patientID and patientName into separate table patients
now you can delete patientName column from samples table
add column imported to patients table to indicate, that there might be some other records that belong to this patient.
when patients come back you open his record, update all other info like address, phone, ... and look for all possible samples record that belong to him. If so, then fix patient id in those records.
Now you can switch imported indicator because this patient data are up to date.
After fixing patientID for samples records. You will end up with patients with no record in samples table. So you can go and delete them.
Unless you already have a natural key you will be corrupting this data when you run the distinct query and build a key from it. From your posting I would guess a natural key would be SAMPLE_NUM. Another problem is that if you roll up by last name you will almost certainly be combining different patients into one.

Updating id references into additional table

I'm having a little trouble (possibly codeblind currently) when it comes to migrating some data.
I have 2 tables, one is an appliance table, the other lists manufacturers. The original database stores all the data in a single table, which I'm splitting into multiple tables. I've managed to extract the manufacturers fine, as with the rest of the appliance details to the relevant tables. What I'm failing to do is link the id of the manufacturer to the appliance.
So what I want is for the id in the appliance table to be the corresponding id relative to the manufacturer name in the other table, but done in a single query from the original source material.
My original insert code as follows:
insert into c_appliance (app_serial, property_id, app_location,
app_installdate, app_warrantyexp, app_nextservice)
select [Serial No#], [Customer Number], location,
installed, [Expiry Date], [Service Due]
from dbo.[Customer Table]
This doesn't add the manufacturer into the appliance table - which I'm aware of. The manufacturer column currently remains as null while i attempt to figure out what I'm missing.
Any help would be greatly appreciated!
First of all create a unique key column in original table (temporarily) if not there.
Then insert in first table, as in the above query you are inserting in c_appliance. Also add the temporary unique key column.
Similarly insert into Manufacturer table in the same way, with temporary unique key column.
Now update you can set primary key and foreign keys on the basis of this unique key column.