MSSQL insert rows into relational tables. - sql

I have two tables. Person and Phones. Many phones numbers could be associated with one person by foreign key. If I want to add a phone number and map it to particular person, how my SQL should look like?
In my understanding:
SQL statement should be transact, therefore first I have to insert person into Person table and after insert phone number in Phones and map it with just inserted row in Person table.
What if row is already exist in one of another table? How should I handle it?
I am Looking for a clean and simple solution or sql example.
Note: I don't have access for creating stored procedures.

If you're inserting a new Person with new Phones, then you would
Insert into the Person table.
Use LAST_INSERT_ID() to get the ID which was just generated on that insert.
Use that ID to insert records into the Phone table.
If you're inserting a new Phones for an existing Person, then you would
Select the Person to get its ID if you don't already have it
Use that ID to insert records into the Phone table.
What if row is already exist in one of another table? How should I handle it?
Define "already exists" in this context. What defines uniqueness in your data? In cases like this you may want to consider incorporating that definition of uniqueness into the primary key in that table. (Which can be composed of more than one column.) Otherwise you'll have to SELECT from the table to see if the row already exists. If it does, update it. If it doesn't, insert it. (Or however you want to handle already-existing data logically in your domain.)
Keep in mind that it's easy to go overboard with uniqueness in cases like this. For example, you might be tempted to try to create a many-to-many relationship between these tables so that you can avoid having duplicate phone numbers. In real world scenarios this ends up being a bad idea because it's possible that:
Two people share the same phone number.
One of those two people changes his/her number, but the other one doesn't.
In an overly-normalized scenario, the above events would result in one of the following:
Both users' phone numbers are updated when only one of them actually updates it, resulting in incorrect data for the other user.
You have to write overly-complicated code to check for this scenario and create a new record (disassociating the previous many-to-many relationship), resulting in a lot of unnecessary code and points of failure.

Related

advantages and disadvantages of database automatic number generator for each row vs manual numbering for each row

Imagine two tables that implemented like the following description:
The first table rows numbers created by database system administration automatically.
The second table rows numbers created manually by the programmer in a sequential order.
The main question is what are the advantages and disadvantages of these two approaches?
One distinct advantage of having the database manage auto-numbering over manually creating them is that the database implementation is thread safe - and manually creating them is usually (99.9% of the cases) is not (It's hard to do it correctly).
On the other hand, the database implementation does not guarantee sequential numbering - there can be gaps in the numbers.
Given these two facts, an auto-increment column should be used only as a surrogate key, when the values of this column does not have any business meaning - but they are simple used as a simple row identifier.
Please note that when using a surrogate key, it's best to also enforce uniqueness of a natural key - otherwise you might get rows where all the data is duplicated except the surrogate key.
When the database automatically create numbers, you habe less work.
Think about a sign up system, you have fields like name, email, password and so one:
1.) the number is generated by the database, so you can just insert the data into the table.
2.) if this is not the case you have to get the last number, so before the insert into you have to get the last id so instead a insert into you have a select + insert into.
Another reason is, what happened when you delete a row in your table?
Maybe in a forum, you want to delete the account but not all of his posts, so you can work with a workaround and when a post has a user_id not given you know this is/was a deleted or banned account - if you give a new user the number from a deleted user you will come in trouble.

Inserting test data which references another table without hard coding the foreign key values

I'm trying to write a SQL query that will insert test data into two tables, one of which references the other.
Tables are created from something like the following:
CREATE TABLE address (
address_id INTEGER IDENTITY PRIMARY KEY,
...[irrelevant columns]
);
CREATE TABLE member (
...[irrelevant columns],
address INTEGER,
FOREIGN KEY(address) REFERENCES address(address_id)
);
I want ids in both tables to auto increment, so that I can easily insert new rows later without having to look into the table for ids.
I need to insert some test data into both tables, about 25 rows in each. Hardcoding ids for the insert causes issues with inserting new rows later, as the automatic values for the id columns try and start with 1 (which is already in the database). So I need to let the ids be automatically generated, but I also need to know which ids are in the database for inserting test data into the member database - I don't believe the autogenerated ones are guaranteed to be consecutive, so can't assume I can safely hardcode those.
This is test data - I don't care which record I link each member row I am inserting to, only that there is an address record in the address table with that id.
My thoughts for how to do this so far include:
Insert addresses individually, returning the id, then use that to insert an individual member (cons: potentially messy, not sure of the syntax, harder to see expected sets of addresses/members in the test data)
Do the member insert with a SELECT address_id FROM address WHERE [some condition that will only give one row] for the address column (cons: also a bit messy, involves a quite long statement for something I don't care about)
Is there a neater way around this problem?
I particularly wonder if there is a way to either:
Let the auto increment controlling functions be aware of manually inserted id values, or
Get the list of inserted ids from the address table into a variable which I can use values from in turn to insert members.
Ideally, I'd like this to work with as many (irritatingly slightly different) database engines as possible, but I need to support at least postgresql and sqlite - ideally in a single query, although I could have two separate ones. (I have separate ones for creating the tables, the sole difference being INTEGER GENEREATED BY DEFAULT AS IDENTITY instead of just IDENTITY.)
https://www.postgresql.org/docs/8.1/static/functions-sequence.html
Sounds like LASTVAL() is what you're looking for. It was also work in the real world to maintain transactional consistency between multiple selects, as it's scoped to your sessions last insert.

Adding record with new foreign key

I have few tables to store company information in my database, but I want to focus on two of them. One table, Company, contains CompanyID, which is autoincremented, and some other columns that are irrelevant for now. The problem is that companies use different versions of names (e.g. IBM vs. International Business Machines) and I want/need to store them all for futher use, so I can't keep names in Company table. Therefore I have another table, CompanyName that uses CompanyID as a foreign key (it's one-to-many relation).
Now, I need to import some new companies, and I have names only. Therefore I want to add them to CompanyName table, but create new records in Company table immediately, so I can put right CompanyID in CompanyName table.
Is it possible with one query? How to approach this problem properly? Do I need to go as far as writing VBA procedure to add records one by one?
I searched Stack and other websites, but I didn't find any solution for my problem, and I can't figure it out myself. I guess it could be done with form and subform, but ultimately I want to put all my queries in macro, so data import would be done automatically.
I'm not database expert, so maybe I just designed it badly, but I didn't figure out another way to cleanly store multiple names of the same entity.
The table structure you setup appears to be a good way to do this. But there's not a way to insert records into both tables at the same time. One option is to write two queries to insert records into Company and then CompanyName. After inserting records into Company you will need to create a query that joins from the source table to the Company table joining it on a field that uniquely defines the record beside the autoincrement key. That will allow you to get the key field from Company for use when you insert into CompanyName.
The other option, is to write some VBA code to loop through the source data inserting records into both. The would be preferable since it should be more reliable.

Current primary key is ineffective at preventing duplicates. Does this sound like a good way to rearchitect my tables?

Every so often, I update our research recruitment database with those who responded to our Craigslist ad. Each respondent is given a unique respondentID, which is the primary key.
Sometimes, people respond to these Craigslist ads multiple times. I think we may have duplicate people in our database, which is bad.
I would like to change the primary key of all my recruitment tables from respondentID to Email, which will prevent duplicates and make it easier to look up information. There are probably duplicate email records in my database already, and I need to clean this up if so.
Here's the current architecture for my three recruitment tables:
demographic - contains columns like RespondentID (PK), Email (I want this to be PK), Phone, etc
genre - contains columns like RespondentID (PK), Horror, etc
platform - contains columns like RespondentID (PK), TV, etc.
I want to join all three tables together at some point so we can get a better understanding of someone.
Here are my questions:
How can I eliminate duplicate respondents already in my database? (I can tell if they are duplicates because they will have the same Email value.)
Given my current architecture, how can I transition my database to have Email as the primary key without messing up my data?
After transitioning to a new architecture, what is the process I can use to delete duplicates in my Craigslist ad spreadsheet before I append them to Demo, Genre, and Platform tables?
Here are my ideas about solutions:
Create backup tables. Join the three tables and export the big table to Excel. In Excel, use Data Filtering and Conditional Formatting to find the duplicate entries, and delete them by hand. Unfortunately, I have 20,000 records which will crash Excel. :( The chief issue is that I don't know how to remove duplicate entries within a table using SQL. (Also, if I have two entries by bobdole#republican.com, one entry should remain.) Can you come up with a smarter solution involving SQL and Access?
After each Email record is unique, I will create new tables with each using Email as the primary key.
When I want to remove duplicates within the data I'd like to import, I should be able to easily do it within Excel. Next, I will use this SQL command to deduplicate between the current database and the incoming data:
DELETE * from newParticipantsList
WHERE Email in (SelectEmail from Demo)
I'm going to try to duplicate my current architecture in a small test table in Access and see if I can figure it out. Overall, I don't have much experience with joining tables and removing data in SQL, so it's a little scary.
Maybe I'm just being thick, but why don't you just create a new Identity column in the existing table? You can always remove those records you deem duplicates, but the Identity column is guaranteed to be unique under all circumstances.
It will be up to you to make sure that any new records inserted into the table are not duplicates, by checking the Email column.
To remove duplicates from demographic table you could do something like:
WITH RecordsToKeep AS (
SELECT MIN(RespondentID) as RespondentID
FROM demographic
GROUP BY Email
) DELETE demographic
FROM demographic
LEFT JOIN RecordsToKeep on RecordsToKeep.RespondentID = demographic.RespondentID
where RecordsToKeep.RespondentID IS NULL
This will keep the first record for each email address and delete the rest. You will need to remap the genre and platform tables before you delete the source.
In terms of what to do in the future, you could get SQL to do all the de-duplicating for you by importing the data into a staging table and then only import distinct records to the final when the address isn't already in the demographic table.
There is no reason to change the Email Address to be the primary key. String's aren't great primary keys for a number of reasons. The problem you have isn't with duplicate keys, the problem is how you are inserting the data.

Fixing DB Inconsistencies - ID Fields

I've inherited a (Microsoft?) SQL database that wasn't very pristine in its original state. There are still some very strange things in it that I'm trying to fix - one of them is inconsistent ID entries.
In the accounts table, each entry has a number called accountID, which is referenced in several other tables (notes, equipment, etc. ). The problem is that the numbers (for some random reason) - range from about -100000 to +2000000 when there are about only 7000 entries.
Is there any good way to re-number them while changing corresponding numbers in the other tables? At my disposal I also have ColdFusion, so any thing that works with SQL and/or that I'll accept.
For surrogate keys, they are meant to be meaningless, so unless you actually had a database integrity issue (like there were no foreign key contraints properly defined) or your identity was approaching the maximum for its datatype, I would leave them alone and go after some other low hanging fruit that would have more impact.
In this instance, it sounds like "why" is a better question than "how". The OP notes that there is a strange problem that needs to be fixed but doesn't say why it is a problem. Is it causing problems? What positive impact would changing these numbers have? Unless you originally programmed the system and understand precisely why the number is in its current state, you are taking quite a risky making changes like this.
I would talk to an accountant (or at least your financial people) before messing in anyway with the numbers in the accounts tables if this is a financial app. The Table of accounts is very critical to how finances are reported. These IDs may have meaning you don't understand. No one puts in a negative id unless they had a reason. I would under no circumstances change that unless I understood why it was negative to begin with. You could truly screw up your tax reporting or some other thing by making an uneeded change.
You could probably disable the foreign key relationships (if you're able to take it offline temporarily) and then update the primary keys using a script. I've used this update script before to change values, and you could pretty easily wrap this code in a cursor to go through the key values in question, one by one, and update the arbitrary value to an incrementing value you're keeping track of.
Check out the script here: http://vyaskn.tripod.com/sql_server_search_and_replace.htm
If you just have a list of tables that use the primary key, you could set up a series of UPDATE statements that run inside your cursor, and then you wouldn't need to use this script (which can be a little slow).
It's worth asking, though, why these values appear out of wack. Does this database have values added and deleted constantly? Are the primary key values really arbitrary, or do they just appear to be, but they really have meaning? Though I'm all for consolidating, you'd have to ensure that there's no purpose to those values.
With ColdFusion this shouldn't be a herculean task, but it will be messy and you'll have to be careful. One method you could use would be to script the database and then generate a brand new, blank table schema. Set the accountID as an identity field in the new database.
Then, using ColdFusion, write a query that will pull all of the old account data and insert them into the new database one by one. For each row, let the new database assign a new ID. After each insert, pull the new ID (using either ##IDENTITY or MAX(accountID)) and store the new ID and the old ID together in a temporary table so you know which old IDs belong to which new IDs.
Next, repeat the process with each of the child tables. For each old ID, pull its child entries and re-insert them into the new database using the new IDs. If the primary keys on the child tables are fine, you can insert them as-is or let the server assign new ones if they don't matter.
Assigning new IDs in place by disabling relationships temporarily may work, but you might also run into conflicts if one of the entries is assigned an ID that is already being used by the old data which could cause conflicts.
Create a new column in the accounts table for your new ID, and new column in each of your related tables to reference the new ID column.
ALTER TABLE accounts
ADD new_accountID int IDENTITY
ALTER TABLE notes
ADD new_accountID int
ALTER TABLE equipment
ADD new_accountID int
Then you can map the new_accountID column on each of your referencing tables to the accounts table.
UPDATE notes
SET new_accountID = accounts.new_accountID
FROM accounts
INNER JOIN notes ON (notes.accountID = accounts.accountID)
UPDATE equipment
SET new_accountID = accounts.new_accountID
FROM accounts
INNER JOIN equipment ON (equipment.accountID = accounts.accountID)
At this point, each table has both accountID with the old keys, and new_accountID with the new keys. From here it should be pretty straightforward.
Break all of the foreign keys on accountID.
On each table, UPDATE [table] SET accountID = new_accountID.
Re-add the foreign keys for accountID.
Drop new_accountID from all of the tables, as it's no longer needed.