Using a Delete query on a single table when referencing other tables - sql

I want to run a delete query to remove certain data from a table in a Sharepoint list using an MS Access query. However I want to be sure only to delete from a single list based on the values of another table.
The table is TMainData: This consists solely of number fields that are references to the keyfields in other tables, such as TProgram which has a program name, or TContact which has the point of contact, or TPositionTitle which has a title like Site Director.
So a TMainData entry looks something like
ProgramID, which links to TPrograms: 4
ContactID, which links to TContacts: 42
PositionTitle, which links to TPositionTitle: 3
This tells me that the Site Director (TPositionTitle 3) of Anesthesiology (ProgramID 4) is John Smith (ContactID 42).
Here's where it gets tricky: I have a reference under TPrograms to TProgramType. I want to delete all records under TMainData where they link to a certain Program Type, because that program type is going away. HOWEVER... I don't want to delete the program itself (yet), just the lines referencing that program in TMainData.
The "manual" way I see to do this is to run queries that identify what the ProgramIDs are of the programs I want to delete the contacts for, and then use those IDs in a delete query that only references the TMainData query. I'm wondering if there's a way to use referential data, because I may have to be running some ridiculous update queries at a later time that would need this same info.
I dug through https://support.office.com/en-us/article/Use-queries-to-delete-one-or-more-records-from-a-database-A323BF1A-C9B4-4C86-8719-BE58BDF1B10C but it doesn't seem to cover deleting from one table based on values referenced in another table.

You already seem to understand what you need to do to achieve the desired result when you state:
...run queries that identify what the ProgramIDs are of the programs I want to delete the contacts for, and then use those IDs in a delete query that only references the TMainData query.
If I've understood your description correctly, I would suggest something along the lines of:
delete from tmaindata
where
tmaindata.programid in
(
select tprograms.programid
from tprograms
where tprograms.tprogramtype = 'YourProgramType'
)
Always take a backup of your data before running delete queries - there is no undo.

Related

How to remove Duplicates and update all relations from PostgreSQL database?

I'm working on updating a local dataset that has a lot of cases with the following structure:
The elements in table A that share the same refID for the main entity are virtually the same, so I wanna remove all these duplicates and update tables B, C and so on.
The idea is to group all elements in table A that share the same refId, choose one, remove all the others and change the references in other tables to the one I chose. However, I'm having problems with the last part, updating the other tables.
Is there a quick way of doing this? Doing it manually has been a pain

Adding record with new foreign key

I have few tables to store company information in my database, but I want to focus on two of them. One table, Company, contains CompanyID, which is autoincremented, and some other columns that are irrelevant for now. The problem is that companies use different versions of names (e.g. IBM vs. International Business Machines) and I want/need to store them all for futher use, so I can't keep names in Company table. Therefore I have another table, CompanyName that uses CompanyID as a foreign key (it's one-to-many relation).
Now, I need to import some new companies, and I have names only. Therefore I want to add them to CompanyName table, but create new records in Company table immediately, so I can put right CompanyID in CompanyName table.
Is it possible with one query? How to approach this problem properly? Do I need to go as far as writing VBA procedure to add records one by one?
I searched Stack and other websites, but I didn't find any solution for my problem, and I can't figure it out myself. I guess it could be done with form and subform, but ultimately I want to put all my queries in macro, so data import would be done automatically.
I'm not database expert, so maybe I just designed it badly, but I didn't figure out another way to cleanly store multiple names of the same entity.
The table structure you setup appears to be a good way to do this. But there's not a way to insert records into both tables at the same time. One option is to write two queries to insert records into Company and then CompanyName. After inserting records into Company you will need to create a query that joins from the source table to the Company table joining it on a field that uniquely defines the record beside the autoincrement key. That will allow you to get the key field from Company for use when you insert into CompanyName.
The other option, is to write some VBA code to loop through the source data inserting records into both. The would be preferable since it should be more reliable.

MSSQL insert rows into relational tables.

I have two tables. Person and Phones. Many phones numbers could be associated with one person by foreign key. If I want to add a phone number and map it to particular person, how my SQL should look like?
In my understanding:
SQL statement should be transact, therefore first I have to insert person into Person table and after insert phone number in Phones and map it with just inserted row in Person table.
What if row is already exist in one of another table? How should I handle it?
I am Looking for a clean and simple solution or sql example.
Note: I don't have access for creating stored procedures.
If you're inserting a new Person with new Phones, then you would
Insert into the Person table.
Use LAST_INSERT_ID() to get the ID which was just generated on that insert.
Use that ID to insert records into the Phone table.
If you're inserting a new Phones for an existing Person, then you would
Select the Person to get its ID if you don't already have it
Use that ID to insert records into the Phone table.
What if row is already exist in one of another table? How should I handle it?
Define "already exists" in this context. What defines uniqueness in your data? In cases like this you may want to consider incorporating that definition of uniqueness into the primary key in that table. (Which can be composed of more than one column.) Otherwise you'll have to SELECT from the table to see if the row already exists. If it does, update it. If it doesn't, insert it. (Or however you want to handle already-existing data logically in your domain.)
Keep in mind that it's easy to go overboard with uniqueness in cases like this. For example, you might be tempted to try to create a many-to-many relationship between these tables so that you can avoid having duplicate phone numbers. In real world scenarios this ends up being a bad idea because it's possible that:
Two people share the same phone number.
One of those two people changes his/her number, but the other one doesn't.
In an overly-normalized scenario, the above events would result in one of the following:
Both users' phone numbers are updated when only one of them actually updates it, resulting in incorrect data for the other user.
You have to write overly-complicated code to check for this scenario and create a new record (disassociating the previous many-to-many relationship), resulting in a lot of unnecessary code and points of failure.

Current primary key is ineffective at preventing duplicates. Does this sound like a good way to rearchitect my tables?

Every so often, I update our research recruitment database with those who responded to our Craigslist ad. Each respondent is given a unique respondentID, which is the primary key.
Sometimes, people respond to these Craigslist ads multiple times. I think we may have duplicate people in our database, which is bad.
I would like to change the primary key of all my recruitment tables from respondentID to Email, which will prevent duplicates and make it easier to look up information. There are probably duplicate email records in my database already, and I need to clean this up if so.
Here's the current architecture for my three recruitment tables:
demographic - contains columns like RespondentID (PK), Email (I want this to be PK), Phone, etc
genre - contains columns like RespondentID (PK), Horror, etc
platform - contains columns like RespondentID (PK), TV, etc.
I want to join all three tables together at some point so we can get a better understanding of someone.
Here are my questions:
How can I eliminate duplicate respondents already in my database? (I can tell if they are duplicates because they will have the same Email value.)
Given my current architecture, how can I transition my database to have Email as the primary key without messing up my data?
After transitioning to a new architecture, what is the process I can use to delete duplicates in my Craigslist ad spreadsheet before I append them to Demo, Genre, and Platform tables?
Here are my ideas about solutions:
Create backup tables. Join the three tables and export the big table to Excel. In Excel, use Data Filtering and Conditional Formatting to find the duplicate entries, and delete them by hand. Unfortunately, I have 20,000 records which will crash Excel. :( The chief issue is that I don't know how to remove duplicate entries within a table using SQL. (Also, if I have two entries by bobdole#republican.com, one entry should remain.) Can you come up with a smarter solution involving SQL and Access?
After each Email record is unique, I will create new tables with each using Email as the primary key.
When I want to remove duplicates within the data I'd like to import, I should be able to easily do it within Excel. Next, I will use this SQL command to deduplicate between the current database and the incoming data:
DELETE * from newParticipantsList
WHERE Email in (SelectEmail from Demo)
I'm going to try to duplicate my current architecture in a small test table in Access and see if I can figure it out. Overall, I don't have much experience with joining tables and removing data in SQL, so it's a little scary.
Maybe I'm just being thick, but why don't you just create a new Identity column in the existing table? You can always remove those records you deem duplicates, but the Identity column is guaranteed to be unique under all circumstances.
It will be up to you to make sure that any new records inserted into the table are not duplicates, by checking the Email column.
To remove duplicates from demographic table you could do something like:
WITH RecordsToKeep AS (
SELECT MIN(RespondentID) as RespondentID
FROM demographic
GROUP BY Email
) DELETE demographic
FROM demographic
LEFT JOIN RecordsToKeep on RecordsToKeep.RespondentID = demographic.RespondentID
where RecordsToKeep.RespondentID IS NULL
This will keep the first record for each email address and delete the rest. You will need to remap the genre and platform tables before you delete the source.
In terms of what to do in the future, you could get SQL to do all the de-duplicating for you by importing the data into a staging table and then only import distinct records to the final when the address isn't already in the demographic table.
There is no reason to change the Email Address to be the primary key. String's aren't great primary keys for a number of reasons. The problem you have isn't with duplicate keys, the problem is how you are inserting the data.

How do I know if record from an SQL database is being used elsewhere?

Is there a way to know that a record is being used by another record in the database?
Using deleting as an example: When I create an SQL statement trying to delete a group in dbo.group I get this error:
The DELETE statement conflicted with the REFERENCE constraint "FK_MyTable". The conflict occurred in database "MyDB", table "dbo.User", column 'Group_ID'.
Since I have a user that has a foreign key relationship to the group I am not able to delete the group. I want to be able to know if the record is linked to other records before I run the delete statement. Is there a way to do that?
Basically I want to show the user that the record that they are viewing is undeleable. I do not want to try to delete the record.
Other folks are suggesting ways to detect dependent rows, but the problem with this is that there's a race condition: if your test finds no dependent rows, and you try to delete the group, there might be another client application that adds a dependent row in the brief moment between your two queries. So your information that the group is unused is potentially outdated as soon as you get it.
The better approach is to try to delete the group, and handle any errors that are raised.
This is also better for performance, because you don't have to run any SELECT query to check if the dependent rows exist.
Re your comment and edited question: Okay, fair enough, it makes sense to use this information to give hints to the user (e.g. display a "delete group" button or else gray out the button).
In that case, you can use one of the suggestions from other folks, such as query the count of dependent rows in the Users table. If you need information for multiple groups, I'd do one query, joining Groups to Users and then group by the group id.
SELECT g.groupid, COUNT(*) AS user_count
FROM dbo.Groups g JOIN dbo.Users u ON (g.groupid = u.groupid)
GROUP BY g.groupid;
That'd be better than running a separate SQL query for each group to get the count of users.
If you don't know how many tables may depend on Groups, you should learn to use the INFORMATION_SCHEMA system views to query for metadata. I don't think this is the case for you, so I won't go into detail.
Do a query which checks if there are users which have the column group_id set to the id you want to delete. If the query returns 0 rows you can delete without exception
SELECT count(group_id)
FROM dbo.User
WHERE group_id = [yourgroupidtodeletevalue]
You can set up cascading deletes.
You can query the Forign Key table. The error tells you which table is dependent on the foreign key lookup.