How to migrate IDs from JOIN table into foreign key column in PostgreSQL - sql

I have the following tables in my PostgreSQL database:
CREATE TABLE "User" (
id VARCHAR(25) PRIMARY KEY NOT NULL
);
CREATE TABLE "Post" (
id VARCHAR(25) PRIMARY KEY NOT NULL
);
CREATE TABLE "_PostToUser" (
"A" VARCHAR(25) NOT NULL REFERENCES "Post"(id) ON DELETE CASCADE,
"B" VARCHAR(25) NOT NULL REFERENCES "User"(id) ON DELETE CASCADE
);
The relationship between User and Post right now is managed via the _PostToUser JOIN table.
However, I want to get rid of this extra JOIN table and simply have a foreign key reference from Post to User, so I ran this query to create the foreign key:
ALTER TABLE "Post" ADD COLUMN "authorId" VARCHAR(25);
ALTER TABLE "Post"
ADD CONSTRAINT fk_author
FOREIGN KEY ("authorId")
REFERENCES "User"("id");
Now, I'm wondering what SQL query I need to run in order to migrate the data from the JOIN table to the new authorId column? If I understand correctly, I need a query that reads all the rows from the _PostToUser relation table and for each row:
Finds the respective Post record by looking up the value from column A
Inserts the value from column B as the value for authorId into that Post record
Edit: As mentioned by #Nick in the comments, I should have clarified that I indeed want to change the relationship from m-n and restrict it to 1-n: One post can at most have one author. One author/user can write many posts.

Your current design is already correct, and uses a proper junction table to store the relationships between users and their posts. In this design, a given relationship only requires storing two ID values, which is lean. Going in the direction you suggest is denormalizing your data, and will result in data duplication. To see why this is the case, your suggested table will now store metadata from the author table. This metadata will, in principle, be repetitive, since a given author's metadata would be the same for every record in the new posts table.
Instead, I suggest indexing the junction table:
CREATE INDEX idx ON "_PostToUser" (B, A);
As an example, the above index should help the following query:
SELECT u.*, p.*
FROM "User" u
INNER JOIN "_PostToUser" pu ON pu.B = u.id -- index helps here
INNER JOIN "Post" p ON p.id = pu.A; -- Post.id is already a primary key
The join to the lookup table should now be faster, because Postgres can use the index take a given user id value and try to find the corresponding A value on the other side of the junction.

As long as you are happy to restrict the relationship between Posts and Users to N:1, and you only store a foreign key to User in Post, then I think what you are doing is fine. The query to update the Post table would be:
UPDATE "Post" p
SET "authorId" = pu."B"
FROM "_PostToUser" pu
WHERE pu."A" = p."id"
Demo on dbfiddle

Related

Performing SQL Query to Remove Unused Users from a Database

I'm currently working with a database that consists of a users table, a permissions table, a set of documents-related tables, and several miscellaneous tables that have foreign key dependencies on rows in the user table.
I'm trying to remove all user entries from the 'Users' table that meet the following criteria:
Not referenced by an entry in one of the documents tables.
Not referenced by an entry in the permissions table.
Contains a null value in the 'Customer ID' column of the User row.
I'm able to create a query that gets all users, which looks like this:
SELECT id
INTO MyTableVar
FROM Users
WHERE
(NOT EXISTS (SELECT Author_Id FROM ItemInstances_DocumentInstance
WHERE Users.Id = ItemInstances_DocumentInstance.Author_Id)
AND NOT EXISTS (SELECT CompletedBy_Id FROM TaskInstanceUser
WHERE Users.Id = TaskInstanceUser.CompletedBy_Id)
AND Cust_Id IS NULL
AND Id > 4)
SELECT *
FROM MyTableVar
This query gets all of Id's of users that I want to remove, but I get an error when I try to delete these entries
The DELETE statement conflicted with the REFERENCE constraint "FK_MessageUser_User.
I'm stumped as to how I should use the ID's I've queried to remove entries in the MessageUser_User table that correspond to users I want to delete. I feel like this should be easy, but I can't figure out a way to do it with SQL syntax.
PS: I'd also appreciate some feedback on how I wrote what I have so far for my query. I'd love to know what I could do to make it cleaner. I'm new to SQL and need all the help I can get.
I'm guessing that the table with the Foreign Key does not have ON DELETE CASCADE which you can read about here.
If you have the ability to alter constraints on your table, you can do this, which will permit the referencing table to automatically delete records that reference a deleted row from the main table.
ALTER TABLE MessageUser_User DROP
CONSTRAINT FK_MessageUser_User;
ALTER TABLE MessageUser_User ADD
CONSTRAINT FK_MessageUser_User
FOREIGN KEY (<<IdColumnName>>)
REFERENCES Users (Id)
ON DELETE CASCADE;
Otherwise, you can use a separate query to delete from MessageUser_User where it contains the IDs you want to delete in it's foreign key column:
DELETE FROM MessageUser_User WHERE ID IN (SELECT ID FROM MyTableVar );
Regarding the style of your delete query - I usually prefer to do left joins then delete the records where there is a null in the right table(s):
SELECT id
INTO MyTableVar
FROM Users
LEFT JOIN ItemInstances_DocumentInstance ON Author_Id = Users.Id
LEFT JOIN TastInstanceUser ON CompletedBy_Id = Users.Id
WHERE
Author_Id IS NULL
AND CompletedBy_Id IS NULL
AND Cust_Id IS NULL
AND Id > 4

How to delete records from two different tables that are linked with FK? SQL

I have two tables City and Buildings in my database. They are linked with city_number that is Primary Key in City table and Foreign Key in Buildings table. If user wants to delete record from City table I want to remove any records from the Buildings table that is tied to that City. I use unique auto incremented id passed through the argument to remove these records. My SQL Query looks like this:
DELETE C.*, B.*
FROM City AS C
INNER JOIN Buildings AS B
ON C.c_number = B.b_district
WHERE D.c_id = 'some id example: 107';
Query above won't work sicne SQL allow only records from one table to be removed with INNER JOIN so i will have to use two separate DELETE statements like this:
DELETE
FROM City
WHERE c_id = '107'
DELETE
FROM Buildings
WHERE b_city = 'city that is tied to unique id 107'
My question is, what is the best practice to remove records that are tied in two tables? Since I have to use two separate SQL statements, should I pass City and then delete record(s) from Buildings table? or Should I create another query that will pull City from City table based on unique id and then remove record(s) from Buildings? If anyone knows better way to do this please let me know.
I believe the easiest way to accomplish your goal would be to set up your foreign key with ON DELETE CASCADE. That way, whenever a row in the parent table is deleted, any related rows in the child table will be deleted automatically.
Here is an example of a way to alter a table in order to create a foreign key with ON DELETE CASCADE:
ALTER TABLE child_table
ADD CONSTRAINT fk_name
FOREIGN KEY (child_col1, child_col2, ... child_col_n)
REFERENCES parent_table (parent_col1, parent_col2, ... parent_col_n)
ON DELETE CASCADE;
In your case, the child table would be Buildings and the parent table would be City. It sounds like you would have just city_number for the column. You'll have to fill in the name of your foreign key.
Like Shannon mentioned, you can use ON DELETE CASCADE to delete data from parent and child tables.
Here is a working example:
http://sqlfiddle.com/#!18/f5860/10
Without writing out the code, here's what I would do:
Select all the ids to be deleted for buildings belonging to a city
Delete all the buildings
Delete the city
Put it in a stored procedure
Re-usable, self-contained, and clear.
This is a violation of SRP however, let me know if you care about that and I'll post a SRP based SQL solution.

SQL query, search data and return project name

Hey I'm looking for some help in creating a stored procedure.
Here are the details
I have a table called Partners which holds the partner information (Columns, PartnerID and partnername) I also have another table called ProjectPartners which holds the link between the project and the partners columns( PPID, Partner1, partner2, partner3....partner25) and I have a further table called ProjectDetails which holds the information on the project columns( ProjectDID, Project) The foreign key for projectpartners is within Projectdetails.
I'm looking to create a stored procedure that allows me to enter a partner name, this then displays the projects they are included within. I already have some mock code but it doesn't seem to work.
#partnername nvarchar(50)
AS
SET NOCOUNT ON;
SELECT ProjectDID, Project
FROM Projectdetails
WHERE Partners.PartnerName = #partnername
Any help will be much appreciated
You are missing the joins through your table schema to get the necessary data.
Take a read of this MSDN article about joins.
select ProjectDetails.ProjectDID, ProjectDetails.Project
from ProjectDetails
join ProjectPartners on ProjectPartners.ProjectDID = ProjectDetails.ProjectDID
join Partners on Partners.PartnerId = ProjectPartners.PPID
where Partners.PartnerName = #partnerName
You haven't described the relationship between ProjectPartners and Partner, so I am assuming that the PPID column on ProjectPartners is the relationship
You have also mentioned that your ProjectPartners table has the columns PPID, Partner1, partner2, partner3....partner25. Are you only planning on having 25 partners. If you have 26 will you add a new column? You might want to address that.
Also in column naming conventions, some are a bit muddled.
You have PPID on ProjectPartners. I presume this means ProjectPartnersId.
On the table ProjectDetails you have the column ProjectDID.
This is slightly inconsistent. I guess it should either be PDID on ProjectDetails or ProjectPID on ProjectPartners
Personally, I have always had always had a preference for plain old Id as my Identity column.
UPDATE:
Based on your comments below, it sounds like you might have something a little fundamental wrong with your tables:
create table Partners (
Id int not null primary key identity,
PartnerName nvarchar(100) not null)
go
create table ProjectDetails(
Id int not null primary key identity,
Project nvarchar(100) not null)
go
create table ProjectPartners (
PartnersId int not null,
ProjectDetailsId int not null
)
go
alter table ProjectPartners add constraint FK_ProjectPartners_PartnersId_Partners_Id foreign key (PartnersId) references Partners(Id)
alter table ProjectPartners add constraint FK_ProjectPartners_ProjectDetailsId_ProjectDetails_Id foreign key (ProjectDetailsId) references ProjectDetails(Id)
go
I would suggest changing your database schema to one that is a bit more flexible as per the one provided above.
This will prevent the ever growing ProjectPartners table by adding a new column each time you have a new partner.
It will fix all issues with your foreign keys and make your tables a bit more intuitive.
This would now yield the SQL:
select ProjectDetails.Project, ProjectDetails.Id
from ProjectDetails
join ProjectPartners on ProjectPartners.ProjectDetailsId = ProjectDetails.Id
join Partners on Partners.Id = ProjectPartners.PartnersId
where Partners.PartnerName= #partnerName

Update trigger old values natural key

I have an accounts table with the account owner as the primary key. In the update trigger, I want to update some accounts to new owners. Since this table doesn't have an id field, how do I use the inserted/updated tables in the trigger? DB is sql server 2008.
CREATE TRIGGER accounts_change_owner on accounts AFTER INSERT
AS BEGIN
MERGE INTO accounts t
USING
(
SELECT *
FROM inserted e
INNER JOIN deleted f ON
e.account_owner = f.account_owner ---this won't work since the new account owner value is diff
) d
ON (t.account_owner = d.account_owner)
WHEN MATCHED THEN
UPDATE SET t.account_owner = d.account_owner
END
I think I understood your question, but I am not sure. You want to be able update account owner name in one table and to have this update propagated to the referencing tables?
If so you don't really need a trigger, you can use on update cascade foreign key.
Like this:
create table AccountOwner
(
Name varchar(100) not null
constraint PK_AccountOwner primary key
)
create table Account
(
AccountName varchar(100) not null,
AccountOwnerName varchar(100) not null
constraint FK_Account_AccountOwnerName references AccountOwner(Name) on update cascade
)
insert AccountOwner values('Owner1')
insert Account values('Account1', 'Owner1')
Now if I update table AccountOwner like this
update AccountOwner
set Name = 'Owner2'
where Name = 'Owner1'
it will automatically update table 'Account'
select *
from Account
AccountName AccountOwnerName
----------- -----------------
Account1 Owner2
I think you need to modify the design of your table. Recall that the three attributes of a primary key are that the primary key must be
Non-null
Unique
Unchanging
(If the primary key consists of multiple columns, all columns must follow the rules above). Most databases enforce #1 and #2, but the enforcement of #3 is usually left up to the developers.
Changing a primary key value is a classic Bad Idea in a relational database. You can probably come up with a way to do it; that doesn't change the fact that it's a Bad Idea. Your best choice is to add an artificial primary key to your table, put NOT NULL and a UNIQUE constraints on the ACCOUNT_OWNER field (assuming that this is the case), and change any referencing tables to use the artificial key.
The next question is, "What's so bad about changing a primary key value?". Changing the primary key value alters the unique identifier for that particular data; if something else is counting on having the original value point back to a particular row, such as a foreign key relationship, after such a change the original value will no longer point where it's supposed to point.
Good luck.

a sql question about linking tables

I wonder why these statements are not the same (the first one worked for me)
AND user_thread_map.user_id = $user_id
AND user_thread_map.thread_id = threads.id
AND user_thread_map.user_id = users.id
AND user_thread_map.thread_id = threads.id
AND users.id = $user_id
Shouldn't they be the same? In the 2nd one I linked all the tables in the first 2 lines, then I tell it to select where users.id = $user_id.
Can someone explain why the 2nd statement doesn't work? Because I thought it would.
Assuming you're getting no rows returned (you don't really say what the problem is, so I'm guessing a bit here), my first thought is that there are no rows in users where id is equal to $user_id.
That's the basic difference between those two SQL segments, the second is a cross-join of the user_thread_map, threads and users tables. The first does not join with users at all, so that's where I'd be looking for the problem.
It appears that your user_thread_map table is a many-to-many relationship between users and threads. If that is true, are you sure you have a foreign key constraint between the ID fields in that table to both corresponding other tables, something like:
users:
id integer primary key
name varchar(50)
threads:
id integer primary key
thread_text varchar(100)
user_thread_map:
user_id integer references users(id)
thread_id integer references threads(id)
If you have those foreign key constraints, it should be impossible to end up with a user_thread_map(user_id) value that doesn't have a corresponding users(id) value.
If those constraints aren't there, a query can tell you which values need to be fixed before immediately adding the constraints (this is important to prevent the problem from re-occurring), something like:
select user_thread_map.user_id
from user_thread_map
left join users
on user_thread_map.user_id = users.id
where users.id is null
The first one would select records from table user_thread_map with user_id = $user_id, irrespective of whether a record in table user existed with that id. The second query would only return something if the related record in user is found.