Recommended way to deal with updating m2m table postgres

Recommended way to deal with updating m2m table postgres - sql

I have the below tables
A project table
project_id,project_name
A skill table
skill_id,skill_name
A project_skill table (many to many relationship)
project_skill_id,project_id,skill_id
The browser will have a form which asks the user to enter a project name and and SO style autocomplete for tags. I'm sending the below json format back to sql for insertion
{"project_name":"foo","skills":["bar","baz"]}
My question relates to a situation where the user gets to edit an existing project.Assuming the user removes "baz" from skills and includes "zed". How do i properly deal with updating the many to many table
{"project_name":"foo","skills":["bar","zed","biz"]}
Do i remove all records from the m2m table and do a fresh insert with the new skills?
remove all records based on project_id
insert new records of bar,zed,biz
Do i check in the server what was removed/added and remove only what was actually removed
remove baz from table
add biz
This also pertains to modifying project_name etc. Do i check what was modified and update the necessary or perform a complete delete and insert

I'd use a CTE with a MERGE (note this is SQL Server but Postgres should be similar):
;WITH src AS
(
SELECT p.project_id, s.skill_id
FROM
dbo.project AS p
INNER JOIN #input AS i ON p.project_name = i.project_name
INNER JOIN dbo.skill AS s ON i.skill_name = s.skill_name
)
MERGE INTO dbo.project_skill AS tgt
USING src
ON tgt.project_id = src.project_id AND tgt.skill_id = src.skill_id
WHEN NOT MATCHED BY TARGET THEN
INSERT (project_id, skill_id) VALUES (src.project_id, src.skill_id)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
where #input contains the new values:
DECLARE #input TABLE
(
project_name VARCHAR(100),
skill_name VARCHAR(100)
);

Related

using Polybase and Stored procedure for updating dbo.table from several external tables

I need some help in this..
I have 3 external tables:
create external table ext.titanic
(
PassengerId INT,
Pclass INT,
Pname VARCHAR(100),
Gender VARCHAR(20),
Ticket VARCHAR(30),
Cabin VARCHAR(30)
)
WITH (LOCATION='/titanic.csv',
DATA_SOURCE = blob1,
FILE_FORMAT = TextFileFormat1,
);
create external table ext.titanic2
(
Pclass INT,
Pname VARCHAR(100)
)
WITH (LOCATION='/titanic2.csv',
DATA_SOURCE = blob1,
FILE_FORMAT = TextFileFormat1,
);
create external table ext.titanic3
(
PassengerId INT,
Pname VARCHAR(100),
)
WITH (LOCATION='/titanic3.csv',
DATA_SOURCE = blob1,
FILE_FORMAT = TextFileFormat1,
);
and i have dbo table created:
CREATE TABLE dbo.titanic
WITH
(
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
titanic.PassengerId,
titanic.Pclass,
titanic.Pname,
titanic.Gender,
titanic.Ticket,
titanic.Cabin,
titanic3.PassengerId as T3_PassengerId,
titanic3.Pname as T3_Pname,
titanic2.Pclass as T2_Pclass,
titanic2.Pname as T2_Pname
FROM ext.titanic
FULL JOIN ext.titanic2 ON ext.titanic2.PassengerId=ext.titanic.PassengerId
FULL JOIN ext.titanic3 ON ext.titanic3.Pclass=ext.titanic.Pclass;
I have to join them and update the dbo.titanic with a stored procedure
do i need additional ext.table to join them there and after that to merge it with dbo.titanic?
or there is a easy and simple way to do that?
also i need more help for the dbo.titanic and joins..
there are more unique PassengerIds in titanic3 than in titanic,
but i need all PassengerIds from the 2 tables to be in one column.. same for Pclass from both tables... that is bugging me
just for reference - titanic table has around 100000 rows(800 unique PassengerIDs) and titanic2 and titanic3 have 5000 unique (total)rows for PassengerId and Pclass.
The final table must look like dbo.titanic but without T3_PassengerId and T2_Pclass as they must be merged somehow in the PassengerId and Pclass.
I lost a lot of time looking for something like that, but didn't find anything close enough.
This is the best I could find:
https://www.sqlservercentral.com/articles/access-external-data-from-azure-synapse-analytics-using-polybase
and I want to thank the guy that wrote this,
but to use it, I have 3 main issues :
there are no 3 external tables with different columns that need to be joined
there is no update so this can be used after the creation of the tables.(as I understand update cant be used with external tables)
there is no stored procedure used for this update.
Can I use something like this
INSERT INTO table1(column1, column2,...) SELECT column1, column2,... FROM table2 WHERE condition( compare value in table1 <> value in table 2)
thanks in advance

You must not create another ext.table; the way Polybase works is that it will load all data to Temp tables and then it can be merged to dbo.titanic.
Perform a left/right join if the tables don't have the same IDs but you need all of them.
Use the following code, then it will be easy to create the SP:
;WITH [MyCTE] AS (SELECT ...) UPDATE dbo.titanic SET ...;
You can't update using Polybase, you will have to create a new file i.e. titanic4.csv which has the records joined.
Please try and update with your progress, so I can help you further.

I got this...
may be not the most elegant way but it works.. using left join, with additional stg.titanic table (same as dbo.titanic) that combines the 3 external tables.. then merge stg. and dbo. tables..
MERGE dbo.titanic AS [Target]
USING (SELECT
column1,2,3
UpdateTime
from stg.titanic) AS [Source]
ON [Target].PassengerId = [Source].PassengerId
and [Target].Pclass = [Source].Pclass
and [Target].Pname = [Source].Pname --- specifies the condition
WHEN MATCHED THEN
UPDATE SET [Target].UpdateTime = GetDate()
WHEN NOT MATCHED THEN
INSERT (column1,2,3 --- when one of the 3 conditions is not met then insert new row
UpdateTime)
VALUES (
[Source].column1,2,3
[Source].UpdateTime
);
if someone knows a better way it will be good to share with us
Thanks.

What could be the workaround to avoid the MERGE issue i.e. The target of a MERGE statement cannot be a remote table?

Copying data from one table to another both on different servers but similar structures.
Ended up on this.
declare #ClassIds table (OldClassId int, NewClassId int);
merge into newDB.dbo.tblClasses as target
using
(
select
Id = Id * (-1),
[Name]
from
oldDB.dbo.tblClasses
)
as source on source.Id = target.Id
when not matched by target then
insert ([Name])
values (source.[Name])
output source.Id * (-1), inserted.Id -- ← the trick is here
into #ClassIds (OldClassId, NewClassId);
insert into newDB.dbo.tblStudents
select
s.Id,
s.[Name],
ClassId = ids.NewClassId
from
oldDB.dbo.tblStudents s
inner join #ClassIds ids on ids.OldClassId = s.ClassId;
but error:
The target of a MERGE statement cannot be a remote table, a remote view, or a view over remote tables.
Workaround could be reversing i.e. target and server but that's not ideal in my situation.
What should I do?
Original question:
Original question
Reason to do this:
the reason is I am copying the parent-child data and in the target the references to parent would be lost since the primary keys are auto generated hence in target a new record in parent would generate new Id but child would have the old parent id as of the source hence lost. So to avoid that the merge would make sure tyo update the child record with new parent ids.
edit:
the newDB is on the different server i.e. [192.168.xxx.xxx].newDB.dbo.tblStudents

If you are not able to change the remote DB structure, I would suggest to build the ClassId mapping table right in the target Class table:
drop table if exists #ClassIdMap;
create table #ClassIdMap (SourceClassId int, TargetClassId int);
declare #Prefix varchar(10) = 'MyClassId=';
insert into targetServer.targetDb.dbo.Classes
([Name])
select
-- insert the source class id values with some unique prefix
[Name] = concat(#Prefix, Id)
from
sourceServer.sourceDb.dbo.Classes;
-- then create the ClassId mapping table
-- getting the SourceClassId by from the target Name column
insert #ClassIdMap (
SourceClassId,
TargetClassId)
select
SourceClassId = replace([Name], #Prefix, ''),
TargetClassId = Id
from
targetServer.targetDb.dbo.Class
where
[Name] like #Prefix + '%';
-- replace the source Ids with the Name values
update target set
[Name] = source.[Name]
from
targetServer.targetDb.dbo.Class target
inner join #ClassIdMap map on map.TargetClassId = target.Id
inner join sourceServer.sourceDb.dbo.Classes source on source.Id = map.SourceClassId;
-- and use the ClassId mapping table
-- to insert Students into correct classes
insert into targetServer.targetDb.dbo.Students (
[Name] ,
ClassId )
select
s.[Name],
ClassId = map.TargetClassId
from
sourceServer.sourceDb.dbo.Students s
inner join #ClassIdMap map on map.SourceClassId = s.ClassId;
The problem or risk with this script is that it is not idempotent — being executed twice it creates the duplicates.
To eliminate this risk, it is necessary to somehow remember on the source side what has already been inserted.

Need to duplicate a row an its related data in other tables. Revision a row

My company has a database with Project related data. At times, they would like to Revise a project, keeping the old version and copying it so they can work on a copied version. The project table has a revision field that defaults to 0 and should increment by one when they click a revise button on the front-end website. The hierarchy would look like:
Project(ProjectID)
Project_Details: (ID) | (ProjectID)
Activities: (ID) | (ProjectID)
Activity_Details: (ID) | (ActivitiesID)
ProjectID will link all my tables together. I Have an Activities table that will contain activities for a project. So one to many. The Activities table will link all of its table by ActivityID.
What i Have so far just to test out:
INSERT INTO Project SELECT projectnumber, MAX(Revision)+1 FROM Project Where projectnumber = '23.444.555'
SELECT ##IDENTITY
INSERT INTO ProjectDatails SELECT ##IDENTITY, Rate, Department FROM ProjectDatails where projectid = #projectid
INSERT INTO Activities SELECT ##IDENTITY, Area_No, Completed_Date FROM Activities where projectid = #projectid
This is where i am not sure what to do from here. I need to copy all my rows from an Activity_Details table that relate to my Activities table by activityid. However, there are multiple rows in my Activities table with the same ProjectID.
So it looks something like a foreach row in Activities with ProjectID = #projectid, get the activityid in that row, copy all rows in Activity_Details with that activityid.
How do I accomplish that.

No need for a loop. What you need is a mapping between the 'old' and 'new' activity records and use that mapping to create the Activity_details with the correct ActivityID.
If you can add another field on Activities, which will store the last ActivityID that record was copied from, you can use that in the join to insert into activities details:
INSERT INTO Activities (ProjectID, Area_No, Completed_Date, Last_ActivityID)
SELECT #newprojectid, Area_No, Completed_Date, ActivityID FROM Activities where projectid = #projectid
INSERT INTO Activity_Details (ActivityID, Details)
SELECT Activities.ActivityID, Details FROM Activity_Details
INNER JOIN Activities ON Activity_Details.ActivityID = Activities.Last_ActivityID
where Activities.projectid = #newprojectid
If you cannot (or don't want to) add that field, you will have to rely on a MERGE statement to get the get the mapping. Quite a bit trickier, but still doable. Probably best left to a different answer, if desired.

How to insert multiple instances based on an array

We have two roles: Admin and Customer. There are a number of default users with email addresses following the pattern:
An Admin - admin1#.com, admin2#.com etc.
A Customer - user1#.com, user2#.com etc.
Then, we run the script for each combination (and in case with admins, it's done twice, because they're customers too).
insert into AspNetUserRoles values(
(select Id from AspNetUsers where Email = 'AAA'),
(select Id from AspNetRoles where Name = 'BBB'))
Now, based on my question, you can take a guess how it's resolved right now. For each new email, we add a statement or two. If we'd add a new role, we'd have to add a number of statement, possibly as many as the number of registered emails.
I sense there's away to declare a matrix on form:
a#.com, role1, role2
b#.com, role1,
c#.com
d#.com, role1, role3, role4
I've tried for a while but couldn't figure out the syntax, though. The actual DBA says it's not (easily) doable and that the script we have right now is as it's supposed to be done.
I suspect he's full of Christmas candy having been processed but, not being a DBA myself, I can't really argue, unless I have something that works. I also suspect that I didn't google the right way (i.e. I used wrong terms to describe what I want, due to my ignorance).
Edit
Realizing that the question might be misleading, I'll give an example in speudo-code to illustrate my intention.
List<Link> links = new List<Link> {
new {a1,b1}, new {a1,b2},
new {a2,b2},
new {a3,b1}, new {a3,b3}, new {a3,b4} }
foreach(Link in links)
ExecuteSql(
"insert into Links values(
(select Id from FirstTable where Name = link.A),
(select Id from SecondTable where Name = link.B))"
);
The part I can't figure out is how to declare such a list and how to loop through it.

1) Say we start by creating a temp table.
-- Create temp table for user and roles
CREATE TABLE #temp(
AspNetUser varchar(1000) ,
AspNetRoles varchar(1000));
2a) populate it from a File (eg userroles.csv)
a#.com,role1|b#.com,role1|c#.com,|d#.com,role1 role3 role4
Like this
-- Read from csv
BULK INSERT #temp FROM 'D:\userroles.csv'
WITH (
FIELDTERMINATOR =','
,ROWTERMINATOR ='|');
2b) OR do your own inserts in the script
INSERT INTO #temp
(AspNetUser, AspNetRoles)
VALUES
('a#.com','role1'),
('b#.com','role1'),
('c#.com',null),
('d#.com','role1 role3 role4')
3) Insert all combinations into the table by looking up the id's
-- Insert all found combinations
INSERT INTO AspNetUserRoles
SELECT users.Id, roles.Id
FROM
(
SELECT AspNetUser,
CAST ('<Role>' + REPLACE(AspNetRoles, ' ', '</Role><Role>') + '</Role>' AS XML) AS Data
FROM #temp
) AS A
CROSS APPLY Data.nodes ('/Role') AS Split(a)
INNER JOIN AspNetUsers users ON users.Email = AspNetUser
INNER JOIN AspNetRoles roles ON roles.Name = Split.a.value('.', 'VARCHAR(100)')
-- Clean up
drop table #textfile;
You can change delimiters SPACE, , and | to what you like.
You may want to do errorchecking for typos!

Update a SQL table with values from another nested query

I am currently using a SQL Server Agent job to create a master user table for my in-house web applications, pulling data from 3 other databases; Sharepoint, Practice Management System and Our HR Database.
Currently it goes...
truncate table my_tools.dbo.tb_staff
go
insert into my_tools.dbo.tb_staff
(username
,firstname
,surname
,chargeoutrate)
select right(wss.nt_user_name,
,hr.firstname
,hr.surname
,pms.chargeoutrate
from sqlserver.pms.dbo.staff as pms
inner join sqlserver.wss_content.dbo.vw_staffwss as wss
on pms.nt_user_name = wss.nt_user_name
inner join sqlserver.hrdb.dbo.vw_staffdetails as hr
on wss.fullname = hr.knownas
go
The problem is that the entire table is cleared as the first step so my auto increment primary key/identified on tb_staff is certain to change. Also if someone is removed from sharepoint or the PMS they will not be recreated on this table and this will cause inconsistencies throughout the database.
I want to preserve entries in this table, even after they are removed from one of the other systems.
I suppose what I want to do is:
1) Mark all exiting entries in tb_staff as inactive (using a column called active and set it to false)
2) Run the query on the three joined tables and update every found record, also marking them as active.
I can't see how I can nest a select statement within an Update statement like I have here with the Insert statement.
How can I achieve this please?
*please note I have edited my SQL down to 4 columns and simplified it so small errors are probably due to rushed editing. The real query is far bigger.

WITH source AS(
SELECT RIGHT(wss.nt_user_name, 10) nt_user_name, /*Or whatever - this is invalid in the original SQL*/
hr.firstname,
hr.surname,
pms.chargeoutrate
FROM staff AS pms
INNER JOIN vw_staffwss AS wss
ON pms.nt_user_name = wss.nt_user_name
INNER JOIN vw_staffdetails AS hr
ON wss.fullname = hr.knownas
)
MERGE
INTO tb_staff
USING source
ON source.nt_user_name= tb_staff.username /*Or whatever you are using as the key */
WHEN MATCHED
THEN UPDATE SET active=1 /*Can synchronise other columns here if needed*/
WHEN NOT MATCHED BY TARGET
THEN INSERT (username, firstname, surname, chargeoutrate, active) VALUES (nt_user_name,firstname, surname, chargeoutrate, 1)
WHEN NOT MATCHED BY source
THEN UPDATE SET active=0;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas