Conditionally insert multiple data rows into multiple Postgres tables - sql

I have multiple rows of data. And this is just some made up data in order to make an easy example.
The data has name, age and location.
I want to insert the data into two tables, persons and locations, where locations has a FK to persons.
Nothing should be inserted if there already is a person with that name, or if the age is below 18.
I need to use COPY (in my real world example I'm using NPQSQL for .NET and I think that's the fastest way of inserting a lot of data).
So I'll do the following in a transaction (not 100% sure on the syntax, not on my computer right now):
-- Create a temp table
DROP TABLE IF EXISTS tmp_x;
CREATE TEMPORARY TABLE tmp_x
(
name TEXT,
age INTEGER,
location TEXT
);
-- Read the data into the temp table
COPY tmp_x (name, age) FROM STDIN (FORMAT BINARY);
-- Conditionally insert the data into persons
WITH insertedPersons AS (
INSERT INTO persons (name, age)
SELECT name, age
FROM tmp_x tmp
LEFT JOIN persons p ON p.name = tmp.name
WHERE
p IS NULL
AND tmp.age >= 18
RETURNING id, name
),
-- Use the generated ids to insert the relational data into locations
WITH insertedLocations AS (
INSERT INTO locations (personid, location)
SELECT ip.id, tmp.location
FROM tmp_x tmp
INNER JOIN insertedPersons ip ON ip.name = tmp.name
),
DROP TABLE tmp_x;
Is there a better/easier/more efficient way to do this?
Is there a better way to "link" the inserts instead of INNER JOIN insertedPersons ip ON ip.name = tmp.name. What if name wasn't unique? Can I update tmp_x with the new person ids and use that?

Related

PostgreSQL Insert into table with subquery selecting from multiple other tables

I am learning SQL (postgres) and am trying to insert a record into a table that references records from two other tables, as foreign keys.
Below is the syntax I am using for creating the tables and records:
-- Create a person table + insert single row
CREATE TABLE person (
pname VARCHAR(255) NOT NULL,
PRIMARY KEY (pname)
);
INSERT INTO person VALUES ('personOne');
-- Create a city table + insert single row
CREATE TABLE city (
cname VARCHAR(255) NOT NULL,
PRIMARY KEY (cname)
);
INSERT INTO city VALUES ('cityOne');
-- Create a employee table w/ForeignKey reference
CREATE TABLE employee (
ename VARCHAR(255) REFERENCES person(pname) NOT NULL,
ecity VARCHAR(255) REFERENCES city(cname) NOT NULL,
PRIMARY KEY(ename, ecity)
);
-- create employee entry referencing existing records
INSERT INTO employee VALUES(
SELECT pname FROM person
WHERE pname='personOne' AND <-- ISSUE
SELECT cname FROM city
WHERE cname='cityOne
);
Notice in the last block of code, where I'm doing an INSERT into the employee table, I don't know how to string together multiple SELECT sub-queries to get both the existing records from the person and city table such that I can create a new employee entry with attributes as such:
ename='personOne'
ecity='cityOne'
The textbook I have for class doesn't dive into sub-queries like this and I can't find any examples similar enough to mine such that I can understand how to adapt them for this use case.
Insight will be much appreciated.
There doesn’t appear to be any obvious relationship between city and person which will make your life hard
The general pattern for turning a select that has two base tables giving info, into an insert is:
INSERT INTO table(column,list,here)
SELECT column,list,here
FROM
a
JOIN b ON a.x = b.y
In your case there isn’t really anything to join on because your one-column tables have no column in common. Provide eg a cityname in Person (because it seems more likely that one city has many person) then you can do
INSERT INTO employee(personname,cityname)
SELECT p.pname, c.cname
FROM
person p
JOIN city c ON p.cityname = c.cname
But even then, the tables are related between themselves and don’t need the third table so it’s perhaps something of an academic exercise only, not something you’d do in the real world
If you just want to mix every person with every city you can do:
INSERT INTO employee(personname,cityname)
SELECT pname, cname
FROM
person p
CROSS JOIN city c
But be warned, two people and two cities will cause 4 rows to be inserted, and so on (20 people and 40 cities, 800 rows. Fairly useless imho)
However, I trust that the general pattern shown first will suffice for your learning; write a SELECT that shows the data you want to insert, then simply write INSERT INTO table(columns) above it. The number of columns inserted to must match the number of columns selected. Don’t forget that you can select fixed values if no column from the query has the info (INSERT INTO X(p,c,age) SELECT personname, cityname, 23 FROM ...)
The following will work for you:
INSERT INTO employee
SELECT pname, cname FROM person, city
WHERE pname='personOne' AND cname='cityOne';
This is a cross join producing a cartesian product of the two tables (since there is nothing to link the two). It reads slightly oddly, given that you could just as easily have inserted the values directly. But I assume this is because it is a learning exercise.
Please note that there is a typo in your create employee. You are missing a comma before the primary key.

How do I insert data from one table to another when there is unequal number of rows in one another?

I have a table named People that has 19370 rows with playerID being the primary column. There is another table named Batting, which has playerID as a foreign key and has 104324 rows.
I was told to add a new column in the People table called Total_HR, which is included in the Batting table. So, I have to insert that column data from the Batting table into the People table.
However, I get the error:
Msg 515, Level 16, State 2, Line 183 Cannot insert the value NULL into
column 'playerID', table 'Spring_2019_BaseBall.dbo.People'; column
does not allow nulls. INSERT fails. The statement has been terminated.
I have tried UPDATE and INSERT INTO SELECT, however got the same error
insert into People (Total_HR)
select sum(HR) from Batting group by playerID
I expect the output to populate the column Total_HR in the People table using the HR column from the Batting table.
You could use a join
BEGIN TRAN
Update People
Set Total_HR = B.HR_SUM
from PEOPLE A
left outer join
(Select playerID, sum(HR) HR_SUM
from Batting
group by playerID) B on A.playerID = B.playerID
Select * from People
ROLLBACK
Notice that I've put this code in a transaction block so you can test the changes before you commit
From the error message, it seems that playerID is a required field in table People.
You need to specify all required fields of table People in the INSERT INTO clause and provide corresponding values in the SELECT clause.
I added field playerID below, but you might need to add additional required fields as well.
insert into People (playerID, Total_HR)
select playerID, sum(HR) from Batting group by playerID
It is strange, however, that you want to insert rows in a table that should already be there. Otherwise, you could not have a valid foreign key on field playerID in table Batting... If you try to insert such rows from table Batting into table People, you might get another error (violation of PRIMARY KEY constraint)... Unless... you are creating the database just now and you want to populate empty table People from filled/imported table Batting before adding the actual foreign key constraint to table People. Sorry, I will not question your intentions. I personally would consider to update the query somewhat so that it will not attempt to insert any rows that already exist in table People:
insert into People (playerID, Total_HR)
select Batting.playerID, sum(Batting.HR)
from Batting
left join People on People.playerID = Batting.playerID
where People.playerID is null and Batting.playerID is not null
group by playerID
You need to calculate the SUM and then join this result to the People table to bring into the same rows both Total_HR column from People and the corresponding SUM calculated from Batting.
Here is one way to write it. I used CTE to make is more readable.
WITH
CTE_Sum
AS
(
SELECT
Batting.playerID
,SUM(Batting.HR) AS TotalHR_Src
FROM
Batting
GROUP BY
Batting.playerID
)
,CTE_Update
AS
(
SELECT
People.playerID
,People.Total_HR
,CTE_Sum.TotalHR_Src
FROM
CTE_Sum
INNER JOIN People ON People.playerID = CTE_Sum.playerID
)
UPDATE CTE_Update
SET
Total_HR = TotalHR_Src
;

Copying data from one table to another different column names

I'm having an issue copying one table's data to another. I have around 100 or so individual tables that have generally the same field names but not always. I need to be able to copy and map the fields. example: source table is BROWARD and has column names broward_ID, name, dob, address (the list goes on). The temp table I want to copy it to has ID, name, dob, address etc.
I'd like to map the fields like broward_ID = ID, name = name, etc. But many of the other tables are different in column name, so I will have to write a query for each one. Once I figure out the first on, I can do the rest. Also the column in both tables are not in order either..thanks in advance for the TSQL...
With tables:
BROWARD (broward_ID, name, dob, address) /*source*/
TEMP (ID, name, address,dob) /*target*/
If you want to copy information from BROWARD to TEMP then:
INSERT INTO TEMP SELECT broward_ID,NAME,ADDRESS,DOB FROM BROWARD --check that the order of columns in select represents the order in the target table
If you want only copy values of broward_ID and name then:
INSERT INTO TEMP(ID, name) SELECT broward_ID,NAME FROM BROWARD
Your question will resolve using update
Let's consider we have two different table
Table A
Id Name
1 abc
2 cde
Table B
Id Name
1
2
In above case want to insert Table A Name column data into Table B Name column
update B inner join on B.Id = A.Id set B.Name = A.Name where ...

copying values with one to many relationship from external file in PostgreSQL

Let say I want to copy new values from external sql file. The values are representing tables with one-to-many relationship, like: books(book_id, title, author_id, subject), author(author_id, name, field, status). The ids are starting from 1 and incrementing. But in the database there are already data/values. So how can I copy the new values, so that they get the right id values and keeping the relationship?
Thanks
One way would be to load the new data into temporary tables new_author and new_book, then insert rows from new_author into author where they didn't already exist, and then insert from new_book to book, using the ids from the newly created author records.
create temp table new_book(like book);
create temp table new_author(like author);
-- load data into new_book and new_author
\copy new_author from ~/new_author_file
\copy new_book from ~/new_book_file
insert into author (name, field, status)
select name, field, status
from new_author
where name not in (select name from author);
insert into book (title, author_id, subject)
select b.title, a.author_id, b.subject
from new_book b
join new_author na on b.author_id = na.author_id
join author a on na.name = a.name;
All of this assumes that author.name is unique, and that you know that you need to add all of the books.

Transact-SQL: Getting ID of Lookup Value if Data isn't Related

I have 3 tables and two are related (Name and Gender):
Staging(id, name, gender)
Name(id, name genderID)
Gender(id, gender)
The data has been "dumped" into Staging(id, name, gender) in a denormalized fashion and now I'm trying to normalize the data.
I need to be able to use t-sql to do the following
Insert the name from the Staging table into the Name table
Get the id from the Gender table and insert into the Name table as a foreign key
The problem is the Gender and Name tables aren't related to Staging so I'm trying to understand the logic of how this transaction should work.
My assumption was that I needed to somehow to an INSERT INTO SELECT with some type of subquery, but I'm just at a lost. Thanks.
Yes, you need INSERT INTO...SELECT. Join table staging with Gender via column gender so you can get the ID.
INSERT INTO Name (ID, Name, GenderID)
SELECT s.id, s.name, g.id
FROM Staging s
INNER JOIN Gender g
ON s.gender = g.gender