Insert data and set foreign keys with Postgres - sql

I have to migrate a large amount of existing data in a Postgres DB after a schema change.
In the old schema a country attribute would be stored in the users table. Now the country attribute has been moved into a separate address table:
users:
country # OLD
address_id # NEW [1:1 relation]
addresses:
id
country
The schema is actually more complex and the address contains more than just the country. Thus, every user needs to have his own address (1:1 relation).
When migrating the data, I'm having problems setting the foreign keys in the users table after inserting the addresses:
INSERT INTO addresses (country)
SELECT country FROM users WHERE address_id IS NULL
RETURNING id;
How do I propagate the IDs of the inserted rows and set the foreign key references in the users table?
The only solution I could come up with so far is creating a temporary user_id column in the addresses table and then updating the the address_id:
UPDATE users SET address_id = a.id FROM addresses AS a
WHERE users.id = a.user_id;
However, this turned out to be extremely slow (despite using indices on both users.id and addresses.user_id).
The users table contains about 3 million rows with 300k missing an associated address.
Is there any other way to insert derived data into one table and setting the foreign key reference to the inserted data in the other (without changing the schema itself)?
I'm using Postgres 8.3.14.
Thanks
I have now solved the problem by migrating the data with a Python/sqlalchemy script. It turned out to be much easier (for me) than trying the same with SQL. Still, I'd be interested if anybody knows a way to process the RETURNING result of an INSERT statement in Postgres SQL.

The table users must have some primary key that you did not disclose. For the purpose of this answer I will name it users_id.
You can solve this rather elegantly with data-modifying CTEs introduced with PostgreSQL 9.1:
country is unique
The whole operation is rather trivial in this case:
WITH i AS (
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL
RETURNING id, country
)
UPDATE users u
SET address_id = i.id
FROM i
WHERE i.country = u.country;
You mention version 8.3 in your question. Upgrade! Postgres 8.3 has reached end of life.
Be that as it may, this is simple enough with version 8.3. You just need two statements:
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE address_id IS NULL
AND a.country = u.country;
country is not unique
That's more challenging. You could just create one address and link to it multiple times. But you did mention a 1:1 relationship that rules out such a convenient solution.
WITH s AS (
SELECT users_id, country
, row_number() OVER (PARTITION BY country) AS rn
FROM users
WHERE address_id IS NULL
)
, i AS (
INSERT INTO addresses (country)
SELECT country
FROM s
RETURNING id, country
)
, r AS (
SELECT *
, row_number() OVER (PARTITION BY country) AS rn
FROM i
)
UPDATE users u
SET address_id = r.id
FROM r
JOIN s USING (country, rn) -- select exactly one id for every user
WHERE u.users_id = s.users_id
AND u.address_id IS NULL;
As there is no way to unambiguously assign exactly one id returned from the INSERT to every user in a set with identical country, I use the window function row_number() to make them unique.
Not as straight forward with Postgres 8.3. One possible way:
INSERT INTO addresses (country)
SELECT DISTINCT country -- pick just one per set of dupes
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE a.country = u.country
AND u.address_id IS NULL
AND NOT EXISTS (
SELECT * FROM addresses b
WHERE b.country = a.country
AND b.users_id < a.users_id
); -- effectively picking the smallest users_id per set of dupes
Repeat this until the last NULL value is gone from users.address_id.

Related

Insert values from another table and update original table with returning values

I'm new to PostgreSQL (and even Stackoverflow).
Say, I have two tables Order and Delivery:
Order
id product address delivery_id
--------------------------------------------------
1 apple mac street (null)
3 coffee java island (null)
4 window micro street (null)
Delivery
id address
----------------
Delivery.id and Order.id are auto-incrementing serial columns.
The table Delivery is currently empty.
I would like to move Order.address to Delivery.address and its Delivery.id to Order.delivery_id to arrive at this state:
Order
id product address delivery_id
--------------------------------------------------
1 apple mac street 1
5 coffee java island 2
7 window micro street 3
Delivery
id address
---------------------
1 mac street
2 java island
3 micro street
I'll then remove Order.address.
I found a similar question for Oracle but failed to convert it to PostgreSQL:
How to insert values from one table into another and then update the original table?
I still think it should be possible to use a plain SQL statement with the RETURNING clause and a following INSERT in Postgres.
I tried this (as well as some variants):
WITH ids AS (
INSERT INTO Delivery (address)
SELECT address
FROM Order
RETURNING Delivery.id AS d_id, Order.id AS o_id
)
UPDATE Order
SET Delivery_id = d_id
FROM ids
WHERE Order.id = ids.o_id;
This latest attempt failed with:
ERROR: missing FROM-clause entry for table "Delivery" LINE 1: ...address Order RETURNING Delivery.id...
How to do this properly?
First of all, ORDER is a reserved word. Don't use it as identifier. Assuming orders as table nae instead.
WITH ids AS (
INSERT INTO delivery (address)
SELECT DISTINCT address
FROM orders
ORDER BY address -- optional
RETURNING *
)
UPDATE orders o
SET delivery_id = i.id
FROM ids i
WHERE o.address = i.address;
You have to account for possible duplicates in order.address. SELECT DISTINCT produces unique addresses.
In the outer UPDATE we can now join back on address because delivery.address is unique. You should probably keep it that way beyond this statement and add a UNIQUE constraint on the column.
Effectively results in a one-to-many relationship between delivery and orders. One row in delivery can have many corresponding rows in orders. Consider to enforce that by adding a FOREIGN KEY constraint accordingly.
This statement enjoys the benefit of starting out on an empty delivery table. If delivery wasn't empty, we'd have to work with an UPSERT instead of the INSERT. See:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Related:
Insert data in 3 tables at a time using Postgres
About the cause for the error message you got:
RETURNING causes error: missing FROM-clause entry for table
Use legal, lower-case identifiers exclusively, if you can. See:
Are PostgreSQL column names case-sensitive?
You can't return columns from the FROM relation in the RETURNING clause of the CTE query. You'll have to either manage this in a cursor, or add an order_id column to the Delivery table, something like this:
ALTER TABLE Delivery ADD COLUMNN order_id INTEGER:
INSERT INTO Delivery (address, order_id)
SELECT address, id
FROM Order
;
WITH q_ids AS
(
SELECT id, order_id
FROM Delivery
)
UPDATE Order
SET delivery_id = q_ids.id
FROM q_ids
WHERE Order.id = q_ids.order_id;

How to prevent from updating the record in sql which is used in another table?

Tables
Country (country_id, country_name)
Company (company_id, country_id, Company_name)
How to prevent updating the row in the country table which is used in Company table?
UPDATE country SET your_values_here
WHERE
countryId = 123 AND
NOT EXISTS (SELECT 1 FROM Company WHERE countryId = 123)
You will have to use stored procedure to do that, Here's the pseudo code:
First, check if the countryID exists in Company
select 1 from Company where country_id = #country_id
If exist, throw an error, else do update:
update country set your_values_here
Company::country_id is just a posted foreign key so an update to one table shouldn't affect the other.
Are you talking about insertions or deletes? If so you would either alter Company so that country_id can be null (i.e. not requiring a corresponding record in Country table) or you would turn off cascading deletes.
Hopefully one of those and Google will sort it for you.

update table from another table's data

I have 2 tables : customers (customer,city_name,postal_code) and postal_codes(city_name,postal_code).
In the customers table the postal_code entries are missing but the city is there.
How can I update the customers table from the postal_codes table so the missing postal_code gets updated in the customers table where the postal_code is missing?
This may be a duplicate but I could not make work any of the suggestions from similar threads. They all use some kind of abbreviations for table names which I find hard to follow.
Tried this but it does not seem to work :
UPDATE customers
SET postal_code = postal_codes.postal_code
FROM postal_codes.postal_code INNER JOIN postal_codes.city_name ON customers.city_name
As far as I recall SQLite does not support the update ... from join syntax, but this query should work for you:
UPDATE customers
SET postal_code = (
SELECT postal_codes.postal_code
FROM postal_codes
WHERE postal_codes.city_name = customers.city_name
)
WHERE EXISTS (
SELECT *
FROM postal_codes
WHERE postal_codes.city_name = customers.city_name
AND postal_codes.postal_code <> customers.postal_code
);

How do I process data in Multivalued Column in Oracle PLSQL?

I am working on creating a small database in Oracle for a project at work. One of the columns will need to have multiple values recorded in it. What's the query to create a multivalued column?
If you need a user to enter multiple email addresses, I would consider creating a USER_EMAIL table to store such records.
Create Table User_Email (User_Id int, Email varchar(100));
User_Id would be a foreign key that goes back to your USER table.
Then you can have a 1-n number of email address per user. This is generally the best practice for database normalization. If your emails have different types (ie work, personal, etc.), you could have another column in that table for the type.
If you need to return the rows in a single column, you could then look at using LISTAGG:
select u.id,
listagg(ue.email, ', ') within group (order by ue.email) email_addresses
from users u
left join user_email ue on u.id = ue.user_id
group by u.id
SQL Fiddle Demo
You can try to use VARRAY columns in a Oracle column.
Look at this page: https://www.orafaq.com/wiki/VARRAY
You can see there:
Declaration of a type:
CREATE OR REPLACE TYPE vcarray AS VARRAY(10) OF VARCHAR2(128);
Declaration of a table:
CREATE TABLE varray_table (id number, col1 vcarray);
Insertion:
INSERT INTO varray_table VALUES (3, vcarray('D', 'E', 'F'));
Selection:
SELECT t1.id, t2.column_value
FROM varray_table t1, TABLE(t1.col1) t2
WHERE t2.column_value = 'A' OR t2.column_value = 'D'

Underlying rows in Group By

I have a table with a certain number of columns and a primary key column (suppose OriginalKey). I perform a GROUP BY on a certain sub-set of those columns and store them in a temporary table with primary key (suppose GroupKey). At a later stage, I may need to get more details about one or more of those groupings (which can be found in the temporary table) i.e. I need to know which were the rows from the original table that formed that group. Simply put, I need to know the mappings between GroupKey and OriginalKey. What's the best way to do this? Thanks in advance.
Example:
Table Student(
StudentID INT PRIMARY KEY,
Level INT, --Grade/Class/Level depending on which country you are from)
HomeTown TEXT,
Gender CHAR)
INSERT INTO TempTable SELECT HomeTown, Gender, COUNT(*) AS NumStudents FROM Student GROUP BY HomeTown, Gender
On a later date, I would like to find out details about all towns that have more than 50 male students and know details of every one of them.
How about joining the 2 tables using the GroupKey, which, you say, are the same?
Or how about doing:
select * from OriginalTable where
GroupKey in (select GroupKey from my_temp_table)
You'd need to store the fields you grouped on in your temporary table, so you can join back to the original table. e.g. if you grouped on fieldA, fieldB, and fieldC, you'd need something like:
select original.id
from original
inner join temptable on
temptable.fieldA = original.fieldA and
temptable.fieldB = original.fieldB and
temptable.fieldC = original.fieldC