Postgresql Deleting Records Matching Primary Key - sql

We are getting data via a batch process from a third-party data provider. For the schema they provide, they issue delimited text files indicating records to delete, followed by a different text file of records to add.
My plan for removing the records is to load the records to delete into a temporary table, and then delete records through an inner join. For example:
-- main table
CREATE TABLE main_table (
city VARCHAR(40) NOT NULL,
state CHAR(2) NOT NULL,
PRIMARY KEY (city, state));
-- removing records
CREATE TEMPORARY TABLE delete_table (
city VARCHAR(40) NOT NULL,
state CHAR(2) NOT NULL,
PRIMARY KEY (city, state));
INSERT INTO main_table (city, state) VALUES
('Chicago', 'IL'),
('Seattle', 'WA'),
('New York', 'NY'),
('Springfield', 'IL'),
('Springfield', 'MA');
INSERT INTO delete_table (city, state) VALUES
('Chicago', 'IL'),
('New York', 'NY'),
('Springfield', 'MA');
-- Delete statement
DELETE FROM main_table t
USING delete_table d
WHERE t.city=d.city AND t.state=d.state;
Is this the best way to typically delete records in this case? The record deletions can range from a few rows to a million depending on the table and day and one hundred tables. Additionally, any table with composite primary keys requires a customized where clause. Is there a way to use a natural join, or a USING (column list) predicate instead?

Related

Is it possible to store a query in a variable and use that variable in Insert query? "#countrid =SELECT id FROM COUNTRIES WHERE description = 'asdf';"

So I've been going through SQL migrations to insert data in a SEQUENTIAL manner specifically from parent to child.
I've inserted data in the parent table. Now I've to store the primary key value of that
specific row (WHERE condition is defined in query for reference " where description = '1234'") in a variable.
And while inserting data to the child table I've to use that primary key value stored in a variable in place of a foreign key column("country_code_id") of the child table.
I'm using Postgresql
CREATE TABLE Countries
(
id SERIAL,
description VARCHAR(100),
CONSTRAINT coutry_pkey PRIMARY KEY (id)
);
CREATE TABLE Cities
(
country_code_id int ,
city_id int,
description VARCHAR(100),
CONSTRAINT cities_pkey PRIMARY KEY (city_id),
CONSTRAINT fk_cities_countries FOREIGN KEY (country_code_id) REFERENCES Countries (id)
);
INSERT INTO COUNTRIES (description) VALUES('asdf');
#countrid = SELECT id FROM COUNTRIES WHERE description = 'asdf';
INSERT INTO cities VALUES (countrid, 1 , 'abc');
SQL does not have variables. The normal way to do this is to use INSERT ... RETURNING:
INSERT INTO countries (description) VALUES ('1234')
RETURNING id;
This will return the automatically generated primary key. You store that in a variable on the client side and run a second statement:
INSERT INTO cities (country_code_id, city_id, description)
VALUES (4711, 1, 'abc');
where 4711 is the value returned from the first statement. To avoid hard-coding the value, you can use a prepared statement, which also will boost performance.
An alternative, more complicated, solution is to run both statements in a single statement using a common table expression:
WITH country_ids AS (
INSERT INTO countries (description) VALUES ('1234')
RETURNING id
INSERT INTO (country_code_id, city_id, description)
SELECT id, 1, 'abc'
FROM country_ids;

I have come across an error in SQL and cannot fix this foreign key error. See desc for more information

I have been receiving the error code 1452, i am trying to add keys to a table to keep data unqiue and useable in other tables. i have created the tables and can use the information already entered but i want to make the databases properly so i am trying to use the keys. please refer to the code below.
CREATE TABLE CUSTOMERS (
CustID varchar(50) NOT NULL,
Client_Name varchar(50) NOT NULL,
Client_Address varchar(80) NOT NULL,
PRIMARY KEY (CustID)
);
CREATE TABLE ORDERS (
Order_ID VARCHAR(10) NOT NULL,
Client_NameID varchar(50) NOT NULL,
Dates varchar(10) NOT NULL,
PRIMARY KEY (Order_ID),
FOREIGN KEY (Client_NameID) REFERENCES CUSTOMERS(CustID)
);
SELECT * FROM CUSTOMERS;
SELECT * FROM ORDERS;
DESCRIBE Orders; /*Used to display the Table*/
ALTER TABLE ORDERS ADD Dates VARCHAR(10); /*Used to add columns into the table*/
ALTER TABLE ORDERS DROP COLUMN Date; /*Used to remove column from the table*/
INSERT INTO CUSTOMERS (CustID, Client_Name, Client_Address) VALUES
('168', 'Coventry Building Services', 'Units 2-4, Binley Industrial Estate, CV3 2WL'), /*Used to insert values into the columns*/
('527', 'Allied Construction LTD', '34, Lythalls La Industrial Estate, NG18 5AH'),
('169', 'Ricoh Builds Ltd', 'Unit 12, Stoneleigh Park, CV8 2UV'),
('32', 'British Embassy in Tehran', '198 Ferdowski Avenue Tehran 11316-91144 Iran');
INSERT INTO ORDERS (Order_ID, Client_NameID, Dates) VALUES
('CON-2237', 'Coventry Building Services', '2014-12-14'),
('CON-3664', 'Allied Construction LTD', '2015-01-16'),
('CON-2356', 'Ricoh Builds Ltd', '2015-02-12'),
('CON-1234', 'British Embassy in Tehran', '2015-04-16');
DELETE FROM ORDERS WHERE Client_Name='Coventry Building Services'; /*Used to delete specific
data from the specific row and column wherever applicable*/
DROP TABLE CUSTOMERS;
DROP TABLE ORDERS;
Below are the tables im trying to work with, all of them will pretty much have a key that links them together if necessary
The CustomerS Table which only includes a Primary Key
The Orders Table which includes a Primary and Foreign key
The problem is with the inserts into table orders. Your foreign key on Client_NameIDreferencesCUSTOMERS(CustID), but you are giving the CUSTOMERS(Client_Name) instead.
You probably want:
INSERT INTO ORDERS (Order_ID, Client_NameID, Dates) VALUES
('CON-2237', 'CON-2237', '2014-12-14');
('CON-3664', 'CON-3664', '2015-01-16');
('CON-2356', 'CON-2356', '2015-02-12');
('CON-1234', 'CON-1234', '2015-04-16');
Notes:
You can perform all inserts in a single query by passing several tuples of values, as shown above
Don't store dates as strings; instead, use the date datatype, which exists for that purpose. I changed the query so it uses proper date literals, which would fit in a date column
it is unclear why you want to use the same value for the primary key of customers and orders - to me, this makes things harder to follow. I would recommend just using auto-incremented primary keys

Copying two columns from a foreign table into another table

INSERT INTO Confirmed (TotalDeaths, Population)
SELECT TotalDeaths, Population
FROM Deaths
WHERE UID IS NOT NULL;
Copy the values for the columns named TotalD and Pop from the Deaths Table to the Confirmed Table (same names, both contains UID primary Key)
Failed to execute query. Error:
Cannot insert the value NULL into column 'UID', table 'dbo.Confirmed'; column does not allow nulls. INSERT fails. The statement has been terminated.
I keep running into a problem where I get compiler errors due to the primary key not allowing nulls. I'm not sure where the null keys are even coming from when it shouldn't be null to begin with.
Both tables have very similar columns, but in this case all that needs to be mentioned is that there are three columns in both tables that are crucial, which are: UID int NOT NULL, TotalDeaths int NOT NULL, Population int NOT NULL.
Seems to be UID is not null column. Use below query
INSERT INTO Confirmed (UID, TotalDeaths, Population)
SELECT UID, TotalDeaths, Population
FROM Deaths
WHERE UID IS NOT NULL;

How can I insert a row that references another postgres table via foreign key, and creates the foreign row too if it doesn't exist?

In Postgres, is there a way to atomically insert a row into a table, where one column references another table, and we look up to see if the desired row exists in the referenced table and inserts it as well if it is not?
For example, say we have a US states table and a cities table which references the states table:
CREATE TABLE states (
state_id serial primary key,
name text
);
CREATE TABLE cities (
city_id serial,
name text,
state_id int references states(state_id)
);
When I want to add the city of Austin, Texas, I want to be able to see whether Texas exists in the states table, and if so use its state_id in the new row I'm inserting in the cities table. If Texas doesn't exist in the states table, I want to create it and then use its id in the cities table.
I tried this query, but I got an error saying
ERROR: WITH clause containing a data-modifying statement must be at the top level
LINE 2: WITH inserted AS (
^
WITH state_id AS (
WITH inserted AS (
INSERT INTO states(name)
VALUES ('Texas')
ON CONFLICT DO NOTHING
RETURNING state_id),
already_there AS (
SELECT state_id FROM states
WHERE name='Texas')
SELECT * FROM inserted
UNION
SELECT * FROM already_there)
INSERT INTO cities(name, state_id)
VALUES
('Austin', (SELECT state_id FROM state_id));
Am I overlooking a simple solution?
Here is one option:
with inserted as (
insert into states(name) values ('Texas')
on conflict do nothing
returning state_id
)
insert into cities(name, state_id)
values (
'Dallas',
coalesce(
(select state_id from inserted),
(select state_id from states where name = 'Texas')
)
);
The idea is to attempt to insert in a CTE, and then, in the main insert, check if a value was inserted, else select it.
For this to work properly, you need a unique constraint on states(name):
create table states (
state_id serial primary key,
name text unique
);
Demo on DB Fiddlde
You can force the insert statement to return a value:
WITH inserted AS (
INSERT INTO states (name)
VALUES ('Texas')
ON CONFLICT (name) DO UPDATE SET name = EXCLUDED.NAME
RETURNING state_id
)
. . .
The DO UPDATE SET forces the INSERT to return something.
I notice that you don't have a unique constraint, so you also need that:
ALTER TABLE states ADD CONSTRAINT unq_state_name
UNIQUE (name);
Otherwise the ON CONFLICT doesn't have anything to work with.

Database Normalization using Foreign Key

I have a sample table like below where Course Completion Status of a Student is being stored:
Create Table StudentCourseCompletionStatus
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
AlgorithmCourseStatus nvarchar(30),
DatabaseCourseStatus nvarchar(30),
NetworkingCourseStatus nvarchar(30),
MathematicsCourseStatus nvarchar(30),
ProgrammingCourseStatus nvarchar(30)
)
Insert into StudentCourseCompletionStatus Values (1, 'In Progress', 'In Progress', 'Not Started', 'Completed', 'Completed')
Insert into StudentCourseCompletionStatus Values (2, 'Not Started', 'In Progress', 'Not Started', 'Not Applicable', 'Completed')
Now as part of normalizing the schema I have created two other tables - CourseStatusType and Status for storing the Course Status names and Status.
Create Table CourseStatusType
(
CourseStatusTypeID int primary key identity(1,1),
CourseStatusType nvarchar(100) not null
)
Insert into CourseStatusType Values ('AlgorithmCourseStatus')
Insert into CourseStatusType Values ('DatabaseCourseStatus')
Insert into CourseStatusType Values ('NetworkingCourseStatus')
Insert into CourseStatusType Values ('MathematicsCourseStatus')
Insert into CourseStatusType Values ('ProgrammingCourseStatus')
Insert into CourseStatusType Values ('OperatingSystemsCourseStatus')
Insert into CourseStatusType Values ('CompilerCourseStatus')
Create Table Status
(
StatusID int primary key identity(1,1),
StatusName nvarchar (100) not null
)
Insert into Status Values ('Completed')
Insert into Status Values ('Not Started')
Insert into Status Values ('In Progress')
Insert into Status Values ('Not Applicable')
The modified table is as below:
Create Table StudentCourseCompletionStatus1
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
CourseStatusTypeID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_CourseStatusType] FOREIGN KEY (CourseStatusTypeID) REFERENCES dbo.CourseStatusType (CourseStatusTypeID),
StatusID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_Status] FOREIGN KEY (StatusID) REFERENCES Status (StatusID),
)
I have few question on this:
Is this the correct way to normalize it ? The old table was very helpful to get data easily - I can store a student's course status in a single row, but now 5 rows are required. Is there a better way to do it?
Moving the data from the old table to this new table seems to be not an easy task. Can I achieve this using a query or I have to manually to do this ?
Any help is appreciated.
vou could also consider storing results in flat table like this:
studentID,courseID,status
1,1,"completed"
1,2,"not started"
2,1,"not started"
2,3,"in progress"
you will also need additional Courses table like this
courserId,courseName
1, math
2, programming
3, networking
and a students table
students
1 "john smith"
2 "perry clam"
3 "john deere"
etc..you could also optionally create a status table to store the distinct statusstrings statusstings and refer to their PK instead ofthestrings
studentID,courseID,status
1,1,1
1,2,2
2,1,2
2,3,3
... etc
and status table
id,status
1,"completed"
2,"not started"
3,"in progress"
the beauty of this representation is: it is quite easy to filter and aggregate data , i.e it is easy to query which subjects a particular person have completed, how many subjects are completed by an average student, etc. this things are much more difficult in the columnar design like you had. you can also easily add new subjects without the need to adapt your tables or even queries they,will just work.
you can also always usin SQLs PIVOT query to get it to a familiar columnar presentation like
name,mathstatus,programmingstatus,networkingstatus,etc..
but now 5 rows are required
No, it's still just one row. That row simply contains identifiers for values stored in other tables.
There are pros and cons to this. One of the main reasons to normalize in this way is to protect the integrity of the data. If a column is just a string then anything can be stored there. But if there's a foreign key relationship to a table containing a finite set of values then only one of those options can be stored there. Additionally, if you ever want to change the text of an option or add/remove options, you do it in a centralized place.
Moving the data from the old table to this new table seems to be not an easy task.
No problem at all. Create your new numeric columns on the data table and populate them with the identifiers of the lookup table records associated with each data table record. If they're nullable, you can make them foreign keys right away. If they're not nullable then you need to populate them before you can make them foreign keys. Once you've verified that the data is correct, remove the old de-normalized columns. Done.
In StudentCourseCompletionStatus1 you still need 2 associations to Status and CourseStatusType. So I think you should consider following variant of normalization:
It means, that your StudentCourseCompletionStatus would hold only one CourseStatusID and another table CourseStatus would hold the associations to CourseType and Status.
To move your data you can surely use a query.