How do I insert data from multiple tables into one table - sql

This is the question:
Using the SQL CREATE TABLE statement, create a table, MOVSTARDIR, with attributes for the movie number, star number, and director number and the 4 acting awards. The primary key is the movie number, star number and director number (all 3), with referential integrity enforced. The director number is the director for that movie, and the star must have appeared in that movie.
Load MOVSTARDIR (from existing tables) using INSERT INTO.
This is my current solution:
CREATE TABLE MOVSTARDIR
(
MVNUM SHORT NOT NULL,
STARNUM SHORT NOT NULL,
DIRNUM SHORT NOT NULL,
BESTF TEXT,
BESTM TEXT,
SUPM TEXT,
SUPF TEXT
);
ALTER TABLE MOVSTARDIR ADD CONSTRAINT PrimeKey PRIMARY KEY(MVNUM, STARNUM, DIRNUM)
INSERT INTO MOVSTARDIR
SELECT MOVIE.MVNUM, STAR.STARNUM, DIRECTOR.DIRNUM, BESTF, BESTM, SUPF, SUPM
FROM MOVIE, STAR, DIRECTOR, MOVSTAR, MOVDIR
WHERE MOVSTAR.MVNUM = MOVIE.MVNUM
AND MOVDIR.MVNUM = MOVSTAR.MVNUM
AND MOVDIR.DIRNUM = DIRECTOR.DIRNUM
My issue is that the created table is still blank. How do I fill it up with the required data?

I would try this:
CREATE TABLE MOVSTARDIR AS
(SELECT MOVIE.MVNUM, STAR.STARNUM, DIRECTOR.DIRNUM, BESTF, BESTM, SUPF, SUPM
FROM MOVIE, STAR, DIRECTOR, MOVSTAR, MOVDIR
WHERE MOVSTAR.MVNUM = MOVIE.MVNUM
AND MOVDIR.MVNUM = MOVSTAR.MVNUM
AND MOVDIR.DIRNUM = DIRECTOR.DIRNUM)

Step 1 : Check if the select returns any data
For testing you don't need it to return the complete data.
Just include a WHERE clause to test for a smaller resultset.
Step 2 : Make sure that the expected result of that select is correct.
Step 3.1 : do the insert from the select into the new table
Step 3.2:
If autocommit is turned on for your session, and there was no error.
Then those records should be in the new table.
If autocommit is off, then you still need to COMMIT your changes.
For example this older SO post
Remark:
Your query is still using the older style to join tables.
Better use the current standard JOIN syntax.
It has better readability.
And it becomes more obvious which join conditions are missing.
You don't desire a cartesian join by accident.
For example:
SELECT
MOVIE.MVNUM, STAR.STARNUM, DIRECTOR.DIRNUM, BESTF, BESTM, SUPF, SUPM
FROM MOVIE
JOIN MOVSTAR ON MOVSTAR.MVNUM = MOVIE.MVNUM
LEFT JOIN STAR ON STAR.STARNUM = MOVSTAR.STARNUM
LEFT JOIN MOVDIR ON MOVDIR.MVNUM = MOVSTAR.MVNUM
LEFT JOIN DIRECTOR ON DIRECTOR.DIRNUM = MOVDIR.DIRNUM
Please notice that a join on STAR was included.
Without knowing the tables, that's of course just an assumption.
But you get the idea.

Related

Single columns from several rows into several columns in one record, but allow NULL in the rows

I'm trying to combine single rows from multiple records into several columns in one record.
Say we have a database of people and they've all chosen 2 numbers. But, some people have only chosen 1 number and haven't submitted their 2nd number.
This is a simplified example, in my actual production database, it's scaled up to several of these 'numbers'.
From this example, person 3 hasn't chosen their 2nd number yet.
I tried this query:
SELECT ppl.*,
cn1.chosen_num AS first_num,
cn2.chosen_num AS second_num
FROM people AS ppl
LEFT JOIN
chosenNumbers AS cn1
LEFT JOIN
chosenNumbers AS cn2
WHERE ppl.numid = cn1.personid
AND ppl.numid = cn2.personid
AND cn1.type = 'first'
AND cn2.type = 'second'
But it doesn't return any information on Person3, since they haven't chosen their second number yet. However, I want data on EVERYONE involved in this number guessing, but I want to be able to see the first and second numbers they've guessed.
Here is a dump of the sample database where I'm testing this.
BEGIN TRANSACTION;
CREATE TABLE people (numid INTEGER PRIMARY KEY, name TEXT);
INSERT INTO people VALUES(1,'Person1');
INSERT INTO people VALUES(2,'Person2');
INSERT INTO people VALUES(3,'Person3');
CREATE TABLE chosenNumbers (numid INTEGER PRIMARY KEY, chosen_num INTEGER, type TEXT, personid INTEGER, FOREIGN KEY(personid) REFERENCES people(numid));
INSERT INTO chosenNumbers VALUES(1,101,'first',1);
INSERT INTO chosenNumbers VALUES(2,102,'second',1);
INSERT INTO chosenNumbers VALUES(3,201,'first',2);
INSERT INTO chosenNumbers VALUES(4,202,'second',2);
-- Person 3 hasn't chosen their 2nd number yet..
-- But I want data on them, and the query above
-- doesn't work.
INSERT INTO chosenNumbers VALUES(5,301,'first',3);
COMMIT;
I'd also appreciate being told how I could scale this up to say, 3 numbers, or 4 numbers, or even more than that.
You can use conditional aggregation:
SELECT p.*,
MAX(CASE WHEN c.type = 'first' THEN c.chosen_num END) AS first_num,
MAX(CASE WHEN c.type = 'second' THEN c.chosen_num END) AS second_num
FROM people AS p LEFT JOIN chosenNumbers AS c
ON p.numid = c.personid
GROUP BY p.numid;
You can expand the code for more columns.
See the demo.

PostgreSQL Insert into table with subquery selecting from multiple other tables

I am learning SQL (postgres) and am trying to insert a record into a table that references records from two other tables, as foreign keys.
Below is the syntax I am using for creating the tables and records:
-- Create a person table + insert single row
CREATE TABLE person (
pname VARCHAR(255) NOT NULL,
PRIMARY KEY (pname)
);
INSERT INTO person VALUES ('personOne');
-- Create a city table + insert single row
CREATE TABLE city (
cname VARCHAR(255) NOT NULL,
PRIMARY KEY (cname)
);
INSERT INTO city VALUES ('cityOne');
-- Create a employee table w/ForeignKey reference
CREATE TABLE employee (
ename VARCHAR(255) REFERENCES person(pname) NOT NULL,
ecity VARCHAR(255) REFERENCES city(cname) NOT NULL,
PRIMARY KEY(ename, ecity)
);
-- create employee entry referencing existing records
INSERT INTO employee VALUES(
SELECT pname FROM person
WHERE pname='personOne' AND <-- ISSUE
SELECT cname FROM city
WHERE cname='cityOne
);
Notice in the last block of code, where I'm doing an INSERT into the employee table, I don't know how to string together multiple SELECT sub-queries to get both the existing records from the person and city table such that I can create a new employee entry with attributes as such:
ename='personOne'
ecity='cityOne'
The textbook I have for class doesn't dive into sub-queries like this and I can't find any examples similar enough to mine such that I can understand how to adapt them for this use case.
Insight will be much appreciated.
There doesn’t appear to be any obvious relationship between city and person which will make your life hard
The general pattern for turning a select that has two base tables giving info, into an insert is:
INSERT INTO table(column,list,here)
SELECT column,list,here
FROM
a
JOIN b ON a.x = b.y
In your case there isn’t really anything to join on because your one-column tables have no column in common. Provide eg a cityname in Person (because it seems more likely that one city has many person) then you can do
INSERT INTO employee(personname,cityname)
SELECT p.pname, c.cname
FROM
person p
JOIN city c ON p.cityname = c.cname
But even then, the tables are related between themselves and don’t need the third table so it’s perhaps something of an academic exercise only, not something you’d do in the real world
If you just want to mix every person with every city you can do:
INSERT INTO employee(personname,cityname)
SELECT pname, cname
FROM
person p
CROSS JOIN city c
But be warned, two people and two cities will cause 4 rows to be inserted, and so on (20 people and 40 cities, 800 rows. Fairly useless imho)
However, I trust that the general pattern shown first will suffice for your learning; write a SELECT that shows the data you want to insert, then simply write INSERT INTO table(columns) above it. The number of columns inserted to must match the number of columns selected. Don’t forget that you can select fixed values if no column from the query has the info (INSERT INTO X(p,c,age) SELECT personname, cityname, 23 FROM ...)
The following will work for you:
INSERT INTO employee
SELECT pname, cname FROM person, city
WHERE pname='personOne' AND cname='cityOne';
This is a cross join producing a cartesian product of the two tables (since there is nothing to link the two). It reads slightly oddly, given that you could just as easily have inserted the values directly. But I assume this is because it is a learning exercise.
Please note that there is a typo in your create employee. You are missing a comma before the primary key.

SQL Queries instead of Cursors

I'm creating a database for a hypothetical video rental store.
All I need to do is a procedure that check the availabilty of a specific movie (obviously the movie can have several copies). So I have to check if there is a copy available for the rent, and take the number of the copy (because it'll affect other trigger later..).
I already did everything with the cursors and it works very well actually, but I need (i.e. "must") to do it without using cursors but just using "pure sql" (i.e. queries).
I'll explain briefly the scheme of my DB:
The tables that this procedure is going to use are 3: 'Copia Film' (Movie Copy) , 'Include' (Includes) , 'Noleggio' (Rent).
Copia Film Table has this attributes:
idCopia
Genere (FK references to Film)
Titolo (FK references to Film)
dataUscita (FK references to Film)
Include Table:
idNoleggio (FK references to Noleggio. Means idRent)
idCopia (FK references to Copia film. Means idCopy)
Noleggio Table:
idNoleggio (PK)
dataNoleggio (dateOfRent)
dataRestituzione (dateReturn)
dateRestituito (dateReturned)
CF (FK to Person)
Prezzo (price)
Every movie can have more than one copy.
Every copy can be available in two cases:
The copy ID is not present in the Include Table (that means that the specific copy has ever been rented)
The copy ID is present in the Include Table and the dataRestituito (dateReturned) is not null (that means that the specific copy has been rented but has already returned)
The query I've tried to do is the following and is not working at all:
SELECT COUNT(*)
FROM NOLEGGIO
WHERE dataNoleggio IS NOT NULL AND dataRestituito IS NOT NULL AND idNoleggio IN (
SELECT N.idNoleggio
FROM NOLEGGIO N JOIN INCLUDE I ON N.idNoleggio=I.idNoleggio
WHERE idCopia IN (
SELECT idCopia
FROM COPIA_FILM
WHERE titolo='Pulp Fiction')) -- Of course the title is just an example
Well, from the query above I can't figure if a copy of the movie selected is available or not AND I can't take the copy ID if a copy of the movie were available.
(If you want, I can paste the cursors lines that work properly)
------ USING THE 'WITH SOLUTION' ----
I modified a little bit your code to this
WITH film
as
(
SELECT idCopia,titolo
FROM COPIA_FILM
WHERE titolo = 'Pulp Fiction'
),
copy_info as
(
SELECT N.idNoleggio, N.dataNoleggio, N.dataRestituito, I.idCopia
FROM NOLEGGIO N JOIN INCLUDE I ON N.idNoleggio = I.idNoleggio
),
avl as
(
SELECT film.titolo, copy_info.idNoleggio, copy_info.dataNoleggio,
copy_film.dataRestituito,film.idCopia
FROM film LEFT OUTER JOIN copy_info
ON film.idCopia = copy_info.idCopia
)
SELECT COUNT(*),idCopia FROM avl
WHERE(dataRestituito IS NOT NULL OR idNoleggio IS NULL)
GROUP BY idCopia
As I said in the comment, this code works properly if I use it just in a query, but once I try to make a procedure from this, I got errors.
The problem is the final SELECT:
SELECT COUNT(*), idCopia INTO CNT,COPYFILM
FROM avl
WHERE (dataRestituito IS NOT NULL OR idNoleggio IS NULL)
GROUP BY idCopia
The error is:
ORA-01422: exact fetch returns more than requested number of rows
ORA-06512: at "VIDEO.PR_AVAILABILITY", line 9.
So it seems the Into clause is wrong because obviously the query returns more rows. What can I do ? I need to take the Copy ID (even just the first one on the list of rows) without using cursors.
You can try this -
WITH film
as
(
SELECT idCopia, titolo
FROM COPIA_FILM
WHERE titolo='Pulp Fiction'
),
copy_info as
(
select N.idNoleggio, I.dataNoleggio , I.dataRestituito , I.idCopia
FROM NOLEGGIO N JOIN INCLUDE I ON N.idNoleggio=I.idNoleggio
),
avl as
(
select film.titolo, copy_info.idNoleggio, copy_info.dataNoleggio,
copy_info.dataRestituito
from film LEFT OUTER JOIN copy_info
ON film.idCopia = copy_info.idCopia
)
select * from avl
where (dataRestituito IS NOT NULL OR idNoleggio IS NULL);
You should think in terms of sets, rather than records.
If you find the set of all the films that are out, you can exclude them from your stock, and the rest is rentable.
select copiafilm.* from #f copiafilm
left join
(
select idCopia from #r Noleggio
inner join #i include on Noleggio.idNoleggio = include.idNoleggio
where dateRestituito is null
) out
on copiafilm.idCopia = out.idCopia
where out.idCopia is null
I solved the problem editing the last query into this one:
SELECT COUNT(*),idCopia INTO CNT,idCopiaFilm
FROM avl
WHERE (dataRestituito IS NOT NULL OR idNoleggio IS NULL) AND rownum = 1
GROUP BY idCopia;
IF CNT > 0 THEN
-- FOUND AVAILABLE COPY
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN
-- NOT FOUND AVAILABLE COPY
Thank you #Aditya Kakirde ! Your suggestion almost solved the problem.

in SQL, best way to join first and last instance of child table without NOT EXISTS?

in PostgreSQL, have issue table and child issue_step table - an issue contains one or more steps.
the view issue_v pulls things from the issue and the first and last step: author and from_ts are pulled from the first step, while status and thru_ts are pulled from the last step.
the tables
create table if not exists seeplai.issue(
isu_id serial primary key,
subject varchar(240)
);
create table if not exists seeplai.issue_step(
stp_id serial primary key,
isu_id int not null references seeplai.issue on delete cascade,
status varchar(12) default 'open',
stp_ts timestamp(0) default current_timestamp,
author varchar(40),
notes text
);
the view
create view seeplai.issue_v as
select isu.*,
first.stp_ts as from_ts,
first.author as author,
first.notes as notes,
last.stp_ts as thru_ts,
last.status as status
from seeplai.issue isu
join seeplai.issue_step first on( first.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id>first.stp_id ) )
join seeplai.issue_step last on( last.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id<last.stp_id ) );
note1: issue_step.stp_id is guaranteed to be chronologically sequential, so using it instead of stp_ts because it's already indexed
this works, but ugly as sin, and cannot be the most efficient query in the world.
In this code, I use a sub-query to find the first and last step IDs, and then join to the two instances of the step table by using those found values.
SELECT ISU.*
,S1.STP_TS AS FROM_TS
,S1.AUTHOR AS AUTHOR
,S1.NOTES AS NOTES
,S2.STP_TS AS THRU_TS
,S2.STATUS AS STATUS
FROM SEEPLAI.ISSUE ISU
INNER JOIN
(
SELECT ISU_ID
,MIN(STP_ID) AS MIN_ID
,MAX(STP_ID AS MAX_ID
FROM SEEPLAI.ISSUE_STEP
GROUP BY
ISU_ID
) SQ
ON SQ.ISU_ID = ISU.ISU.ID
INNER JOIN
SEEPLAI.ISSUE_STEP S1
ON S1.STP_ID = SQ.MIN_ID
INNER JOIN
SEEPLAI.ISSUE_STEP S2
ON S2.STP_ID = SQ.MAX_ID
Note: you really shouldn't be using a select * in a view. It is much better practice to list out all the fields that you need in the view explicitly
Have you considered using window functions?
http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html
A starting point:
select steps.*,
first_value(steps.stp_id) over w as first_id,
last_value(steps.stp_id) over w as last_id
from issue_step steps
window w as (partition by steps.isu_id order by steps.stp_id)
Btw, if you know the IDs in advance, you'll much be better off getting details in a separate query. (Trying to fetch everything in one go will just yield sucky plans due to subqueries or joins on aggregates, which will result in inefficiently considering/joining the entire tables together.)

DB design: Should I use constraints within a table or a new table

I inherited a large existing DB and I'd like to know if I should refactor it because 95% of my queries require joining at least 4 tables.
The DB has a 5 tables that only have an ID and Name column with less than 20 rows. I assume the author did this so he could change the names there and not change them in the other tables, but many of those tables are only referenced in one other table. Should I refactor these small 2 column tables into the a larger table and add a constraint to the column so users can't input incorrect names instead of having seperate tables?
Resist that urge. From your description I can deduce that the existing design is solid and probably well normalized. Your refactoring may actually undo a good db structure.
If you are bothered by writing a lot of joins in your queries I would suggest creating views to mitigate the boilerplate.
...the author did this so he could change the names there not change
them in the other tables...
That is evidence of good design and exactly what you should strive for in a normalized database.
no.
your db is normalized and proper.
and you save space, lookup time, indexing for storing an int rather then a varchar name
small tables are optimized away if they are properly keyed.
Sounds like what you have are lookup tables. Let me tell you waht happens when you decide to put all lookups in one table with an additonal column to specify which type it is. Fisrt instead of joining to 4 different tables in one query, you have to join to the same table 4 times. There ends up being more contention for the resources in the "one table to rule them all". Further, you lose FK constraints. That means you eventually lose data integrity. So if one lookup is state, nothing wil prevent you from putting the id values for a different lookup for customer type in the stateid column in the customeraddress table. When the lookups are separate you con enforce that relationship.
Suppose instead of one big table you decide to have a constraint on the column for customer type. Constraints are now enforced but you have a problem when they need to change. Now you have to alter the database in order to add a new type. Again usually this is a very bad idea espcially when the table gets large.
Short story: Replacing strings with ID numbers has nothing to do with normalization. Using natural keys in your case might improve performance. In my tests, queries using natural keys were faster by 1 or 2 orders of magnitude.
You might have accepted an answer too quickly.
The DB has a 5 tables that only have an ID and Name column with less
than 20 rows.
I'm assuming these tables have a structure something like this.
create table a (
a_id integer primary key,
a_name varchar(30) not null unique
);
create table b (...
-- Just like a
create table your_data (
yet_another_id integer primary key,
a_id integer not null references a (a_id),
b_id integer not null references b (b_id),
c_id integer not null references c (c_id),
d_id integer not null references d (d_id),
unique (a_id, b_id, c_id, d_id),
-- other columns go here
);
And it's obvious that your_data will require four joins (at least) to get usable information from it.
But the names in table a, b, c, and d are unique (ahem), so you can use the unique names as targets for foreign key references. You could rewrite the table your_data like this.
create table your_data (
yet_another_id integer primary key,
a_name varchar(30) not null references a (a_name),
b_name varchar(30) not null references b (b_name),
c_name varchar(30) not null references c (c_name),
d_name varchar(30) not null references d (d_name),
unique (a_name, b_name, c_name, d_name),
-- other columns go here
);
Replacing id numbers with strings doesn't change the normal form. (And replacing strings with id numbers doesn't have anything to do with normalization.) If the original table were in 5NF, then this rewrite will be in 5NF, too.
But what about performance? Aren't id numbers plus joins supposed to be faster than strings?
I tested that by inserting 20 rows into each of the four tables a, b, c, and d. Then I generated a Cartesian product to fill one test table written with id numbers, and another using the names. (So, 160K rows in each.) I updated the statistics, and ran a couple of queries.
explain analyze
select a.a_name, b.b_name, c.c_name, d.d_name
from your_data_id
inner join a on (a.a_id = your_data_id.a_id)
inner join b on (b.b_id = your_data_id.b_id)
inner join c on (c.c_id = your_data_id.c_id)
inner join d on (d.d_id = your_data_id.d_id)
...
Total runtime: 808.472 ms
explain analyze
select a_name, b_name, c_name, d_name
from your_data
Total runtime: 132.098 ms
The query using id numbers takes a lot longer to execute. I used a WHERE clause on all four columns, which returns a single row.
explain analyze
select a.a_name, b.b_name, c.c_name, d.d_name
from your_data_id
inner join a on (a.a_id = your_data_id.a_id and a.a_name = 'a one')
inner join b on (b.b_id = your_data_id.b_id and b.b_name = 'b one')
inner join c on (c.c_id = your_data_id.c_id and c.c_name = 'c one')
inner join d on (d.d_id = your_data_id.d_id and d.d_name = 'd one)
...
Total runtime: 14.671 ms
explain analyze
select a_name, b_name, c_name, d_name
from your_data
where a_name = 'a one' and b_name = 'b one' and c_name = 'c one' and d_name = 'd one';
...
Total runtime: 0.133 ms
The tables using id numbers took about 100 times longer to query.
Tests used PostgreSQL 9.something.
My advice: Try before you buy. I mean, test before you invest. Try rewriting your data table to use natural keys. Think carefully about ON UPDATE CASCADE and ON DELETE CASCADE. Test performance with representative sample data. Edit your original question and let us know what you found.