How to deal with an id that needs to be matched to multiple ids in SQLite? - sql

I'm new to databases, so I'll start by showing what I would do if I was using a simple table in a csv file. Presently, I'm building a Shiny (R) app to keep track of people taking part in studies. The idea is to make sure no one is doing more than one study at the same time, and that enough time has passed between studies.
A single table would look like something like this:
study_title contact_person tasks first_name last_name
MX9345-3 John Doe OGTT Michael Smith
MX9345-3 John Doe PVT Michael Smith
MX9345-3 John Doe OGTT Julia Barnes
MX9345-3 John Doe PVT Julia Barnes
...
So each study has a single contact person, but multiple tasks. It is possible other studies will use the same tasks.
Each task should have a description
Each person can be connected to multiple studies (the final database has timestamps to make sure this does not happen at the same time), and consequently repeat the same tasks.
the SQLite code could look something like this
CREATE TABLE studies (study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (task_name TEXT, description TEXT);
CREATE TABLE participants (first_name TEXT, last_name TEXT);
Now I'm stuck. If I add a primary key and foreign keys (say in studies an ID for each study, and foreign keys for each task and person), the primary keys will repeat, which is not possible. A Study is defined by the tasks it contains (akin to an album and music tracks).
How should I approach this situation in SQLite? And importantly, how are the INSERTs done in these situations? I've seen lots of SELECT examples, but few INSERTs that match all IDs in each table, for example when adding a new person to a running study.

What you do is use tables to map/reference/relate/associate.
The first step would be to utilise alias's of the rowid so instead of :-
CREATE TABLE studies (study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (task_name TEXT, description TEXT);
CREATE TABLE participants (first_name TEXT, last_name TEXT);
you would use :-
CREATE TABLE studies (id INTEGER PRIMARY KEY,study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (id INTEGER PRIMARY KEY, task_name TEXT, description TEXT);
CREATE TABLE participants (id INTEGER PRIMARY KEY, first_name TEXT, last_name TEXT);
With SQLite INTEGER PRIMARY KEY (or INTEGER PRIMARY KEY AUTOINCREMENT) makes the column (id in the above although they can have any valid column name) and alias of the rowid (max of 1 per table), which uniquely identifies the rows.
Why not to use AUTOINCREMENT plus more seeSQLite Autoincrement
Insert some data for demonstration :-
INSERT INTO studies (study_title, contact_person)
VALUES ('Maths','Mr Smith'),('English','Mrs Taylor'),('Geography','Mary White'),('Phsyics','Mr Smith');
INSERT INTO tasks (task_name,description)
VALUES ('Task1','Do task 1'),('Task2','Do task 2'),('Task3','Do task 3'),('Task4','Do task 4'),('Mark','Mark the sudies');
INSERT INTO participants (first_name,last_name)
VALUES ('Tom','Jones'),('Susan','Smythe'),('Sarah','Toms'),('Alan','Francis'),('Julian','MacDonald'),('Fred','Bloggs'),('Rory','Belcher');
First mapping/reference... Table :-
CREATE TABLE IF NOT EXISTS study_task_relationship (study_reference INTEGER, task_reference INTEGER, PRIMARY KEY (study_reference,task_reference));
Map/relate Study's with Tasks (many-many possible)
Do some mapping (INSERT some data) :-
INSERT INTO study_task_relationship
VALUES
(1,2), -- Maths Has Task2
(1,5), -- Maths has Mark Questions
(2,1), -- English has Task1
(2,4), -- English has Task4
(2,5), -- English has Mark questions
(3,3), -- Geography has Task3
(3,1), -- Geoegrapyh has Task1
(3,2), -- Geography has Task2
(3,5), -- Geography has Mark Questions
(4,4) -- Physics has Task4
;
- See comments on each line
List the Studies along with the tasks
SELECT study_title, task_name -- (just want the Study title and task name)
FROM study_task_relationship -- use the mapping table as the main table
JOIN studies ON study_reference = studies.id -- get the related studies
JOIN tasks ON task_reference = tasks.id -- get the related tasks
ORDER BY study_title -- Order by Study title
results in :-
List each study with all it's tasks
SELECT study_title, group_concat(task_name,'...') AS tasklist
FROM study_task_relationship
JOIN studies ON study_reference = studies.id
JOIN tasks ON task_reference = tasks.id
GROUP BY studies.id
ORDER by study_title;
results in :-
Add study-participants associative table and populate it :-
CREATE TABLE IF NOT EXISTS study_participants_relationship (study_reference INTEGER, participant_reference INTEGER, PRIMARY KEY (study_reference,participant_reference));
INSERT INTO study_participants_relationship
VALUES
(1,1), -- Maths has Tom Jones
(1,5), -- Maths has Julian MacDonald
(1,6), -- Maths has Fred Bloggs
(2,4), -- English has Alan Francis
(2,7), -- English has Rory Belcher
(3,3), -- Geography has Sarah Toms
(3,2) -- Susan Smythe
;
You can now, as an example, get a list of participants the tasks along with the study title :-
SELECT study_title, task_name, participants.first_name ||' '||participants.last_name AS fullname
FROM study_task_relationship
JOIN tasks ON study_task_relationship.task_reference = tasks.id
JOIN studies On study_task_relationship.study_reference = studies.id
JOIN study_participants_relationship ON study_task_relationship.study_reference = study_participants_relationship.study_reference
JOIN participants ON study_participants_relationship.participant_reference = participants.id
ORDER BY fullname, study_title
which would result in :-
FOREIGN KEYS
As you can see there is no actual need for defining FOREIGN KEYS. They are really just an aid to stop you inadvertently doing something like :-
INSERT INTO study_participants_relationship VALUES(30,25);
No such study nor no such participant
To utilise FOREIGN KEYS you have to ensure that they are enabled, the simplest is just to issue the command to turn them on (as if it were a normal SQL statment).
PRAGMA foreign_keys=1
A FOREIGN KEY is a constraint, it stops you INSERTING, UPDATING or DELETING a row that would violate the constraint/rule.
Basically the rule is that the column to which the FOREIGN key is defined (the child) must have a value that is in the referenced table/column the parent.
So assumning that FOREIGN KEYS are turned on then coding :-
CREATE TABLE IF NOT EXISTS study_participants_relationship
(
study_reference INTEGER REFERENCES studies(id), -- column foreign key constraint
participant_reference INTEGER,
FOREIGN KEY (participant_reference) REFERENCES participants(id) -- table foreign key constraint
PRIMARY KEY (study_reference,participant_reference
)
);
Would result in INSERT INTO study_participants_relationship VALUES(30,25); failing e.g.
FOREIGN KEY constraint failed: INSERT INTO study_participants_relationship VALUES(30,25);
It fails as there is no row in studies with an id who's value is 30 (the first column foreign key constraint). If the value 30 did exist then the second constraint would kick in as there is no row in participants with an id of 25.
There is no difference between a column Foreign key constraint and a table Foreign key constraint other than where and how they are coded.
However, the above wouldn't stop you deleting all rows from the study_participants_relationship table as it would stop you deleting a row from the studies or participants table if they were referenced by the study_participants_relationship table.

"deal with an id that needs to be matched to multiple ids in SQLite?"
For many-to-many couplings, make extra coupling tables, like the study_task and participent_task tables below. This is many-to-many since a task can be on many studies and a study can have many tasks.
"make sure no one is doing more than one study at the same time"
That could be handled by letting each participant only have a column for current study (no place for more than one then).
PRAGMA foreign_keys = ON;
CREATE TABLE study (id INTEGER PRIMARY KEY, study_title TEXT, contact_person TEXT);
CREATE TABLE task (id INTEGER PRIMARY KEY, task_name TEXT, description TEXT);
CREATE TABLE participant (
id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT,
id_current_study INTEGER references study(id),
started_current_study DATE
);
CREATE TABLE study_task (
id_study INTEGER NOT NULL references study(id),
id_task INTEGER NOT NULL references task(id),
primary key (id_study,id_task)
);
CREATE TABLE participant_task (
id_participant INTEGER NOT NULL references participant(id),
id_task INTEGER NOT NULL references task(id),
status TEXT check (status in ('STARTED', 'DELIVERED', 'PASSED', 'FAILED')),
primary key (id_participant,id_task)
);
insert into study values (1, 'MX9345-3', 'John Doe');
insert into study values (2, 'MX9300-2', 'Jane Doe');
insert into participant values (1001, 'Michael', 'Smith', 1,'2018-04-21');
insert into participant values (1002, 'Julia', 'Barnes', 1, '2018-04-10');
insert into task values (51, 'OGTT', 'Make a ...');
insert into task values (52, 'PVT', 'Inspect the ...');
insert into study_task values (1,51);
insert into study_task values (1,52);
insert into study_task values (2,51);
--insert into study_task values (2,66); --would fail since 66 doesnt exists (controlled and enforced by foreign key)
The PRAGMA on the first line is needed to make SQLite (above version 3.6 from 2009 I think) enforce foreign keys, without it it just accepts the foreign key syntax, but no controlling is done.

Related

Directionless relationship failing in PostgreSQL

I am trying to create a 2-way relationship table in PostgreSQL for my 3 objects. This idea has stemmed from the following question https://dba.stackexchange.com/questions/48568/how-to-relate-two-rows-in-the-same-table where I also want to store the relationship and its reverse between rows.
For context on my database: Object 1 which contains (aka relates to many) object2s. In turn, these object2s also relate to many object3s. A 1-to-many relationship (object 1 to object 2) and many-to-many relationship (object 2 to object 3)
Each of the objects have been assigned a UUID in other tables which contain info regarding them. Based on their UUID's I want to be able to query them and get the associated objects UUID as well. This in turn will show me the associations and direct me as to which object I should be looking at for location, info, etc just by knowing the UUID.
PLEASE NOTE - THAT ONE BOX MAY HAVE A RELTIONSHIP OF 10 SLOTS. THEREFORE THAT ONE UUID ASSIGNED FOR THE BOX WILL APPEAR IN MY UUID1 COLUMN 10 TIMES!! THIS IS A MUST!
My next step was to try and create a directionless relationship using this query:
CREATE TABLE bridge_x
(uuid1 UUID NOT NULL REFERENCES temp (uuid1), uuid2 UUID NOT NULL REFERENCES temp (uuid2),
PRIMARY KEY(uuid1, uuid2),
CONSTRAINT temp_temp_directionless
FOREIGN KEY (uuid2, uuid1)
REFERENCES bridge_x (uuid1, uuid2)
);
Is there any other way I can store ALL the information mentioned and be able to query the UUID in order to see the relationship between the objects?
You'll need a composite primary key in the bridge table. An example, using polygameous marriages:
CREATE TABLE person
(person_id INTEGER NOT NULL PRIMARY KEY
, name varchar NOT NULL
);
CREATE TABLE marriage
( person1 INTEGER NOT NULL
, person2 INTEGER NOT NULL
, comment varchar
, CONSTRAINT marriage_1 FOREIGN KEY (person1) REFERENCES person(person_id)
, CONSTRAINT marriage_2 FOREIGN KEY (person2) REFERENCES person(person_id)
, CONSTRAINT order_in_court CHECK (person1 < person2)
, CONSTRAINT polygamy_allowed UNIQUE (person1,person2)
);
INSERT INTO person(person_id,name) values (1,'Bob'),(2,'Alice'),(3,'Charles');
INSERT INTO marriage(person1,person2, comment) VALUES(1,2, 'Crypto marriage!') ; -- Ok
INSERT INTO marriage(person1,person2, comment) VALUES(2,1, 'Not twice!' ) ; -- Should fail
INSERT INTO marriage(person1,person2, comment) VALUES(3,3, 'No you dont...' ) ; -- Should fail
INSERT INTO marriage(person1,person2, comment) VALUES(2,3, 'OMG she did it again.' ) ; -- Should fail (does not)
INSERT INTO marriage(person1,person2, comment) VALUES(3,4, 'Non existant persons are not allowed to marry !' ) ; -- Should fail
SELECT p1.name, p2.name, m.comment
FROM marriage m
JOIN person p1 ON m.person1 = p1.person_id
JOIN person p2 ON m.person2 = p2.person_id
;
Result:
CREATE TABLE
CREATE TABLE
INSERT 0 3
INSERT 0 1
ERROR: new row for relation "marriage" violates check constraint "order_in_court"
DETAIL: Failing row contains (2, 1, Not twice!).
ERROR: new row for relation "marriage" violates check constraint "order_in_court"
DETAIL: Failing row contains (3, 3, No you dont...).
INSERT 0 1
ERROR: insert or update on table "marriage" violates foreign key constraint "marriage_2"
DETAIL: Key (person2)=(4) is not present in table "person".
name | name | comment
-------+---------+-----------------------
Bob | Alice | Crypto marriage!
Alice | Charles | OMG she did it again.
(2 rows)

Build correct logical db

I use 4 table in database.
Example:
Create table applicant(
Id int not null primary key,
IdName integer
idSkill integer,
idContact integer
Constraint initial foreign key (idName) References Initiale(id)
CONSTRAINT contacT foreign key (idContact) References contact(id)
CONSTRAINT Skills foreign key (idSkill) references skill(id))
Create table Initiale(
Id int not null primary key,
firstname text,
middlename text)
Create table contact(
Id int not null primary key,
phone text,
email text)
Create tabke Skills(
Id int not null primary key,,
Nane text)
I want insert data promptly in 4 table,but i not understand, what get id and insert in applicant.
Create tabke Skills ..... will fail. You should use Create table Skills.
Nor should you have Id int not null primary key,, (two commas), this would fail.
You should very likely be using Id INTEGER PRIMARY KEY not Id int not null primary key.
That is because generally an Id column should be a unique identifier of the row. With SQLite INTEGER PRIMARY KEY has a special meaining whilst INT PRIMARY KEY does not.
That is, if INTEGER PRIMARY KEY is used then the column will be an alias of the rowid column which has to be a unique signed 64 bit integer and importantly if no value is provided when inserting a row then SQLite will assign a unique integer. i.e. the all important Id.
This Id will initially be 1, then likely 2, then likely 3 and so on, although there is no guarantee that the Id will be monotonically increasing.
There are additional errors, mainly comma's omitted. The following should work :-
CREATE TABLE IF NOT EXISTS applicant(
Id INTEGER PRIMARY KEY,
IdName integer,
idSkill integer,
idContact integer,
Constraint initial foreign key (idName) References Initiale(id),
CONSTRAINT contacT foreign key (idContact) References contact(id),
CONSTRAINT Skills foreign key (idSkill) references Skills(id));
Create table IF NOT EXISTS Initiale(
Id INTEGER PRIMARY KEY,
firstname text,
middlename text);
Create table IF NOT EXISTS contact(
Id INTEGER PRIMARY KEY,
phone text,
email text);
Create table IF NOT EXISTS Skills(
Id INTEGER PRIMARY KEY,
Nane text);
You could then insert data along the lines of :-
INSERT INTO Initiale (firstname,middlename) -- Note absence of Id so SQLite will generate
VALUES
('Fred','James'), -- very likely id 1
('Alan','Roy'), -- very likely id 2
('Simon','Gerorge')-- very likely id 3
;
INSERT INTO contact -- alternative way of getting Id generated (specify null for Id)
VALUES
(null,'0123456789','email01#email.com'), -- very likely id 1
(null,'0987654321','email02#email.com'), -- very likely id 2
(null,'3333333333','email03.#email.com') -- very likely id 3
;
INSERT INTO Skills (Nane)
VALUES
('Skill01'),('Skill02'),('Skill03') -- very likely id's 1,2 and 3
;
INSERT INTO applicant (IdName,idSkill,idContact)
VALUES
-- First applicant
(2, -- Alan Roy
3, -- Skill 3
1), -- Contact 0123456789 )
-- Second Applicant
(3, -- Simon George
3, -- Skill 3
2), -- Contact 0987654321 )
-- Third Applicant
(2, -- Alan Roy again????
1, -- Skill 1
3), -- contact 3333333333)
(1,1,1) -- Fred James/ Skill 1/ Contact 0123456789
--- etc
;
Noting that Rows on the Initiale, Contact and Skills table MUST exists before insertions can be made into the Applicant table.
You could then run a query such as :-
SELECT * FROM applicant
JOIN Initiale ON Initiale.Id = idName
JOIN contact ON contact.Id = idContact
JOIN Skills ON Skills.Id = idSkill
This would result in (using the data as inserted above) :-

How do I check the value of a foreign key on insert?

I'm teaching myself SQL using Sqlite3, well suited for my forever-game project (Don't we all have one?) and have the following tables:
CREATE TABLE equipment_types (
row_id INTEGER PRIMARY KEY,
type TEXT NOT NULL UNIQUE);
INSERT INTO equipment_types (type) VALUES ('gear'), ('weapon');
CREATE TABLE equipment_names (
row_id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE);
INSERT INTO equipment_names (name) VALUES ('club'), ('band aids');
CREATE TABLE equipment (
row_id INTEGER PRIMARY KEY,
name INTEGER NOT NULL UNIQUE REFERENCES equipment_names,
type INTEGER NOT NULL REFERENCES equipment_types);
INSERT INTO equipment (name, type) VALUES (1, 2), (2, 1);
So now we have a 'club' that is a 'weapon', and 'band aids' that are 'gear'. I now want to make a weapons table; it will have an equipment_id that references the equipment table and weapon properties like damage and range, etc. I want to constrain it to equipment that is a 'weapon' type.
But for the life of me I can't figure it out. CHECK, apparently, only allows expressions, not subqueries, and I've been trying to craft a TRIGGER that might do the job, but in short, I can't quite figure out the query and syntax, or how to check the result that as I understand it will be in the form of a table, or null.
Also, are there good online resources for learning SQL more advanced than W3School? Add them as a comment, please.
Just write a query that looks up the type belonging to the new record:
CREATE TRIGGER only_weapons
BEFORE INSERT ON weapons
FOR EACH ROW
WHEN (SELECT et.type
FROM euqipment_types AS et
JOIN equipment AS e ON e.type = et.equipment_type_id
WHERE e.row_id = NEW.equipment_id
) != 'weapon'
BEGIN
SELECT RAISE(FAIL, "not a weapon");
END;
The foreign key references should be to the primary key and to the same time. I would phrase this as:
CREATE TABLE equipment_types (
equipment_type_id INTEGER PRIMARY KEY,
type TEXT NOT NULL UNIQUE
);
INSERT INTO equipment_types (type) VALUES ('gear'), ('weapon');
CREATE TABLE equipment_names (
equipment_name_id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
INSERT INTO equipment_names (name) VALUES ('club'), ('band aids');
CREATE TABLE equipment (
equipment_id INTEGER PRIMARY KEY,
equipment_name_id INTEGER NOT NULL UNIQUE REFERENCES equipment_names(equipment_name_id),
equipement_type_id INTEGER NOT NULL REFERENCES equipment_types(equipement_type_id)
);
I would not use the name row_id for the primary key. That is the built-inn default, so the name is not very good. In SQLite, an integer primary key is automatically auto-incremented (see here).

Database Normalization using Foreign Key

I have a sample table like below where Course Completion Status of a Student is being stored:
Create Table StudentCourseCompletionStatus
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
AlgorithmCourseStatus nvarchar(30),
DatabaseCourseStatus nvarchar(30),
NetworkingCourseStatus nvarchar(30),
MathematicsCourseStatus nvarchar(30),
ProgrammingCourseStatus nvarchar(30)
)
Insert into StudentCourseCompletionStatus Values (1, 'In Progress', 'In Progress', 'Not Started', 'Completed', 'Completed')
Insert into StudentCourseCompletionStatus Values (2, 'Not Started', 'In Progress', 'Not Started', 'Not Applicable', 'Completed')
Now as part of normalizing the schema I have created two other tables - CourseStatusType and Status for storing the Course Status names and Status.
Create Table CourseStatusType
(
CourseStatusTypeID int primary key identity(1,1),
CourseStatusType nvarchar(100) not null
)
Insert into CourseStatusType Values ('AlgorithmCourseStatus')
Insert into CourseStatusType Values ('DatabaseCourseStatus')
Insert into CourseStatusType Values ('NetworkingCourseStatus')
Insert into CourseStatusType Values ('MathematicsCourseStatus')
Insert into CourseStatusType Values ('ProgrammingCourseStatus')
Insert into CourseStatusType Values ('OperatingSystemsCourseStatus')
Insert into CourseStatusType Values ('CompilerCourseStatus')
Create Table Status
(
StatusID int primary key identity(1,1),
StatusName nvarchar (100) not null
)
Insert into Status Values ('Completed')
Insert into Status Values ('Not Started')
Insert into Status Values ('In Progress')
Insert into Status Values ('Not Applicable')
The modified table is as below:
Create Table StudentCourseCompletionStatus1
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
CourseStatusTypeID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_CourseStatusType] FOREIGN KEY (CourseStatusTypeID) REFERENCES dbo.CourseStatusType (CourseStatusTypeID),
StatusID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_Status] FOREIGN KEY (StatusID) REFERENCES Status (StatusID),
)
I have few question on this:
Is this the correct way to normalize it ? The old table was very helpful to get data easily - I can store a student's course status in a single row, but now 5 rows are required. Is there a better way to do it?
Moving the data from the old table to this new table seems to be not an easy task. Can I achieve this using a query or I have to manually to do this ?
Any help is appreciated.
vou could also consider storing results in flat table like this:
studentID,courseID,status
1,1,"completed"
1,2,"not started"
2,1,"not started"
2,3,"in progress"
you will also need additional Courses table like this
courserId,courseName
1, math
2, programming
3, networking
and a students table
students
1 "john smith"
2 "perry clam"
3 "john deere"
etc..you could also optionally create a status table to store the distinct statusstrings statusstings and refer to their PK instead ofthestrings
studentID,courseID,status
1,1,1
1,2,2
2,1,2
2,3,3
... etc
and status table
id,status
1,"completed"
2,"not started"
3,"in progress"
the beauty of this representation is: it is quite easy to filter and aggregate data , i.e it is easy to query which subjects a particular person have completed, how many subjects are completed by an average student, etc. this things are much more difficult in the columnar design like you had. you can also easily add new subjects without the need to adapt your tables or even queries they,will just work.
you can also always usin SQLs PIVOT query to get it to a familiar columnar presentation like
name,mathstatus,programmingstatus,networkingstatus,etc..
but now 5 rows are required
No, it's still just one row. That row simply contains identifiers for values stored in other tables.
There are pros and cons to this. One of the main reasons to normalize in this way is to protect the integrity of the data. If a column is just a string then anything can be stored there. But if there's a foreign key relationship to a table containing a finite set of values then only one of those options can be stored there. Additionally, if you ever want to change the text of an option or add/remove options, you do it in a centralized place.
Moving the data from the old table to this new table seems to be not an easy task.
No problem at all. Create your new numeric columns on the data table and populate them with the identifiers of the lookup table records associated with each data table record. If they're nullable, you can make them foreign keys right away. If they're not nullable then you need to populate them before you can make them foreign keys. Once you've verified that the data is correct, remove the old de-normalized columns. Done.
In StudentCourseCompletionStatus1 you still need 2 associations to Status and CourseStatusType. So I think you should consider following variant of normalization:
It means, that your StudentCourseCompletionStatus would hold only one CourseStatusID and another table CourseStatus would hold the associations to CourseType and Status.
To move your data you can surely use a query.

Insert into table from temporary table takes a long time

Morning folks,
I have a temporary table with 135,000 rows and 24 columns, of those rows I need to insert about 8,000 of them into an 8 column table. If run my insert the first time round (i.e. when my 8 column table is empty) it runs in about 6 seconds. When I run the same query again (this time round it shouldn't insert anything as the rows have already been inserted) it takes 30 minutes!!
I've been unable to re-create this with a small simplified sample, but here's some sql for you to run anyway. It's running the last insert when the programme table has entries which causes the problems. Can anyone shed some light as to why this might be?
CREATE TEMPORARY TABLE TVTEMPTABLE (
PROGTITLE TEXT, YR YEAR, DIRECTOR TEXT, GENRE TEXT
);
CREATE TABLE GENRE (
GENREID INT NOT NULL AUTO_INCREMENT, GENRE TEXT, PRIMARY KEY(GENREID)
) ENGINE=INNODB;
CREATE TABLE PROGRAMME (
PROGRAMMEID INT NOT NULL AUTO_INCREMENT, GENREID INT NOT NULL, PROGTITLE TEXT, YR YEAR,
DIRECTOR TEXT, PRIMARY KEY(PROGRAMMEID), INDEX (GENREID), FOREIGN KEY (GENREID) REFERENCES GENRE(GENREID)
) ENGINE=INNODB;
INSERT INTO GENRE(GENRE) VALUES
('Consumer'),('Entertainment'),('Comedy'),('Film'),('Drama'),('Sport'),
('Sitcom'),('Travel'),('Documentary'),('Factual');
INSERT INTO TVTEMPTABLE(PROGTITLE, YR, DIRECTOR, GENRE) VALUES
('Breakfast','2011','n/a','Consumer'),('Breakfast','2011','n/a','Consumer'),
('Wanted Down Under','2011','n/a','Entertainment'),('Wanted Down Under','2011','n/a','Entertainment'),
('Lorraine','2011','n/a','Comedy'),('Lorraine','2011','n/a','Comedy'),
('Supernanny USA','2011','n/a','Film'),('Supernanny USA','2011','n/a','Film'),
('Three Coins in the Fountain','2011','n/a','Drama'),('Three Coins in the Fountain','2011','n/a','Drama'),
('The Wright Stuff','2011','n/a','Sport'),('The Wright Stuff','2011','n/a','Sport'),
('This Morning','2011','n/a','Sitcom'),('This Morning','2011','n/a','Sitcom'),
('Homes Under the Hammer','2011','n/a','Travel'),('Homes Under the Hammer','2011','n/a','Travel'),
('LazyTown','2011','n/a','Documentary'),('LazyTown','2011','n/a','Documentary'),
('Jeremy Kyle','2011','n/a','Factual'),('Jeremy Kyle','2011','n/a','Factual');
INSERT INTO PROGRAMME (
PROGTITLE, GENREID, YR,
DIRECTOR)
SELECT
T.PROGTITLE, MAX(G.GENREID),
MAX(T.YR), MAX(T.DIRECTOR)
FROM
TVTEMPTABLE T
INNER JOIN GENRE G ON G.GENRE=T.GENRE
LEFT JOIN PROGRAMME P ON P.PROGTITLE=T.PROGTITLE
WHERE P.PROGTITLE IS NULL
GROUP BY T.PROGTITLE;
Edit: Is this what you mean by index?
CREATE TEMPORARY TABLE TVTEMPTABLE (
PROGTITLE VARCHAR(50), YR YEAR, DIRECTOR TEXT, GENRE VARCHAR(50), INDEX(PROGTITLE,GENRE)
);
CREATE TABLE PROGRAMME (
PROGRAMMEID INT NOT NULL AUTO_INCREMENT, GENREID INT NOT NULL, PROGTITLE VARCHAR(50), YR YEAR,
DIRECTOR TEXT, PRIMARY KEY(PROGRAMMEID), INDEX (GENREID,PROGTITLE), FOREIGN KEY (GENREID) REFERENCES GENRE(GENREID)
) ENGINE=INNODB;
Edit 2: This is the result from the desc extended. After indexing (I may have done this wrong?). The insert still takes a long time
Ok yes the answer was to properly index my tables, what I didn't realise however was
INDEX(A,B,C);
Is different from
INDEX(A),INDEX(B),INDEX(C);