Outputting the name of the column in SQLite - sql

I have created two tables and now I want to find the movie that yielded the highest revenue for each platform(Hulu, Disney and Netflix). The problem here is I do not know how to output the names of the platform as it is a column title. Can anyone help me?
CREATE TABLE "StreamedMovies" (
"Title" TEXT,
"Netflix" INTEGER, -- 1 if the movie is streamed in this platform, 0 otherwise
"Hulu" INTEGER, -- 1 if the movie is streamed in this platform, 0 otherwise
"Disney" INTEGER, -- 1 if the movie is streamed in this platform, 0 otherwise
"ScreenTime" REAL,
PRIMARY KEY("Title")
)
CREATE TABLE "MovieData" (
"Title" TEXT,
"Genre" TEXT,
"Director" TEXT,
"Casting" TEXT,
"Rating" REAL,
"Revenue" REAL,
PRIMARY KEY("Title")
)

You'll have to write a case statement.
select
Title,
case
when Netflix == 1 then 'Netflix'
when Hulu = 1 then 'Hulu'
when Disney = 1 then 'Disney'
end as Platform
from StreamedMovies
This indicates a flaw in your design. A number of flaws. For example, there's nothing stopping a row from having multiple platforms. Or no platforms. Or having a platform set to 42.
Instead, add a platforms table and a join table to indicate which movies are streaming on which platforms.
While we're at it we'll fix some other issues.
Titles can change. Use a simple integer primary key.
Don't quote column and table names, it makes them case sensitive.
Declare your foreign keys.
Use not null to require important data.
-- The platforms available for streaming.
create table platforms (
id integer primary key,
name text not null
);
insert into platforms (id, name)
values ('Netflix'), ('Hulu'), ('Disney+');
-- The movies.
create table movies (
id integer primary key,
title text not null
);
insert into movies (title) values ('Bad Taste');
-- A join table for which platforms movies are streaming on.
create table streamed_movies (
movie_id integer not null references movies,
platform_id integer not null references platforms
);
insert into streamed_movies (movie_id, platform_id) values (1, 1), (1, 3);
select
movies.title, platforms.name
from streamed_movies sm
join movies on sm.movie_id = movies.id
join platforms on sm.platform_id = platforms.id
title name
--------- -------
Bad Taste Netflix
Bad Taste Disney+

Related

How to make sure only one column is not null in postgresql table

I'm trying to setup a table and add some constraints to it. I was planning on using partial indexes to add constraints to create some composite keys, but ran into the problem of handling NULL values. We have a situation where we want to make sure that in a table only one of two columns is populated for a given row, and that the populated value is unique. I'm trying to figure out how to do this, but I'm having a tough time. Perhaps something like this:
CREATE INDEX foo_idx_a ON foo (colA) WHERE colB is NULL
CREATE INDEX foo_idx_b ON foo (colB) WHERE colA is NULL
Would this work? Additionally, is there a good way to expand this to a larger number of columns?
Another way to write this constraint is to use the num_nonulls() function:
create table table_name
(
a integer,
b integer,
check ( num_nonnulls(a,b) = 1)
);
This is especially useful if you have more columns:
create table table_name
(
a integer,
b integer,
c integer,
d integer,
check ( num_nonnulls(a,b,c,d) = 1)
);
You can use the following check:
create table table_name
(
a integer,
b integer,
check ((a is null) != (b is null))
);
If there are more columns, you can use the trick with casting boolean to integer:
create table table_name
(
a integer,
b integer,
...
n integer,
check ((a is not null)::integer + (b is not null)::integer + ... + (n is not null)::integer = 1)
);
In this example only one column can be not null (it simply counts not null columns), but you can make it any number.
One can do this with an insert/update trigger or checks, but having to do so indicates it could be done better. Constraints exist to give you certainty about your data so you don't have to be constantly checking if the data is valid. If one or the other is not null, you have to do the checks in your queries.
This is better solved with table inheritance and views.
Let's say you have (American) clients. Some are businesses and some are individuals. Everyone needs a Taxpayer Identification Number which can be one of several things such as a either a Social Security Number or Employer Identification Number.
create table generic_clients (
id bigserial primary key,
name text not null
);
create table individual_clients (
ssn numeric(9) not null
) inherits(generic_clients);
create table business_clients (
ein numeric(9) not null
) inherits(generic_clients);
SSN and EIN are both Taxpayer Identification Numbers and you can make a view which will treat both the same.
create view clients as
select id, name, ssn as tin from individual_clients
union
select id, name, ein as tin from business_clients;
Now you can query clients.tin or if you specifically want businesses you query business_clients.ein and for individuals individual_clients.ssn. And you can see how the inherited tables can be expanded to accommodate more divergent information between types of clients.

How to deal with an id that needs to be matched to multiple ids in SQLite?

I'm new to databases, so I'll start by showing what I would do if I was using a simple table in a csv file. Presently, I'm building a Shiny (R) app to keep track of people taking part in studies. The idea is to make sure no one is doing more than one study at the same time, and that enough time has passed between studies.
A single table would look like something like this:
study_title contact_person tasks first_name last_name
MX9345-3 John Doe OGTT Michael Smith
MX9345-3 John Doe PVT Michael Smith
MX9345-3 John Doe OGTT Julia Barnes
MX9345-3 John Doe PVT Julia Barnes
...
So each study has a single contact person, but multiple tasks. It is possible other studies will use the same tasks.
Each task should have a description
Each person can be connected to multiple studies (the final database has timestamps to make sure this does not happen at the same time), and consequently repeat the same tasks.
the SQLite code could look something like this
CREATE TABLE studies (study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (task_name TEXT, description TEXT);
CREATE TABLE participants (first_name TEXT, last_name TEXT);
Now I'm stuck. If I add a primary key and foreign keys (say in studies an ID for each study, and foreign keys for each task and person), the primary keys will repeat, which is not possible. A Study is defined by the tasks it contains (akin to an album and music tracks).
How should I approach this situation in SQLite? And importantly, how are the INSERTs done in these situations? I've seen lots of SELECT examples, but few INSERTs that match all IDs in each table, for example when adding a new person to a running study.
What you do is use tables to map/reference/relate/associate.
The first step would be to utilise alias's of the rowid so instead of :-
CREATE TABLE studies (study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (task_name TEXT, description TEXT);
CREATE TABLE participants (first_name TEXT, last_name TEXT);
you would use :-
CREATE TABLE studies (id INTEGER PRIMARY KEY,study_title TEXT, contact_person TEXT);
CREATE TABLE tasks (id INTEGER PRIMARY KEY, task_name TEXT, description TEXT);
CREATE TABLE participants (id INTEGER PRIMARY KEY, first_name TEXT, last_name TEXT);
With SQLite INTEGER PRIMARY KEY (or INTEGER PRIMARY KEY AUTOINCREMENT) makes the column (id in the above although they can have any valid column name) and alias of the rowid (max of 1 per table), which uniquely identifies the rows.
Why not to use AUTOINCREMENT plus more seeSQLite Autoincrement
Insert some data for demonstration :-
INSERT INTO studies (study_title, contact_person)
VALUES ('Maths','Mr Smith'),('English','Mrs Taylor'),('Geography','Mary White'),('Phsyics','Mr Smith');
INSERT INTO tasks (task_name,description)
VALUES ('Task1','Do task 1'),('Task2','Do task 2'),('Task3','Do task 3'),('Task4','Do task 4'),('Mark','Mark the sudies');
INSERT INTO participants (first_name,last_name)
VALUES ('Tom','Jones'),('Susan','Smythe'),('Sarah','Toms'),('Alan','Francis'),('Julian','MacDonald'),('Fred','Bloggs'),('Rory','Belcher');
First mapping/reference... Table :-
CREATE TABLE IF NOT EXISTS study_task_relationship (study_reference INTEGER, task_reference INTEGER, PRIMARY KEY (study_reference,task_reference));
Map/relate Study's with Tasks (many-many possible)
Do some mapping (INSERT some data) :-
INSERT INTO study_task_relationship
VALUES
(1,2), -- Maths Has Task2
(1,5), -- Maths has Mark Questions
(2,1), -- English has Task1
(2,4), -- English has Task4
(2,5), -- English has Mark questions
(3,3), -- Geography has Task3
(3,1), -- Geoegrapyh has Task1
(3,2), -- Geography has Task2
(3,5), -- Geography has Mark Questions
(4,4) -- Physics has Task4
;
- See comments on each line
List the Studies along with the tasks
SELECT study_title, task_name -- (just want the Study title and task name)
FROM study_task_relationship -- use the mapping table as the main table
JOIN studies ON study_reference = studies.id -- get the related studies
JOIN tasks ON task_reference = tasks.id -- get the related tasks
ORDER BY study_title -- Order by Study title
results in :-
List each study with all it's tasks
SELECT study_title, group_concat(task_name,'...') AS tasklist
FROM study_task_relationship
JOIN studies ON study_reference = studies.id
JOIN tasks ON task_reference = tasks.id
GROUP BY studies.id
ORDER by study_title;
results in :-
Add study-participants associative table and populate it :-
CREATE TABLE IF NOT EXISTS study_participants_relationship (study_reference INTEGER, participant_reference INTEGER, PRIMARY KEY (study_reference,participant_reference));
INSERT INTO study_participants_relationship
VALUES
(1,1), -- Maths has Tom Jones
(1,5), -- Maths has Julian MacDonald
(1,6), -- Maths has Fred Bloggs
(2,4), -- English has Alan Francis
(2,7), -- English has Rory Belcher
(3,3), -- Geography has Sarah Toms
(3,2) -- Susan Smythe
;
You can now, as an example, get a list of participants the tasks along with the study title :-
SELECT study_title, task_name, participants.first_name ||' '||participants.last_name AS fullname
FROM study_task_relationship
JOIN tasks ON study_task_relationship.task_reference = tasks.id
JOIN studies On study_task_relationship.study_reference = studies.id
JOIN study_participants_relationship ON study_task_relationship.study_reference = study_participants_relationship.study_reference
JOIN participants ON study_participants_relationship.participant_reference = participants.id
ORDER BY fullname, study_title
which would result in :-
FOREIGN KEYS
As you can see there is no actual need for defining FOREIGN KEYS. They are really just an aid to stop you inadvertently doing something like :-
INSERT INTO study_participants_relationship VALUES(30,25);
No such study nor no such participant
To utilise FOREIGN KEYS you have to ensure that they are enabled, the simplest is just to issue the command to turn them on (as if it were a normal SQL statment).
PRAGMA foreign_keys=1
A FOREIGN KEY is a constraint, it stops you INSERTING, UPDATING or DELETING a row that would violate the constraint/rule.
Basically the rule is that the column to which the FOREIGN key is defined (the child) must have a value that is in the referenced table/column the parent.
So assumning that FOREIGN KEYS are turned on then coding :-
CREATE TABLE IF NOT EXISTS study_participants_relationship
(
study_reference INTEGER REFERENCES studies(id), -- column foreign key constraint
participant_reference INTEGER,
FOREIGN KEY (participant_reference) REFERENCES participants(id) -- table foreign key constraint
PRIMARY KEY (study_reference,participant_reference
)
);
Would result in INSERT INTO study_participants_relationship VALUES(30,25); failing e.g.
FOREIGN KEY constraint failed: INSERT INTO study_participants_relationship VALUES(30,25);
It fails as there is no row in studies with an id who's value is 30 (the first column foreign key constraint). If the value 30 did exist then the second constraint would kick in as there is no row in participants with an id of 25.
There is no difference between a column Foreign key constraint and a table Foreign key constraint other than where and how they are coded.
However, the above wouldn't stop you deleting all rows from the study_participants_relationship table as it would stop you deleting a row from the studies or participants table if they were referenced by the study_participants_relationship table.
"deal with an id that needs to be matched to multiple ids in SQLite?"
For many-to-many couplings, make extra coupling tables, like the study_task and participent_task tables below. This is many-to-many since a task can be on many studies and a study can have many tasks.
"make sure no one is doing more than one study at the same time"
That could be handled by letting each participant only have a column for current study (no place for more than one then).
PRAGMA foreign_keys = ON;
CREATE TABLE study (id INTEGER PRIMARY KEY, study_title TEXT, contact_person TEXT);
CREATE TABLE task (id INTEGER PRIMARY KEY, task_name TEXT, description TEXT);
CREATE TABLE participant (
id INTEGER PRIMARY KEY,
first_name TEXT,
last_name TEXT,
id_current_study INTEGER references study(id),
started_current_study DATE
);
CREATE TABLE study_task (
id_study INTEGER NOT NULL references study(id),
id_task INTEGER NOT NULL references task(id),
primary key (id_study,id_task)
);
CREATE TABLE participant_task (
id_participant INTEGER NOT NULL references participant(id),
id_task INTEGER NOT NULL references task(id),
status TEXT check (status in ('STARTED', 'DELIVERED', 'PASSED', 'FAILED')),
primary key (id_participant,id_task)
);
insert into study values (1, 'MX9345-3', 'John Doe');
insert into study values (2, 'MX9300-2', 'Jane Doe');
insert into participant values (1001, 'Michael', 'Smith', 1,'2018-04-21');
insert into participant values (1002, 'Julia', 'Barnes', 1, '2018-04-10');
insert into task values (51, 'OGTT', 'Make a ...');
insert into task values (52, 'PVT', 'Inspect the ...');
insert into study_task values (1,51);
insert into study_task values (1,52);
insert into study_task values (2,51);
--insert into study_task values (2,66); --would fail since 66 doesnt exists (controlled and enforced by foreign key)
The PRAGMA on the first line is needed to make SQLite (above version 3.6 from 2009 I think) enforce foreign keys, without it it just accepts the foreign key syntax, but no controlling is done.

How do I check the value of a foreign key on insert?

I'm teaching myself SQL using Sqlite3, well suited for my forever-game project (Don't we all have one?) and have the following tables:
CREATE TABLE equipment_types (
row_id INTEGER PRIMARY KEY,
type TEXT NOT NULL UNIQUE);
INSERT INTO equipment_types (type) VALUES ('gear'), ('weapon');
CREATE TABLE equipment_names (
row_id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE);
INSERT INTO equipment_names (name) VALUES ('club'), ('band aids');
CREATE TABLE equipment (
row_id INTEGER PRIMARY KEY,
name INTEGER NOT NULL UNIQUE REFERENCES equipment_names,
type INTEGER NOT NULL REFERENCES equipment_types);
INSERT INTO equipment (name, type) VALUES (1, 2), (2, 1);
So now we have a 'club' that is a 'weapon', and 'band aids' that are 'gear'. I now want to make a weapons table; it will have an equipment_id that references the equipment table and weapon properties like damage and range, etc. I want to constrain it to equipment that is a 'weapon' type.
But for the life of me I can't figure it out. CHECK, apparently, only allows expressions, not subqueries, and I've been trying to craft a TRIGGER that might do the job, but in short, I can't quite figure out the query and syntax, or how to check the result that as I understand it will be in the form of a table, or null.
Also, are there good online resources for learning SQL more advanced than W3School? Add them as a comment, please.
Just write a query that looks up the type belonging to the new record:
CREATE TRIGGER only_weapons
BEFORE INSERT ON weapons
FOR EACH ROW
WHEN (SELECT et.type
FROM euqipment_types AS et
JOIN equipment AS e ON e.type = et.equipment_type_id
WHERE e.row_id = NEW.equipment_id
) != 'weapon'
BEGIN
SELECT RAISE(FAIL, "not a weapon");
END;
The foreign key references should be to the primary key and to the same time. I would phrase this as:
CREATE TABLE equipment_types (
equipment_type_id INTEGER PRIMARY KEY,
type TEXT NOT NULL UNIQUE
);
INSERT INTO equipment_types (type) VALUES ('gear'), ('weapon');
CREATE TABLE equipment_names (
equipment_name_id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
INSERT INTO equipment_names (name) VALUES ('club'), ('band aids');
CREATE TABLE equipment (
equipment_id INTEGER PRIMARY KEY,
equipment_name_id INTEGER NOT NULL UNIQUE REFERENCES equipment_names(equipment_name_id),
equipement_type_id INTEGER NOT NULL REFERENCES equipment_types(equipement_type_id)
);
I would not use the name row_id for the primary key. That is the built-inn default, so the name is not very good. In SQLite, an integer primary key is automatically auto-incremented (see here).

Distinct top 10 from multiple tables

I have these two tables in SQLite
CREATE TABLE "freq" (
`id` INTEGER,
`translation_id` INTEGER,
`freq` INTEGER DEFAULT NULL,
`year` INTEGER,
PRIMARY KEY(`id`),
FOREIGN KEY(`translation_id`) REFERENCES `translation`(`id`) DEFERRABLE INITIALLY DEFERRED
)
CREATE TABLE translation (
id INTEGER PRIMARY KEY,
w_id INTEGER,
word TEXT DEFAULT NULL,
located_in TEXT,
UNIQUE (word, language)
ON CONFLICT ABORT
)
Based on the values from these tables I want to create a third one which contains the top 10 words for every translation.located_in for every freq.year. This could look like this:
CREATE TABLE top_ten_words_by_country (
id INTEGER PRIMARY KEY,
located_in TEXT,
year INTEGER,
`translation_id` INTEGER,
freq INTEGER,
FOREIGN KEY(`translation_id`) REFERENCES `translation`(`id`) DEFERRABLE INITIALLY DEFERRED
)
Thats what I tried (for one country and one year) so far:
SELECT * FROM freq f, translation t
WHERE t.located_in = 'Germany' ORDER BY f.freq DESC
which has these problems:
it doesn't add up multiple words from translation which have the same w_id (which means they are a translation from each other)
it only works for one year and one country
it takes veeeeery long (I know joins are expensive, so its not that important to speed this up)
it contains duplicate translation.word
So can anyone provide me a way to do what I want?
The speed is the least important thing here for me.
Look, you have a cartesian product(there's no relation between your tables).
Besides, you have to use 'group by' clause.
And you can create a view instead a table.
Change your query to:
SELECT sum(f.freq) total_freq
, t.w_id
, t.located_in
, f.year
FROM freq f
, translation t
WHERE f.translation_id = t.id
group by t.w_id
, t.located_in
, f.year
ORDER BY total_freq DESC

Insert into table from temporary table takes a long time

Morning folks,
I have a temporary table with 135,000 rows and 24 columns, of those rows I need to insert about 8,000 of them into an 8 column table. If run my insert the first time round (i.e. when my 8 column table is empty) it runs in about 6 seconds. When I run the same query again (this time round it shouldn't insert anything as the rows have already been inserted) it takes 30 minutes!!
I've been unable to re-create this with a small simplified sample, but here's some sql for you to run anyway. It's running the last insert when the programme table has entries which causes the problems. Can anyone shed some light as to why this might be?
CREATE TEMPORARY TABLE TVTEMPTABLE (
PROGTITLE TEXT, YR YEAR, DIRECTOR TEXT, GENRE TEXT
);
CREATE TABLE GENRE (
GENREID INT NOT NULL AUTO_INCREMENT, GENRE TEXT, PRIMARY KEY(GENREID)
) ENGINE=INNODB;
CREATE TABLE PROGRAMME (
PROGRAMMEID INT NOT NULL AUTO_INCREMENT, GENREID INT NOT NULL, PROGTITLE TEXT, YR YEAR,
DIRECTOR TEXT, PRIMARY KEY(PROGRAMMEID), INDEX (GENREID), FOREIGN KEY (GENREID) REFERENCES GENRE(GENREID)
) ENGINE=INNODB;
INSERT INTO GENRE(GENRE) VALUES
('Consumer'),('Entertainment'),('Comedy'),('Film'),('Drama'),('Sport'),
('Sitcom'),('Travel'),('Documentary'),('Factual');
INSERT INTO TVTEMPTABLE(PROGTITLE, YR, DIRECTOR, GENRE) VALUES
('Breakfast','2011','n/a','Consumer'),('Breakfast','2011','n/a','Consumer'),
('Wanted Down Under','2011','n/a','Entertainment'),('Wanted Down Under','2011','n/a','Entertainment'),
('Lorraine','2011','n/a','Comedy'),('Lorraine','2011','n/a','Comedy'),
('Supernanny USA','2011','n/a','Film'),('Supernanny USA','2011','n/a','Film'),
('Three Coins in the Fountain','2011','n/a','Drama'),('Three Coins in the Fountain','2011','n/a','Drama'),
('The Wright Stuff','2011','n/a','Sport'),('The Wright Stuff','2011','n/a','Sport'),
('This Morning','2011','n/a','Sitcom'),('This Morning','2011','n/a','Sitcom'),
('Homes Under the Hammer','2011','n/a','Travel'),('Homes Under the Hammer','2011','n/a','Travel'),
('LazyTown','2011','n/a','Documentary'),('LazyTown','2011','n/a','Documentary'),
('Jeremy Kyle','2011','n/a','Factual'),('Jeremy Kyle','2011','n/a','Factual');
INSERT INTO PROGRAMME (
PROGTITLE, GENREID, YR,
DIRECTOR)
SELECT
T.PROGTITLE, MAX(G.GENREID),
MAX(T.YR), MAX(T.DIRECTOR)
FROM
TVTEMPTABLE T
INNER JOIN GENRE G ON G.GENRE=T.GENRE
LEFT JOIN PROGRAMME P ON P.PROGTITLE=T.PROGTITLE
WHERE P.PROGTITLE IS NULL
GROUP BY T.PROGTITLE;
Edit: Is this what you mean by index?
CREATE TEMPORARY TABLE TVTEMPTABLE (
PROGTITLE VARCHAR(50), YR YEAR, DIRECTOR TEXT, GENRE VARCHAR(50), INDEX(PROGTITLE,GENRE)
);
CREATE TABLE PROGRAMME (
PROGRAMMEID INT NOT NULL AUTO_INCREMENT, GENREID INT NOT NULL, PROGTITLE VARCHAR(50), YR YEAR,
DIRECTOR TEXT, PRIMARY KEY(PROGRAMMEID), INDEX (GENREID,PROGTITLE), FOREIGN KEY (GENREID) REFERENCES GENRE(GENREID)
) ENGINE=INNODB;
Edit 2: This is the result from the desc extended. After indexing (I may have done this wrong?). The insert still takes a long time
Ok yes the answer was to properly index my tables, what I didn't realise however was
INDEX(A,B,C);
Is different from
INDEX(A),INDEX(B),INDEX(C);