SQL Relation and Query - sql

I am trying to create a database that contains two tables. I have included the create_tables.sql code if this helps. I am trying to set the relationship to make the STKEY the defining key so that a query can be used to search for thr key and show what issues this student has been having. At the moment when I search using:
SELECT *
FROM student, student_log
WHERE 'tilbun' like student.stkey
It shows all the issues in the table regardless of the STKEY. I think I may have the foreign key set incorrectly. I have included the create_tables.sql here.
CREATE TABLE `student`
(
`STKEY` VARCHAR(10),
`first_name` VARCHAR(15),
`surname` VARCHAR(15),
`year_group` VARCHAR(4),
PRIMARY KEY (STKEY)
)
;
CREATE TABLE `student_log`
(
`issue_number` int NOT NULL AUTO_INCREMENT,
`STKEY` VARCHAR(10),
`date_field` DATETIME,
`issue` VARCHAR(150),
PRIMARY KEY (issue_number),
INDEX (STKEY),
FOREIGN KEY (STKEY) REFERENCES student (STKEY)
)
;
Cheers for the help.

Though you have correctly defined the foreign key relationship in the tables, you must still specify a join condition when performing the query. Otherwise, you'll get a cartesian product of the two tables (all rows of one times all rows of the other)
SELECT
student.*,
student_log.*
FROM student INNER JOIN student_log ON student.STKEY = student_log.STKEY
WHERE student.STKEY LIKE 'tilbun'
And note that rather than using an implicit join (comma-separated list of tables), I have used an explicit INNER JOIN, which is the preferred modern syntax.
Finally, there's little use to using a LIKE clause instead of = unless you also use wildcard characters
WHERE student.STKEY LIKE '%tilbun%'

Related

Inner join performance

I have a table that has a lot of foreign keys that I will need to inner join so I can search. There might be upwards of 10 of them, meaning I'd have to do 10 inner joins. Each of the tables being joined may only be a few rows, compared to the massive (millions of rows) table that I am joining them with.
I just need to know if the joins are a fast way (using only Postgres) to do this, or if there might be a more clever way I can do it using subqueries or something.
Here is some made up data as an example:
create table a (
a_id serial primary key,
name character varying(32)
);
create table b (
b_id serial primary key,
name character varying(32)
);
--just 2 tables for simplicity, but i really need like 10 or more
create table big_table (
big_id serial primary key,
a_id int references a(a_id),
b_id int references b(b_id)
);
--filter big_table based on the name column of a and b
--big_table only contains fks to a and b, so in this example im using
--left joins so i can compare by the name column
select big_id,a.name,b.name from big_table
left join a using (a_id)
left join b using (b_id)
where (? is null or a.name=?) and (? is null or b.name=?);
Basically, joins are a fast way. Which way might be the fastest depends on the exact requirements. A couple of hints:
The purpose of your WHERE clause is unclear. It seems you intend to join to all look-up tables and include a condition for each, while you only actually need some of them. That's inefficient. Rather use dynamic-sql and only include in the query what you actually need.
With the current query, since all of your fk columns in the main table can be NULL, you must use LEFT JOIN instead of JOIN or you will exclude rows with NULL values in the fk columns.
The name columns in the look-up tables should certainly be defined NOT NULL. And I would not use the non-descriptive column name "name", that's an unhelpful naming convention. I also would use text instead of varchar(32).

SQL query, search data and return project name

Hey I'm looking for some help in creating a stored procedure.
Here are the details
I have a table called Partners which holds the partner information (Columns, PartnerID and partnername) I also have another table called ProjectPartners which holds the link between the project and the partners columns( PPID, Partner1, partner2, partner3....partner25) and I have a further table called ProjectDetails which holds the information on the project columns( ProjectDID, Project) The foreign key for projectpartners is within Projectdetails.
I'm looking to create a stored procedure that allows me to enter a partner name, this then displays the projects they are included within. I already have some mock code but it doesn't seem to work.
#partnername nvarchar(50)
AS
SET NOCOUNT ON;
SELECT ProjectDID, Project
FROM Projectdetails
WHERE Partners.PartnerName = #partnername
Any help will be much appreciated
You are missing the joins through your table schema to get the necessary data.
Take a read of this MSDN article about joins.
select ProjectDetails.ProjectDID, ProjectDetails.Project
from ProjectDetails
join ProjectPartners on ProjectPartners.ProjectDID = ProjectDetails.ProjectDID
join Partners on Partners.PartnerId = ProjectPartners.PPID
where Partners.PartnerName = #partnerName
You haven't described the relationship between ProjectPartners and Partner, so I am assuming that the PPID column on ProjectPartners is the relationship
You have also mentioned that your ProjectPartners table has the columns PPID, Partner1, partner2, partner3....partner25. Are you only planning on having 25 partners. If you have 26 will you add a new column? You might want to address that.
Also in column naming conventions, some are a bit muddled.
You have PPID on ProjectPartners. I presume this means ProjectPartnersId.
On the table ProjectDetails you have the column ProjectDID.
This is slightly inconsistent. I guess it should either be PDID on ProjectDetails or ProjectPID on ProjectPartners
Personally, I have always had always had a preference for plain old Id as my Identity column.
UPDATE:
Based on your comments below, it sounds like you might have something a little fundamental wrong with your tables:
create table Partners (
Id int not null primary key identity,
PartnerName nvarchar(100) not null)
go
create table ProjectDetails(
Id int not null primary key identity,
Project nvarchar(100) not null)
go
create table ProjectPartners (
PartnersId int not null,
ProjectDetailsId int not null
)
go
alter table ProjectPartners add constraint FK_ProjectPartners_PartnersId_Partners_Id foreign key (PartnersId) references Partners(Id)
alter table ProjectPartners add constraint FK_ProjectPartners_ProjectDetailsId_ProjectDetails_Id foreign key (ProjectDetailsId) references ProjectDetails(Id)
go
I would suggest changing your database schema to one that is a bit more flexible as per the one provided above.
This will prevent the ever growing ProjectPartners table by adding a new column each time you have a new partner.
It will fix all issues with your foreign keys and make your tables a bit more intuitive.
This would now yield the SQL:
select ProjectDetails.Project, ProjectDetails.Id
from ProjectDetails
join ProjectPartners on ProjectPartners.ProjectDetailsId = ProjectDetails.Id
join Partners on Partners.Id = ProjectPartners.PartnersId
where Partners.PartnerName= #partnerName

Matching delimited string to table rows

So I have two tables in this simplified example: People and Houses. People can own multiple houses, so I have a People.Houses field which is a string with comma delimeters (eg: "House1, House2, House4"). Houses can have multiple people in them, so I have a Houses.People field, which works the same way ("Sam, Samantha, Daren").
I want to find all the rows in the People table corresponding to the the names of people in the given house, and vice versa for houses belong to people. But I can't figure out how to do that.
This is as close as I've come up with so far:
SELECT People.*
FROM Houses
LEFT JOIN People ON Houses.People Like CONCAT(CONCAT('%', People.Name), '%')
WHERE House.Name = 'SomeArbitraryHouseImInterestedIn'
But I get some false positives (eg: Sam and Samantha might both get grabbed when I just want Samantha. And likewise with House3, House34, and House343, when I want House343).
I thought I might try and write a SplitString function so I could split a string (using a list of delimiters) into a set, and do some subquery on that set, but MySQL functions can't have tables as return values.
Likewise you can't store arrays as fields, and from what I gather the comma-delimited elements in a long string seems to be the usual way to approach this problem.
I can think of some different ways to get what I want but I'm wondering if there isn't a nice solution.
Likewise you can't store arrays as fields, and from what I gather the comma-delimited elements in a long string seems to be the usual way to approach this problem.
I hope that's not true. Representing "arrays" in SQL databases shouldn't be in a comma-delimited format, but the problem can be correctly solved by using a junction table. Comma-separated fields should have no place in relational databases, and they actually violates the very first normal form.
You'd want your table schema to look something like this:
CREATE TABLE people (
id int NOT NULL,
name varchar(50),
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE houses (
id int NOT NULL,
name varchar(50),
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE people_houses (
house_id int,
person_id int,
PRIMARY KEY (house_id, person_id),
FOREIGN KEY (house_id) REFERENCES houses (id),
FOREIGN KEY (person_id) REFERENCES people (id)
) ENGINE=INNODB;
Then searching for people will be as easy as this:
SELECT p.*
FROM houses h
JOIN people_houses ph ON ph.house_id = h.id
JOIN people p ON p.id = ph.person_id
WHERE h.name = 'SomeArbitraryHouseImInterestedIn';
No more false positives, and they all lived happily ever after.
The nice solution is to redesign your schema so that you have the following tables:
People
------
PeopleID (PK)
...
PeopleHouses
------------
PeopleID (PK) (FK to People)
HouseID (PK) (FK to Houses)
Houses
------
HouseID (PK)
...
Short Term Solution
For your immediate problem, the FIND_IN_SET function is what you want to use for joining:
For People
SELECT p.*
FROM PEOPLE p
JOIN HOUSES h ON FIND_IN_SET(p.name, h.people)
WHERE h.name = ?
For Houses
SELECT h.*
FROM HOUSES h
JOIN PEOPLE p ON FIND_IN_SET(h.name, p.houses)
WHERE p.name = ?
Long Term Solution
Is to properly model this by adding a table to link houses to people, because you're likely storing redundant relationships in both tables:
CREATE TABLE people_houses (
house_id int,
person_id int,
PRIMARY KEY (house_id, person_id),
FOREIGN KEY (house_id) REFERENCES houses (id),
FOREIGN KEY (person_id) REFERENCES people (id)
)
The problem is that you have to use another schema, like the one proposed by #RedFilter. You can see it as:
People table:
PeopleID
otherFields
Houses table:
HouseID
otherFields
Ownership table:
PeopleID
HouseID
otherFields
Hope that helps,
Hi you just change the table name places, left side is People and then right side is Houses:
SELECT People.*
FROM People
LEFT JOIN Houses ON Houses.People Like CONCAT(CONCAT('%', People.Name), '%')
WHERE House.Name = 'SomeArbitraryHouseImInterestedIn'

Multiple-reference foreign keys in table definition?

Summary
How do I make it easy for non-programmers to write a query such as the following?
select
table_name.*
, foreign_table_1.name
, foreign_table_2.name
from
table_name
left outer join foreign_table foreign_table_1 on foreign_table_1.id = foreign_1_id
left outer join foreign_table foreign_table_2 on foreign_table_2.id = foreign_1_id
;
Context
I have a situation like this:
create table table_name (
id integer primary key autoincrement not null
, foreign_key_1_id integer not null
, foreign_key_2_id integer not null
, some_other_column varchar(255) null
);
create table foreign_table (
id integer primary key autoincrement not null
, name varchar(255) null
);
...in which both foreign_key_1_id and foreign_key_2_id reference foreign_table. (Obviously, this is simplified and abstracted.) To query and get the respective values, I might do something like this:
select
table_name.*
, foreign_table_1.name
, foreign_table_2.name
from
table_name
left outer join foreign_table foreign_table_1 on foreign_table_1.id = foreign_1_id
left outer join foreign_table foreign_table_2 on foreign_table_2.id = foreign_1_id
;
(That is, alias foreign_table in the join to link things up correctly.) This works fine. However, some of my clients want to use SQL Maestro to query the tables. This program uses the foreign key information to link up tables using a fairly straightforward interface ("Visual Query Builder"). For instance, the user can pick multiple tables and SQL Maestro will fill in the joins, like seen here:
(source: sqlmaestro.com)
(That's a diagram from their website, just for illustration.)
This strategy works great as long as foreign keys only reference one table. The multiple-reference situation seems to be confusing it because it gneerates SQL like this:
SELECT
table_name.some_other_column,
foreign_table.name
FROM
table_name
INNER JOIN foreign_table ON (table_name.foreign_key_1_id = foreign_table.id)
AND (table_name.foreign_key_2_id = foreign_table.id)
...because the foreign keys are defined as follows:
create table table_name (
id integer primary key autoincrement not null
, foreign_key_1_id integer not null
, foreign_key_2_id integer not null
, some_other_column varchar(255) null
---------------------------
-- The part that changed:
---------------------------
, foreign key (foreign_key_1_id) references foreign_table(id)
, foreign key (foreign_key_2_id) references foreign_table(id)
);
create table foreign_table (
id integer primary key autoincrement not null
, name varchar(255) not null
);
That's a problem because you only get 1 foreign_table.name value back, whereas there are often 2 separate values.
Question
How would I go about defining foreign keys to handle this situation? (Is it possible or does it make sense to do so? I wouldn't think that it would make a big difference in constraint checking, so I've thought that that is a reason I can't find any information.) My end goal is to make querying this information easy for my clients to do by themselves, and although this situation doesn't happen every day, it's time-consuming / frustrating to have to help people through it every time it comes up.
If there isn't a way of solving my foreign key problem this way, can you suggest any alternatives? I already have some ways of getting people this information though views, but people often need to have more flexibility than that.
It seems to me that your definition is just fine, and SQL Maestro is incorrectly interpreting two foreign keys to the same table as the same foreign key, so I would alert them to that fact so that they can fix it.

Polymorphism in SQL database tables?

I currently have multiple tables in my database which consist of the same 'basic fields' like:
name character varying(100),
description text,
url character varying(255)
But I have multiple specializations of that basic table, which is for example that tv_series has the fields season, episode, airing, while the movies table has release_date, budget etc.
Now at first this is not a problem, but I want to create a second table, called linkgroups with a Foreign Key to these specialized tables. That means I would somehow have to normalize it within itself.
One way of solving this I have heard of is to normalize it with a key-value-pair-table, but I do not like that idea since it is kind of a 'database-within-a-database' scheme, I do not have a way to require certain keys/fields nor require a special type, and it would be a huge pain to fetch and order the data later.
So I am looking for a way now to 'share' a Primary Key between multiple tables or even better: a way to normalize it by having a general table and multiple specialized tables.
Right, the problem is you want only one object of one sub-type to reference any given row of the parent class. Starting from the example given by #Jay S, try this:
create table media_types (
media_type int primary key,
media_name varchar(20)
);
insert into media_types (media_type, media_name) values
(2, 'TV series'),
(3, 'movie');
create table media (
media_id int not null,
media_type not null,
name varchar(100),
description text,
url varchar(255),
primary key (media_id),
unique key (media_id, media_type),
foreign key (media_type)
references media_types (media_type)
);
create table tv_series (
media_id int primary key,
media_type int check (media_type = 2),
season int,
episode int,
airing date,
foreign key (media_id, media_type)
references media (media_id, media_type)
);
create table movies (
media_id int primary key,
media_type int check (media_type = 3),
release_date date,
budget numeric(9,2),
foreign key (media_id, media_type)
references media (media_id, media_type)
);
This is an example of the disjoint subtypes mentioned by #mike g.
Re comments by #Countably Infinite and #Peter:
INSERT to two tables would require two insert statements. But that's also true in SQL any time you have child tables. It's an ordinary thing to do.
UPDATE may require two statements, but some brands of RDBMS support multi-table UPDATE with JOIN syntax, so you can do it in one statement.
When querying data, you can do it simply by querying the media table if you only need information about the common columns:
SELECT name, url FROM media WHERE media_id = ?
If you know you are querying a movie, you can get movie-specific information with a single join:
SELECT m.name, v.release_date
FROM media AS m
INNER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?
If you want information for a given media entry, and you don't know what type it is, you'd have to join to all your subtype tables, knowing that only one such subtype table will match:
SELECT m.name, t.episode, v.release_date
FROM media AS m
LEFT OUTER JOIN tv_series AS t USING (media_id)
LEFT OUTER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?
If the given media is a movie,then all columns in t.* will be NULL.
Consider using a main basic data table with tables extending off of it with specialized information.
Ex.
basic_data
id int,
name character varying(100),
description text,
url character varying(255)
tv_series
id int,
BDID int, --foreign key to basic_data
season,
episode
airing
movies
id int,
BDID int, --foreign key to basic_data
release_data
budget
What you are looking for is called 'disjoint subtypes' in the relational world. They are not supported in sql at the language level, but can be more or less implemented on top of sql.
You could create one table with the main fields plus a uid then extension tables with the same uid for each specific case. To query these like separate tables you could create views.
Using the disjoint subtype approach suggested by Bill Karwin, how would you do INSERTs and UPDATEs without having to do it in two steps?
Getting data, I can introduce a View that joins and selects based on specific media_type but AFAIK I cant update or insert into that view because it affects multiple tables (I am talking MS SQL Server here). Can this be done without doing two operations - and without a stored procedure, natually.
Thanks
Question is quite old but for modern postresql versions it's also worth considering using json/jsonb/hstore type.
For example:
create table some_table (
name character varying(100),
description text,
url character varying(255),
additional_data json
);