Inner join performance - sql

I have a table that has a lot of foreign keys that I will need to inner join so I can search. There might be upwards of 10 of them, meaning I'd have to do 10 inner joins. Each of the tables being joined may only be a few rows, compared to the massive (millions of rows) table that I am joining them with.
I just need to know if the joins are a fast way (using only Postgres) to do this, or if there might be a more clever way I can do it using subqueries or something.
Here is some made up data as an example:
create table a (
a_id serial primary key,
name character varying(32)
);
create table b (
b_id serial primary key,
name character varying(32)
);
--just 2 tables for simplicity, but i really need like 10 or more
create table big_table (
big_id serial primary key,
a_id int references a(a_id),
b_id int references b(b_id)
);
--filter big_table based on the name column of a and b
--big_table only contains fks to a and b, so in this example im using
--left joins so i can compare by the name column
select big_id,a.name,b.name from big_table
left join a using (a_id)
left join b using (b_id)
where (? is null or a.name=?) and (? is null or b.name=?);

Basically, joins are a fast way. Which way might be the fastest depends on the exact requirements. A couple of hints:
The purpose of your WHERE clause is unclear. It seems you intend to join to all look-up tables and include a condition for each, while you only actually need some of them. That's inefficient. Rather use dynamic-sql and only include in the query what you actually need.
With the current query, since all of your fk columns in the main table can be NULL, you must use LEFT JOIN instead of JOIN or you will exclude rows with NULL values in the fk columns.
The name columns in the look-up tables should certainly be defined NOT NULL. And I would not use the non-descriptive column name "name", that's an unhelpful naming convention. I also would use text instead of varchar(32).

Related

Does it matter whether you first write down the primary key or foreign key, when you use join?

When I use join, by the ON part I write down the connection, foreign key = primary key.
But when I reverse the fk and pk I still get the same result, does this mean that it doesn't matter which one goes first?
Example:
select
movie.title,
director.firstname,
director.lastname
from movie
join director on movie.director = director.directorcode
(This is the one with the right order FK=PK)
select
movie.title,
director.firstname,
director.lastname
from movie
join director on director.directorcode = movie.director
This is happening because in both cases the FK and PK columns refer to the same values.
In a SQL join operation, the result of the join will be the same regardless of the order of the FK and PK in the join condition, as long as the FK and PK columns refer to the same values.
The purpose of the join condition is to match up rows from the two tables based on the values in the FK and PK columns.
It does not matters what you write first in equals operation.

Is there a way to make a "natural join" with a table having a foreign key on another table?

This is just a dummy example.
In reality I have a foreign key which references a lot of more columns. That why I'm trying replace the normal join with a "natural join"
I have a table which has a foreign key on another table.
The columns don't have the same name.
I would like to avoid writing join on this_column_of_tableA= this_column_of_tableB
If the foreign key of tableB had the same name of the referencing column, I could do a natural join.
I have tried nonetheless. As expected it didn't work. (cross product)
But Oracle should know which column to use to make the join. Because it is the table definition.
Is this possible to make a jointure in this case without rewriting which column matchs which column. It's prone making errors and a waste of time.
create table TB (
TB_1 number,
constraint fk_TA foreign key (TB_1)
REFERENCES TA(TA_1)
);
create table TA (
TA_1 number,
constraint pk_departments primary key (TA_1)
);
INSERT INTO TA (TA_1)
VALUES (1);
INSERT INTO TA (TA_1)
VALUES (2);
INSERT INTO TA (TA_1)
VALUES (3);
INSERT INTO TB(TB_1)
VALUES (1);
INSERT INTO TB(TB_1)
VALUES (2);
select * from TA natural join TB
code
No, there is no simple way to do this and you should avoid this.
Natural join is very risky and should be prevented whenever possible.
Natural joins require both the identic name and data type of the columns. This means they fail whenever this is changed in just one of the joined tables and then you have to find out where and what.
Correct code is code that works without risks and works correctly and not the shortest possible code.
Maybe you could create a workaround on a very unpleasant way to make it possible, but I recommend to just use explicit join.
You can do something like this:
WITH A AS (SELECT TA_1 FROM TA)
, B AS (SELECT TB_1 TA_1 FROM TB)
SELECT TA_1 FROM A NATURAL JOIN B;
A fundamental feature of relational databases is that joins are defined by queries, never by the schema. You should generally specify the columns required from your tables and not rely on SELECT *.

Table just for grouping

Is it a common case to have a table with a single column for the purpose of grouping rows in another table?
I'm inserting data in batches and I want to have an autoincrement key for each batch to be able to group data based on generated id.
Concretely I want to get from this
A
id, x, y, b_id
id PRIMARY KEY
b_id FOREIGN KEY REFERENCES B.id
B
id, timestamp
id PRIMARY KEY
SELECT count(*) as number, B.timestamp FROM A inner join B on A.b_id=B.id
where A.x='value' and A.y='value'
group by B.id;
to
A
id, x, y, timestamp, b_id
id PRIMARY KEY
b_id FOREIGN KEY REFERENCES B.id
B
id
id PRIMARY KEY
SELECT count(*) as number, A.timestamp FROM A
where A.x='value' and A.y='value'
group by A.b_id, A.timestamp;
So basically move timestamp to B (denormalize) and use foreign key only for grouping. I want to avoid having join only for the timestamp placed in B. Tables are quite big (60M of rows) and join is very slow. If I still filter on A and have foreign key only for grouping then that would speed up things a lot.
Concretely, I'm using MySQL.
Denormalization can be acceptable for performance reasons. Just make sure that the performance improvement outweighs the cost of that denormalization. There will be costs not only in additional space requirements (which can cause their own performance issues), but also the cost of potential data errors. For example, when two rows end up in table "A" that have the same b_id, but different timestamp values.

SQL Relation and Query

I am trying to create a database that contains two tables. I have included the create_tables.sql code if this helps. I am trying to set the relationship to make the STKEY the defining key so that a query can be used to search for thr key and show what issues this student has been having. At the moment when I search using:
SELECT *
FROM student, student_log
WHERE 'tilbun' like student.stkey
It shows all the issues in the table regardless of the STKEY. I think I may have the foreign key set incorrectly. I have included the create_tables.sql here.
CREATE TABLE `student`
(
`STKEY` VARCHAR(10),
`first_name` VARCHAR(15),
`surname` VARCHAR(15),
`year_group` VARCHAR(4),
PRIMARY KEY (STKEY)
)
;
CREATE TABLE `student_log`
(
`issue_number` int NOT NULL AUTO_INCREMENT,
`STKEY` VARCHAR(10),
`date_field` DATETIME,
`issue` VARCHAR(150),
PRIMARY KEY (issue_number),
INDEX (STKEY),
FOREIGN KEY (STKEY) REFERENCES student (STKEY)
)
;
Cheers for the help.
Though you have correctly defined the foreign key relationship in the tables, you must still specify a join condition when performing the query. Otherwise, you'll get a cartesian product of the two tables (all rows of one times all rows of the other)
SELECT
student.*,
student_log.*
FROM student INNER JOIN student_log ON student.STKEY = student_log.STKEY
WHERE student.STKEY LIKE 'tilbun'
And note that rather than using an implicit join (comma-separated list of tables), I have used an explicit INNER JOIN, which is the preferred modern syntax.
Finally, there's little use to using a LIKE clause instead of = unless you also use wildcard characters
WHERE student.STKEY LIKE '%tilbun%'

Multiple-reference foreign keys in table definition?

Summary
How do I make it easy for non-programmers to write a query such as the following?
select
table_name.*
, foreign_table_1.name
, foreign_table_2.name
from
table_name
left outer join foreign_table foreign_table_1 on foreign_table_1.id = foreign_1_id
left outer join foreign_table foreign_table_2 on foreign_table_2.id = foreign_1_id
;
Context
I have a situation like this:
create table table_name (
id integer primary key autoincrement not null
, foreign_key_1_id integer not null
, foreign_key_2_id integer not null
, some_other_column varchar(255) null
);
create table foreign_table (
id integer primary key autoincrement not null
, name varchar(255) null
);
...in which both foreign_key_1_id and foreign_key_2_id reference foreign_table. (Obviously, this is simplified and abstracted.) To query and get the respective values, I might do something like this:
select
table_name.*
, foreign_table_1.name
, foreign_table_2.name
from
table_name
left outer join foreign_table foreign_table_1 on foreign_table_1.id = foreign_1_id
left outer join foreign_table foreign_table_2 on foreign_table_2.id = foreign_1_id
;
(That is, alias foreign_table in the join to link things up correctly.) This works fine. However, some of my clients want to use SQL Maestro to query the tables. This program uses the foreign key information to link up tables using a fairly straightforward interface ("Visual Query Builder"). For instance, the user can pick multiple tables and SQL Maestro will fill in the joins, like seen here:
(source: sqlmaestro.com)
(That's a diagram from their website, just for illustration.)
This strategy works great as long as foreign keys only reference one table. The multiple-reference situation seems to be confusing it because it gneerates SQL like this:
SELECT
table_name.some_other_column,
foreign_table.name
FROM
table_name
INNER JOIN foreign_table ON (table_name.foreign_key_1_id = foreign_table.id)
AND (table_name.foreign_key_2_id = foreign_table.id)
...because the foreign keys are defined as follows:
create table table_name (
id integer primary key autoincrement not null
, foreign_key_1_id integer not null
, foreign_key_2_id integer not null
, some_other_column varchar(255) null
---------------------------
-- The part that changed:
---------------------------
, foreign key (foreign_key_1_id) references foreign_table(id)
, foreign key (foreign_key_2_id) references foreign_table(id)
);
create table foreign_table (
id integer primary key autoincrement not null
, name varchar(255) not null
);
That's a problem because you only get 1 foreign_table.name value back, whereas there are often 2 separate values.
Question
How would I go about defining foreign keys to handle this situation? (Is it possible or does it make sense to do so? I wouldn't think that it would make a big difference in constraint checking, so I've thought that that is a reason I can't find any information.) My end goal is to make querying this information easy for my clients to do by themselves, and although this situation doesn't happen every day, it's time-consuming / frustrating to have to help people through it every time it comes up.
If there isn't a way of solving my foreign key problem this way, can you suggest any alternatives? I already have some ways of getting people this information though views, but people often need to have more flexibility than that.
It seems to me that your definition is just fine, and SQL Maestro is incorrectly interpreting two foreign keys to the same table as the same foreign key, so I would alert them to that fact so that they can fix it.