How do you keep a JOIN table performant? - sql

I'm drawing up plans for a few new features on my site, and one could be "solved" using a join table.
Example schema:
Person table
PK PersonId
Name
Age ...
PersonCheckin table
PK FK PersonId
PK FK CheckinId
Date ...
Checkin table
PK CheckinId
CheckedInto ...
A join would be run to get the check in data for a person (connected by the PersonCheckin table). Since every person could check in an unlimited number of times, the PersonCheckin table could become very large.
I'd imagine this would cause some performance issues. What are typical ways this is handled to keep performance high?

A join is considering the best performing means of connecting related tables.
But it really depends on the query, because it might not need to be a JOIN -- JOINing can inflate the record set on the parent tables side if there are more than one child record related, which means there could be a need for either GROUP BY or DISTINCT. EXISTS or IN is a better choice in such situations...
Indexes can help on the column(s) used in the JOIN criteria, on both sides of the relationship. In this example both sides are primary keys, which typically have the best index automatically created when the primary key is defined...

If you are going to execute this query very often and you want to achieve better performance just create a view on the database where you write the join query

Related

How to understand this query?

SELECT DISTINCT
...
...
...
FROM Reviews Rev
INNER JOIN Reviews SubRev ON Subrev.W_ID=Rev.ID
WHERE Rev.Status='Approved'
This is a small part of a long query that I've been trying to understand for a day now. What is happening with the join? Reviews table appears to be joined with itself, under different aliases. Why is this done? What does it achieve? Also, ID field of the Reviews table is null for the entries that are nevertheless selected and returned. This is correct, but I don't understand how that can happen if the W_ID field is not null.
It allows you to join one row from the table to a different row in the table.
I've both seen this done, and used it myself, in cases where you maybe have a relationship between those rows.
Real-world examples:
An old version of a record and a newer version
Some sort of hierarchical relationship (e.g. if the table contains records of people, you can record that someone is a parent of someone else). There are probably plenty of other possible use cases, too.
SQL allows you to create a foreign key which relates between two different columns in the same table.

Extending table with another table ... sort of

I have a DB about renting cars.
I created a CarModels table (ModelID as PK).
I want to create a second table with the same primary key as CarModels have.
This table only contains the number of times this Model was searched on my website.
So lets say you visit my website, you can check a list that contains common cars rented.
"Most popular Cars" table.
It's not about One-to-One relationship, that's for sure.
Is there any SQL code to connect two Primary keys together ?
select m.ModelID, m.Field1, m.Field2,
t.TimesSearched
from CarModels m
left outer join Table2 t on m.ModelID = t.ModelID
but why not simply add the field TimesSearched to table CarModels ?
Then you dont need another table
Easiest is to just use a new primary key on the new table with a foreign key to the CarModels table, like [CarModelID] INT NOT NULL. You can put an index and a unique constraint on the FK.
If you reeeealy want them to be the same, you can jump through a bunch of hoops that will make your life Hell, like creating the table from the CarModels table, then setting that field as the primary key, then whenever you add a new CarModel you'll have to create a trigger that will SET IDENTITY_INSERT ON so you can add the new one, and remember to SET IDENTITY_INSERT OFF when you're done.
Personally, I'd create a CarsSearched table that holds ThisUser selected ThisCarModel on ThisDate: then you can start doing some fun data analysis like [are some cars more popular in certain zip codes or certain times of year?], or [this company rents three cars every year in March, so I'll send them a coupon in January].
You are not extending anything (modifying the actual model of the table). You simply need to make INNER JOIN of the table linking with the primary keys being equal.
It could be outer join as it has been suggested but if it's 1:1 like you said ( the second table with have exact same keys - I assume all of them), inner will be enough as both tables would have the same set of same prim keys.
As a bonus, it will also produce fewer rows if you didn't match all keys as a nice reminder if you fail to match all PKs.
That being said, do you have a strong reason why not to keep the said number in the same table? You are basically modeling 1:1 relationship for 1 extra column (and small one too, by data type)
You could extend (now this is extending tables model) with the additional attribute of integer that keeps that number for you.
Later is preferred for simplicity and lower query times.

SQL One-to-one relationship join

I have 2 tables one is an extension of the other so it is currently a simple one-to-one relationship (this is likely to become one-to-many in the future). I need to join from one table to another to pull a value out of another column in the extension.
so table A contains basic details including an id and table B uses a FK reference to the Id column in table A. I need to pull out column X from table B.
To add complexity sometimes there won't be a matching entry in table B but in that case it needs to return null. Also the value of X could be null.
I know I can use a left outer join but is there a more efficient way to perform the join?
Left outer join is the way. In order to make it most efficient, make sure you index the FK column in table B. It will be super-fast with the index.
You don't need to index the primary key in table A for this query (and most databases already index primary keys anyway).
The MySQL syntax for creating the index:
CREATE INDEX `fast_lookups` ON `table_b` (`col_name`);
You can name it whatever, I picked "fast_lookups."

Unable to join 6 tables with out any primary or foreign keys - Out of memory

I have six tables, which unfortunately does not have any primary/foreign key-relations encoded. I've tried to create a view, however I am more than happy with just a table.
I've managed to join the six tables, but after returning 33.5 million rows (cleanInventtrans has 1.7 million rows and every time more tables are added, I am assuming the tables are multiplied by the combination of the previous table) I run out of memory. Now, I realize this is not the correct way of doing this, if it was I'd get a result I am assuming.
I've looked at a couple of questions online
http://www.daniweb.com/web-development/databases/ms-sql/threads/123446/problem-using-join-with-six-tables
SQL joining 6 tables
Unable to relate two MySQL tables (foreign keys)
And I've looked at
http://www.techonthenet.com/sql/joins.php
However, they assume there is a primary-foreign key relationship, my tables do not have this but there are fields which correspond, between the various tables, as seen in the SQL code below. I use these to match up the tables, but somewhere along the line, I am clearly doing it wrong.
I understand that by doing it like this I am essentially multiplying results with each table I join with. I was hoping there was a more clever/correct way of doing it which reduces the amount of results to a management size.
I am unfortunately unable to provide a create statement.
The code I am using:
SELECT dbo.AX_SALESLINE.SALESID, dbo.AX_SALESLINE.ITEMID, dbo.AX_SALESLINE.QTYORDERED, dbo.AX_SALESLINE.SALESPRICE, dbo.AX_SALESLINE.LINEPERCENT,
dbo.AX_SALESLINE.LINEAMOUNT, dbo.AX_SALESLINE.SALESQTY, dbo.AX_SALESLINE.CONFIRMEDDLV, dbo.CleanInventTrans.COSTAMOUNTPOSTED,
dbo.CleanInventTrans.DATEPHYSICAL, dbo.AX_CUSTPACKINGSLIPJOUR.DELIVERYDATE, dbo.AX_SALESTABLE.CUSTACCOUNT, dbo.AX_SALESTABLE.SALESTYPE,
dbo.AX_SALESTABLE.SALESSTATUS, dbo.AX_CUSTPACKINGSLIPJOUR.QTY, dbo.AX_PRODTABLE.PRODID
FROM dbo.AX_CUSTPACKINGSLIPJOUR INNER JOIN
dbo.AX_SALESTABLE INNER JOIN
dbo.CleanInventTrans INNER JOIN
dbo.AX_INVENTTABLE ON dbo.CleanInventTrans.ITEMID = dbo.AX_INVENTTABLE.ITEMID INNER JOIN
dbo.AX_PRODTABLE ON dbo.AX_INVENTTABLE.ITEMID = dbo.AX_PRODTABLE.ITEMID INNER JOIN
dbo.AX_SALESLINE ON dbo.AX_INVENTTABLE.ITEMID = dbo.AX_SALESLINE.ITEMID ON dbo.AX_SALESTABLE.SALESID = dbo.AX_SALESLINE.SALESID ON
dbo.AX_CUSTPACKINGSLIPJOUR.SALESID = dbo.AX_SALESLINE.SALESID
Edit:
Salesline and salestable can be related via P/F-keys, however inventtable and salesline does not have a relationship beyond having a column named itemid, which is the same in both. But these have duplicates and can not function as keys.
Edit2:
Salesline and inventtable cannot be related for some reason:
'AX_INVENTTABLE' table saved successfully
'AX_SALESLINE' table
- Unable to create relationship 'FK_AX_SALESLINE_AX_INVENTTABLE'.
The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "FK_AX_SALESLINE_AX_INVENTTABLE". The conflict occurred in database "VS1", table "dbo.AX_INVENTTABLE", column 'ITEMID'.
I've checked the itemid in inventtable, it is unique, and there are multiple of the itemid in the salesline - however not all ID's are numerical in nature, some contain letters. I am not sure if this is what is causing the problem yet.
Edit3:
Foreign-primary relation fails because there are IDs in salines which are not present in the inventtable. Very odd.
Don't bring back all of the rows from these 6 tables. If there are 1million plus rows in some of the tables at least filter on one of these tables, i.e.the CleanIventTrans table using a WHERE clause. If you're not going to add primary and foreign keys, at least add indexes to help improve performance.

How to Delete all data from a table which contain self referencing foreign key

I have a table which has employee relationship defined within itself.
i.e.
EmpID Name SeniorId
-----------------------
1 A NULL
2 B 1
3 C 1
4 D 3
and so on...
Where Senior ID is a foreign key whose primary key table is same with refrence column EmpId
I want to clear all rows from this table without removing any constraint. How can i do this?
Deletion need to be performed like this
4, 3 , 2 , 1
How can I do this
EDIT:
Jhonny's Answer is working for me but which of the answers are more efficient.
I don't know if I am missing something, but maybe you can try this.
UPDATE employee SET SeniorID = NULL
DELETE FROM employee
If the table is very large (cardinality of millions), and there is no need to log the DELETE transactions, dropping the constraint and TRUNCATEing and recreating constraints is by far the most efficient way. Also, if there are foreign keys in other tables (and in this particular table design it would seem to be so), those rows will all have to be deleted first in all cases, as well.
Normalization says nothing about recursive/hierarchical/tree relationships, so I believe that is a red herring in your reply to DVK's suggestion to split this into its own table - it certainly is viable to make a vertical partition of this table already and also to consider whether you can take advantage of that to get any of the other benefits I list below. As DVK alludes to, in this particular design, I have often seen a separate link table to record self-relationships and other kinds of relationships. This has numerous benefits:
have many to many up AND down instead of many-to-one (uncommon, but potentially useful)
track different types of direct relationships - manager, mentor, assistant, payroll approver, expense approver, technical report-to - with rows in the relationship and relationship type tables instead of new columns in the employee table
track changing hierarchies in a temporally consistent way (including terminated employee hierarchy history) by including active indicators and effective dates on the relationship rows - this is only fully possible when normalizing the relationship into its own table
no NULLs in the SeniorID (actually on either ID) - this is a distinct advantage in avoiding bad logic, but NULLs will usually appear in views when you have to left join to the relationship table anyway
a better dedicated indexing strategy - as opposed to adding SeniorID to selected indexes you already have on Employee (especially as the number of relationship types grows)
And of course, the more information you relate to this relationship, the more strongly is indicated that the relationship itself merits a table (i.e. it is a "relation" in the true sense of the word as used in relational databases - related data is stored in a relation or table - related to a primary key), and thus a normal form for relationships might strongly indicate that the relationship table be created instead of a simple foreign key relationship in the employee table.
Benefits also include its straightforward delete scenario:
DELETE FROM EmployeeRelationships;
DELETE FROM Employee;
You'll note a striking equivalence to the accepted answer here on SO, since, in your case, employees with no senior relationship have a NULL - so in that answer the poster set all to NULL first to eliminate relationships and then remove the employees.
There is a possibly appropriate usage of TRUNCATE depending upon constraints (EmpployeeRelationships is typically able to be TRUNCATEd since its primary key is usually a composite and not a foreign key in any other table).
Try this
DELETE FROM employee;
Inside a loop, run a command that deletes all rows with an unreferenced EmpID until there are zero rows left. There are a variety of ways to write that inner DELETE command:
DELETE FROM employee WHERE EmpID NOT IN (SELECT SeniorID FROM employee)
DELETE FROM employee e1 WHERE NOT EXISTS
(SELECT * FROM employee e2 WHERE e2.SeniorID = e.EmpID
and probably a third one using a JOIN, but I'm not familiar with the SQL Server syntax for that.
One solution is to normalize this by splitting out "senior" relationship into a separate table. For the sake of generality, make that second table "empID1|empID2|relationship_type".
Barring that, you need to do this in a loop. One way is to do it:
declare #count int
select #count=count(1) from table
while (#count > 0)
BEGIN
delete employee WHERE NOT EXISTS
(select 1 from employee 'e_senior'
where employee.EmpID=e_senior.SeniorID)
select #count=count(1) from table
END