SQL table performance with foreign key

SQL table performance with foreign key - sql

I have a website that needs to do a lot of active searching of users. I have a User table which contains links to all the full user details but that is only really of interest when looking at your own account. When searching for other users, there is very limited information you need so in order to make searches faster and more efficient, every time you update your user details, the code writes an entry to a separate table called UserLight - which only contains about 8 columns and is all pure data - ie no links to other child tables or collection objects, just string data for speed. Each user can only have one UserLight entry at a time which is the summary representation of how their account appears to other users.
My question is for performance, does it matter that I am making the UserId a foreign key constraint with the User table? So you cannot create a UserLight entry without the corresponding row in User, and also so when you delete the User row, it automatically cascades and deletes the UserLight entry. That is ideal and how I would like to have it but I'm just wondering if having this FK constraint on the UserLight table in any way slows down the performance on read or write operations to/from this table? If it does, I am happy to drop the FK constraint and have a completely isolated table with no constraints or external references to other objects to speed up performance, and just manage housekeeping manually, but if the FK constraint doesnt affect performance at all - I would prefer to keep it.

It will not hamper your performance instead its preferred to have data constrained so as to avoid insert/delete/update anomalies.

Related

SQLite - any long-term downsides to using unique, non-PK columns as FKs?

In my design, I have many tables which use FKs. The issue is because certain records will be deleted and re-added at various points of time as they are linked to specific project files, the references will be always be inaccurate if I rely on the traditional auto-incrementing ID (because each time they are re-added they will be given a new ID).
I previously asked a question (Sqlite - composite PK with two auto-incrementing values) as to whether I can create a composite auto-incrementing ID however it appears to not be possible as answered by the question I was linked.
The only automatic value I can think of that'll always be unique and never repeated is a full date value, down to the second - however the idea of using a date for the tables' IDs feels like bad design. So, if I instead place a full date field in every table and use these as the FK reference, am I looking at any potential issues down the line? And am I correct in thinking it would be more efficient to store it as integer rather than a text value?
Thanks for the help
Update
To clarify, I am not looking asking in regards to Primary Keys. The PK will be standard auto-incrementing ID. I am asking in regards to basing hundreds of FKs on dates.
Thank you for the replies below, the difficulty i'm having is I can't find a similar model to learn from. The end result is i'd like the application to use project files (like Word has their docx files) to import data into the database. Once a new project is loaded, the previous project's records are cleared but their data is preserved in the project file (the application's custom file format / a txt file) so they can be added once again. The FKs will all be project-based so they will only be referencing records that exist at the time in the database. For example, as it's a world-building application, let's say a user adds a subject type that would be relevant to any project (e.g. mathematics), due to the form it's entered on in the application, the record is given a_type number of 1, meaning it’s something that persists regardless of the project loaded. Another subject type however may be Demonology which only applies to the specific project loaded (e.g. a fantasy world). A school_subject junction table needs both of these in the same table to reference as the FK. So let’s say Demonology is the second record in the subjects type table, it has an auto-increment value of 2 - thus the junction table records 2 as it’s FK value. The issue is, before this project is re-opened again, the user may have added 10 more subject types that are universal and persist, thus next time the project’s subject type records and school_subject records are added back, Demonology is now given the ID of 11. However, the school_subject junction table is re-recreated with the same record having 2 as its value. This is why I’d like a FK which will always remain the same. I don’t want all projects to be present in the database, because I want users to be able to backup and duplicate individual projects as well know that even if the application is deleted, they can re-download and re-open their project files.

This is a bit long for a comment.
Something seems wrong with your design. When you delete a row in a table, there should be no foreign key references to that key. The entity is gone. Does not exist (as far as the database is concerned). Under most circumstances, you will get an error if you try to delete a row in one table where another row refers to that row using a foreign key reference.
When you insert a row into a table, the database becomes aware of that entity. There should not be references to it.
Hence, you have an unusual situation. It sounds like you have primary keys that represent something in the real world -- such as a social security number or vehicle identification number. If that is the case, you might want this id to be the primary key of the table.
Another option is soft deletion. Once one of these rows is inserted in the table, it cannot be deleted. However, you can set a flag that says that it is deleted. Then, foreign key references can stay to the "soft" deleted row.

constraints different moments

I have a table of schedules
So my question is this : How can I make a constraint to forbid a values to be scheduled no more than once a day.
Thanks ahead.

Simply add a unique constraint/index on the vessel and date:
create unique index unq_tourschedule_vesselid_tourdate on tourschedule(vesselid, tourdate);
(A unique constraint is implemented using a unique index.)
You should do this in the database, so even manual changes to the data enforce this constraint.

It depends on what level you need to "prevent" the scheduling. Do you want to prevent it from the UI, the middle-tier, or at the database level?
UI - Do an AJAX check against DB or middle-tier check and prevent insertion of the record there (not a secure solution, but worth mentioning because it informs your users of an existing record).
Middle Tier - best place. Query your DB to see if a record exists with that given vesselID and TourDate. If any records are returned, do not allow insertion. You could then redirect to the page with a helpful message to the user. Business logic goes here typically, and it is best to decouple your business logic from your database.
Database level - most robust, but least maintainable and bad practice for business logic visibility. Many options, all of them cumbersome:
Stored procedure - upon insert, check the records, same procedure as middle tier, but you have to funnel your "error" message up through all the tiers.
Compound key using vesselID and TourDate ensures automatically that only unique entries can be inserted.
Constraint on the table data upon insertion - not just an index, which is for searching optimization, but an actual constraint. This constraint may be added to an existing table or be part of the table creation statement itself.

Yes I have created a unique Index and everything worked out all right thank you for helping me out.

How to validate user's access to specific rows in a SQL table?

I'm working on a project and I'm new to both web apps and SQL, so bear with me. I'm building an API and I want to make sure that my users only have access to certain rows in a specific table that have a foreign key to their customer id in another table, but have to be validated by user id in another table. (A single Customer has multiple Users and owns multiple Assets. For right now, all of the Customer's Users can access any Asset, but no Customers share an Asset or User.) The way I can think to do this is to do
SELECT * FROM [Asset] WHERE Id=#AssetId AND CustomerId=(SELECT CustomerId FROM [User] WHERE UserId=#UserId);
This is great, but with many entries in the Asset and User tables, this query could take up a ton of time. This is bad since every request made to my API that needs the Asset data should be doing this check. I could set up an index, and in fact UserId is a Secondary Key in User because it's a unique identifier from the auth provider, but I'm not sure if I should add an index for CustomerId in Asset. The Asset table should grow relatively slowly compared to some other tables (have a messaging record table for auditing purposes), but I'm not sure if that's the right answer, or if there's some simpler answer that's more optimized. Or is this kind of query so fast at scale that I have nothing to worry about?

For your particular case, it looks like the perfect context to build a junction table between the User table and the Asset table. Both field together will become the primary key. Individually, AssetId and UserId will be foreign keys.
Let's say the junction table is called AssetUser.
Foreign keys :
CONSTRAINT [FK_AssetUser_User] FOREIGN KEY ([UserId]) REFERENCES [User]([UserId])
CONSTRAINT [FK_AssetUser_Asset] FOREIGN KEY ([AssetId]) REFERENCES [Asset]([AssetId])
Primary key :
CONSTRAINT [PK_AssetUser] PRIMARY KEY([AssetId], [UserId]));
You shouldn't worry about scale too much unless you are going to have ALOT of data and/or the performance is critical in your application. If so, you have the option to use hadoop or to migrate to a NoSQL database.

DELETE FROM table becomes heavy as the number of records in table's children increase

I have a main table called Campaign. Campaign's Id is a foreign key in another table CampaignRun and CampaignRun's Id is a foreign key in a third table CampaignRecipient. Due to my CASCADE requirements I am using
DELETE FROM Campaign WHERE Id = x
to remove all the associated information about a campaign. But this function becomes very heavy on the server and of course locks the tables while running. I was wondering if there is a faster way of dealing with DELETE FROM. TRUNCATE is faster but it unfortunately accepts no condition.
Will appreciate any working suggestions.

maybe you can check this out, interrupt() might be the answer

updating primary key of master and child tables for large tables

I have a fairly huge database with a master table with a single column GUID (custom GUID like algorithm) as primary key and 8 child tables that have foreign key relationships with this GUID column. All the tables have approximately 3-8 million records. None of these tables have any BLOB/CLOB/TEXT or any other fancy data types just normal numbers, varchars, dates, and timestamps (about 15-45 columns in each table). No partitions or other indexes other than the primary and foreign keys.
Now, the custom GUID algorithm has changed and though there are no collisions I would like to migrate all the old data to use GUIDs generated using the new algorithm. No other columns need to be changed. Number one priority is data integrity and performance is secondary.
Some of the possible solutions that I could think of were (as you will probably notice they all revolve around one idea only)
add new column ngu_id and populate with new gu_id; disable constraints; update child tables with ngu_id as gu_id; renaname ngu_id->gu_id; re-enable constraints
read one master record and its dependent child records from child tables; insert into the same table with new gu_id; remove all records with old gu_ids
drop constraints; add a trigger to the master table such that all the child tables are updated; start updating old gu_id's with new new gu_ids; re-enable constraints
add a trigger to the master table such that all the child tables are updated; start updating old gu_id's with new new gu_ids
create new column ngu_ids on all master and child tables; create foreign key constraints on ngu_id columns; add update trigger to the master table to cascade values to child tables; insert new gu_id values into ngu_id column; remove old foreign key constraints based on gu_id; remove gu_id column and rename ngu_id to gu_id; recreate constraints if necessary;
use on update cascade if available?
My questions are:
Is there a better way? (Can't burrow my head in the sand, gotta do this)
What is the most suitable way to do this? (I've to do this in Oracle, SQL server and mysql4 so, vendor-specific hacks are welcome)
What are the typical points of failure for such an exercise and how to minimize them?
If you are with me so far, thank you and hope you can help :)

Your ideas should work. the first is probably the way I would use. Some cautions and things to think about when doing this:
Do not do this unless you have a current backup.
I would leave both values in the main table. That way if you ever have to figure out from some old paperwork which record you need to access, you can do it.
Take the database down for maintenance while you do this and put it in single user mode. The very last thing you need while doing something like this is a user attempting to make changes while you are in midstream. Of course, the first action once in single user mode is the above-mentioned backup. You probably should schedule the downtime for some time when the usage is lightest.
Test on dev first! This should also give you an idea as to how long you will need to close production for. Also, you can try several methods to see which is the fastest.
Be sure to communicate in advance to users that the database will be going down at the scheduled time for maintenance and when they can expect to have it be available again. Make sure the timing is ok. It really makes people mad when they plan to stay late to run the quarterly reports and the database is not available and they didn't know it.
There are a fairly large number of records, you might want to run the updates of the child tables in batches (one reason not to use cascading updates). This can be faster than trying to update 5 million records with one update. However, don't try to update one record at a time or you will still be here next year doing this task.
Drop indexes on the GUID field in all the tables and recreate after you are done. This should improve the performance of the change.

Create a new table with the old and the new pk values in it. Place unique constraints on both columns to ensure you haven't broken anything so far.
Disable constraints.
Run an updates against all the tables to modify the old value to the new value.
Enable the PK, then enable the FK's.

It's difficult to say what the "best" or "most suitable" approach is as you have not described what you are looking for in a solution. For example, do the tables need to be available for query while you are migrating to new IDs? Do they need to be available for concurrent modification? Is it important to complete the migration as fast as possible? Is it important to minimize the space used for migration?
Having said that, I would prefer #1 over your other ideas, assuming they all met your requirements.
Anything that involves a trigger to update the child tables seems error-prone and over complicated and likely will not perform as well as #1.
Is it safe to assume that new IDs will never collide with old IDs? If not, solutions based on updating the IDs one at a time will have to worry about collisions -- this will get messy in a hurry.
Have you considered using CREATE TABLE AS SELECT (CTAS) to populate new tables with the new IDs? You'll be making a copy of your existing tables and this will require additional space, however it is likely to be faster than updating the existing tables in place. The idea is: (i) use CTAS to create new tables with new IDs in place of the old, (ii) create indexes and constraints as appropriate on the new tables, (iii) drop the old tables, (iv) rename the new tables to the old names.

In fact, it depend on your RDBMS.
Using Oracle, the simpliest choice is to make all of the foreign key constraints "deferred" (check on commit), perform updates in a single transaction, then commit.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas