while creating a tkinter application to store book information, I realize that simply deleting a row of information from the SQL database does not update the indexes. Kinda hard to explain but here is a picture of what I meant:
link to picture. (still young on this account, so pictures can't be embedded, sorry for the inconvenience)
As you can see, the first column represents the index and index 3 is missing because I deleted it. Is there a way such that upon deleting a row, anything below it just shifts up to cover for the empty spot?
Your use of the word "index" must be based on the application language, not the database language. In databases, indexes are additional data structures that speed certain operations on tables.
You are referring to an "id" column, presumably one that is defined automatically as identity, auto_increment, serial, or whatever the underlying database uses.
A very important point is that deleting a row from a table does not affect other rows in the table (unless you have gone through the work of writing triggers to make that happen). It just deletes the rows.
The second more important point is that you do not want to change the "identity" of rows -- and that is what the column you are calling an "index" is doing. It identifies the row. It not only identifies the row today, but it identifies the same row tomorrow. And, if it existed, yesterday. That is, you don't want to change the identity.
This is even more important when you have foreign key relationships -- that is, other tables that refer to this row. Those relationships could get all messed up if the ids start changing.
SQL does offer a simple way to get a number with no gaps:
select row_number() over (order by "index") as seqnum
from t;
Related
In my design, I have many tables which use FKs. The issue is because certain records will be deleted and re-added at various points of time as they are linked to specific project files, the references will be always be inaccurate if I rely on the traditional auto-incrementing ID (because each time they are re-added they will be given a new ID).
I previously asked a question (Sqlite - composite PK with two auto-incrementing values) as to whether I can create a composite auto-incrementing ID however it appears to not be possible as answered by the question I was linked.
The only automatic value I can think of that'll always be unique and never repeated is a full date value, down to the second - however the idea of using a date for the tables' IDs feels like bad design. So, if I instead place a full date field in every table and use these as the FK reference, am I looking at any potential issues down the line? And am I correct in thinking it would be more efficient to store it as integer rather than a text value?
Thanks for the help
Update
To clarify, I am not looking asking in regards to Primary Keys. The PK will be standard auto-incrementing ID. I am asking in regards to basing hundreds of FKs on dates.
Thank you for the replies below, the difficulty i'm having is I can't find a similar model to learn from. The end result is i'd like the application to use project files (like Word has their docx files) to import data into the database. Once a new project is loaded, the previous project's records are cleared but their data is preserved in the project file (the application's custom file format / a txt file) so they can be added once again. The FKs will all be project-based so they will only be referencing records that exist at the time in the database. For example, as it's a world-building application, let's say a user adds a subject type that would be relevant to any project (e.g. mathematics), due to the form it's entered on in the application, the record is given a_type number of 1, meaning it’s something that persists regardless of the project loaded. Another subject type however may be Demonology which only applies to the specific project loaded (e.g. a fantasy world). A school_subject junction table needs both of these in the same table to reference as the FK. So let’s say Demonology is the second record in the subjects type table, it has an auto-increment value of 2 - thus the junction table records 2 as it’s FK value. The issue is, before this project is re-opened again, the user may have added 10 more subject types that are universal and persist, thus next time the project’s subject type records and school_subject records are added back, Demonology is now given the ID of 11. However, the school_subject junction table is re-recreated with the same record having 2 as its value. This is why I’d like a FK which will always remain the same. I don’t want all projects to be present in the database, because I want users to be able to backup and duplicate individual projects as well know that even if the application is deleted, they can re-download and re-open their project files.
This is a bit long for a comment.
Something seems wrong with your design. When you delete a row in a table, there should be no foreign key references to that key. The entity is gone. Does not exist (as far as the database is concerned). Under most circumstances, you will get an error if you try to delete a row in one table where another row refers to that row using a foreign key reference.
When you insert a row into a table, the database becomes aware of that entity. There should not be references to it.
Hence, you have an unusual situation. It sounds like you have primary keys that represent something in the real world -- such as a social security number or vehicle identification number. If that is the case, you might want this id to be the primary key of the table.
Another option is soft deletion. Once one of these rows is inserted in the table, it cannot be deleted. However, you can set a flag that says that it is deleted. Then, foreign key references can stay to the "soft" deleted row.
Imagine two tables that implemented like the following description:
The first table rows numbers created by database system administration automatically.
The second table rows numbers created manually by the programmer in a sequential order.
The main question is what are the advantages and disadvantages of these two approaches?
One distinct advantage of having the database manage auto-numbering over manually creating them is that the database implementation is thread safe - and manually creating them is usually (99.9% of the cases) is not (It's hard to do it correctly).
On the other hand, the database implementation does not guarantee sequential numbering - there can be gaps in the numbers.
Given these two facts, an auto-increment column should be used only as a surrogate key, when the values of this column does not have any business meaning - but they are simple used as a simple row identifier.
Please note that when using a surrogate key, it's best to also enforce uniqueness of a natural key - otherwise you might get rows where all the data is duplicated except the surrogate key.
When the database automatically create numbers, you habe less work.
Think about a sign up system, you have fields like name, email, password and so one:
1.) the number is generated by the database, so you can just insert the data into the table.
2.) if this is not the case you have to get the last number, so before the insert into you have to get the last id so instead a insert into you have a select + insert into.
Another reason is, what happened when you delete a row in your table?
Maybe in a forum, you want to delete the account but not all of his posts, so you can work with a workaround and when a post has a user_id not given you know this is/was a deleted or banned account - if you give a new user the number from a deleted user you will come in trouble.
I have finished all my changes to a database table in sql server management studio 2012, but now I have a large gap between some values due to editing. Is there a way to keep my data, but re-assign all the ID's from 1 up to my last value?
I would like this cleaned up as I populate dropdownlists with these values and then I make interactions with my database with the assumption that my dropdownlist index and the table's ID match up, which is not the case right now.
My current DB has a large gap from 7 to 28, I would like to shift everything from 28 and up, back down to 8, 9, 10, 11, ect... so that my database has NO gaps from 1 and onward.
If the solution is tricky please give me some steps as I am new to SQL.
Thank you!
Yes, there are any number of ways to "close the gaps" in an auto generated sequence. You say you're new to SQL so I'll assume you're also new to relational concepts. Here is my advice to you: don't do it.
The ID field is a surrogate key. There are several aspects of surrogates one must be mindful of when using them, but the one I want to impress upon you is,
-- A surrogate key is used to make the row unique. Other than the guarantee that
-- the value is unique, no other assumptions may be made concerning the value.
-- In particular, no meaning may be derived from the value as to the contents of
-- the row or the row's relationship to any other row.
You have designed your app with a built-in assumption of the value of the key field (that they will be consecutive). Already it is causing you problems. Do you really want to go through this every time you make changes to the table? And suppose a future feature requires you to filter out some of the choices according to an option the user has selected? Or enable the user to specify the order of the items? Not going to be easy. So what is the solution?
You can create an additional (non-visible) field in the dropdown list that contains the key value. When the user makes a selection, use that index to get the key value of the selection and then go out to the database and get whatever additional data you need. This will work if you populate the list from the entire table or just select a few according to some as yet unknown filtering criteria or change the order in any way.
Viola. You never have this problem again, no matter how often you add and remove rows in the table.
However, on the off chance that you are as stubborn as me (not likely!) or just refuse to listen to the melodious voice of reason and experience, then try this:
Create a new table exactly like the old table, including auto incrementing PK.
Populate the new table using a Select from the old table. You can specify any order you want.
Drop the old table.
Rename the new table to the old table name.
You will have to drop and redefine any FKs from other tables. But this entire process
can be placed in a script because if you do this once, you'll probably do it again.
Now all the values are consecutive. Until you edit the table again...
You should refactor the code for your dropdown list and not the PK of the table.
If you do not agree, you can do one of the following:
Insert another column holding the dropdown's "order of appearance", make a unique index on it and fill this by hand (or programmatically).
Replace the SERIAL with an INT would work, make a unique index on the column and fill this by hand (or programmatically).
Remove the large ids and reseed your serial - the code depending on your DBMS
This happens to me all the time. If you don't have any foreign key constraints then it should be an easy fix.
Remember a DELETE statement will remove the record but keep the identity seed the same. (If I remove id # 5 and #5 was the last record inserted then SQL server still stores the identity seed value at "6").
TRUNCATING the table will reset the identity seed back to it's original value.
INSERT_IDENTITY [TABLE] ON can also be used to insert the correct data in the correct order if tuncating cannot happen.
SELECT *
INTO #tempTable
FROM [TableTryingToFix]
TRUNCATE TABLE [TableTryingToFix];
INSERT INTO [TableTryingToFix] (COL1, COL2, COL3, ETC)
SELECT COL1, COL2, COL2, ETC
FROM #tempTable
ORDER BY oldTableID
I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.
I've inherited a (Microsoft?) SQL database that wasn't very pristine in its original state. There are still some very strange things in it that I'm trying to fix - one of them is inconsistent ID entries.
In the accounts table, each entry has a number called accountID, which is referenced in several other tables (notes, equipment, etc. ). The problem is that the numbers (for some random reason) - range from about -100000 to +2000000 when there are about only 7000 entries.
Is there any good way to re-number them while changing corresponding numbers in the other tables? At my disposal I also have ColdFusion, so any thing that works with SQL and/or that I'll accept.
For surrogate keys, they are meant to be meaningless, so unless you actually had a database integrity issue (like there were no foreign key contraints properly defined) or your identity was approaching the maximum for its datatype, I would leave them alone and go after some other low hanging fruit that would have more impact.
In this instance, it sounds like "why" is a better question than "how". The OP notes that there is a strange problem that needs to be fixed but doesn't say why it is a problem. Is it causing problems? What positive impact would changing these numbers have? Unless you originally programmed the system and understand precisely why the number is in its current state, you are taking quite a risky making changes like this.
I would talk to an accountant (or at least your financial people) before messing in anyway with the numbers in the accounts tables if this is a financial app. The Table of accounts is very critical to how finances are reported. These IDs may have meaning you don't understand. No one puts in a negative id unless they had a reason. I would under no circumstances change that unless I understood why it was negative to begin with. You could truly screw up your tax reporting or some other thing by making an uneeded change.
You could probably disable the foreign key relationships (if you're able to take it offline temporarily) and then update the primary keys using a script. I've used this update script before to change values, and you could pretty easily wrap this code in a cursor to go through the key values in question, one by one, and update the arbitrary value to an incrementing value you're keeping track of.
Check out the script here: http://vyaskn.tripod.com/sql_server_search_and_replace.htm
If you just have a list of tables that use the primary key, you could set up a series of UPDATE statements that run inside your cursor, and then you wouldn't need to use this script (which can be a little slow).
It's worth asking, though, why these values appear out of wack. Does this database have values added and deleted constantly? Are the primary key values really arbitrary, or do they just appear to be, but they really have meaning? Though I'm all for consolidating, you'd have to ensure that there's no purpose to those values.
With ColdFusion this shouldn't be a herculean task, but it will be messy and you'll have to be careful. One method you could use would be to script the database and then generate a brand new, blank table schema. Set the accountID as an identity field in the new database.
Then, using ColdFusion, write a query that will pull all of the old account data and insert them into the new database one by one. For each row, let the new database assign a new ID. After each insert, pull the new ID (using either ##IDENTITY or MAX(accountID)) and store the new ID and the old ID together in a temporary table so you know which old IDs belong to which new IDs.
Next, repeat the process with each of the child tables. For each old ID, pull its child entries and re-insert them into the new database using the new IDs. If the primary keys on the child tables are fine, you can insert them as-is or let the server assign new ones if they don't matter.
Assigning new IDs in place by disabling relationships temporarily may work, but you might also run into conflicts if one of the entries is assigned an ID that is already being used by the old data which could cause conflicts.
Create a new column in the accounts table for your new ID, and new column in each of your related tables to reference the new ID column.
ALTER TABLE accounts
ADD new_accountID int IDENTITY
ALTER TABLE notes
ADD new_accountID int
ALTER TABLE equipment
ADD new_accountID int
Then you can map the new_accountID column on each of your referencing tables to the accounts table.
UPDATE notes
SET new_accountID = accounts.new_accountID
FROM accounts
INNER JOIN notes ON (notes.accountID = accounts.accountID)
UPDATE equipment
SET new_accountID = accounts.new_accountID
FROM accounts
INNER JOIN equipment ON (equipment.accountID = accounts.accountID)
At this point, each table has both accountID with the old keys, and new_accountID with the new keys. From here it should be pretty straightforward.
Break all of the foreign keys on accountID.
On each table, UPDATE [table] SET accountID = new_accountID.
Re-add the foreign keys for accountID.
Drop new_accountID from all of the tables, as it's no longer needed.