Related
I have a table with lots of columns, and I'd like to add two more (date and time) to the front of the existing table.
There is no data in the table right now, but I'm wondering what the best way is get the table in the format I need it.
I could just drop the table and create a new one with the correct configuration, but I'm wondering if there is a better way?
This is currently not possible. You have to drop and recreate the table.
Theoretically you could add the column, drop and re-add all other columns, but that's hardly practical.
It's an ongoing discussion and an open TODO-item of the Postgres project to allow reordering of columns. But a lot of dependencies and related considerations make that hard.
Quoting the Postgres project's ToDo List:
Allow column display reordering by recording a display, storage, and
permanent id for every column?
Contrary to what some believe, the order of columns in a table is not irrelevant, for multiple reasons.
The default order is used for statements like INSERT without column definition lists.
Or SELECT *, which returns columns in the predefined order.
The composite type of the table uses the same order of columns.
The order of columns is relevant for storage optimization (padding and alignment matter). More:
Calculating and saving space in PostgreSQL
People may be confusing this with the order of rows, which in undefined in a table.
In relational databases the order of columns in a table is irrelevant
Create a view that shows you the columns in the order you want
If you still want to, drop the table and recreate it
If I have a table with columns: a, b, c and later I do a ALTER TABLE command to add a new column "d", is it possible to add it between a and b for example, and not at the end?
I heard that the position of the columns affects performance.
It's not possible to add a column between two existing columns with an ALTER TABLE statement in SQLite. This works as designed.
The new column is always appended to the end of the list of existing
columns.
As far as I know, MySQL is the only SQL (ish) dbms that lets you determine the placement of new columns.
To add a column at a specific position within a table row, use FIRST
or AFTER col_name. The default is to add the column last. You can also
use FIRST and AFTER in CHANGE or MODIFY operations to reorder columns
within a table.
But this isn't a feature I'd use regularly, so "as far as I know" isn't really very far.
With every sql platform I've seen the only way to do this is to drop the table and re-create it.
However, I question if the position of the column affects performance... In what way would it, what operations are you doing that you think it will make a difference?
I will also note that dropping the table and recreating it is often not a heavy lift. Making a backup of a table and restoring that table is easy on all major platforms so scripting a backup - drop - create - restore is an easy task for a competent DBA.
In fact I've done so often when users ask -- but I always find it a little silly. The most often reason given is the tool of choice behaves nicer when the columns are created in a certain order. (This was also #Jarad's reason below) So this is a good lesson for tool makers, make your tool able to reorder columns (and remember it between runs) -- then everyone is happy.
I use the DB.compileStatement:
sql = DB.compileStatement("INSERT INTO tableX VALUES (?,?,?);
sql.bindString(1,"value for column 1");
sql.bindString(2,"value for column 2");
sql.bindString(3,"value for column 3");
sql.executeUpdateDelete();
So there will be a big difference if order of the columns is not correct.
Unfortunately adding columns at a specific position is not possible using ALTER TABLE, at least not in SQLite. (MySQL it is possible). Workaroud is recreating the table.. (and backup and restore data)
I would like to know if there's a way to add a column to an SQL Server table after it's created and in a specific position??
Thanks.
You can do that in Management-Studio. You can examine the way this is accomplished by generating the SQL-script BEFORE saving the change. Basically it's achieved by:
removing all foreign keys
creating a new table with the added column
copying all data from the old into the new table
dropping the old table
renaming the new table to the old name
recreating all the foreign keys
In addition to all the other responses, remember that you can reorder and rename columns in VIEWs. So, if you find it necessary to store the data in one format but present it in another, you can simply add the column on to the end of the table and create a single table view that reorders and renames the columns you want to show. In almost every circumstance, this view will behave exactly like the original table.
The safest way to do this is.
Create your new table with the correct column order
Copy the data from the old table.
Drop the Old Table.
The only safe way of doing that is creating a new table (with the column where you want it), migrating the data, dropping the original table, and renaming the new table to the original name.
This is what Management Studio does for you when you insert columns.
As others have pointed out you can do this by creating a temp table moving the data and droping the orginal table and then renaming the other table. This is stupid thing to do though. If your table is large, it could be very time-consuming to do this and users will be locked out during the process. This issomething you NEVER want to do to any table in production.
There is absolutely no reason to ever care what order the columns are in a table since you should not be relying on column order anyway (what if someone else did this same stupid thing?). No queries should use select * or ordinal positions to get columns. If you are doing this now, this is broken code and needs to be fixed immediately as the results are not always going to be as expected. For instance if you do insert a column where you want it and someone else is using select * for a report, suddenly the partnumber is showing up in the spot that used to hold the Price.
By doing what you want to do, you may break much more than you fix by putting the column where you personally want it. Column order in tables should always be irrelevant. You should not be doing this every time you want columns to appear in a differnt order.
With Sql Server Management Studio you can open the table in design and drag and drop the column wherever you want
As Kane says, it's not possible in a direct way. You can see how Management Studio does it by adding a column in the design mode and checking out the change script.
If the column is not in the last position, the script basically drops the table and recreates it, with the new column in the desired position.
In databases table columns don't have order.
Write proper select statement and create a view
No.
Basically, SSMS behind the scenes will copy the table, constraints, etc, drop the old table and rename the new.
The reason is simple - columns are not meant to be ordered (nor are rows), so you're always meant to list which columns you want in a result set (select * is a bit of a hack)
I wanted to know, what should i consider while deciding if i should create a new table or modify an existing table for a sql db. i use both mysql and sqlite.
-Edit- I always thought if i can put a column into a table where it makes sense and can be used by every row then i would always modify it. However at work if its a different 'release' we put it in a different table.
You can modify existing tables, as long as
you are keeping the database Normalized
you are not breaking code that uses the table
You can create new tables even if 1. and 2. are true for the following reasons:
Performance reasons
Clarity in your schema logic.
Not sure if I'm understanding your question correctly, but one thing I always try to consider is the impact on existing data.
Taking the case of an application which relies on a database...
When you update the application (including database schema updates), it is important to ensure that any existing, in-use databases will be either backwards compatible with the application, or there is way to migrate and update the existing database.
Generally if the data is in a one-to-one relationship with the existing data in the table and if the table row size is not too large already and if there aren't too many records in the table, then I usually alter the table to accept the new column.
However, suppose I want to add a column with a default value to a table where it doesn't exist. Adding it to the table with 50 million records might not be so speedy a process and it might lock up the table on production when we move the change up. In this case, putting it into a separate table and adding the records to it may work out better. In general, I wouldn't do this unless my testing has shown that adding and populating the column will take an unacceptably long time. I would prefer to keep the record together where possible.
Same thing with the overall record size. SQL server has a byte limit to the number of bytes that can be in a record, it will allow you to create a structure that is potentially larger than that, but it will not alow you to put more than the byte limit into a specific record. Further, less wide tables tend to be faster to access due to how they are stored. Frequently, people will create a table that has a one-to-one relationship (we call them extended tables in our structure) for additional columns that are not as frequnetly used. If the fields from both tables will be frequently used, often they still create two tables but have a view that will pickout all the columns needed.
And of course if the data is in a one to many relationship, you need a related table not just a new column.
Incidentally, you should always do an alter table through a script and the SSMS GUI as it is more efficient and easier to move to prod.
Working on a project at the moment and we have to implement soft deletion for the majority of users (user roles). We decided to add an is_deleted='0' field on each table in the database and set it to '1' if particular user roles hit a delete button on a specific record.
For future maintenance now, each SELECT query will need to ensure they do not include records where is_deleted='1'.
Is there a better solution for implementing soft deletion?
Update: I should also note that we have an Audit database that tracks changes (field, old value, new value, time, user, ip) to all tables/fields within the Application database.
I would lean towards a deleted_at column that contains the datetime of when the deletion took place. Then you get a little bit of free metadata about the deletion. For your SELECT just get rows WHERE deleted_at IS NULL
You could perform all of your queries against a view that contains the WHERE IS_DELETED='0' clause.
Having is_deleted column is a reasonably good approach.
If it is in Oracle, to further increase performance I'd recommend partitioning the table by creating a list partition on is_deleted column.
Then deleted and non-deleted rows will physically be in different partitions, though for you it'll be transparent.
As a result, if you type a query like
SELECT * FROM table_name WHERE is_deleted = 1
then Oracle will perform the 'partition pruning' and only look into the appropriate partition. Internally a partition is a different table, but it is transparent for you as a user: you'll be able to select across the entire table no matter if it is partitioned or not. But Oracle will be able to query ONLY the partition it needs. For example, let's assume you have 1000 rows with is_deleted = 0 and 100000 rows with is_deleted = 1, and you partition the table on is_deleted. Now if you include condition
WHERE ... AND IS_DELETED=0
then Oracle will ONLY scan the partition with 1000 rows. If the table weren't partitioned, it would have to scan 101000 rows (both partitions).
The best response, sadly, depends on what you're trying to accomplish with your soft deletions and the database you are implementing this within.
In SQL Server, the best solution would be to use a deleted_on/deleted_at column with a type of SMALLDATETIME or DATETIME (depending on the necessary granularity) and to make that column nullable. In SQL Server, the row header data contains a NULL bitmask for each of the columns in the table so it's marginally faster to perform an IS NULL or IS NOT NULL than it is to check the value stored in a column.
If you have a large volume of data, you will want to look into partitioning your data, either through the database itself or through two separate tables (e.g. Products and ProductHistory) or through an indexed view.
I typically avoid flag fields like is_deleted, is_archive, etc because they only carry one piece of meaning. A nullable deleted_at, archived_at field provides an additional level of meaning to yourself and to whoever inherits your application. And I avoid bitmask fields like the plague since they require an understanding of how the bitmask was built in order to grasp any meaning.
if the table is large and performance is an issue, you can always move 'deleted' records to another table, which has additional info like time of deletion, who deleted the record, etc
that way you don't have to add another column to your primary table
That depends on what information you need and what workflows you want to support.
Do you want to be able to:
know what information was there (before it was deleted)?
know when it was deleted?
know who deleted it?
know in what capacity they were acting when they deleted it?
be able to un-delete the record?
be able to tell when it was un-deleted?
etc.
If the record was deleted and un-deleted four times, is it sufficient for you to know that it is currently in an un-deleted state, or do you want to be able to tell what happened in the interim (including any edits between successive deletions!)?
Careful of soft-deleted records causing uniqueness constraint violations.
If your DB has columns with unique constraints then be careful that the prior soft-deleted records don’t prevent you from recreating the record.
Think of the cycle:
create user (login=JOE)
soft-delete (set deleted column to non-null.)
(re) create user (login=JOE). ERROR. LOGIN=JOE is already taken
Second create results in a constraint violation because login=JOE is already in the soft-deleted row.
Some techniques:
1. Move the deleted record to a new table.
2. Make your uniqueness constraint across the login and deleted_at timestamp column
My own opinion is +1 for moving to new table. Its take lots of
discipline to maintain the *AND delete_at = NULL* across all your
queries (for all of your developers)
You will definitely have better performance if you move your deleted data to another table like Jim said, as well as having record of when it was deleted, why, and by whom.
Adding where deleted=0 to all your queries will slow them down significantly, and hinder the usage of any of indexes you may have on the table. Avoid having "flags" in your tables whenever possible.
you don't mention what product, but SQL Server 2008 and postgresql (and others i'm sure) allow you to create filtered indexes, so you could create a covering index where is_deleted=0, mitigating some of the negatives of this particular approach.
Something that I use on projects is a statusInd tinyint not null default 0 column
using statusInd as a bitmask allows me to perform data management (delete, archive, replicate, restore, etc.). Using this in views I can then do the data distribution, publishing, etc for the consuming applications. If performance is a concern regarding views, use small fact tables to support this information, dropping the fact, drops the relation and allows for scalled deletes.
Scales well and is data centric keeping the data footprint pretty small - key for 350gb+ dbs with realtime concerns. Using alternatives, tables, triggers has some overhead that depending on the need may or may not work for you.
SOX related Audits may require more than a field to help in your case, but this may help.
Enjoy
Use a view, function, or procedure that checks is_deleted = 0; i.e. don't select directly on the table in case the table needs to change later for other reasons.
And index the is_deleted column for larger tables.
Since you already have an audit trail, tracking the deletion date is redundant.
I prefer to keep a status column, so I can use it for several different configs, i.e. published, private, deleted, needsAproval...
Create an other schema and grant it all on your data schema.
Implment VPD on your new schema so that each and every query will have the predicate allowing selection of the non-deleted row only appended to it.
http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/cmntopc.htm#CNCPT62345
#AdditionalCriteria("this.status <> 'deleted'")
put this on top of your #entity
http://wiki.eclipse.org/EclipseLink/Examples/JPA/SoftDelete