Postgres - add sequential column to existing data - sql

I have an existing table that currently doesn't have an id column and a lot of duplicate rows on what should be a unique pair - it's messy. Example:
fips | customer_id
-------+------------
17043 | 2085
17043 | 2085
42091 | 4426
42091 | 4426
customer_id/fips should be unique, but the current code and schema don't enforce that. There also isn't an id column, so I have no unique way to reference a single row.
I'd like to add an id column and assign sequential integers so I can have a unique primary key. How can I go about that?

Postgres 10 added IDENTITY columns (as demonstrated in Gordon's answer).
In Postgres 9.6 (or any version) you can use use a serial column instead.
Either way, make it the PRIMARY KEY in the same command. That's cheaper for big tables:
ALTER TABLE tbl ADD COLUMN tbl_id serial PRIMARY KEY;
Or:
ALTER TABLE tbl ADD COLUMN tbl_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
db<>fiddle here
IDENTITY columns are not PRIMARY KEY automatically. Postgres allows multiple IDENTITY columns for the same table (even if that's rarely useful).
See:
Auto increment table column
Or you clean up the mess to make (fips, customer_id) unique. Then that can be your PK. See:
How to delete duplicate rows without unique identifier

You can simply add an identity column:
alter table t add column id int generated always as identity;
Here is a db<>fiddle.

Related

Sqlite - composite PK with two auto-incrementing values [duplicate]

I have a composite primary key {shop_id, product_id} for SQLite
Now, I want an auto-increment value for product_id which resets to 1 if shop id is changed. Basically, I want auto-generated composite key
e.g.
Shop ID Product Id
1 1
1 2
1 3
2 1
2 2
3 1
Can I achieve this with auto-increment? How?
Normal Sqlite tables are B*-trees that use a 64-bit integer as their key. This is called the rowid. When inserting a row, if a value is not explicitly given for this, one is generated. An INTEGER PRIMARY KEY column acts as an alias for this rowid. The AUTOINCREMENT keyword, which can only be used on said INTEGER PRIMARY KEY column, contrary to the name, merely alters how said rowid is calculated - if you leave out a value, one will be created whether that keyword is present or not, because it's really the rowid and must have a number. Details here. (rowid values are generally generated in increasing, but not necessarily sequential, order, and shouldn't be treated like a row number or anything like that, btw).
Any primary key other than a single INTEGER column is treated as a unique index, while the rowid remains the true primary key (Unless it's a WITHOUT ROWID table), and is not autogenerated. So, no, you can't (easily) do what you want.
I would probably work out a database design where you have a table of shops, a table of products, each with their own ids, and a junction table that establishes a many-to-many relation between the two. This keeps the product id the same between stores, which is probably going to be less confusing to people - I wouldn't expect the same item to have a different SKU in two different stores of the same chain, for instance.
Something like:
CREATE TABLE stores(store_id INTEGER PRIMARY KEY
, address TEXT
-- etc
);
CREATE TABLE product(prod_id INTEGER PRIMARY KEY
, name TEXT
-- etc
);
CREATE TABLE inventory(store_id INTEGER REFERENCES stores(store_id)
, prod_id INTEGER REFERENCES product(prod_id)
, PRIMARY KEY(store_id, prod_id)) WITHOUT ROWID;

SQL primary key on lookup table or unique constraint?

I want to create a lookup table 'orderstatus'. i.e. below, just to clarify this is to be used in a Data Warehouse. I will need to join through OrderStatus to retrieve the INT (if i create one) to be used elsewhere if need be. Like in a fact table for example, I would store the int in the fact table to link to the lookup table.
+------------------------+------------------+
| OrderStatus | ConnectionStatus |
+------------------------+------------------+
| CLOSED | APPROVE |
+------------------------+------------------+
| COMPLETED | APPROVE |
+------------------------+------------------+
| FULFILLED | APPROVE |
+------------------------+------------------+
| CANCELLED | CLOSED |
+------------------------+------------------+
| DECLINED | CLOSED |
+------------------------+------------------+
| AVS_CHECK_SYSTEM_ERROR | CLOSED |
+------------------------+------------------+
What is best practise in terms of primary key/unique key? Should i just create an OrderStatusKey INT as PrimaryKey with identity? Or create a unique constraint on order status (unique)? Thanks.
For this, I would suggest you create an Identity column, and make that the clustered primary key.
It is considered best practice for tables to have a primary key of some kind, but having a clustered index for a table like this is the fastest way to allow for the use of this table in multi table queries ( with joins ).
Here is a sample as to how to add it:
ALTER TABLE dbo.orderstatus
ADD CONSTRAINT PK_orderstatus_OrderStatusID PRIMARY KEY CLUSTERED (OrderStatusID);
GO
Article with more details MSDN
And here is another resource for explaining a primary key Primary Key Primer
If OrderStatus is unique and the primary identifier AND you will be reusing this status code directly in related tables (and not a numeric pointer to this status code) then keep the columns as is and make OrderStatus the primary clustered index.
A little explanation:
A primary key is unique across the table; a clustered index ties all record data back to that index. It is not always necessary to have the primary key also be the clustered index on the table but usually this is the case.
If you are going to be linking to the order status using something other than the status code then create another column of type int as an IDENTITY and make that the primary clustered key. Also add a unique non-clustered index to OrderStatus to ensure that no duplicates could ever be added.
Either way you go every table should have a primary key as well as a clustered index (again, usually they are the same index).
Here are some things to consider:
PRIMARY KEY ensures that there is no NULL values or duplicates in the table
UNIQUE KEY can contain NULL and (by the ANSI standard) any number of NULLs. (This behavior depends on SQL Server settings and possible index filters r not null constraints)
The CLUSTERED INDEX contains all the data related to a row on the leaves.
When the CLUSTERED INDEX is not unique (and not null), the SQL Server will add a hidden GUID to each row.
SQL Server add a hidden GUID column to the key column list when the key columns are not unique to distinguish the individual records)
All indexes are using either values of the key columns of the clustered index or the rowid of a heap table.
The query optimizer uses the index stats to find out the best way to execute a query
For small tables, the indexes are ignore usually, since doing an index scan, then a lookup for each values is more expensive than doing a full table scan (which will read one or two pages when you have really small tables)
Status lookup tables are usually very small and can be stored on one page.
The referencing tables will store the PK value (or unique) in their structure (this is what you'll use to do a join too). You can have a slight performance benefit if you have an integer key to use as reference (aka IDENTITY in SQL Server).
If you usually don't want to list the ConnectionStatus, then using the actual display value (OrderStatus) can be beneficial, since you don't have to join the lookup table.
You can store both values in the referencing tables, but the maintaining both columns have some overhead and more space for errors.
The clustered/non-clustered question depends on the use cases of this table. If you usually use the OrderStatus for filtering (using the textual form), a NON CLUSTERED IDENTITY PK and a CLUESTERED UNIQUE on the OrderStatus can be beneficial. However (as you can read it above), in small tables the effect/performance gain is usually negligible.
If you are not familiar with the above things and you feel it safer, then create an identity clustered PK (OrderKey or OrderID) and a unique non clustered key on the OrderStatus.
Use the PK as referencing/referenced column in foreign keys.
One more thing: if this column will be referenced by only one table, you may want to consider to create an indexed view which contains both table's data.
Also, I would suggest to add a dummy value what you can use if there is no status set (and use it as default for all referencing columns). Because not set is still a status, isn't it?

Sql combine value of two columns as primary key

I have a SQL server table on which I insert account wise data. Same account number should not be repeated on the same day but can be repeated if the date changes.
The customer retrieves the data based on the date and account number.
In short the date + account number is unique and should not be duplicate.
As both are different fields should I concatenate both and create a third field as primary key or there is option of having a primary key on the merge value.
Please guide with the optimum way.
You can create a composite primary key. When you create the table, you can do this sort of thing in SQL Server;
CREATE TABLE TableName (
Field1 varchar(20),
Field2 INT,
PRIMARY KEY (Field1, Field2))
Take a look at this question which helps with each flavour of SQL
How can I define a composite primary key in SQL?
PLEASE HAVE A LOOK, IT WILL CLEAR MOST OF THE DOUBTS !
We can state 2 or more columns combined as a primary key.
In that case every column included in primary key will be called : Composite Key
And mind you Composite keys can never be null !!
Now, first let me show you how to make 2 or more columns as primary key.
create table table_name ( col1 type, col2 type, primary key(col1, col2));
The benefit is :
col1 has value (X) and col2 has value (Y) then no other row can have col1 as (X) and col2 as (Y).
col1, col2 must have some values, they can't be null !!
HOPE THIS HELPS !
Not at all. Just use a primary key constraint:
alter table t add constraint pk_accountnumber_date primary key (accountnumber, date)
You can also include this in the create table statement.
I might suggest, however, that you use an auto-incrementing/identity/serial primary key -- a unique number for each row. Then declare the account number/date combination as a unique key. I prefer such synthetic primary keys for several reasons:
They make it easy to refer to a row in foreign key relationships.
They show the insert order into the table, so you can readily see the last inserted rows.
They make it simple to identify a single row for updates and deletes.
They hide the "id" information of the row from referring tables and applications.
The alternative is to have a PK which is an autoincrementing number and then put a unique unique index on the natural key. In this way uniqueness is preserved but you have the fastest possible joining to any child tables. If the table will not ever have child tables, the composite PK is a good idea. If there will be many child tables, this is could be a better choice.

SQL Primary Key Duplicate Values

I have a table with 2 primary key columns : ID and StudentID.
ID column is set to isIdentity = Yes with auto increment.
I've tested it multiple times before, but for some reason this time, when I insert a duplicate value on StudentID, it does not throw the error but instead added it on to the database. 2 of the same values are displayed when I show the table data.
What can be the problem here?
You have a compound primary key on ID and StudentID. That means you the combination of ID and StudentID together must be unique. Since ID is an identity column that combination of ID and StudentID will always be unique (because ID is already unique on its own).
You can change the primary key to be on ID only. Then you can add a unique index on StudentID. For example:
create unique index idx_studentID on yourTable(StudentID)
That will insure that the StudentID column, in fact, contains only unique values.
It seems like you may not actually need ID column, but that's a little wider discussion than your original question.
You can't have 2 "primary keys". You can have a compound primary key (meaning the combination needs to be unique, which is what it sounds like you have now. Or, You can have one "primary" key and one "unique" constraint which is what it sounds like you want.
You cannot have 2 Primary Keys. You can have multiple Unique Keys if needed, which should help you in your case. Make sure to go back to your table creation and double check which column is your Primary Key and work from there.
Do not mix up identity, primary key and unique key.
Any table can have identity key which you can setup on table. Here seed can be say 1, then increment it by 1. So incremental order will like 1,2,3...and so on.
Primary key, one can define on specific column of the table. Identity key can be used as primary key. But you can have identity column as well primary key on same table. Primary key is one and only for the table.So if you are treating identity as primary key, then you will have no further table column as primary key.
Unique key, can be more than one column with your table.
While fetching rows from table data, if you provide combination of identity key, primary key and unique key then search will be fastest
During my first response, I have mentioned that one can generate identity column by soft coding and it will not be treated as primary key.Following is syntax one can use while creating table.
1] If one wish to set identity column as primary key
--id int identity(1,1) primary key
2] If one doesn't wish to set identity column as primary key and still wish
to us identity column then donot us word primary key for identity column.
--id int identity(1,1)
In this 2] case scenario, one may create primary key on other table column.

sqlite: how can I add an autoincrementing id to an existing table?

I would like to add an autoincrementing integer field called uid to an existing table assoc, but it doesn't look like I can do that unless it's a primary key.
I have fields local_id and remote_id which are the existing primary key pair, and I do that so that I can INSERT OR IGNORE INTO assoc so that I don't get duplicate primary keys, but if I have a pair of columns as a primary key, I can't seem to use them as an update (see other SO question).
Could anyone suggest how to restructure the table (and implement that restructuring using ALTER TABLE) so that I can get the behavior I need:
a single autoincrementing key, so I can use that for UPDATEs
a pair of fields local_id and remote_id so that the pair (local_id, remote_id) remains unique in the table
In this case, you could drop the primary key on your existing columns, create the new primary key integer autoincrementing column, then create a UNIQUE index on the other two columns.
Aha, I don't need to -- there's a builtin rowid column.