Enumerated text columns in SQL - sql

I have a number of tables that have text columns that contain only a few different distinct values. I often play the tradeoff between the benefits (primarily reduced row size) of extracting the possible values into a lookup table and storing a small index in the table against the amount of work required to do so.
For the columns that have a fixed set of values known in advance (enumerated values), this isn't so bad, but the more painful case is when I know I have a small set of unique values, but I don't know in advance what they will be.
For example, if I have a table that stores log information on different URLs in a web application:
CREATE TABLE [LogData]
(
ResourcePath varchar(1024) NOT NULL,
EventTime datetime NOT NULL,
ExtraData varchar(MAX) NOT NULL
)
I waste a lot of space by repeating the for every request. There will be a very number of duplicate entries in this table. I usually end up with something like this:
CREATE TABLE [LogData]
(
ResourcePathId smallint NOT NULL,
EventTime datetime NOT NULL,
ExtraData varchar(MAX) NOT NULL
)
CREATE TABLE [ResourcePaths]
(
ResourcePathId smallint NOT NULL,
ResourceName varchar(1024) NOT NULL
)
In this case however, I no longer have a simple way to append data to the LogData table. I have to a lookup on the resource paths table to get the Id, add it if it is missing, and only then can I perform the actual insert. This makes the code much more complicated and changes my write-only logging function to require some sort of transacting against the lookup table.
Am I missing something obvious?

If you have a unique index on ResourseName, the lookup should be very fast even on a big table. However, it has disadvantages. For instance, if you log a lot of data and have to archive it off periodically and want to archive the previous month or year of logdata, you are forced to keep all of resoursepaths. You can come up with solutions for all of that.

yes inserting from existing data doing the lookup as part of the insert
Given #resource, #time and #data as inputs
insert( ResourcePathId, EventTime, ExtraData)
select ResourcePathId, #time, #data
from ResourcePaths
where ResourceName = #resource

Related

Optimize SQL Query (If possible) using CONVERT(INT, SUBSTRING( and LEN FUNCTION

My situation is like that :
I have these tables:
CREATE TABLE [dbo].[HeaderResultPulser]
(
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[ReportNumber] CHAR(255) NOT NULL,
[ReportDescription] CHAR(255) NOT NULL,
[CatalogNumber] NCHAR(255) NOT NULL,
[WorkerName] NCHAR(255) DEFAULT ('') NOT NULL,
[LastCalibrationDate] DATETIME NOT NULL,
[NextCalibrationDate] DATETIME NOT NULL,
[MachineNumber] INT NOT NULL,
[EditTime] DATETIME NOT NULL,
[Age] NCHAR(255) DEFAULT ((1)) NOT NULL,
[Current] INT DEFAULT ((-1)) NOT NULL,
[Time] BIGINT DEFAULT ((-1)) NOT NULL,
[MachineName] NVARCHAR(MAX) DEFAULT ('') NOT NULL,
[BatchNumber] NVARCHAR(MAX) DEFAULT ('') NOT NULL,
CONSTRAINT [PK_HeaderResultPulser]
PRIMARY KEY CLUSTERED ([Id] ASC)
);
CREATE TABLE [dbo].[ResultPulser]
(
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[ReportNumber] CHAR(255) NOT NULL,
[BatchNumber] CHAR(255) NOT NULL,
[DateTime] DATETIME NOT NULL,
[Ocv] FLOAT(53) NOT NULL,
[OcvMin] FLOAT(53) NOT NULL,
[OcvMax] FLOAT(53) NOT NULL,
[Ccv] FLOAT(53) NOT NULL,
[CcvMin] FLOAT(53) NOT NULL,
[CcvMax] FLOAT(53) NOT NULL,
[Delta] BIGINT NOT NULL,
[DeltaMin] BIGINT NOT NULL,
[DeltaMax] BIGINT NOT NULL,
[CurrentFail] BIT DEFAULT ((0)) NOT NULL,
[NumberInTest] INT NOT NULL
);
For every row in HeaderResultPulser I have multiple rows in ResultPulser
my key is the [HeaderResultPulser].[ReportNumber] to get a list of data in ResultPulser, and for every a lot of row with the same [ResultPulser].[ReportNumber]
It has multiple [ResultPulser].[NumberInTest] values
For example: in the ResultPulser table the data can look like this:
ReportNumber | NumberInTest
-------------+-------------
0000006211 | 1
0000006211 | 2
0000006211 | 3
0000006211 | 4
0000006211 | 5
0000006211 | 6
0000006212 | 1
0000006212 | 2
0000006212 | 3
0000006212 | 4
0000006212 | 5
NumberInTest can be 200, 500, 10000 and sometime even more..
The report number column contains two the first 7 chars are a number of machine and the rest is an incrementing number.
For example, 0000006212 is [0000006][212] == [the machine number][the incrementing number]
My query for example :
select
[HeaderResultPulser].[ReportNumber],
max(NumberInTest) as TotalCells
from
ResultPulser, HeaderResultPulser
where
((([ResultPulser].[ReportNumber] like '0000006%' and
CONVERT(INT, SUBSTRING([ResultPulser].[ReportNumber], 8, LEN([ResultPulser].[ReportNumber]))) BETWEEN '211' AND '815')
and ([HeaderResultPulser].[ReportNumber] = [ResultPulser].[ReportNumber])))
group by
[HeaderResultPulser].[ReportNumber]
Actually I want to get all the rows on the machine number 0000006 that number was 211 to 815 (include both)
This query takes about 6-7 seconds
There is a lot of data (in the hundreds of millions and billions and in the future can be more and can be much more in table ResultPulser), and it can get Tens of thousands of rows in HeaderResultPulser table
And In getting receive I only receive on select a few hundred in the worst case a thousand or about two thousand if I want to go far... but (in numbers) to get the max(NumberInTest) from ResultPulser I take about (It can get to a few millions of rows)
There is any way to optimize my query? Or when It's so much data it's just must this time? (That just the way it is)
The way you are doing joins is no longer standard. It's also hard to read, and dangerous if you ever need to use left joins. Instead of joining this way:
select *
from T1, T2
where T1.column = T2.column
Use ANSI-92 join syntax instead:
select *
from T1
join T2 on T1.column = T2.column
You said that your "key" was ReportNumber. Why isn't that declared in your schema? It sounds like you want a unique constraint on HeaderResultPulser.ReportNumber, and a foreign key on the the ReportPulser table, such that ReportNumber references HeaderResultPulser (ReportNumber)
Since your report number column seems to contain two different values, your table is not in First Normal Form. This is making things difficult for you. Why not split the two parts of the "report number" into two different columns when the data is entered? This will significantly improve your query performance, because you no longer need to perform an expression against the data in the table at query time to separate the ReportNumber into atomic values.
Your comment says that the first 7 characters of the ReportNumber are the MachineNumber. But you already have MachineNumber in the HeaderReportPulser table. So why not just add a separate column for Increment? If you still need ReportNumber to exist as a column, you can make it a calculated column, as the concatenation of MachineNumber and Increment.
If you don't want to touch the "existing" schema, we can do a similar thing in reverse. Your query will not be completely sargable unless you can do something to the schema, because you have to perform some kind of expression on the data in the ReportNumber column. But maybe you have the option to use a calculated column to do this up front:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7);
Now we have the increment as a column in its own right. But it's still being calculated at query time, because it's not persisted. We can make it persisted:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7) persisted;
We can also index a computed column. Since your required expression is deterministic and precise (see Indexes on Computed Columns), we don't actually have to mark it as persisted:
alter table HeaderReportPulser
add Increment as right(ReportNumber, len(rtrim(ReportNumber)) - 7);
create index ix_headerreportpulser_increment on HeaderReportPulser(Increment);
You could do a similar set of operations to create the Increment and MachineNumber on the ReportPulser table. If you always want to use both values, create an index on the combination of (MachineNumber, Increment)
The biggest performance gain might be eliminating the outer group by by using a correlated subquery or lateral join:
select hrp.[ReportNumber],
(select max(rp.NumberInTest)
from ResultPulser rp
where rp.ReportNumber = hrp.ReportNumber and
right(rp.ReportNumber, 3) between '211' and '815'
) as TotalCells
from HeaderResultPulser hrp
where hrp.ReportNumber like '0000006%';
Your logic looks like it only wants the last three characters of the ReportNumber, so I simplified the logic. I'm not 100% that is the case -- it just seems reasonable. Regardless, there is no need to convert the values to integers and then compare as strings. And similar logic can be used even for longer report numbers.
You also want an index on ResultPulser(ReportNumber, NumberInTest) :
create index idx_resultpulser_reportnumber_numberintest on ResultPulser(ReportNumber, NumberInTest)
EDIT:
Actually, I notice that the report number matches between the two tables. So this seems simplest:
select hrp.[ReportNumber],
(select max(rp.NumberInTest)
from ResultPulser rp
where rp.ReportNumber = hrp.ReportNumber
) as TotalCells
from HeaderResultPulser hrp
where hrp.ReportNumber >= '0000006211' and
hrp.ReportNumber <= '0000006815';
You still want to be sure you have the above index on ResultPulser.
If the ReportNumber is not a fixed 10 digits, then you can use:
where hrp.ReportNumber >= '0000006211' and
hrp.ReportNumber <= '0000006815' and
len(hrp.ReportNumber) = 10
This should also use the index and return exactly what you want.
Performance Optimization of any query depends on many factors including environment you are hosting and running your query. Hardware and Software play important part in optimization of heavy running database queries. In your case you can look into following things:
USE ANSI 92 JOIN syntax instead of default cross join
e.g
select *
from T1
join T2 on T1.column = T2.column
Put indexes on columns like
[ReportNumber]
[NumberInTest]
Note: You may need index for each column in the join area which is not primary key.
Remember use of MAX is always heavy and that could be the main problem in your query.
Finally you can further look into optimizing your query syntax using following online tool where you can specify your actual query and environment you are using:
https://www.eversql.com/
Hope it help you.
If you really want to optimize performance, I propose to add a bit of logic beyond SQL structures.
Is it possible that particular value of ReportNumber is present in table ResultPulser, but not in table HeaderResultPulser? If not, and I ssupose so, there is no reason to join table HeaderResultPulser.
Then, I propose to take advantage from fact, that the condition on ReportNumber can be expressed equivalently without dividing in substrings. For your example, the condition
([ResultPulser].[ReportNumber] like '0000006%' and
CONVERT(INT, SUBSTRING([ResultPulser].[ReportNumber], 8,
LEN([ResultPulser].[ReportNumber]))) BETWEEN '211' AND '815')
is equivalent to:
([ResultPulser].[ReportNumber] BETWEEN '0000006211' and '0000006815')
So the proposal is:
Create index on table ResultPulser(ReportNumber, NumberInTest)
Use selections similar to this:
select ReportNumber, max(NumberInTest) as TotalCells
from ResultPulser
where
ReportNumber BETWEEN '0000006211' and '0000006815'
group by
ReportNumber
(Please, add brackets or double quotes and capitalizations as necessary for MS SQL Server and your taste)
I would expect that good database will execute this query by index-only access, and it will be optimal from execution point of view.
Performance depends on not only on execution path, but also on setup and hardware. Please, make sure that your database has enough cache and fast disk accesses. Also concurrent load is very important.
Simple splitting the field ReportNumber into [the machine number] and [the incrementing number] will probably not improve performance of the query in form proposed by me. But it may be very convenient for other forms of access (other WHERE classes). And it will reflect the structure of the case. Even more important: It will release you from imposed limits. Currently, you have 3 digits for the [the incrementing number]. Are you sure, it will never be necessary to have more than 999 of them for single [the machine number]?
Why the field ReportNumber has type char(255), when only 10 characters are used? char(255) has fixed length, so it will be terrible wasting of space. Only database compression can help. Used space has strong influence on performance – Please, consider the above remark about the database cache.
If both these fields, [the machine number], [the incrementing number], are intergers, why not split ReportNumber and use integer type for them?
Side remark: Field names suggest that you search the total number of rows in table ResultPulser, which belong to single entry in table HeaderResultPulser. The proposed query will deliver this, only if numbers in NumberInTest are consecutive, without gaps. If this is not supplied, you have to count them rather than seek the maximum.

Adding Row in existing table (SQL Server 2005)

I want to add another row in my existing table and I'm a bit hesitant if I'm doing the right thing because it might skew the database. I have my script below and would like to hear your thoughts about it.
I want to add another row for 'Jane' in the table, which will be 'SKATING" in the ACT column.
Table: [Emp_table].[ACT].[LIST_EMP]
My script is:
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
Will this do the trick?
Your statement looks ok. If the database has a problem with it (for example, due to a foreign key constraint violation), it will reject the statement.
If any of the fields in your table are numeric (and not varchar or char), just remove the quotes around the corresponding field. For example, if emp_cod and line_no are int, insert the following values instead:
('REG','EMP',45233,'2016-06-20 00:00:00:00',2,'SKATING','JANE')
Inserting records into a database has always been the most common reason why I've lost a lot of my hairs on my head!
SQL is great when it comes to SELECT or even UPDATEs but when it comes to INSERTs it's like someone from another planet came into the SQL standards commitee and managed to get their way of doing it implemented into the final SQL standard!
If your table does not have an automatic primary key that automatically gets generated on every insert, then you have to code it yourself to manage avoiding duplicates.
Start by writing a normal SELECT to see if the record(s) you're going to add don't already exist. But as Robert implied, your table may not have a primary key because it looks like a LOG table to me. So insert away!
If it does require to have a unique record everytime, then I strongly suggest you create a primary key for the table, either an auto generated one or a combination of your existing columns.
Assuming the first five combined columns make a unique key, this select will determine if your data you're inserting does not already exist...
SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND [EMP_COD] = wsEmpCod AND [DATE] = wsDate AND [LINE_NO] = wsLineno
The wsXXX declarations, you will have to replace them with direct values or have them DECLAREd earlier in your script.
If you ran this alone and recieved a value of 1 or more, then the data exists already in your table, at least those 5 first columns. A true duplicate test will require you to test EVERY column in your table, but it should give you an idea.
In the INSERT, to do it all as one statement, you can do this ...
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
WHERE (SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND
[EMP_COD] = wsEmpCod AND [DATE] = wsDate AND
[LINE_NO] = wsLineno) = 0
Just replace the wsXXX variables with the values you want to insert.
I hope that made sense.

Setting field size (per column) while generating table in Access

I am trying to export my Database as an .dbf by using a VBA script, but the dbf requires the database to have certain values for the column size.
When I leave the columns as they are in Access, I get an error saying
field will not fit in record
How can I set the column size for each column seperatly? Preferably while generating the table, so I don't have to do it manually everytime i generate a new table with queries
And where do I set them? (in a Query or in SQL?)
Thanks in advance!
Edit:
I have made sure that its the field size value that is giving me the error. I changed all the field size values manually by opening the table in Design View.
So now the second part of my question is becoming more crucial. Wether or not it is possible to set the field size while generating the table.
Edit2:
I am currently using SQL in a query to create the table as followed:
SELECT * INTO DB_Total
FROM Tags_AI_DB;
After the initial DB_Total is made, I use several Insert into queries to add other rows:
INSERT INTO DB_TOTAL
SELECT a.*
FROM Tags_STS_ENA_DB AS a
LEFT JOIN DB_TOTAL AS b
ON a.NAME = b.NAME
WHERE b.NAME IS NULL;
If I set the column values in the DB_Total table while generating it with the Select into query, will they still have those values after using the Insert Into queries to insert more rows?
Edit3:
I decided (after a few of your suggestions and some pointers from colleagues, that it would be better to first make my table and afterwards update this table with queries.
However, it seems like I have run into a dead end with Access, this is the code I am using:
CREATE TABLE DB_Total ("NAME" char(79),"TYPE" char(16), "UNIT" char(31),
"ADDR" char(254), "RAW_ZERO" char(11), "RAW_FULL" char(11), "ENG_ZERO" char(11),
"ENG_FULL" char(11), "ENG_UNIT" char(8), "FORMAT" char(11), "COMMENT" char(254),
"EDITCODE" char(8), "LINKED" char(1), "OID" char(10), "REF1" char(11), "REF2" char(11),
"DEADBAND" char(11), "CUSTOM" char(128), "TAGGENLINK" char(32), "CLUSTER" char(16),
"EQUIP" char(254), "ITEM" char(63), "HISTORIAN" char(6),
"CUSTOM1" char(254), "CUSTOM2" char(254), "CUSTOM3" char(254), "CUSTOM4" char(254),
"CUSTOM5" char(254), "CUSTOM6" char(254), "CUSTOM7" char(254), "CUSTOM8" char(254))
These are all the columns required for me to make a DBF file that is accepted by the application we are using it with.
You'll understand my sadness when this generated the following error:
Record is too large
Is there anything I can do to make this table work?
UPDATE
The maximum record size for Access 2007 is around 2kB (someone will no doubt correct that value)
When you create CHAR(255) it will use 255 bytes of space regardless as to what is in the field.
By contrast, VARCHARs do not use up space (only enough to define them) until you put something in the field, they grow dynamically.
Changing the CHAR(x)s to VARCHAR(x)s you will shrink the length of your table to within permitted values. Be aware that you may come into trouble if the row you are trying to insert is larger than the 2kB limit.
Previous
The way to specify column lengths when generating the table is to use a CREATE TABLE statement instead of a SELECT * INTO.
CREATE TABLE DB_Total
(
Column1Name NVARCHAR(255) --Use whatever datatype and length you need
,Column2Name NUMERIC(18,0) --Use whatever datatype and length you need
,...
) ;
INSERT INTO DB_Total
....
If you use a SELECT * INTO statement, SQL will use whatever field lengths and types it finds in the existing data.
It is also better practice to list the column names in your insert statement, so instead of
INSERT INTO DB_TOTAL
SELECT a.*
You should put:
INSERT INTO DB_Total
(
Column1Name
,Column2Name
,...
)
SELECT a.Column1Name
,a.Column2Name
,...
FROM ...
WHERE ... ;
In Edit2, you indicated your process starts with a "make table" (SELECT INTO) query which creates DB_Total and loads it with data from Tags_AI_DB. Then you run a series of "append" (INSERT) queries to add data from other tables.
Now your problem is that you need specific field size settings for DB_Total, but it is impossible to define those sizes with a "make table" query.
I think you should create DB_Total one time and set the field sizes as you wish. Do that manually with the table in Design View, or execute a CREATE TABLE statement if you prefer.
Then forget about the "make table" query and use only "append" queries to add the data.
If the issue is that this is a recurring operation and you want to discard previous data before importing the new, execute DELETE FROM DB_Total instead of DROP TABLE DB_Total. That will allow you to preserve the structure of the (now empty) DB_Total table so you needn't fiddle with setting the field sizes again.
Seems to me the only potential issue then might be if the structure of the source tables changes. If that happens, revise the structure of DB_Total so that it's compatible again.

What the best way to self-document "codes" in a SQL based application?

Q: Is there any way to implement self-documenting enumerations in "standard SQL"?
EXAMPLE:
Column: PlayMode
Legal values: 0=Quiet, 1=League Practice, 2=League Play, 3=Open Play, 4=Cross Play
What I've always done is just define the field as "char(1)" or "int", and define the mnemonic ("league practice") as a comment in the code.
Any BETTER suggestions?
I'd definitely prefer using standard SQL, so database type (mySql, MSSQL, Oracle, etc) should't matter. I'd also prefer using any application language (C, C#, Java, etc), so programming language shouldn't matter, either.
Thank you VERY much in advance!
PS:
It's my understanding that using a second table - to map a code to a description, for example "table playmodes (char(1) id, varchar(10) name)" - is very expensive. Is this necessarily correct?
The normal way is to use a static lookup table, sometimes called a "domain table" (because its purpose is to restrict the domain of a column variable.)
It's up to you to keep the underlying values of any enums or the like in sync with the values in the database (you might write a code generator to generates the enum from the domain table that gets invoked when the something in the domain table gets changed.)
Here's an example:
--
-- the domain table
--
create table dbo.play_mode
(
id int not null primary key clustered ,
description varchar(32) not null unique nonclustered ,
)
insert dbo.play_mode values ( 0 , "Quiet" )
insert dbo.play_mode values ( 1 , "LeaguePractice" )
insert dbo.play_mode values ( 2 , "LeaguePlay" )
insert dbo.play_mode values ( 3 , "OpenPlay" )
insert dbo.play_mode values ( 4 , "CrossPlay" )
--
-- A table referencing the domain table. The column playmode_id is constrained to
-- on of the values contained in the domain table playmode.
--
create table dbo.game
(
id int not null primary key clustered ,
team1_id int not null foreign key references dbo.team( id ) ,
team2_id int not null foreign key references dbo.team( id ) ,
playmode_id int not null foreign key references dbo.play_mode( id ) ,
)
go
Some people for reasons of "economy" might suggest using a single catch-all table for all such code, but in my experience, that ultimately leads to confusion. Best practice is a single small table for each set of discrete values.
add a foreign key to "codes" table.
the codes table would have the PK be the code value, add a string description column where you enter in the description of the value.
table: PlayModes
Columns: PlayMode number --primary key
Description string
I can't see this as being very expensive, databases are based on joining tables like this.
That information should be in database somewhere and not on comments.
So, you should have a table containing that codes and prolly a FK on your table to it.
I agree with #Nicholas Carey (+1): Static data table with two columns, say “Key” or “ID” and “Description”, with foreign key constraints on all tables using the codes. Often the ID columns are simple surrogate keys (1, 2, 3, etc., with no significance attached to the value), but when reasonable I go a step further and use “special” codes. Following are a few examples.
If the values are a sequence (say, Ordered, Paid, Processed, Shipped), I might use 1, 2, 3, 4, to indicate sequence. This can make things easier if you want to find all “up through” a give stages, such as all orders that have not yet been shipped (ID < 4). If you are into planning ahead, make them 10, 20, 30, 40; this will allow you to add values “in between” existing values, if/when new codes or statuses come along. (Yes, you cannot and should not try to anticipate everything and anything that might have to be done some day, but a bit of pre-planning like this can make some changes that much simpler.)
Keys/Ids are often integers (1 byte, 2 byte, 4 byte, whatever). There’s little cost to make them character values (1 char, 2 char, 3, char, 4 char). That’s character, not variable character. Done this way, you can have mnemonics on your codes, such as
O, P, R, S
Or, Pd, Pr, Sh
Ordr, Paid, Proc, Ship
…or whatever floats your boat. Done this way, I have found that it can save a lot of time when analyzing or debugging. You still want the lookup table, for relational integrity as well as a reminder for the more obscure codes.

Share auto-incremented primary key between two tables

Hi I want to have two tables each have an INT "id" column which will auto-increment but I don't want either "id" columns to ever share the same number. What is this called and what's the best way to do it? Sequence? Iterator? Index? Incrementor?
Motivation: we're migrating from one schema to a another and have a web-page that reads both tables and shows the (int) ID, but I can't have the same ID used for both tables.
I'm using SQL Server 9.0.3068.
Thanks!
Just configure the identity increment to be >1 e.g. table one uses IDENTITY (1, 10) [1,11,21...] and table two uses IDENTITY (2, 10) [2,12,22...]. This will also give you some room for expansion if needed later.
I think using a GUID would be the most straightforward way, if I understand you correctly.
SELECT NEWID()
Use a column with GUID (Globally Unique Identifier) type. It's 16 byte and will be always unique for each row.
Just be aware that you'll get a significant performance hit comparing to normal integer keys.
Use another table with an ID key of type int default it to 1, called KeyID or whatever.
Have a stored procedure retrieve the value, add 1, then update the KeyID, then return this to the stored procedure which is updating your two tables which needs the new unique key.
This will ensure the ID is an int, and that it's unique between the set of tables which are using the stored procedure to generate new ID's.
You can define an IDENTITY column in a third table, use that to generate ID values, but you always roll back any inserts you make into the table (to avoid making it grow). Rolling back the transaction doesn't roll back the fact that the ID was generated.
I'm not a regular user of Microsoft SQL Server, so please forgive any syntax gaffes. But something like the following is what I have in mind:
CREATE TABLE AlwaysRollback (
id IDENTITY(1,1)
);
BEGIN TRANSACTION;
INSERT INTO AllwaysRollBack () VALUES ();
ROLLBACK TRANSACTION;
INSERT INTO RealTable1 (id, ...) VALUES (SCOPE_IDENTITY(), ...);
BEGIN TRANSACTION;
INSERT INTO AllwaysRollBack () VALUES ();
ROLLBACK TRANSACTION;
INSERT INTO RealTable2 (id, ...) VALUES (SCOPE_IDENTITY(), ...);
I don't know what you would call it.
If you don't want to use a GUID or a separate table, you could also create a function that looked at the max values of the ids from both tables and added one to the that value (or something like that).
You could then call that function in an insert trigger on both tables.
I am personally a fan of the GUID solution, but here is a viable option.
Many solutions to this problem have avoided GUID and used good old integer. This is common also with merge replication situations where many satellite sites merge with a master and key conflicts need to be avoided.
If GUID will not work for you, and you absolutely must have int, bigint, or the like, you can always just use an IDENTITY column and have each table with a different value for SEED. Those datatypes have a very wide range, and it is not too hard to split the range into usable segments, especially if all you want is two splits. As an example, basic int has a range from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). This is more than enough for a customer table, for example.
Transact-SQL Reference (SQL Server 2000)
int, bigint, smallint, and tinyint
Example:
--Create table with a seed of 1 billion and an increment of 1
CREATE TABLE myTable
(
primaryKey int IDENTITY (1000000000, 1),
columnOne varchar(10) NOT NULL
)
If you really need to do this with an int and you have an auto incrementing number, the way i have done this before is to change the id field auto increment function to the sequence of the other table. I am not too sure in ms sql or my sql but in pgsql that means that in the sql you would have this field
id integer NOT NULL DEFAULT nextval('table_two_seq'::regclass),
where table_two_sequence is the sequence function for the other table. Then test it out by inserting some data. I am really sorry if this wont work in ms sql i try to steer clear of it tbh. Failing that the GUID is the best way as has been mentioned by others. Or when inserting in the code that you use you could put an algorithm in that but it could get messy.
Alternatively, think about having the data in one table as this would be a way around it. if you need to you could have a view simulating two tables. Just a thought.
Hope i have helped
Starting with SQL Server 2012 you can declare a sequence object
https://msdn.microsoft.com/en-us/library/ff878091.aspx which is exactly what you need.
I should be pretty trivial to emulate a sequence object with a table
containing the next sequence value and a stored procedure atomically
select the value and increment. [You'd liked to use function, but functions
can't have side effects.]
How about this hack? Create a table (MySequence) with two columns: And Identity column (SequenceValue) and a dummy column (DummyValue) and use this stored procedure to get a new sequence value. The only row in the table will be last sequence value retrieved.
CREATE PROCEDURE GetNextValue
AS
BEGIN
DECLARE #value int = null;
-- Insert statements for procedure here
INSERT into MySequence (DummyValue) Values (null);
SET #value = SCOPE_IDENTITY();
DELETE from MySequence where SequenceValue <> #value
SELECT #value as Sequence
return #value
END
To use the sequence you'd have to manage the inserts to the target tables--a trigger would probably work.