Note: This is not about database design nor about the general use of a GUID. This is about deterministically create such GUID's for test data, on a Microsoft SQL server.
We are migrating our database away from integer identifiers to the uniqueidentifier data type.
For test purposes we want to migrate our test data sets to known GUID values, deterministically based on our former integer values
UPDATE Unit
SET UnitGuid = NEWID(UnitId)
Obviously this does not work right away. How to use the UnitId to create a deterministic GUID?
You could create keymap table:
CREATE TABLE tab_map(id_old INT PRIMARY KEY, guid UNIQUEIDENTIFIER);
INSERT INTO tab_map(id_old, guid)
SELECT id, NEWID()
FROM src_table;
DBFiddle Demo
After that you could use simple query or wrap with a function:
SELECT guid
FROM tab_map
WHERE id_old = ?
Stop thinking about the problem from a "string" perspective. an int is made up of 4 bytes. A uniqueidentifier is made up of 16 bytes. you can easily take 12 fixed bytes and append the four bytes from an int to the end of those, and get a solution that works for all int values:
declare #Unit table
(
UniqueColumn UNIQUEIDENTIFIER DEFAULT NEWID(),
Characters VARCHAR(10),
IntegerId int
)
-- Add *3* data rows
INSERT INTO #Unit(Characters, IntegerId) VALUES ('abc', 1111),('def', 2222),('ghi',-17)
-- Deterministically creates a uniqueidentifier value out of an integer value.
DECLARE #GuidPrefix binary(12) = 0xefbeadde0000000000000000
UPDATE #Unit
SET UniqueColumn = CONVERT(uniqueidentifier,#GuidPrefix + CONVERT(binary(4),IntegerId))
-- Check the result
SELECT * FROM #Unit
Result:
UniqueColumn Characters IntegerId
------------------------------------ ---------- -----------
DEADBEEF-0000-0000-0000-000000000457 abc 1111
DEADBEEF-0000-0000-0000-0000000008AE def 2222
DEADBEEF-0000-0000-0000-0000FFFFFFEF ghi -17
(For various reasons, we have to provide the first four bytes in a different order than the one that is used by default when displaying a uniqueidentifier as a string, which is why if we want to display DEADBEEF, we had to start our binary as efbeadde)
Also, of course, insert usual warnings that if you're creating guids/uniqueidentifiers but not using one of the prescribed methods for generating them, then you cannot assume any of the usual guarantees about uniqueness.
I ended up solving this for myself. Here's my solution for future reference:
I create a prefix part of the GUID in the form of deadbeef-0000-0000-0000-, then append a "stringified", zero-padded version of the Id column's integer value to it, like 000000000001, wich results in
DEADBEEF-0000-0000-0000-000000000001
in this example.
Here's the SQL command for this action on a whole table:
-- Deterministically creates a uniqueidentifier value out of an integer value.
DECLARE #GuidPrefix nvarchar(max) = N'deadbeef-0000-0000-0000-'; -- without the last 12 digits
UPDATE Unit
SET UniqueColumn =
(SELECT #GuidPrefix + RIGHT('000000000000' + CAST(IntegerId AS NVARCHAR (12)), 12 ) AS NUMBER_CONVERTED)
Warnings:
This implementation works only for positive int values (which are up
to 2147483647 max)
This is only intended for test data! Use is
strongly discouraged for production data!
And here's a complete working example:
-- Create an example table with random GUID's
CREATE TABLE Unit
(
UniqueColumn UNIQUEIDENTIFIER DEFAULT NEWID(),
Characters VARCHAR(10),
IntegerId int
)
-- Add 2 data rows
INSERT INTO Unit(Characters, IntegerId) VALUES ('abc', 1111)
INSERT INTO Unit(Characters, IntegerId) VALUES ('def', 2222)
-- Deterministically creates a uniqueidentifier value out of an integer value.
DECLARE #GuidPrefix nvarchar(max) = N'deadbeef-0000-0000-0000-'; -- without the last 12 digits
UPDATE Unit
SET UniqueColumn =
(SELECT #GuidPrefix + RIGHT('000000000000' + CAST(IntegerId AS NVARCHAR (12)), 12 ) AS NUMBER_CONVERTED)
-- Check the result
SELECT * FROM Unit
Result:
UniqueColumn Characters IntegerId
--------------------------------------- ---------- ---------
DEADBEEF-0000-0000-0000-000000001111 abc 1111
DEADBEEF-0000-0000-0000-000000002222 def 2222
Related
We have a table with GUID primary keys. When I search for a specific key, I can use either:
SELECT * FROM products WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE6'
SELECT * FROM products WHERE productID LIKE '34594289-16B9-4EEF-9A1E-B35066531DE6'
RESULT (for both):
product_ID Prd_Model
------------------------------------ --------------------------------------------------
34594289-16B9-4EEF-9A1E-B35066531DE6 LW-100
(1 row affected)
We have a customer who uses our ID but adds more text to it to create some kind of compound field in their own system. They sent me one of these values to look up and I had an unexpected result. I meant to trim the suffix but forgot, so I ran this:
SELECT * FROM products WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD'
When I ran it, I unexpectedly got the same result:
product_ID Prd_Model
------------------------------------ --------------------------------------------------
34594289-16B9-4EEF-9A1E-B35066531DE6 LW-062
(1 row affected)
Now if I trim a value off the end of the GUID when searching I get nothing (GUID is 1 digit short):
SELECT * FROM products WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE'
Result:
product_ID Prd_Model
------------------------------------ --------------------------------------------------
(0 rows affected)
When using the LIKE command instead of '=' and if I add the suffix to the end, the statement returns zero results. This is what I would expect.
So why does the longer string with the suffix added to the end return a result when using '=' in the statement? It's obviously ignoring anything beyond the 36-character GUID length but I'm not sure why.
This behaviour is documented:
Converting uniqueidentifier Data
The uniqueidentifier type is considered a character type for the purposes of conversion from a character expression, and therefore is subject to the truncation rules for converting to a character type. That is, when character expressions are converted to a character data type of a different size, values that are too long for the new data type are truncated. See the Examples section.
So, the string value '34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD' is truncated to '34594289-16B9-4EEF-9A1E-B35066531DE6' when it is implicitly cast (due to Data Type Precedence) to a uniqueidentifier and, unsurprisingly, '34594289-16B9-4EEF-9A1E-B35066531DE6' equals itself so the row is returned.
And the documentation does indeed give an example:
The following example demonstrates the truncation of data when the value is too long for the data type being converted to. Because the uniqueidentifier type is limited to 36 characters, the characters that exceed that length are truncated.
DECLARE #ID NVARCHAR(max) = N'0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong';
SELECT #ID, CONVERT(uniqueidentifier, #ID) AS TruncatedValue;
Here is the result set.
String TruncatedValue
-------------------------------------------- ------------------------------------
0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong 0E984725-C51C-4BF4-9960-E1C80E27ABA0
I, however, find it odd that you say that the statement below returns no rows:
SELECT *
FROM products
WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE'
Though true, it won't return rows, it will also error:
Conversion failed when converting from a character string to uniqueidentifier.
The fact it doesn't implies your column isn't a uniqueidentifier which would mean that your first statement isn't true; as the longer string would not be truncated. This means that one of the statements in the question is likely wrong; either your column is a uniqueidentifier and thus you get results but get an error in the latter, or it isn't and neither statement would return a result set. As you can see in this demonstration:
CREATE TABLE dbo.YourTable (UID uniqueidentifier, String varchar(36));
INSERT INTO dbo.YourTable (UID,String)
VALUES('34594289-16B9-4EEF-9A1E-B35066531DE6','34594289-16B9-4EEF-9A1E-B35066531DE6');
GO
--Returns data
SELECT *
FROM dbo.YourTable
WHERE UID = '34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD'
GO
--Errors
SELECT *
FROM dbo.YourTable
WHERE UID = '34594289-16B9-4EEF-9A1E-B35066531DE';
GO
--Returns no data
SELECT *
FROM dbo.YourTable
WHERE String = '34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD'
GO
--Returns no data
SELECT *
FROM dbo.YourTable
WHERE String = '34594289-16B9-4EEF-9A1E-B35066531DE'
GO
DROP TABLE dbo.YourTable;
db<>fiddle
I have a table that looks like this:
memberno(int)|member_mouth (varchar)|Inspected_Date (varchar)
-----------------------------------------------------------------------------
12 |'1;2;3;4;5;6;7' |'12-01-01;12-02-02;12-03-03' [7 members]
So by looking at how this table has been structured (poorly yes)
The values in the member_mouth field is a string that is delimited by a ";"
The values in the Inspected_Date field is a string that is delimited by a ";"
So - for each delimited value in member_mouth there is an equal inspected_date value delimited inside the string
This table has about 4Mil records, we have an application written in C# that normalizes the data and stores it in a separate table. The problem now is because of the size of the table it takes a long time for this to process. (the example above is nothing compared to the actual table, it's much larger and has a couple of those string "array" fields)
My question is this: What would be the best and fastest way to normilize this data in MSSQL proc? let MSSQL do the work and not a C# app?
The best way will be SQL itself. The way followed in the below code is something which worked for me well with 2-3 lakhs of data.
I am not sure about the below code when it comes to 4 Million, but may help.
Declare #table table
(memberno int, member_mouth varchar(100),Inspected_Date varchar(400))
Insert into #table Values
(12,'1;2;3;4;5;6;7','12-01-01;12-02-02;12-03-03;12-04-04;12-05-05;12-07-07;12-08-08'),
(14,'1','12-01-01'),
(19,'1;5;8;9;10;11;19','12-01-01;12-02-02;12-03-03;12-04-04;12-07-07;12-10-10;12-12-12')
Declare #tableDest table
(memberno int, member_mouth varchar(100),Inspected_Date varchar(400))
The table will be like.
Select * from #table
See the code from here.
------------------------------------------
Declare #max_len int,
#count int = 1
Set #max_len = (Select max(Len(member_mouth) - len(Replace(member_mouth,';','')) + 1)
From #table)
While #count <= #max_len
begin
Insert into #tableDest
Select memberno,
SUBSTRING(member_mouth,1,charindex(';',member_mouth)-1),
SUBSTRING(Inspected_Date,1,charindex(';',Inspected_Date)-1)
from #table
Where charindex(';',member_mouth) > 0
union
Select memberno,
member_mouth,
Inspected_Date
from #table
Where charindex(';',member_mouth) = 0
Delete from #table
Where charindex(';',member_mouth) = 0
Update #table
Set member_mouth = SUBSTRING(member_mouth,charindex(';',member_mouth)+1,len(member_mouth)),
Inspected_Date = SUBSTRING(Inspected_Date,charindex(';',Inspected_Date)+1,len(Inspected_Date))
Where charindex(';',member_mouth) > 0
Set #count = #count + 1
End
------------------------------------------
Select *
from #tableDest
Order By memberno
------------------------------------------
Result.
You can take a reference here.
Splitting delimited values in a SQL column into multiple rows
Do it on SQl server side, if possible a SSIS package would be great.
I am asked to generate custom ID values for primary key columns. The query is as follows,
SELECT * FROM SC_TD_GoodsInward WHERE EntityId = #EntityId
SELECT #GoodsInwardId=IIF((SELECT COUNT(*) FROM SC_TD_GoodsInward)>0, (Select 'GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)+'_'+CONVERT(varchar,#EntityId)+'_'+(SELECT RIGHT('0000'+CONVERT(VARCHAR,CONVERT(INT,RIGHT(MAX(GoodsInwardId),4))+1),4) from SC_TD_GoodsInward)), (SELECT 'GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)+'_'+CONVERT(varchar,#EntityId)+'_0001'))
Here the SC_TD_GoodsInward is a table, GoodsInwardId is the value to be generated. I am getting the desired outputs too. Examples.
GI_131118_1_0001
GI_131212_1_0002
GI_131212_1_0003
But, the above condition fails when the last digits reach 9999. I simulated the query and the results were,
GI_131226_1_9997
GI_140102_1_9998
GI_140102_1_9999
GI_140102_1_0000
GI_140102_1_0000
GI_140102_1_0000
GI_140102_1_0000
GI_140102_1_0000
After 9999, it goes to 0000 and does not increment thereafter. So, in the future, I will eventually run into a PK duplicate error. How can i recycle the values so that after 9999, it goes on as 0000, 0001 ... etc. What am I missing in the above query?
NOTE: Please consider the #EntityId value to be 1 in the query.
I am using SQL SERVER 2012.
Before giving a solution for the question few points on your question:
As the Custom primary key consists of mainly three parts Date(140102), physical location where transaction takes place (entityID), 4 place number(9999).
According to the design on a single date in a single physical location there cannot be more than 9999 transactions -- My Solution will also contain the same limitation.
Some points on my solution
The 4 place digit is tied up to the date which means for a new date the count starts from 0000. For Example
GI_140102_1_0001,
GI_140102_1_0002,
GI_140102_1_0003,
GI_140103_1_0000,
GI_140104_1_0000
Any way the this field will be unique.
The solution compares the latest date in the record to the current date.
The Logic:
If current date and latest date in the record matches
Then it increments 4 place digit by the value by 1
If the current date and the latest date in the record does not matched
The it sets the 4 place digit by the value 0000.
The Solution: (Below code gives out the value which will be the next GoodsInwardId, Use it as per requirement to fit in to your solution)
declare #previous nvarchar(30);
declare #today nvarchar(30);
declare #newID nvarchar(30);
select #previous=substring(max(GoodsInwardId),4,6) from SC_TD_GoodsInward;
Select #today=RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)
+RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2);
if #previous=#today
BEGIN
Select #newID='GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)
+RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)
+'_'+CONVERT(varchar,1)+'_'+(SELECT RIGHT('0000'+
CONVERT(VARCHAR,CONVERT(INT,RIGHT(MAX(GoodsInwardId),4))+1),4)
from SC_TD_GoodsInward);
END
else
BEGIN
SET #newID='GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)
+RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)
+'_'+CONVERT(varchar,1)+'_0000';
END
select #newID;
T-SQL to create the required structure (Probable Guess)
For the table:
CREATE TABLE [dbo].[SC_TD_GoodsInward](
[EntityId] [int] NULL,
[GoodsInwardId] [nvarchar](30) NULL
)
Sample records for the table:
insert into dbo.SC_TD_GoodsInward values(1,'GI_140102_1_0000');
insert into dbo.SC_TD_GoodsInward values(1,'GI_140101_1_9999');
insert into dbo.SC_TD_GoodsInward values(1,'GI_140101_1_0001');
**Its a probable solution in your situation although the perfect solution would be to have identity column (use reseed if required) and tie it with the current date as a computed column.
You get this problem because once the last 4 digits reach 9999, 9999 will remain the highest number no matter how many rows are inserted, and you are throwing away the most significant digit(s).
I would remodel this to track the last used INT portion value of GoodsInwardId in a separate counter table (as an INTEGER), and then MODULUS (%) this by 10000 if need be. If there are concurrent calls to the PK generator, remember to lock the counter table row.
Also, even if you kept all the digits (e.g. in another field), note that ordering a CHAR is as follows
1
11
2
22
3
and then applying MAX() will return 3, not 22.
Edit - Clarification of counter table alternative
The counter table would look something like this:
CREATE TABLE PK_Counters
(
TableName NVARCHAR(100) PRIMARY KEY,
LastValue INT
);
(Your #EntityID might be another candidate for the counter PK column.)
You then increment and fetch the applicable counter on each call to your custom PK Key generation PROC:
UPDATE PK_Counters
SET LastValue = LastValue + 1
WHERE TableName = 'SC_TD_GoodsInward';
Select
'GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)
+RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)
+RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)+'_'
+CONVERT(varchar,#EntityId)+'_'
+(SELECT RIGHT('0000'+ CONVERT(NVARCHAR, LastValue % 10000),4)
FROM PK_Counters
WHERE TableName = 'SC_TD_GoodsInward');
You could also modulo the LastValue in the counter table (and not in the query), although I believe there is more information about the number of records inserted by leaving the counter un-modulo-ed.
Fiddle here
Re : Performance - Selecting a single integer value from a small table by its PK and then applying modulo will be significantly quicker than selecting MAX from a SUBSTRING (which would almost certainly be a scan)
DECLARE #entityid INT = 1;
SELECT ('GI_'
+ SUBSTRING(convert(varchar, getdate(), 112),3,6) -- yymmdd today DATE
+ '_' + CAST(#entityid AS VARCHAR(50)) + '_' --#entity parameter
+ CASE MAX(t.GI_id + 1) --take last number + 1
WHEN 10000 THEN
'0000' --reset
ELSE
RIGHT( CAST('0000' AS VARCHAR(4)) +
CAST(MAX(t.GI_id + 1) AS VARCHAR(4))
, 4)
END) PK
FROM
(
SELECT TOP 1
CAST(SUBSTRING(GoodsInwardId,11,1) AS INT) AS GI_entity,
CAST(SUBSTRING(GoodsInwardId,4,6) AS INT) AS GI_date,
CAST(RIGHT(GoodsInwardId,4) AS INT) AS GI_id
FROM SC_TD_GoodsInward
WHERE CAST(SUBSTRING(GoodsInwardId,11,1) AS INT) = #entityid
ORDER BY gi_date DESC, rowTimestamp DESC, gi_id DESC
) AS t
This should take the last GoodInwardId record, ordered by date DESC and take its numeric "id". Then add + 1 to return the NEW id and combine it with today's date and the #entityid you passed. If >9999, start again from 0000.
You need a timestamp type column tho, to order two inserted in the same date + same transaction time. Otherwise you could get duplicates.
I have simplified the answer even more and arrived with the following query.
IF (SELECT COUNT(GoodsInwardId) FROM SC_TD_GoodsInward WHERE EntityId = #EntityId)=0
BEGIN
SELECT #GoodsInwardId= 'GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)+
RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+
RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)+'_'+
CONVERT(varchar,#EntityId)+'_0001'
END
ELSE
BEGIN
SELECT * FROM SC_TD_GoodsInward WHERE EntityId = #EntityId AND CONVERT(varchar,CreatedOn,103) = CONVERT(varchar,GETDATE(),103)
SELECT #GoodsInwardId=IIF(##ROWCOUNT>0,
(Select 'GI_'+
RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)+
RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+
RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)+'_'+
CONVERT(varchar,#EntityId)+'_'+
(SELECT RIGHT('0000'+CONVERT(VARCHAR,CONVERT(INT,RIGHT(MAX(GoodsInwardId),4))+1),4) from SC_TD_GoodsInward WHERE CONVERT(varchar,CreatedOn,103) = CONVERT(varchar,GETDATE(),103))),
(SELECT 'GI_'+RIGHT('00'+CONVERT(varchar,datepart(YY,getdate())),2)+
RIGHT('00'+CONVERT(varchar,datepart(MM,getdate())),2)+
RIGHT('00'+CONVERT(varchar,datepart(DD,getdate())),2)+'_'+
CONVERT(varchar,#EntityId)+'_0001'))
END
select * from SC_TD_GoodsInward
I am working on MSSQL, trying to split one string column into multiple columns. The string column has numbers separated by semicolons, like:
190230943204;190234443204;
However, some rows have more numbers than others, so in the database you can have
190230943204;190234443204;
121340944534;340212343204;134530943204
I've seen some solutions for splitting one column into a specific number of columns, but not variable columns. The columns that have less data (2 series of strings separated by commas instead of 3) will have nulls in the third place.
Ideas? Let me know if I must clarify anything.
Splitting this data into separate columns is a very good start (coma-separated values are an heresy). However, a "variable number of properties" should typically be modeled as a one-to-many relationship.
CREATE TABLE main_entity (
id INT PRIMARY KEY,
other_fields INT
);
CREATE TABLE entity_properties (
main_entity_id INT PRIMARY KEY,
property_value INT,
FOREIGN KEY (main_entity_id) REFERENCES main_entity(id)
);
entity_properties.main_entity_id is a foreign key to main_entity.id.
Congratulations, you are on the right path, this is called normalisation. You are about to reach the First Normal Form.
Beweare, however, these properties should have a sensibly similar nature (ie. all phone numbers, or addresses, etc.). Do not to fall into the dark side (a.k.a. the Entity-Attribute-Value anti-pattern), and be tempted to throw all properties into the same table. If you can identify several types of attributes, store each type in a separate table.
If these are all fixed length strings (as in the question), then you can do the work fairly simply (at least relative to other solutions):
select substring(col, 1+13*(n-1), 12) as val
from t join
(select 1 as n union all select union all select 3
) n
on len(t.col) <= 13*n.n
This is a useful hack if all the entries are the same size (not so easy if they are of different sizes). Do, however, think about the data structure because semi-colon (or comma) separated list is not a very good data structure.
IF I were you, I would create a simple function that is dividing values separated with ';' like this:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N'fn_Split_List') AND xtype IN (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[fn_Split_List]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_Split_List](#List NVARCHAR(512))
RETURNS #ResultRowset TABLE ( [Value] NVARCHAR(128) PRIMARY KEY)
AS
BEGIN
DECLARE #XML xml = N'<r><![CDATA[' + REPLACE(#List, ';', ']]></r><r><![CDATA[') + ']]></r>'
INSERT INTO #ResultRowset ([Value])
SELECT DISTINCT RTRIM(LTRIM(Tbl.Col.value('.', 'NVARCHAR(128)')))
FROM #xml.nodes('//r') Tbl(Col)
RETURN
END
GO
Than simply called in this way:
SET NOCOUNT ON
GO
DECLARE #RawData TABLE( [Value] NVARCHAR(256))
INSERT INTO #RawData ([Value] )
VALUES ('1111111;22222222')
,('3333333;113113131')
,('776767676')
,('89332131;313131312;54545353')
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
SET NOCOUNT OFF
GO
The result is as the follow:
Value
1111111
22222222
113113131
3333333
776767676
313131312
54545353
89332131
Anyway, the logic in the function is not complicated, so you can easily put it anywhere you need.
Note: There is not limitations of how many values you will have separated with ';', but there are length limitation in the function that you can set to NVARCHAR(MAX) if you need.
EDIT:
As I can see, there are some rows in your example that will caused the function to return empty strings. For example:
number;number;
will return:
number
number
'' (empty string)
To clear them, just add the following where clause to the statement above like this:
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
WHERE LEN(SL.[Value]) > 0
I am creating a customer table with a parent table that is company.
It has been dictated(chagrin) that I shall create a primary key for the customer table that is a combination of the company id which is an existing varchar(4) column in the customer table, e.g. customer.company
The rest of the varchar(9) primary key shall be a zero padded counter incrementing through the number of customers within that company.
E.g. where company = MSFT and this is a first insert of an MSFT record: the PK shall be MSFT00001
on subsequent inserts the PK would be MSFT00001, MSFT00002 etc.
Then when company = INTL and its first record is inserted, the first record would be INTL00001
I began with an instead of trigger and a udf that I created from other stackoverflow responses.
ALTER FUNCTION [dbo].[GetNextID]
(
#in varchar(9)
)
RETURNS varchar(9) AS
BEGIN
DECLARE #prefix varchar(9);
DECLARE #res varchar(9);
DECLARE #pad varchar(9);
DECLARE #num int;
DECLARE #start int;
if LEN(#in)<9
begin
set #in = Left(#in + replicate('0',9) , 9)
end
SET #start = PATINDEX('%[0-9]%',#in);
SET #prefix = LEFT(#in, #start - 1 );
declare #tmp int;
set #tmp = len(#in)
declare #tmpvarchar varchar(9);
set #tmpvarchar = RIGHT( #in, LEN(#in) - #start + 1 )
SET #num = CAST( RIGHT( #in, LEN(#in) - #start + 1 ) AS int ) + 1
SET #pad = REPLICATE( '0', 9 - LEN(#prefix) - CEILING(LOG(#num)/LOG(10)) );
SET #res = #prefix + #pad + CAST( #num AS varchar);
RETURN #res
END
How would I write my instead of trigger to insert the values and increment this primary key. Or should I give it up and start a lawnmowing business?
Sorry for that tmpvarchar variable SQL server was giving me strange results without it.
Whilst I agree with the naysayers, the principle of "accepting that which cannot be changed" tends to lower the overall stress level, IMHO. Try the following approach.
Disadvantages
Single-row inserts only. You won't be doing any bulk inserts to your new customer table as you'll need to execute the stored procedure each time you want to insert a row.
A certain amount of contention for the key generation table, hence a potential for blocking.
On the up side, though, this approach doesn't have any race conditions associated with it, and it isn't too egregious a hack to really and truly offend my sensibilities. So...
First, start with a key generation table. It will contain 1 row for each company, containing your company identifier and an integer counter that we'll be bumping up each time an insert is performed.
create table dbo.CustomerNumberGenerator
(
company varchar(8) not null ,
curr_value int not null default(1) ,
constraint CustomerNumberGenerator_PK primary key clustered ( company ) ,
)
Second, you'll need a stored procedure like this (in fact, you might want to integrate this logic into the stored procedure responsible for inserting the customer record. More on that in a bit). This stored procedure accepts a company identifier (e.g. 'MSFT') as its sole argument. This stored procedure does the following:
Puts the company id into canonical form (e.g. uppercase and trimmed of leading/trailing whitespace).
Inserts the row into the key generation table if it doesn't already exist (atomic operation).
In a single, atomic operation (update statement), the current value of the counter for the specified company is fetched and then incremented.
The customer number is then generated in the specified way and returned to the caller via a 1-row/1-column SELECT statement.
Here you go:
create procedure dbo.GetNewCustomerNumber
#company varchar(8)
as
set nocount on
set ansi_nulls on
set concat_null_yields_null on
set xact_abort on
declare
#customer_number varchar(32)
--
-- put the supplied key in canonical form
--
set #company = ltrim(rtrim(upper(#company)))
--
-- if the name isn't already defined in the table, define it.
--
insert dbo.CustomerNumberGenerator ( company )
select id = #company
where not exists ( select *
from dbo.CustomerNumberGenerator
where company = #company
)
--
-- now, an interlocked update to get the current value and increment the table
--
update CustomerNumberGenerator
set #customer_number = company + right( '00000000' + convert(varchar,curr_value) , 8 ) ,
curr_value = curr_value + 1
where company = #company
--
-- return the new unique value to the caller
--
select customer_number = #customer_number
return 0
go
The reason you might want to integrate this into the stored procedure that inserts a row into the customer table is that it makes globbing it all together into a single transaction; without that, your customer numbers may/will get gaps when an insert fails land gets rolled back.
As others said before me, using a primary key with calculated auto-increment values sounds like a very bad idea!
If you are allowed to and if you can live with the downsides (see at the bottom), I would suggest the following:
Use a normal numeric auto-increment key and a char(4) column which only contains the company id.
Then, when you select from the table, you use row_number on the auto-increment column and combine that with the company id so that you have an additional column with a "key" that looks like you wanted (MSFT00001, MSFT00002, ...)
Example data:
create table customers
(
Id int identity(1,1) not null,
Company char(4) not null,
CustomerName varchar(50) not null
)
insert into customers (Company, CustomerName) values ('MSFT','First MSFT customer')
insert into customers (Company, CustomerName) values ('MSFT','Second MSFT customer')
insert into customers (Company, CustomerName) values ('ABCD','First ABCD customer')
insert into customers (Company, CustomerName) values ('MSFT','Third MSFT customer')
insert into customers (Company, CustomerName) values ('ABCD','Second ABCD customer')
This will create a table that looks like this:
Id Company CustomerName
------------------------------------
1 MSFT First MSFT customer
2 MSFT Second MSFT customer
3 ABCD First ABCD customer
4 MSFT Third MSFT customer
5 ABCD Second ABCD customer
Now run the following query on it:
select
Company + right('00000' + cast(ROW_NUMBER() over (partition by Company order by Id) as varchar(5)),5) as SpecialKey,
*
from
customers
This returns the same table, but with an additional column with your "special key":
SpecialKey Id Company CustomerName
---------------------------------------------
ABCD00001 3 ABCD First ABCD customer
ABCD00002 5 ABCD Second ABCD customer
MSFT00001 1 MSFT First MSFT customer
MSFT00002 2 MSFT Second MSFT customer
MSFT00003 4 MSFT Third MSFT customer
You could create a view with this query and let everyone use that view, to make sure everyone sees the "special key" column.
However, this solution has two downsides:
You need at least SQL Server 2005 in
order for row_number to work.
The numbers in the special key will change when you delete companies from the table. So, if you don't want the numbers to change, you have to make sure that nothing is ever deleted from that table.