Get all missing values between two limits in SQL table column - sql

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.

First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null

Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya

You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.

Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null

Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

Related

Definition of a quirky update

Based on Itzik Ben-Gan's article in ITProToday
Microsoft’s implementation follows the physical data independence
principle, and therefore does not guarantee that you will get the data
back from a query in any particular order unless you add an ORDER BY
clause in the outer query. A similar violation of the principle is
when people update data and the solution’s correctness relies on the
data being updated in clustered index order (do a Web search on
“quirky update” to see what I mean).
I tried to find what a quirky update means but in vain.
I am looking for an example to understand the concept.
Here's an example of the "Quirky Update"
use tempdb
go
drop table if exists t
go
create table t(id int primary key, Amount int, RunningTotal int)
insert into t(id,Amount,RunningTotal) values (1,4,0),(2,2,0),(3,6,0)
declare #t int = 0
update t set #t = RunningTotal = #t + Amount
select * from t
outputs
id Amount RunningTotal
----------- ----------- ------------
1 4 4
2 2 6
3 6 12

Replace specific part in my data

I would like to know most efficient and secure way to replace some numbers. In my table i have two columns: Nummer and Vater. In Nummer column i store articles numbers. The one with .1 at the end is the 'main' article and rest are his combinations (sometimes main article doesn't contain combinations), all of specific makes it as concrete product with combinations. Numbers consist of 3-parts separated by 3 dots (always). Vater for all of them is always main article number as shown below:
Example 1:
Nummer | Vater
-------------------------------
003.10TT032.1 | 003.10TT032.1
003.10TT032.2L | 003.10TT032.1
003.10TT032.UY | 003.10TT032.1
Nummer column = varchar
Vater column = varchar
I want to have possibility to change first 2 parts n.n
For example i want to say and send via sql query that i want to replace to: 9.4R53 Therefore based on our example the final results should be as follows:
Nummer | Vater
----------------------
9.4R53.1 | 9.4R53.1
9.4R53.2L | 9.4R53.1
9.4R53.UY | 9.4R53.1
Example 2:
Current:
Nummer | Vater
-------------------------------
12.90D.1 | 12.90D.1
12.90D.089 | 12.90D.1
12.90D.2 | 12.90D.1
Replace to: 829.12
Result should be:
Nummer | Vater
-------------------------------
829.12.1 | 829.12.1
829.12.089 | 829.12.1
829.12.2 | 829.12.1
I made queries as follows:
Example 1 query:
update temp SET Nummer = replace(Nummer, '003.10TT032.', '9.4R53.'),
Vater = replace(Vater, '003.10TT032.1', '9.4R53.1')
WHERE Vater = '003.10TT032.1'
Example 2 query:
update temp SET Nummer = replace(Nummer, '12.90D.', '829.12.'),
Vater = replace(Vater, '12.90D.1', '829.12.1')
WHERE Vater = '12.90D.1 '
In my database i have thousends of records therefore i want to be sure this query is fine and not having anything that could make potentially wrong results. Please of your advice whether can it be like this or not.
Therefore questions:
Is this query fine according to how my articles are stored? (want to avoid wrong replacments which could makes mess into production database data)
Is there better solution?
To answer your questions: yes, your solution works, and yes, there's something to make it bullet-proof. Make it bullet proof and reversible is what I suggest to do below. You will sleep better if you know you can
A. Answer any question of angry people who ask you "what did you do to my product table"
B. Know you can reverse any change you've made to this table (without restoring a backup), including mistakes of others (like wrong instructions).
So if you really have to be 100% confident of the output, I would not run it in one go. I suggest to prepare the queries in a separate table, then run the queries in a loop in dynamic SQL.
It is a little bit cumbersome but you can do it like this: create a dedicated table with the columns you need (like batch_id, insert_date, etc.) and a column named execute_query NVARCHAR(MAX).
Then load your table by running a select distinct of the section you need to replace in your source table (using CHARINDEX to locate the second dot - make your CHARINDEX start from the CHARINDEX of the first dot+1 to get the second dot).
In other words: you prepare all the queries (like the ones in your examples) one by one and store them in a table.
If you want to be totally safe, the update queries can include a where source table_id between n and n' (which you build with a GROUP BY on the source table). This will secure that you can track which records you have updated if you have to answer questions later.
Once this is done, you run a loop which executes each line one by one.
The advantage of this approach is to keep track of your changes - you can also prepare the rollback query at the same time as you prepare the update query. Then you know you can safely revert all the changes you have ever made to your product table.
Never truncate that table, it is your audit table. If someone ask you what you did to the product catalogue you can answer any question, even 5 years from now.
This is a separate answer to show how to split the product ID into separate sections - If you have to update sections of the product ID I think it is better t to store it in a separate columns:
DECLARE #ProductRef TABLE
(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
SrcNummer VARCHAR(255), DisplayNummer VARCHAR(255), SrcVater VARCHAR(255), DisplayVater VARCHAR(255),
NummerSectionA VARCHAR(255), NummerSectionB VARCHAR(255), NummerSectionC VARCHAR(255),
VaterSectionA VARCHAR(255), VaterSectionB VARCHAR(255), VaterSectionC VARCHAR(255) )
INSERT INTO #ProductRef (SrcNummer, SrcVater ) VALUES ('003.10TT032.1','003.10TT032.1')
INSERT INTO #ProductRef (SrcNummer, SrcVater ) VALUES ('003.10TT032.2L','003.10TT032.1')
INSERT INTO #ProductRef (SrcNummer, SrcVater ) VALUES ('003.10TT032.UY','003.10TT032.1')
DECLARE #Separator CHAR(1)
SET #Separator = '.'
;WITH SeparatorPosition (ID, SrcNummer, NumFirstSeparator, NumSecondSeparator, SrcVater, VatFirstSeparator, VatSecondSeparator)
AS ( SELECT
ID,
SrcNummer,
CHARINDEX(#Separator,SrcNummer,0) AS NumFirstSeparator,
CHARINDEX(#Separator,SrcNummer, (CHARINDEX(#Separator,SrcNummer,0))+1 ) AS NumSecondSeparator,
SrcVater,
CHARINDEX(#Separator,SrcVater,0) AS VatFirstSeparator,
CHARINDEX(#Separator,SrcVater, (CHARINDEX(#Separator,SrcVater,0))+1 ) AS VatSecondSeparator
FROM #ProductRef )
UPDATE #ProductRef
SET
NummerSectionA = SUB.NummerSectionA , NummerSectionB = SUB.NummerSectionB , NummerSectionC = SUB.NummerSectionC ,
VaterSectionA = SUB.VaterSectionA , VaterSectionB = SUB.VaterSectionB , VaterSectionC = SUB.VaterSectionC
FROM #ProductRef T
JOIN
(
SELECT
t.ID,
t.SrcNummer,
SUBSTRING (t.SrcNummer,0,s.NumFirstSeparator) AS NummerSectionA,
SUBSTRING (t.SrcNummer,s.NumFirstSeparator+1,(s.NumSecondSeparator-s.NumFirstSeparator-1) ) AS NummerSectionB,
RIGHT (t.SrcNummer,(LEN(t.SrcNummer)-s.NumSecondSeparator)) AS NummerSectionC,
t.SrcVater,
SUBSTRING (t.SrcVater,0,s.NumFirstSeparator) AS VaterSectionA,
SUBSTRING (t.SrcVater,s.NumFirstSeparator+1,(s.NumSecondSeparator-s.NumFirstSeparator-1) ) AS VaterSectionB,
RIGHT (t.SrcVater,(LEN(t.SrcVater)-s.NumSecondSeparator)) AS VaterSectionC
FROM #ProductRef t
JOIN SeparatorPosition s
ON t.ID = s.ID
) SUB
ON T.ID = SUB.ID
Then you only work on the correct product ID section.

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

finding consecutive date pairs in SQL

I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.
I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.
The data looks something like this:
id user_id effective_date important_value useless_value
1 1 1/3/2007 3 0
2 1 1/4/2007 3 1
3 1 1/6/2007 NULL 1
4 1 2/1/2007 3 0
5 2 1/5/2007 12 1
6 3 1/1/1899 7 0
With this sample set, we would consider two consecutive rows duplicates if the user_id and the important_value are the same. From this sample set, we would only delete row with id=2, preserving the information from 1-3-2007, showing that the important_value changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.
My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the user_id values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, UDF_SameOrNull is a function that returns 1 if the two values passed in are the same or if they are both NULL.
WHILE (##fetch_status <> -1)
BEGIN
SELECT * FROM History INTO #history WHERE user_id = #UserId
--return entries to delete
SELECT h2.id
INTO #delete_history_ids
FROM #history h1
JOIN #history h2 ON
h1.effective_date < h2.effective_date
AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1
WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date)
DELETE h1
FROM History h1
JOIN #delete_history_ids dh ON
h1.id = dh.id
FETCH NEXT FROM UserCursor INTO #UserId
END
It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.
Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.
Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.
OK, I think I figured this one out, excellent question!
First, I made the assumption that the effective_date column will not be duplicated for a user_id. I think it can be modified to work if that is not the case - so let me know if we need to account for that.
The process basically takes the table of values and self-joins on equal user_id and important_value and prior effective_date. Then, we do 1 more self-join on user_id that effectively checks to see if the 2 joined records above are sequential by verifying that there is no effective_date record that occurs between those 2 records.
It's just a select statement for now - it should select all records that are to be deleted. So if you verify that it is returning the correct data, simply change the select * to delete tcheck.
Let me know if you have questions.
select
*
from
History tcheck
inner join History tprev
on tprev.[user_id] = tcheck.[user_id]
and tprev.important_value = tcheck.important_value
and tprev.effective_date < tcheck.effective_date
left join History checkbtwn
on tcheck.[user_id] = checkbtwn.[user_id]
and checkbtwn.effective_date < tcheck.effective_date
and checkbtwn.effective_date > tprev.effective_date
where
checkbtwn.[user_id] is null
OK guys, I did some thinking last night and I think I found the answer. I hope this helps someone else who has to match consecutive pairs in data and for some reason is also stuck in SQL Server 2000.
I was inspired by the other results that say to use ROW_NUMBER(), and I used a very similar approach, but with an identity column.
--create table with identity column
CREATE TABLE #history (
id int,
user_id int,
effective_date datetime,
important_value int,
useless_value int,
idx int IDENTITY(1,1)
)
--insert rows ordered by effective_date and now indexed in order
INSERT INTO #history
SELECT * FROM History
WHERE user_id = #user_id
ORDER BY effective_date
--get pairs where consecutive values match
SELECT *
FROM #history h1
JOIN #history h2 ON
h1.idx+1 = h2.idx
WHERE h1.important_value = h2.important_value
With this approach, I still have to iterate over the results until it returns nothing, but I can't think of any way around that and this approach is miles ahead of my last one.

Reset or Update Row Position Integer in Database Table

I am working on a stored procedure in SQL Server 2008 for resetting an integer column in a database table. This integer column stores or persists the display order of the item rows. Users are able to drag and drop items in a particular sort order and we persist that order in the database table using this "Order Rank Integer".
Display queries for items always append a "ORDER BY OrderRankInt" when retrieving data so the user sees the items in the order they previously specified.
The problem is that this integer column collects a lot of duplicate values after the table items are re-ordered a bit. Hence...
Table
--------
Name | OrderRankInt
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
After a lot of reordering by the user becomes....
Table
--------
Name | OrderRankInt
a | 1
b | 2
c | 2
d | 2
e | 2
f | 6
These duplicates are primarily because of insertions and user specified order numbers. We're not trying to prevent duplicate order ranks, but we'd like a way to 'Fix' the table on item inserts/modifies.
Is there a way I can reset the OrderRankInt column with a single UPDATE Query?
Or do I need to use a cursor? What would the syntax for that cursor look like?
Thanks,
Kervin
EDIT
Update with Remus Rusanu solution. Thanks!!
CREATE PROCEDURE EPC_FixTableOrder
#sectionId int = 0
AS
BEGIN
-- "Common Table Expression" to append a 'Row Number' to the table
WITH tempTable AS
(
SELECT OrderRankInt, ROW_NUMBER() OVER (ORDER BY OrderRankInt) AS rn
FROM dbo.[Table]
WHERE sectionId = #sectionId -- Fix for a specified section
)
UPDATE tempTable
SET OrderRankInt = rn; -- Set the Order number to the row number via CTE
END
GO
with cte as (
select OrderId, row_number() over (order by Name) as rn
from Table)
update cte
set OrderId = rn;
This doesn't account for any foreign key relationships, I hope you are taken care of those.
Fake it. Make the column nullable, set the values to NULL, alter it to be an autonumber, and then turn off autonumber and nullable.
(You could skip the nullable steps.)