Find a column where the identity column is breaking - sql

I have a table, e.g.
cust_ord_key
1
2
3
4
5
7
9
How do I write a query to find out if the numbers are in sequence and not breaking anywhere?

In SQL Server:
SELECT SeqID AS MissingSeqID
FROM (SELECT ROW_NUMBER() OVER (ORDER BY column_id) SeqID from sys.columns) LkUp
LEFT JOIN dbo.TestData t ON t.ID = LkUp.SeqID

You may do something like:
DECLARE #RESULT int;
SET #RESULT = 0;
DECLARE #FIRST_ID int;
DECLARE #LAST_ID int;
DECLARE #THIS_VALUE int;
DECLARE #NEXT_VALUE int;
SELECT #FIRST_ID = min(ID), #LAST_ID = max(ID) from table_name;
WHILE(#FIRST_ID <= #LAST_ID)
BEGIN
SELECT #THIS_VALUE = your_field_with_keys from table_name where ID = #FIRST_ID;
SELECT #NEXT_VALUE = your_field_with_keys from table_name where ID = (#FIRST_ID + 1);
if #THIS_VALUE > #NEXT_VALUE
SET #RESULT = #FIRST_ID;
--break your query here or do anything else
SET #FIRST_ID = #FIRST_ID + 1;
END
What does this query do? We declare #RESULT variable for taking an ID of key, where you key is breaking. #FIRST_ID and #LAST_ID are the minimal and maximal IDs from your table, we will use them later. #THIS_VALUE and #NEXT_VALUE are two variables for two keys to be compared.
Then we execute loop over our IDs. Then setting up #THIS_VALUE and #NEXT_VALUE with corresponding keys (this and the next). If #THIS_VALUE more than #NEXT_VALUE, it means that the key is breaking here (if previous key is more than next key), and we take the ID of element, where key is broken. And there you may stop your query or do some required logic.

This is not perfect, but definitely does the job, and is universal across all DB engines.
SELECT t1.id FROM myTable t1
LEFT JOIN myTable t2 ON t1.id+1 = t2.id
WHERE t2.id IS NULL
When the identity doesn't break anywhere, this will only return the last entry. You can compensate for that with a procedural language, by first getting the MAX(ID) adding that to the WHERE clause like this:
WHERE t2.id IS NULL AND t1.id<>5643
where 5643 is the max id (either a variable introduced in the query string, or can be a variable in the procedural SQL language of whatever DB engine you're using). The point is that it's the maximum value of the identity on that table.
OR, you can just dismiss the last row from the result set if you're doing it in PHP or whatever.

Related

SQL: Delete Rows from Dynamic list of tables where ID is null

I'm a SQL novice, and usually figure things out via Google and SO, but I can't wrap my head around the SQL required for this.
My question is similar to Delete sql rows where IDs do not have a match from another table, but in my case I have a middle table that I have to query, so here's the scenario:
We have this INSTANCES table that basically lists all the occurrences of files sent to the database, but have to join with CROSS_REF so our reporting application knows which table to query for the report, and we just have orphaned INSTANCES rows I want to clean out. Each DETAIL table contains different fields from the other ones.
I want to delete all single records from INSTANCES if there are no records for that Instance ID in any DETAIL table. The DETAIL table got regularly cleaned of old files, but the Instance record wasn't cleaned up, so we have a lot of INSTANCE records that don't have any associated DETAIL data. The thing is, I have to select the Table Name from CROSS_REF to know which DETAIL_X table to look up the Instance ID.
In the below example then, since DETAIL_1 doesn't have a record with Instance ID = 1001, I want to delete the 1001 record from INSTANCES.
INSTANCES
Instance ID
Detail ID
1000
123
1001
123
1002
234
CROSS_REF
Detail ID
Table Name
123
DETAIL_1
124
DETAIL_2
125
DETAIL_3
DETAIL_1
Instance ID
1000
1000
2999
Storing table names or column names in a database is almost always a sign for a bad database design. You may want to change this and thus get rid of this problem.
However, when knowing the possible table names, the task is not too difficult.
delete from instances i
where not exists
(
select null
from cross_ref cr
left join detail_1 d1 on d1.instance_id = i.instance_id and cr.table_name = 'DETAIL_1'
left join detail_2 d2 on d2.instance_id = i.instance_id and cr.table_name = 'DETAIL_2'
left join detail_3 d3 on d3.instance_id = i.instance_id and cr.table_name = 'DETAIL_3'
where cr.detail_id = i.detail_id
and
(
d1.instance_id is not null or
d2.instance_id is not null or
d3.instance_id is not null
)
);
(You can replace is not null by = i.instance_id, if you find that more readable. In that case you could even remove these criteria from the ON clauses.)
Much thanks to #DougCoats, this is what I ended up with.
So here's what I ended up with (#Doug, if you want to update your answer, I'll mark yours correct).
DECLARE #Count INT, #Sql VARCHAR(MAX), #Max INT;
SET #Count = (SELECT MIN(DetailID) FROM CROSS_REF)
SET #Max = (SELECT MAX(DetailID) FROM CROSS_REF)
WHILE #Count <= #Max
BEGIN
IF (select count(*) from CROSS_REF where file_id = #count) <> 0
BEGIN
SET #sql ='DELETE i
FROM Instances i
WHERE NOT EXISTS
(
SELECT InstanceID
FROM '+(SELECT TableName FROM Cross_Ref WHERE DetailID=#Count)+' d
WHERE d.InstanceId=i.InstanceID
AND i.detailID ='+ cast(#Count as varchar) +'
)
AND i.detailID ='+ cast(#Count as varchar)
EXEC(#sql);
SET #Count=#Count+1
END
END
this answer assumes you have sequential data in the CROSS_REF table. If you do not, you'll need to alter this to account it (as it will bomb due to missing object reference).
However, this should give you an idea. It also could probably be written to do a more set based approach, but my answer is to demonstrate dynamic sql use. Be careful when using dynamic SQL though.
DECLARE #Count INT, #Sql VARCHAR(MAX), #Max INT;
SET #Count = (SELECT MIN(DetailID) FROM CROSS_REF)
SET #Max = (SELECT MAX(DetailID) FROM CROSS_REF)
WHILE #Count <= #Max
BEGIN
IF (select count(*) from CROSS_REF where file_id = #count) <> 0
BEGIN
SET #sql ='DELETE i
FROM Instances i
WHERE NOT EXISTS
(
SELECT InstanceID
FROM '+(SELECT TableName FROM Cross_Ref WHERE DetailID=#Count)+' d
WHERE d.InstanceId=i.InstanceID
AND i.detailID ='+ cast(#Count as varchar) +'
)
AND i.detailID ='+ cast(#Count as varchar)
EXEC(#sql);
SET #Count=#Count+1
END
END

SQL Server run SELECT for each in list

I won't be surprised if SQL just doesn't work this way at all, but:
If we run two SELECT statements in a query, we get a split "Results" pane. I'm wondering if I can add variables to a list, and then have the number of result pane splits match the length of that list.
If I were to mix languages:
id_list = [26275, 54374, 84567]
for i in id_list:
SELECT * FROM table WHERE id = i
I'm just trying to easily compare results of a query while keeping distinct groups, with a changing number of variables. Since loops never seem to be the answer in SQL, I'd be just as happy inserting something like a blank line or horizontal rule, etc. Not sure if that's possible either though...
There is no concept of "lists" (as a separate data structure) in T-SQL. Does this do what you want?
SELECT *
FROM table
WHERE id IN (26275, 54374, 84567);
declare #i int = 0;
declare #Id int;
declare #Ids table (Id int);
insert #Ids select Id from (values (26275), (54374), (84567)) t(Id);
-- OR: insert #Ids select * from string_split('26275, 54374, 84567', ',');
declare #Count int = (select count(*) from #Ids);
while #i < #Count
begin
select #Id = Id, #i = #i + 1
from #Ids order by Id
offset #i rows fetch next 1 rows only;
select * from dbo.MyTable where Id = #Id;
end
You can use UNION ALL:
SELECT * FROM table WHERE id = 26275
UNION ALL
SELECT * FROM table WHERE id = 54374
UNION ALL
SELECT * FROM table WHERE id = 84567

Repeat query if no results came up

Could someone please advise on how to repeat the query if it returned no results. I am trying to generate a random person out of the DB using RAND, but only if that number was not used previously (that info is stored in the column "allready_drawn").
At this point when the query comes over the number that was drawn before, because of the second condition "is null" it does not display a result.
I would need for query to re-run once again until it comes up with a number.
DECLARE #min INTEGER;
DECLARE #max INTEGER;
set #min = (select top 1 id from [dbo].[persons] where sector = 8 order by id ASC);
set #max = (select top 1 id from [dbo].[persons] where sector = 8 order by id DESC);
select
ordial,
name_surname
from [dbo].[persons]
where id = ROUND(((#max - #min) * RAND() + #min), 0) and allready_drawn is NULL
The results (two possible outcomes):
Any suggestion is appreciated and I would like to thank everyone in advance.
Just try this to remove the "id" filter so you only have to run it once
select TOP 1
ordial,
name_surname
from [dbo].[persons]
where allready_drawn is NULL
ORDER BY NEWID()
#gbn that's a correct solution, but it's possible it's too expensive. For very large tables with dense keys, randomly picking a key value between the min and max and re-picking until you find a match is also fair, and cheaper than sorting the whole table.
Also there's a bug in the original post, as the min and max rows will be selected only half as often as the others, as each maps to a smaller interval. To fix generate a random number from #min to #max + 1, and truncate, rather than round. That way you map the interval [N,N+1) to N, ensuring a fair chance for each N.
For this selection method, here's how to repeat until you find a match.
--drop table persons
go
create table persons(id int, ordial int, name_surname varchar(2000), sector int, allready_drawn bit)
insert into persons(id,ordial,name_surname,sector, allready_drawn)
values (1,1,'foo',8,null),(2,2,'foo2',8,null),(100,100,'foo100',8,null)
go
declare #min int = (select top 1 id from [dbo].[persons] where sector = 8 order by id ASC);
declare #max int = 1+ (select top 1 id from [dbo].[persons] where sector = 8 order by id DESC);
set nocount on
declare #results table(ordial int, name_surname varchar(2000))
declare #i int = 0
declare #selected bit = 0
while #selected = 0
begin
set #i += 1
insert into #results(ordial,name_surname)
select
ordial,
name_surname
from [dbo].[persons]
where id = ROUND(((#max - #min) * RAND() + #min), 0, 1) and allready_drawn is NULL
if ##ROWCOUNT > 0
begin
select *, #i tries from #results
set #selected = 1
end
end

Issues with SQL Max function: "Incorrect syntax new the keyword 'SELECT' "

I'm trying to write a stored procedure to return the maximum value of a column + 1 but for some reason it doesn't want to work.
DECLARE #ID int;
SET #ID = SELECT MAX(ID) + 1 FROM tbl;
I can't for the life of me see what is wrong.
It gives me the error of:
incorrect syntax new the keyword 'SELECT'
No need for SET. Select value directly:
DECLARE #ID int;
SELECT #ID = MAX(ID) + 1 FROM tbl;
Use parentheses ( ... ):
DECLARE #ID int;
SET #ID = (SELECT MAX(ID) + 1 FROM tbl);
or SELECT as suggested by Giorgi. SET is the ANSI standard way of assigning values to variables, SELECT is not. Apart from that using SELECT to assign values to variables is fine, it allows even multiple assignments with one SELECT.
But in general your query seems to be a race condition. Use an IDENTITY column if you want to autoincrement a value. Auto increment primary key in SQL Server Management Studio 2012
You need to consider a scenario when there is no value in the table and MAX returns NULL.
DECLARE #ID int;
SELECT #ID = ISNULL(MAX(ID) , 0) + 1 FROM tbl;
Other adding 1 to null will always yield null.
DECLARE #ID int;
SET #ID = (SELECT MAX(ID) + 1 FROM tbl);
parentheses operator ()
for more information
https://msdn.microsoft.com/en-us/library/ms190276.aspx

Efficient SQL Server stored procedure

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)
According to the last running I made it is working good but it is expected to last for about 200 days:
SET NOCOUNT ON
-- define the last ID handled
DECLARE #LastID integer
SET #LastID = 0
declare #tempDate datetime
set #tempDate = dateadd(dd,-20,getdate())
-- define the ID to be handled now
DECLARE #IDToHandle integer
DECLARE #iCounter integer
DECLARE #watch1 nvarchar(50)
DECLARE #watch2 nvarchar(50)
set #iCounter = 0
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
-- as long as we have s......
WHILE #IDToHandle IS NOT NULL
BEGIN
IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = #IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =#IDToHandle )=0)
BEGIN
INSERT INTO SECONDERY_TABLE
SELECT col1,col2,col3.....
FROM MAIN_TABLE WHERE id_col = #IDToHandle
EXEC [dbo].[DeleteByID] #ID = #IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
set #iCounter = #iCounter +1
END
IF (#iCounter % 1000 = 0)
begin
set #watch1 = 'iCounter - ' + CAST(#iCounter AS VARCHAR)
set #watch2 = 'IDToHandle - '+ CAST(#IDToHandle AS VARCHAR)
raiserror ( #watch1, 10,1) with nowait
raiserror (#watch2, 10,1) with nowait
end
-- set the last handled to the one we just handled
SET #LastID = #IDToHandle
SET #IDToHandle = NULL
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
END
Any ideas or directions to improve this procedure run-time will be welcomed
Yes, try this:
Declare #Ids Table (id int Primary Key not Null)
Insert #Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
And someDateCol < #tempDate -- If there are times in these datetime fields,
-- then you may need to modify this condition.
And some_other_int_col In (1745, 1548, 4785)
And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
Where some_int_col = m.id_col)
And Not Exists (Select * From A_70k_rows_table
Where some_int_col = m.id_col)
Select id from #Ids -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified #Ids is correct
-- Once you have verified that above #Ids is correctly populated,
-- then delete or comment out the select and return lines above so insert runs.
Begin Transaction
Delete OT -- eliminate row-by-row call to second stored proc
From OtherTable ot
Join MAIN_TABLE m On m.id_col = ot.FKCol
Join #Ids i On i.Id = m.id_col
Insert SECONDERY_TABLE(col1, col2, etc.)
Select col1,col2,col3.....
FROM MAIN_TABLE m Join #Ids i On i.Id = m.id_col
Delete m -- eliminate row-by-row call to second stored proc
FROM MAIN_TABLE m
Join #Ids i On i.Id = m.id_col
Commit Transaction
Explaanation.
You had numerous filtering conditions that were not SARGable, i.e., they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.