Insert missing values from table - sql

I have a table with a PK that grows fairly quickly, but since rows are fairly consistently deleted, it becomes a very sparse table quickly as such:
ID VALUE
----------------
1 'Test'
5 'Test 2'
24 'Test 3'
67 'Test 4'
Is there a way that I can automatically insert the next value in the missing IDs so that I don't grow that ID extremely large? For example, I'd like to insert 'Test 5' with ID 2.

I wouldn't do that.
As already explained by others in the comments, you gain nothing by re-filling gaps in the numbers.
Plus, you might even unintentionally mess up your data if you refer to these IDs anywhere else:
Let's say that there once was a row with ID 2 and you deleted it.
Then you insert a complete new row and re-use ID 2.
Now if you have any data anywhere that references ID 2, it suddenly links to the new value instead of the old one.
(Note to nit-pickers: Yes, this should not happen if referential integrity is set up properly. But this is not the case everywhere, so who knows...)

I'm not suggesting doing what you're trying to do, but if you want to do it, this is how. I am only answering the question, not solving the problem.
In your proc, you'd what to lock your table while doing this so that you don't get one the sneaks in. By using something link this:
EXEC #result = sp_getapplock #Resource = #LockResource,
#LockMode = 'Exclusive'
AND
EXEC sp_releaseapplock #Resource = #LockResource
TABLE
DECLARE #table TABLE ( id INT, val VARCHAR(20) )
DATA
INSERT INTO #table
(
id,
val
)
SELECT 1,
'Test'
UNION ALL
SELECT 2,
'Test'
UNION ALL
SELECT 5,
'Test 2'
UNION ALL
SELECT 24,
'Test 3'
UNION ALL
SELECT 67,
'Test 4'
Queries
INSERT INTO #table
SELECT TOP 1
id + 1,
'TEST'
FROM #table t1
WHERE NOT EXISTS ( SELECT TOP 1
1
FROM #table
WHERE id = t1.id + 1 )
ORDER BY id
INSERT INTO #table
SELECT TOP 1
id + 1,
'TEST'
FROM #table t1
WHERE NOT EXISTS ( SELECT TOP 1
1
FROM #table
WHERE id = t1.id + 1 )
ORDER BY id
SELECT *
FROM #table
RESULT
id val
1 Test
2 Test
5 Test 2
24 Test 3
67 Test 4
3 TEST
4 TEST

I deleted my answer about identity since they are not involved. It would be interesting to see if you are using this as a clustered index key, since to fill in gaps would violate the rule of thumb of strictly increasing values.
To just fill in gaps is relatively simple with a self-join and since you have a primary key, this query should run quickly to find the first gap (but of course, how are you handling simultaneous inserts and locks?):
SELECT lhs.ID + 1 AS firstgap
FROM tablename AS lhs
LEFT JOIN tablename AS rhs
ON rhs.ID = lhs.ID + 1
WHERE rhs.ID IS NULL
And inserting batches of records requires each insert to be done separately, while IDENTITY can handle that for you...

As said before: don't worry about the unused ID's.
It is however good practise to optimize the table when a lot of deletes happen.
In MySQL you can do this with:
optimize table tablename

Related

SQL UPDATE query - value depends on another rows

There is a SQL Server database temporary table, let it be TableA. And the table structure is following:
CREATE TABLE #TableA
(
ID BIGINT IDENTITY (1, 1) PRIMARY KEY,
MapVal1 BIGINT NOT NULL,
MapVal2 BIGINT NOT NULL,
IsActual BIT NULL
)
The table is already filled with some mappings of MapVal1 to MapVal2. The issue is that not all the mappings should be flagged as Actual. For this reason should be used IsActual column. Currently IsActual is set to NULL for every row. The task is to create the query for updating IsActual column value. UPDATE query should follow next conditions:
If MapVal1 is unique and MapVal2 is unique (one-to-one mapping) - then this mapping should be flagged as Actual, so IsActual = 1;
If MapVal1 is not unique - then Actual should be the mapping of current MapVal1 to smallest MapVal2, and this MapVal2 must be not mapped to any other MapVal1 that is smaller than current MapVal1;
If MapVal2 is not unique - then Actual should be the mapping of current MapVal2 to smallest MapVal1, and this MapVal1 must be not mapped to any other MapVal2 that is smaller than current MapVal2;
All rows that are not fulfill any of 1), 2) or 3) conditions - should be flagged as inactual, so IsActual = 0.
I believe there is relation between Condition 2) and Condition 3). For every row they both are fulfilled or both are not.
To make it clear, here is an example of result I want to obtain:
Result should be that every MapVal1 is mapped to just one MapVal2 and vice varsa every MapVal2 is mapped to just one MapVal1.
I have created sql-query to resolve my task:
IF OBJECT_ID('tempdb..#TableA') IS NOT NULL
BEGIN
DROP TABLE #TableA
END
CREATE TABLE #TableA
(
ID BIGINT IDENTITY (1, 1) PRIMARY KEY,
MapVal1 BIGINT NOT NULL,
MapVal2 BIGINT NOT NULL,
IsActual BIT NULL
)
-- insert input data
INSERT INTO #TableA (MapVal1, MapVal2)
SELECT 1, 1
UNION ALL SELECT 1, 3
UNION ALL SELECT 1, 4
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 2, 4
UNION ALL SELECT 3, 3
UNION ALL SELECT 3, 4
UNION ALL SELECT 4, 3
UNION ALL SELECT 4, 4
UNION ALL SELECT 6, 7
UNION ALL SELECT 7, 8
UNION ALL SELECT 7, 9
UNION ALL SELECT 8, 8
UNION ALL SELECT 8, 9
UNION ALL SELECT 9, 8
UNION ALL SELECT 9, 9
CREATE NONCLUSTERED INDEX IX_Mapping_MapVal1 ON #TableA (MapVal1);
CREATE NONCLUSTERED INDEX IX_Mapping_MapVal2 ON #TableA (MapVal2);
-- UPDATE of #TableA is starting here
-- every one-to-one mapping should be actual
UPDATE m1 SET
m1.IsActual = 1
FROM #TableA m1
LEFT JOIN #TableA m2
ON m1.MapVal1 = m2.MapVal1 AND m1.ID <> m2.ID
LEFT JOIN #TableA m3
ON m1.MapVal2 = m3.MapVal2 AND m1.ID <> m3.ID
WHERE m2.ID IS NULL AND m3.ID IS NULL
-- update for every one-to-many or many-to-many mapping is more complicated
-- would be great to change this part of query to make it witout any LOOP
DECLARE #MapVal1 BIGINT
DECLARE #MapVal2 BIGINT
DECLARE #i BIGINT
DECLARE #iMax BIGINT
DECLARE #LoopCount INT = 0
SELECT
#iMax = MAX (m.ID)
FROM #TableA m
SELECT
#i = MIN (m.ID)
FROM #TableA m
WHERE m.IsActual IS NULL
WHILE #i <= #iMax
BEGIN
SELECT #LoopCount = #LoopCount + 1
SELECT
#MapVal1 = m.MapVal1,
#MapVal2 = m.MapVal2
FROM #TableA m
WHERE m.ID = #i
IF EXISTS
(
SELECT *
FROM #TableA m
WHERE
m.ID < #i
AND
(m.MapVal1 = #MapVal1
OR m.MapVal2 = #MapVal2)
AND m.IsActual IS NULL
)
BEGIN
UPDATE m SET
m.IsActual = 0
FROM #TableA m
WHERE m.ID = #i
END
SELECT #i = MIN (m.ID)
FROM #TableA m
WHERE
m.ID > #i
AND m.IsActual IS NULL
END
UPDATE m SET
m.IsActual = 1
FROM #TableA m
WHERE m.IsActual IS NULL
SELECT * FROM #TableA
but as it was expected performance of the query with LOOP is very bad, specially when input table keep millions of rows. I spent a lot of time trying to produce query without LOOP to get reduce execution time of my query but unsuccessfully.
Could anybody advice me how to improve performance of my query. It would be great to get query without LOOP.
Using a loop does not imply you need to update the table one record at a time.
It may help if each individual UPDATE statement updates multiple records.
Consider all possible combinations of MapVal1 and MapVal2 as a matrix.
Every time you flag a cell as 'actual', you can flag an entire row and an entire column as 'not actual'.
The simplest way to do this, is by following these steps.
Of all mappings with IsActual = NULL, take the first one (smallest MapVal1, together with the smallest MapVal2 it is mapped to).
Flag this mapping as actual (IsActual = 1).
Flag all other mappings with the same MapVal1 as non-actual (IsActual = 0).
Flag all other mappings with the same MapVal2 as non-actual (IsActual = 0).
Repeat from step 1 until no more records with IsActual = NULL exist.
Here's an implementation:
SELECT 0 -- force ##ROWCOUNT initially 1
WHILE ##ROWCOUNT > 0
WITH MakeActual AS (
SELECT TOP 1 MapVal1, MapVal2
FROM #TableA
WHERE IsActual IS NULL
ORDER BY MapVal1, MapVal2
)
UPDATE a
SET IsActual = CASE WHEN a.MapVal1 = m.MapVal1 AND a.MapVal2 = m.MapVal2 THEN 1 ELSE 0 END
FROM #TableA a
INNER JOIN MakeActual m ON a.MapVal1 = m.MapVal1 OR a.MapVal2 = m.MapVal2
The number of loop iterations equals the number of 'actual' mappings.
The actual performance gain depends a lot on the data.
If the majority of mappings is one-to-one (i.e. hardly any non-actual mappings), then my algorithm will make little difference.
Therefore, it may be wise to keep the initial UPDATE statement from your own code sample (the one with the comment "every one-to-one mapping should be actual").
It may also help to play around with the indexes.
This one should help to further optimize the clause ORDER BY MapVal1, MapVal2:
CREATE NONCLUSTERED INDEX IX_MapVals ON #TableA (MapVal1, MapVal2)

SQL CTE Syntax to DELETE / INSERT rows

What's the CTE syntax to delete from a table, then insert to the same table and return the values of the insert?
Operating on 2 hours of sleep and something doesn't look right (besides the fact that this won't execute):
WITH delete_rows AS (
DELETE FROM <some_table> WHERE id = <id_value>
RETURNING *
)
SELECT * FROM delete_rows
UNION
(
INSERT INTO <some_table> ( id, text_field )
VALUES ( <id_value>, '<text_field_value>' )
RETURNING *
)
The expected behavior is to first clear all the records for an ID, then insert records for the same ID (intentionally not an upsert) and return those inserted records (not the deletions).
Your question update made clear that you cannot do this in a single statement.
Packed into CTEs of the same statement, both operations (INSERT and DELETE) would see the same snapshot of the table and execute virtually at the same time. I.e., the INSERT would still see all rows that you thought to be deleted already. The manual:
All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" one another's effects on the target tables.
You can wrap them as two independent statements into the same transaction - which doesn't seem strictly necessary either, but it would allow the whole operation to succeed / fail atomically:
BEGIN;
DELETE FROM <some_table> WHERE id = <id_value>;
INSERT INTO <some_table> (id, text_field)
VALUES ( <id_value>, '<text_field_value>')
RETURNING *;
COMMIT;
Now, the INSERT can see the results of the DELETE.
CREATE TABLE test_table (value TEXT UNIQUE);
INSERT INTO test_table SELECT 'value 1';
INSERT INTO test_table SELECT 'value 2';
WITH delete_row AS (DELETE FROM test_table WHERE value='value 2' RETURNING 0)
INSERT INTO test_table
SELECT DISTINCT 'value 2'
FROM (SELECT 'dummy') dummy
LEFT OUTER JOIN delete_row ON TRUE
RETURNING *;
The query above handles the situations when DELETE deletes 0/1/some rows.
Elaborating on skif1979's "DelSert" CTE method, the "Logged DelSert:"
-- setups
DROP TABLE IF EXISTS _zx_t1 ;
CREATE TEMP TABLE
IF NOT EXISTS
_zx_t1
( id bigint
, fld2 bigint
, UNIQUE (id)
);
-- unique records
INSERT INTO _zx_t1 SELECT 1, 99;
INSERT INTO _zx_t1 SELECT 2, 98;
WITH
_cte_del_row AS
( DELETE
FROM _zx_t1
WHERE id = 2
RETURNING id as _b4_id, fld2 as _b4_fld2 -- returns complete deleted row
)
, _cte_delsert AS
( INSERT
INTO _zx_t1
SELECT DISTINCT
_cte_del_row._b4_id
, _cte_del_row._b4_fld2 + 1
from (SELECT null::integer AS _zunk) _zunk -- skif1979's trick here
LEFT OUTER JOIN _cte_del_row -- clever LOJ magic
ON TRUE -- LOJ cartesian product
RETURNING id as _aft_id , fld2 as _aft_fld2 -- return newly "delserted" rows
)
SELECT * -- returns before & after snapshots from CTE's
FROM
_cte_del_row
, _cte_delsert ;
RESULT:
_b4_id | _b4_fld2 | _aft_id | _aft_fld2
--------+----------+---------+-----------
2 | 209 | 2 | 210
AFAICT these all occur linearly w/in a unit of work, akin to a journaled or logged update.
Workable for
Child records
OR Schema w/ no FK
OR FK w/ cascading deletes
Not workable for
Parent records w/ FK & no cascading deletes
A related (& IMO better) answer, akin to the "Logged DelSert" is this, a logged "SelUp" :
-- setups
DROP TABLE IF EXISTS _zx_t1 ;
CREATE TEMP TABLE
IF NOT EXISTS
_zx_t1
( id bigint
, fld2 bigint
, UNIQUE (id)
);
-- unique records
INSERT INTO _zx_t1 SELECT 1, 99;
INSERT INTO _zx_t1 SELECT 2, 98;
WITH
_cte_sel_row AS
( SELECT -- start unit of work with read
id as _b4_id -- fields need to be aliased
,fld2 as _b4_fld2 -- to prevent ambiguous column errors
FROM _zx_t1
WHERE id = 2
FOR UPDATE
)
, _cte_sel_up_ret AS -- we're in the same UOW
( UPDATE _zx_t1 -- actual table
SET fld2 = _b4_fld2 + 1 -- some actual work
FROM _cte_sel_row
WHERE id = _b4_id
AND fld2 < _b4_fld2 + 1 -- gratuitous but illustrates the point
RETURNING id as _aft_id, fld2 as _aft_fld2
)
SELECT
_cte_sel_row._b4_id
,_cte_sel_row._b4_fld2 -- before
,_cte_sel_up_ret._aft_id
,_cte_sel_up_ret._aft_fld2 -- after
FROM _cte_sel_up_ret
INNER JOIN _cte_sel_row
ON TRUE AND _cte_sel_row._b4_id = _cte_sel_up_ret._aft_id
;
RESULT:
_b4_id | _b4_fld2 | _aft_id | _aft_fld2
--------+----------+---------+-----------
2 | 209 | 2 | 210
See also:
https://rob.conery.io/2018/08/13/transactional-data-operations-in-postgresql-using-common-table-expressions/

Optimizing deletes from table using UDT (tsql)

SQL Server.
I have a proc that takes a user defined table (readonly) and is about 7500 records large. Using that UDT, I run about 15 different delete statements:
delete from table1
where id in (select id from #table)
delete from table2
where id in (select id from #table)
delete from table3
where id in (select id from #table)
delete from table4
where id in (select id from #table)
....
This operation, as expected, does take a while (about 7-10 minutes). These columns are indexed. However, I suspect there is a more efficient way to do this. I know deletes are traditionally slower, but I wasn't expecting this slow.
Is there a better way to do this?
You can test/try "exists" instead of "IN". I really don't like IN clauses for anything besides casual lookup-queries. (Some people will argue about IN until they are blue in the face)
Delete deleteAlias
from table1 deleteAlias
where exists ( select null from #table vart where vart.Id = deleteAlias.Id )
You can populate a #temp table instead of a #variableTable. Again, over the years, this has been trial and test it out. #variable vs #temp , most of the time, doesn't make that big of a different. But in about 4 situations I had, going to a #temp table made a big impact.
You can also experiment with putting an index on the #temp table (the "joining" column, 'Id' in this example )
IF OBJECT_ID('tempdb..#Holder') IS NOT NULL
begin
drop table #Holder
end
CREATE TABLE #Holder
(ID INT )
/* simulate your insert */
INSERT INTO #HOLDER (ID)
select 1 union all select 2 union all select 3 union all select 4
/* CREATE CLUSTERED INDEX IDX_TempHolder_ID ON #Holder (ID) */
/* optional, create an index on the "join" column of the #temp table */
CREATE INDEX IDX_TempHolder_ID ON #Holder (ID)
Delete deleteAlias
from table1 deleteAlias
where exists ( select null from #Holder holder where holder.Id = deleteAlias.Id )
IF OBJECT_ID('tempdb..#Holder') IS NOT NULL
begin
drop table #Holder
end
IMHO, there is not clear cut answer, sometimes you gotta experiment a little.
And "how your tempdb is setup' is a huge fork in the road that can affect #temp table performance. But try the suggestions above first.
And one last experiment
Delete deleteAlias
from table1 deleteAlias
where exists ( select 1 from #table vart where vart.Id = deleteAlias.Id )
change the null to "1".... once I saw this affect something. Weird, right?

Use Check Constraints to determine if a bit column and be set to true based on another column value, possible?

To elabortate,
is it possible to enforce a rule where ONLY one record entry can have a column named 'IsPrimaryUser' set to true, whereas all others grouped by another column are set to false. The condition for deciding which entry will have a true 'IsPrimaryUser' field will be the CompanyId column.
I am only interested in whether it can be done using check constraints. Obviously, there is a SQL approach to something like this.
Example:
User Table
int UserId | int CompanyId | bit IsPrimaryUser
Data:
UserId | CompanyId | IsPrimaryUser
1 1 1
2 1 0
3 1 0
4 1 0
5 2 1
6 2 0
7 2 0
8 2 0
Check Constraints only work on a single row, but you can use scalar UDFs within the constraint.
You can breach the single-row check by using UDFs that check other rows in the table. Although unlike a trigger where you can access the DELETED virtual table and process individually, SQL Server seems to hold the records in a sort of transaction, and perform a CHECK on EACH row after the change, then finally accepting or aborting the CRUD in batch.
See this test case
Create table
create table usertable (UserId int,
CompanyId int, IsPrimaryUser int)
Populate
insert usertable select
1, 1, 1 union all select
2, 1, 0 union all select
3, 1, 0 union all select
4, 1, 0 union all select
5, 2, 1 union all select
6, 2, 0 union all select
7, 2, 0 union all select
8, 2, 0
Scalar function helper
create function dbo.anyprimaryuser(#userid int, #company int) returns bit as
begin
return
case when exists (
select * from usertable
where companyid=#company and isprimaryuser=1 and userid<>#userid)
then 1 else 0 end
end
The CHECK constraint
alter table usertable add constraint usertable_ck1
check (isprimaryuser=0 or dbo.anyprimaryuser(userid,companyid)=0)
Tests
insert usertable select 9,2,1 -- fail
insert usertable select 9,2,0 -- ok
insert usertable select 19,4,1 union all select 20,4,0 -- ok
insert usertable select 19,3,1 union all select 20,3,0 union all select 21,3,1
-- not ok, accepting the multi-row insert will breach the constraint
update usertable set IsPrimaryUser=1-IsPrimaryUser where CompanyId=4
-- ok! sets one and unsets the other in one go
(note) I updated the answer after Martin's comment below

Add empty row to query results if no results found

I'm writing stored procs that are being called by a legacy system. One of the constraints of the legacy system is that there must be at least one row in the single result set returned from the stored proc. The standard is to return a zero in the first column (yes, I know!).
The obvious way to achieve this is create a temp table, put the results into it, test for any rows in the temp table and either return the results from the temp table or the single empty result.
Another way might be to do an EXISTS against the same where clause that's in the main query before the main query is executed.
Neither of these are very satisfying. Can anyone think of a better way. I was thinking down the lines of a UNION kind of like this (I'm aware this doesn't work):
--create table #test
--(
-- id int identity,
-- category varchar(10)
--)
--go
--insert #test values ('A')
--insert #test values ('B')
--insert #test values ('C')
declare #category varchar(10)
set #category = 'D'
select
id, category
from #test
where category = #category
union
select
0, ''
from #test
where ##rowcount = 0
Very few options I'm afraid.
You always have to touch the table twice, whether COUNT, EXISTS before, EXISTs in UNION, TOP clause etc
select
id, category
from mytable
where category = #category
union all --edit, of course it's quicker
select
0, ''
where NOT EXISTS (SELECT * FROM mytable where category = #category)
An EXISTS solution is better then COUNT because it will stop when it finds a row. COUNT will traverse all rows to actually count them
It's an old question, but i had the same problem.
Solution is really simple WITHOUT double select:
select top(1) WITH TIES * FROM (
select
id, category, 1 as orderdummy
from #test
where category = #category
union select 0, '', 2) ORDER BY orderdummy
by the "WITH TIES" you get ALL rows (all have a 1 as "orderdummy", so all are ties), or if there is no result, you get your defaultrow.
You can use a full outer join. Something to the effect of ...
declare #category varchar(10)
set #category = 'D'
select #test.id, ISNULL(#test.category, #category) as category from (
select
id, category
from #test
where category = #category
)
FULL OUTER JOIN (Select #category as CategoryHelper ) as EmptyHelper on 1=1
Currently performance testing this scenario myself so not sure on what kind of impact this would have but it will give you a blank row with Category populated.
This is #swe's answer, just reformatted:
CREATE FUNCTION [mail].[f_GetRecipients]
(
#MailContentCode VARCHAR(50)
)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 1 WITH TIES -- Returns either all Priority 1 rows or, if none exist, all Priority 2 rows
[To],
CC,
BCC
FROM (
SELECT
[To],
CC,
BCC,
1 AS Priority
FROM mail.Recipients
WHERE 1 = 1
AND IsActive = 1
AND MailContentCode = #MailContentCode
UNION ALL
SELECT
*,
2 AS Priority
FROM (VALUES
(N'system#company.com', NULL, NULL),
(N'author#company.com', NULL, NULL)
) defaults([To], CC, BCC)
) emails
ORDER BY Priority
)
I guess you could try:
Declare #count int
set #count = 0
Begin
Select #count = Count([Column])
From //Your query
if(#Count = 0)
select 0
else //run your query
The downside is that you're effectively running your query twice, the up side is that you're skiping the temp table.
To avoid duplicating the selecting query, how about a temp table to store the query result first? And based on the temp table, return default row if the temp table is empty or return the temp when it has result?