The ID column index jumps 10000 times.
For example:
From index :
5 goes to 10006
and the continues 10007 , 10008 , 10009
and then goes to 20003 , 20004 ....
How I could fix the ID values and put them in order again like before?
Also I have find something about a Reseed function, but I do not know what it is and how to use it?
I'm assuming you're using an identity column:
ID INT NOT NULL IDENTITY(1,1)
There's no guarantee that this will remain in sequence. It was annoying when it first became more obvious (it didn't appear to happen in older versions of SQL Server but apparently could happen) but was also always by design. The skipping became very apparent when 2012 (?) was released. You're supposed to use a SEQUENCE now I believe if maintaining a steady sequence is required - e.g. invoice numbers:
https://dba.stackexchange.com/questions/62151/what-could-cause-an-auto-increment-primary-key-to-skip-numbers
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver15
It may also appear to skip if you perform an INSERT and it fails but this will only skip 1 typically. This has always happened and is by design - you need to reseed you identity to overcome this. Something like:
DBCC CHECKIDENT ("dbo.MyTable", RESEED, 10)
Will make the next identity number 11 provided the other skipping doesn't also occur.
EDIT:
In relation to re-aligning your existing entries I'm no DB Expert but I did do this the other day on a table using a fairly rudimentary approach - but it's only a small table - there's probably a better way to do it:
BEGIN TRAN
--CREATE TEMP TABLE
DECLARE #Tooltip TABLE
(
[TooltipId] INT NOT NULL,
[TooltipKey] NVARCHAR(100) NOT NULL,
[Name] NVARCHAR(255) NOT NULL
)
--INSERT EXISTING INTO TEMP TABLE
INSERT INTO #Tooltip (TooltipKey, Name )
SELECT TooltipKey, Name
FROM dbo.Tooltip
ORDER BY TooltipId
--CLEAR ACTUAL TABLE
TRUNCATE TABLE dbo.Tooltip
--RESET IDENTITY TO 1
DBCC CHECKIDENT ("dbo.Tooltip", RESEED, 1)
--REINSERT FROM TEMP TABLE INTO ACTUAL TABLE
INSERT INTO dbo.Tooltip (TooltipKey, Name )
SELECT TooltipKey, Name
FROM #Tooltip
ORDER BY TooltipId
--TEST OUTPUT
SELECT * FROM dbo.Tooltip
--DO THIS FOR TESTING
ROLLBACK TRAN
--DO THIS WHEN YOU'RE CERTAIN YOU WANT TO PERFORM THE ACTION
--COMMIT TRAN
Bearing in mind that that if you have foreign keys or other references the truncate won't work and you'll have to do something more complex.Particularly if you have foreign keys referencing your existing incorrect IDs
This is not a problem. This is a performance feature of SQL Server.
SQL Server is designed to handle many concurrent transactions -- think dozens or hundreds of inserts per second. It can do this on systems with multiple processors.
In such an environment, "adding just 1" to the maximum can have a lot of overhead -- all the different processors have to agree on what the maximum is. This involves complex locking or sequencing of the transactions -- which slows things down.
To prevent performance bottlenecks, SQL Server will sometimes pre-allocate identity values. This can result in gaps if the numbers are not used.
If you don't like this feature, you can work around it by using a sequence and a trigger to assign the value. Just be warned that alternative approaches have performance implications.
Have you been running large deletes?
Delete doesn't reset the identity, so if you had rows 1-10000, then deleted all of them, the identity would still continue from 10001 when you added a new row.
Truncate does reset identity, but always DELETES ALL ROWS without logging.
You could use the reseed function to reset identity also, but wouldn't be helpful for this case since you'd slowly increment back into ids used by existing data.
I have a table (let's say ErrorLog)
CREATE TABLE [dbo].[ErrorLog]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Created] [datetime] NOT NULL,
[Message] [varchar](max) NOT NULL,
CONSTRAINT [PK_ErrorLog]
PRIMARY KEY CLUSTERED ([Id] ASC)
)
I want to remove all records that are older that 3 months.
I have a non-clustered index on the Created column (ascending).
I am not sure which one of these is better (seem to take same time).
Query #1:
DELETE FROM ErrorLog
WHERE Created <= DATEADD(month, - 3, GETDATE())
Query #2:
DECLARE #id INT
SELECT #id = max(l.Id)
FROM ErrorLog l
WHERE l.Created <= DATEADD(month, - 3, GETDATE())
DELETE FROM ErrorLog
WHERE Id <= #id
Once you know the maximum clustered key you want to delete then it is definitely faster to use this key. The question is whether it worth selecting this key first using the date. The right decision depends on size of the table and what portion of data you need to delete. The smaller the table is and the smaller is also the number of records for deletion the more efficient should be the first option (Query #1). However, if the number of records to delete is large enough, then the non-clustered index on Date column will be ignored and SQL Server will start scanning the base table. In such a case the second option (Query #2) might be more optimal. And there are usually also other factors to consider.
I have solved similar issue recently (deleting about 600 million (2/3) old records from 1.5TB table) and I have decided for the second approach in the end. There were several reasons for it, but the main were as follows.
The table had to be available for new inserts while the old records were being deleted. So, I could not delete the records in one monstrous delete statement but rather I had to use several smaller batched in order to avoid lock escalation to the table level. Smaller batches kept also the transaction log size in reasonable limits. Furthermore, I had only about one hour long maintenance window each day and it was not possible to delete all required records within one day.
With above mentioned in mind the fastest solution for me was to select the maximum ID I needed to delete according to the Date column and then just start deleting from the beginning of the clustered index as far as to the selected Id one batch after the other (DELETE TOP(#BatchSize) FROM ErrorLog WITH(PAGLOCK) WHERE ID <= #myMaxId). I used the PAGLOCK hint in order to increase the batch size without escalating the lock to the table level. I deleted several batches each day in the end.
I have a SQL Server table with records (raw emails) that needs to be processed (build the email and send it) in a given order by an external process (mailer). Its not very resource intensive but can take a while with all the parsing and SMTP overhead etc.
To speed things up I can easily run multiple instance of the mailer process over multiple servers but worry that if two were to start at almost the same time they might still overlap a bit and send the same records.
Simplified for the question my table look something like this with each record having the data for the email.
queueItem
======================
queueItemID PK
...data...
processed bit
priority int
queuedStart datetime
rowLockName varchar
rowLockDate datetime
Batch 1 (Server 1)
starts at 12:00PM
lock/reserve the first 5000 rows (1-5000)
select the newly reserved rows
begin work
Batch 2 (Server 2)
starts at 12:15PM
lock/reserve the next 5000 rows (5001-10000)
select the newly reserved rows
begin work
To lock the rows I have been using the following:
declare #lockName varchar(36)
set #lockName = newid()
declare #batchsize int
set #batchsize = 5000
update queueItem
set rowLockName = #lockName,
rowLockDate = getdate()
where queueitemID in (
select top(#batchsize) queueitemID
from queueItem
where processed = 0
and rowLockName is null
and queuedStart <= getdate()
order by priority, queueitemID
)
If I'm not mistaken the query would start executing the SELECT subquery first and then lock the rows in preparation of the update, this is fast but not instantaneous.
My concern is that if I start two batches at near the same time (faster than the subquery runs) Batch 1's UPDATE might not be completed and Batch 2's SELECT would see the records as still available and attempt/succeed in overwriting Batch 1 (sort of race condition?)
I have ran some test but so far haven't had the issue with them overlapping, is it a valid concern that will come to haunt me at the worst of time?
Perhaps there are better way to write this query worth looking into as I am by no mean a T-SQL guru.
We have a a staging table that looks like this. This will store all our data in 15-min intervals:
CREATE TABLE [dbo].[15MinDataRawStaging](
[RawId] [int] IDENTITY(1,1) NOT NULL,
[CityId] [varchar](15) NOT NULL,
[Date] [int] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
[CounterValue] [int] NOT NULL,
[CounterName] [varchar](40) NOT NULL
)
It currently stores 20 different Counters, which means that we insert about 400K rows every hour of every day to this table.
Right now, I'm deleting data from before 03/2016, but even with the first 8 days of March data, there's over 58M rows.
Once all the hourly data is stored in [15MinDataRawStaging], we start copying data from this table to other tables, which are then used for the reports.
So, for example, we have a Kpi called Downtime, which is composed of counters VeryLongCounterName1 and VeryLongCounterName2. Once the hourly data is stored in [15MinDataRawStaging], we run a stored procedure that inserts these counters to its own table, called [DownTime]. It looks something like this:
insert into [DownTime] (CityKey, Datekey, HourKey, MinuteKey, DownTime, DowntimeType)
select CityId, [date], [hour], [minute], CounterValue, CounterName
From [15MinDataRawStaging] p
where
[date] = #Date
and [Hour] = #Hour
and CounterName in ('VeryLongCounterName1', 'VeryLongCounterName2')
and CounterValue > 0
This runs automatically every hour (through a C# console app), and I've noticed that with this query I'm getting timeout issues. I just ran it, and it indeed takes about 35 seconds to complete.
So my questions are:
Is there a way to optimize the structure of the staging table so these types of INSERTs to other tables don't take that long?
Or is it possible to optimize the INSERT query? The reason I have the staging table is because I need to store the data, even if it's for the current month. No matter what's done, the staging table will have tons of rows.
Do you guys have any other suggestions?
Thanks.
It sounds like you want to partition 15MinDataRawStaging into daily or hourly chunks. The documentation explains how to do this (better than a Stack Overflow answer).
Partitioning basically stores the table in multiple different files (at least conceptually). Certain actions can be very efficient. For instance, dropping a partition is much, much faster than dropping the individual records. In addition, fetching data from a single partition should be fast -- and in your case, the most recent partition will be in memory, making everything faster.
Depending on how the data is used, indexes might also be appropriate. But for this volume of data and the way you are using it, partitions seem like the key idea.
Assuming that the staging table has only one purpose, viz for the INSERT into DownTime, you can trade off a small amount of storage and insert performance (into the staging table) to improve the final ETL performance by adding a clustered index matching the query used in extraction:
CREATE UNIQUE CLUSTERED INDEX MyIndex
ON [15MinDataRawStaging]([Date], [Hour], [Minute], RawId);
I've added RawId in order to allow uniqueness (otherwise a 4 byte uniquefier would have been added in any event).
You'll also want to do some trial and error by testing whether adding [CounterName] and / or [CounterValue] to the index (but before RawId) will improve the overall process throughput (i.e. both Staging insertion and Extraction into the final DownTime table)
I have a table with about 10 fields to store gps info for customers. Over time as we have added more customers that table has grown to about 14 million rows. As the gps data comes in a service constantly inserts a row into the table. 90% of the data is not revelent i.e. the customer does not care where the vehicle was 3 months ago, but the most recent data is used to generate tracking reports. My goal is to write a sql to perform a purge of the data that is older than a month.
Here is my problem I can NOT use TRUNCATE TABLE as I would lose everything?
Yesterday I wrote a delete table statement with a where clause. When I ran it on a test system it locked up my table and the simulation gps inserts were intermittently failing. Also my transaction log grew to over 6GB as it attempted to log each delete.
My first thought was to delete the data a little at a time starting with the oldest first but I was wondering if there was a better way.
My 2 cents:
If you are using SQL 2005 and above, you can consider to partition your table based on the date field, so the table doesn't get locked when deleting old records.
Maybe, if you are in position of making dba decisions, you can temporarily change your log model to Simple, so it won't grow up too fast, it will still be growing, but the log won't be too detailed.
Try this
WHILE EXISTS ( SELECT * FROM table WHERE (condition for deleting))
BEGIN
SET ROWCOUNT 1000
DELETE Table WHERE (condition for deleting)
SET ROWCOUNT 0
ENd
This will delete the rows in groups of 1000
Better is to create a temporary table and insert only the data you want to keep. Then truncate your original table and copy back the backup.
Oracle syntax (SQL Server is similar)
create table keep as select * from source where data_is_good = 1;
truncate table source;
insert into source select * from keep;
You'll need to disable foreign keys, if there are any on the source table.
In Oracle, index names must be unique in the entire schema, not just per-table. In SQL server, you can further optimize this by just renaming "keep" to "source", as you can easily create indexes of the same name on both tables
If you're using SQL Server 2005 or 2008, sliding window partitioning is the perfect solution for this - instant archiving or purging without any perceptible locking. Have a look here for further information.
Welcome to Data Warehousing. You need to to split your data into two parts.
The actual application, with current data only.
The history.
You need to do write a little "ETL" job to move data from current to history and delete the history that was moved.
You need to run this periodically. Daily - weekly - monthly quarterly -- doesn't matter technically. What matters is what use the history has and who uses it.
Can you copy recent data to a new table, truncate the table, then copy it back?
Of course, then you're going to need to worry about doing that again in 6 months or a year.
I would do a manual delete by day/month (whatever is largest unit you can get away with.) Once you do that first one, then write a stored proc to kick off every day that deletes the oldest data you don't need.
DELETE FROM TABLENAME
WHERE datediff(day,tableDateTime,getdate() > 90
Personally, I hate doing stuff to production datasets where one missed key results in some really bad things happening.
I would probably do it in batches as you have already come up with. Another option would be to insert the important data into another table, truncate the GPS table, then reinsert the important data. You would have a small window where you would be missing the recent historical data. How small that window is would depend on how much data you needed to reinsert. Also, you would need to be careful if the table uses autoincrementing numbers or other defaults so that you use the original values.
Once you have the table cleaned up, a regular cleaning job should be scheduled. You might also want to look into partitioning depending on your RDBMS.
I assume you can't down the production system (or queue up the GPS results for insertion after the purge is complete).
I'd go with your inclination of deleting a fraction of it at a time (perhaps 10%) depending on the performance you find on your test system.
Is your table indexed? That might help, but the indexing process my have simmilar effects on the system as doing the one great purge.
Keep in mind that most databases lock the neighboring records in an index during an transaction, so keeping your operations short will be helpful. I'm assuming that your insertions are failing on lock wait timeouts, so delete your data in small, bursty transactions. I'd suggest a single-threaded Perl script that loops through in the oldest 1,000 chunk increments. I hope your primary key (and hopefully clustered index incase they somehow ended up being two different things) can be correlated to time as that would be the best thing to delete by.
PseudoSQL:
Select max(primId) < 3_months_ago
Delete from table where primId < maxPrimId limit 1000
Now, here's the really fun part: All these deletions MAY make your indexes a mess and require that they be rebuilt to keep the machine from getting slow. In that case, you'll either have to swap in an up-to-date slave, or just suffer some downtime. Make sure you test for this possible case on your test machine.
If you are using oracle, i would set up a partition by date on your tables and the indexes. Then you delete the data by dropping the partition... the data will magically go away with the partition.
This is an easy step - and doesn't clog up your redo logs etc.
There's a basic intro to all this here
Does the delete statement use any of the indexes on the table? Often times a huge performance improvement can be obtained by either modifying the statement to use an existing index or to add an index on the table that helps improve the performance of the query that the delete statement does.
Also, as other mentioned, the deletes should be done in multiple chunks instead of one huge statement. This prevents the table from getting locked too long, and having other processes time out waiting for the delete to finish.
Performance is pretty fast when dropping a table- even a very large one. So here is what I would do. Script out your table complete with indexes from Management Studio. Edit the script and run it to create a copy of your table. Call it table2. Do a select insert to park the data you want to retain into the new table2. Rename the old table, say tableOld. Rename table2 with the original name. Wait. If no one screams at you drop table2.
There is some risk.
1) Check if there are triggers or constraints defined on the original table. They may not get included in the script generated by management studio.
2) if original table has identity fields you may have to turn on identity_insert before inserting into the new table.
I came up with the following T-SQL script which gets an arbitrary amount of recent data.
IF EXISTS(SELECT name FROM sys.tables WHERE name = 'tmp_xxx_tblGPSVehicleInfoLog')
BEGIN
PRINT 'Dropping temp table tmp_xxx_tblGPSVehicleInfoLog'
DROP TABLE tmp_xxx_tblGPSVehicleInfoLog
END
GO
PRINT 'Creating temp table tmp_xxx_tblGPSVehicleInfoLog'
CREATE TABLE [dbo].[tmp_xxx_tblGPSVehicleInfoLog](
[GPSVehicleInfoLogId] [uniqueidentifier] NOT NULL,
[GPSVehicleInfoId] [uniqueidentifier] NULL,
[Longitude] [float] NULL,
[Latitude] [float] NULL,
[GroundSpeed] [float] NULL,
[Altitude] [float] NULL,
[Heading] [float] NULL,
[GPSDeviceTimeStamp] [datetime] NULL,
[Milliseconds] [float] NULL,
[DistanceNext] [float] NULL,
[UpdateDate] [datetime] NULL,
[Stopped] [nvarchar](1) NULL,
[StopTime] [datetime] NULL,
[StartTime] [datetime] NULL,
[TimeStopped] [nvarchar](100) NULL
) ON [PRIMARY]
GO
PRINT 'Inserting data from tblGPSVehicleInfoLog to tmp_xxx_tblGPSVehicleInfoLog'
SELECT * INTO tmp_xxx_tblGPSVehicleInfoLog
FROM tblGPSVehicleInfoLog
WHERE tblGPSVehicleInfoLog.UpdateDate between '03/30/2009 23:59:59' and '05/19/2009 00:00:00'
GO
PRINT 'Truncating table tblGPSVehicleInfoLog'
TRUNCATE TABLE tblGPSVehicleInfoLog
GO
PRINT 'Inserting data from tmp_xxx_tblGPSVehicleInfoLog to tblGPSVehicleInfoLog'
INSERT INTO tblGPSVehicleInfoLog
SELECT * FROM tmp_xxx_tblGPSVehicleInfoLog
GO
To keep the transaction log from growing out of control, modify it in the following way:
DECLARE #i INT
SET #i = 1
SET ROWCOUNT 10000
WHILE #i > 0
BEGIN
BEGIN TRAN
DELETE TOP 1000 FROM dbo.SuperBigTable
WHERE RowDate < '2009-01-01'
COMMIT
SELECT #i = ##ROWCOUNT
END
SET ROWCOUNT 0
And here is a version using the preferred TOP syntax for SQL 2005 and 2008:
DECLARE #i INT
SET #i = 1
WHILE #i > 0
BEGIN
BEGIN TRAN
DELETE TOP 1000 FROM dbo.SuperBigTable
WHERE RowDate < '2009-01-01'
COMMIT
SELECT #i = ##ROWCOUNT
END
I'm sharing my solution. I did index the date field. While the procedure was running, I tested getting record counts, inserts, and updates. They were able to complete while the procedure was running. In an Azure managed instance, running on the absolute lowest configuration (General Purpose, 4 cores) I was able to purge 1 million rows in a minute (about 55 seconds).
CREATE PROCEDURE [dbo].[PurgeRecords] (
#iPurgeDays INT = 2,
#iDeleteRows INT = 1000,
#bDebug BIT = 1 --defaults to debug mode
)
AS
SET NOCOUNT ON
DECLARE #iRecCount INT = 0
DECLARE #iCycles INT = 0
DECLARE #iRowCount INT = 1
DECLARE #dtPurgeDate DATETIME = GETDATE() - #iPurgeDays
SELECT #iRecCount = COUNT(1) FROM YOURTABLE WHERE [Created] <= #dtPurgeDate
SELECT #iCycles = #iRecCount / #iDeleteRows
SET #iCycles = #iCycles + 1 --add one my cycle to get the remainder
--purge the rows in groups
WHILE #iRowCount <= #iCycles
BEGIN
BEGIN TRY
IF #bDebug = 0
BEGIN
--delete a group of records
DELETE TOP (#iDeleteRows) FROM YOURTABLE WHERE [Created] <= #dtPurgeDate
END
ELSE
BEGIN
--display the delete that would have taken place
PRINT 'DELETE TOP (' + CONVERT(VARCHAR(10), #iDeleteRows) + ') FROM YOURTABLE WHERE [Created] <= ''' + CONVERT(VARCHAR(25), #dtPurgeDate) + ''''
END
SET #iRowCount = #iRowCount + 1
END TRY
BEGIN CATCH
--if there are any issues with the delete, raise error and back out
RAISERROR('Error purging YOURTABLE Records', 16, 1)
RETURN
END CATCH
END
GO