I'd like to use the following statement to update a column of a single row:
UPDATE Test SET Column1 = Column1 & ~2 WHERE Id = 1
The above seems to work. Is this safe in SQL Server? I remember reading about possible deadlocks when using similar statments in a non-SQL Server DBMS (I think it was related to PostgreSQL).
Example of a table and corresponding stored procs:
CREATE TABLE Test (Id int IDENTITY(1,1) NOT NULL, Column1 int NOT NULL, CONSTRAINT PK_Test PRIMARY KEY (Id ASC))
GO
INSERT INTO Test (Column1) Values(255)
GO
-- this will always affect a single row only
UPDATE Test SET Column1 = Column1 & ~2 WHERE Id = 1
For the table structure you have shown both the UPDATE and the SELECT are standalone transactions and can use clustered index seeks to do their work without needing to read unnecessary rows and take unnecessary locks so I would not be particularly concerned about deadlocks with this procedure.
I would be more concerned about the fact that you don't have the UPDATE and SELECT inside the same transaction. So the X lock on the row will be released as soon as the update statement finishes and it will be possible for another transaction to change the column value (or even delete the whole row) before the SELECT is executed.
If you execute both statements inside the same transaction then I still wouldn't be concerned about deadlock potential as the exclusive lock is taken first (it would be a different matter if the SELECT happened before the UPDATE)
You can also address the concurrency issue by getting rid of the SELECT entirely and using the OUTPUT clause to return the post-update value to the client.
UPDATE Test SET Column1 = Column1 & ~2
OUTPUT INSERTED.Column1
WHERE Id = 1
What do you mean "is it safe"?
Your id is a unique identifier for each row. I would strongly encourage you to declare it as a primary key. But you should have an index on the column.
Without an index, you do have a potential issue with performance (and deadlocks) because SQL Server has to scan the entire table. But with the appropriate primary key declaration (or another index), then you are only updating a single row in a single table. If you have no triggers on the table, then there is not much going on that can interfere with other transactions.
I have a few 1000 jumps in my table. I've figured out the reason, rather late I would say, which is frequent server failure and restarts and executed
set identity cache=off.
Hopefully, these large jumps will not occur. Now I want to reuse these numbers in the gaps for the new entries, what is the best way to do this? is changing the seed value is possible? Please note that I can not alter any existing data.
Also, note the rate at which new entries are added is slow (less than 10 entries daily) and I can keep an eye on this database and change the seed value again manually when necessary.
Thank you very much.
You can write a script for each instance using SET IDENTITY INSERT table_name ON and SET IDENTITY INSERT table_name OFF at the start and end of your script. The full documentation is here. You can only use it on one table at a time.
Changing the seed will have no effect as the next highest value will always be used.
The following script will help with identifying gaps.
SELECT TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
Which is from the this question/answer
UPDATE
A possible strategy would be to take the database offline, use SET IDENTITY INSERT to fill the gaps/jumps with the required ID but otherwise minimum/empty data and then make live again. Then use the empty records until all are used and then revert to the previous method.
I don't think these numbers are important for users. but you can consider to make bulk operation for once to fix them. Don't try to insert in between them
setting identity insert on and of for table level changes structure of your table and it is not transaction safe.
you need to write TSQL script to alter your base table and dependent table at the same time
I have created a table with an identity column. When I insert values in that table, Identity column shows huge gap of increment in between the values. Identity value jumps from 6 to 10001. This is the output ordered by Department id:
Output Screenshot Here
This is table I have created:
Create Table STG2.Department
(
DepartmentID int GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 Cycle),
Name varchar(100),
GroupName varchar(100)
)
PRIMARY INDEX (DepartmentID);
This is how I am inserting values into the Department table:
insert into STG2.Department (Name, GroupName)
Select Distinct
Department, GroupName
from DP_STG.HR;
What am I doing wrong?
What am I doing wrong?
What you are doing wrong is worrying about gaps in the identity column. These are a natural part of using databases. The most typical reasons are due to delete or failed inserts. The only guarantee (I think) is that the numbers are increasing and not duplicated.
In your case, my guess is that Teradata is reserving a bunch of numbers for some good reasons -- for parallelism or some other efficiency (I know SQL Server does this). The gaps will cause no harm, and the order of the inserts should be pretty-well preserved.
Maintaining gapless identity columns is a huge overhead for a database, particularly a powerful, parallel database such as Teradata. In essence, it means that each insert has to complete all queries on the table, lock the table, find the maximum value, add one, and use that. The people who write databases know what a performance killer this is and have looser requirements for such columns.
As already stated the gaps are due to each AMP (the logical Processing Unit of Teradata to have MPP) has each own range of IDs. So it's not wrong to have these gaps but it's by design.
If you rely on IDs without gaps (for any reason), you have to do by your own. Either before loading in your ETL process or after/during loading and define "ID = ROW_NUMBER() + MAX(ID)" (pseudo code).
Let we have a table of payments having 35 columns with a primary key (autoinc bigint) and 3 non-clustered, non-unique indeces (each on one int column).
Among the table's columns we have two datetime fields:
payment_date datetime NOT NULL
edit_date datetime NULL
The table has about 1 200 000 rows.
Only ~1000 of rows have edit_date column = null.
9000 of rows have edit_date not null and not equal to payment_date
Others have edit_date=payment_date
When we run the following query 1:
select top 1 *
from payments
where edit_date is not null and (payment_date=edit_date or payment_date<>edit_date)
order by payment_date desc
server needs a couple of seconds to do it. But if we run query 2:
select top 1 *
from payments
where edit_date is not null
order by payment_date desc
the execution ends up with The log file for database 'tempdb' is full. Back up the transaction log for the database to free up some log space.
If we replace * with some certain column, see query 3
select top 1 payment_date
from payments
where edit_date is not null
order by payment_date desc
it also finishes in a couple of seconds.
Where is the magic?
EDIT
I've changed query 1 so that it operates over exactly the same number of rows as the 2nd query. And still it returns in a second, while query 2 fills tempdb.
ANSWER
I followed the advice to add an index, did this for both date fields - everything started working quick, as expected. Though, the question was - why in this exact situation sql server behave differently on similar queries (query 1 vs query 2); I wanted to understand the logic of the server optimization. I would agree if both queries did used tempdb similarly, but they didn't....
In the end I mark as the answer the first one, where I saw the must-be symptoms of my problem and the first, as well, thoughts on how to avoid this (i.e. indeces)
This is happening cause certain steps in an execution plan can trigger writes to tempdb in particular certain sorts and joins involving lots of data.
Since you are sorting a table with a boat load of columns, SQL decides it would be crazy to perform the sort alone in temp db without the associated data. If it did that it would need to do a gazzilion inefficient bookmark lookups on the underlying table.
Follow these rules:
Try to select only the data you need
Size tempdb appropriately, if you need to do crazy queries that sort a gazzilion rows, you better have an appropriately sized tempdb
Usually, tempdb fills up when you are low on disk space, or when you have set an unreasonably low maximum size for database growth.
Many people think that tempdb is only used for #temp tables. When in fact, you can easily fill up tempdb without ever creating a single temp table. Some other scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL
Server will be forced to do its work in tempdb;
if the sorting requires more space than you have allocated to tempdb,
one of the above errors will occur;
DBCC CheckDB('any database') will perform its work in tempdb -- on
larger databases, this can consume quite a bit of space;
DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option
set will also potentially fill up tempdb;
large resultsets involving unions, order by / group by, cartesian
joins, outer joins, cursors, temp tables, table variables, and
hashing can often require help from tempdb;
any transactions left uncommitted and not rolled back can leave
objects orphaned in tempdb;
use of an ODBC DSN with the option 'create temporary stored
procedures' set can leave objects there for the life of the
connection.
USE tempdb
GO
SELECT name
FROM tempdb..sysobjects
SELECT OBJECT_NAME(id), rowcnt
FROM tempdb..sysindexes
WHERE OBJECT_NAME(id) LIKE '#%'
ORDER BY rowcnt DESC
The higher rowcount, values will likely indicate the biggest temporary tables that are consuming space.
Short-term fix
DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')
DBCC INPUTBUFFER(<number>)
KILL <number>
Long-term prevention
-- SQL Server 7.0, should show 'trunc. log on chkpt.'
-- or 'recovery=SIMPLE' as part of status column:
EXEC sp_helpdb 'tempdb'
-- SQL Server 2000, should yield 'SIMPLE':
SELECT DATABASEPROPERTYEX('tempdb', 'recovery')
ALTER DATABASE tempdb SET RECOVERY SIMPLE
Reference : https://web.archive.org/web/20080509095429/http://sqlserver2000.databases.aspfaq.com:80/why-is-tempdb-full-and-how-can-i-prevent-this-from-happening.html
Other references : http://social.msdn.microsoft.com/Forums/is/transactsql/thread/af493428-2062-4445-88e4-07ac65fedb76
I have a table with about 10 fields to store gps info for customers. Over time as we have added more customers that table has grown to about 14 million rows. As the gps data comes in a service constantly inserts a row into the table. 90% of the data is not revelent i.e. the customer does not care where the vehicle was 3 months ago, but the most recent data is used to generate tracking reports. My goal is to write a sql to perform a purge of the data that is older than a month.
Here is my problem I can NOT use TRUNCATE TABLE as I would lose everything?
Yesterday I wrote a delete table statement with a where clause. When I ran it on a test system it locked up my table and the simulation gps inserts were intermittently failing. Also my transaction log grew to over 6GB as it attempted to log each delete.
My first thought was to delete the data a little at a time starting with the oldest first but I was wondering if there was a better way.
My 2 cents:
If you are using SQL 2005 and above, you can consider to partition your table based on the date field, so the table doesn't get locked when deleting old records.
Maybe, if you are in position of making dba decisions, you can temporarily change your log model to Simple, so it won't grow up too fast, it will still be growing, but the log won't be too detailed.
Try this
WHILE EXISTS ( SELECT * FROM table WHERE (condition for deleting))
BEGIN
SET ROWCOUNT 1000
DELETE Table WHERE (condition for deleting)
SET ROWCOUNT 0
ENd
This will delete the rows in groups of 1000
Better is to create a temporary table and insert only the data you want to keep. Then truncate your original table and copy back the backup.
Oracle syntax (SQL Server is similar)
create table keep as select * from source where data_is_good = 1;
truncate table source;
insert into source select * from keep;
You'll need to disable foreign keys, if there are any on the source table.
In Oracle, index names must be unique in the entire schema, not just per-table. In SQL server, you can further optimize this by just renaming "keep" to "source", as you can easily create indexes of the same name on both tables
If you're using SQL Server 2005 or 2008, sliding window partitioning is the perfect solution for this - instant archiving or purging without any perceptible locking. Have a look here for further information.
Welcome to Data Warehousing. You need to to split your data into two parts.
The actual application, with current data only.
The history.
You need to do write a little "ETL" job to move data from current to history and delete the history that was moved.
You need to run this periodically. Daily - weekly - monthly quarterly -- doesn't matter technically. What matters is what use the history has and who uses it.
Can you copy recent data to a new table, truncate the table, then copy it back?
Of course, then you're going to need to worry about doing that again in 6 months or a year.
I would do a manual delete by day/month (whatever is largest unit you can get away with.) Once you do that first one, then write a stored proc to kick off every day that deletes the oldest data you don't need.
DELETE FROM TABLENAME
WHERE datediff(day,tableDateTime,getdate() > 90
Personally, I hate doing stuff to production datasets where one missed key results in some really bad things happening.
I would probably do it in batches as you have already come up with. Another option would be to insert the important data into another table, truncate the GPS table, then reinsert the important data. You would have a small window where you would be missing the recent historical data. How small that window is would depend on how much data you needed to reinsert. Also, you would need to be careful if the table uses autoincrementing numbers or other defaults so that you use the original values.
Once you have the table cleaned up, a regular cleaning job should be scheduled. You might also want to look into partitioning depending on your RDBMS.
I assume you can't down the production system (or queue up the GPS results for insertion after the purge is complete).
I'd go with your inclination of deleting a fraction of it at a time (perhaps 10%) depending on the performance you find on your test system.
Is your table indexed? That might help, but the indexing process my have simmilar effects on the system as doing the one great purge.
Keep in mind that most databases lock the neighboring records in an index during an transaction, so keeping your operations short will be helpful. I'm assuming that your insertions are failing on lock wait timeouts, so delete your data in small, bursty transactions. I'd suggest a single-threaded Perl script that loops through in the oldest 1,000 chunk increments. I hope your primary key (and hopefully clustered index incase they somehow ended up being two different things) can be correlated to time as that would be the best thing to delete by.
PseudoSQL:
Select max(primId) < 3_months_ago
Delete from table where primId < maxPrimId limit 1000
Now, here's the really fun part: All these deletions MAY make your indexes a mess and require that they be rebuilt to keep the machine from getting slow. In that case, you'll either have to swap in an up-to-date slave, or just suffer some downtime. Make sure you test for this possible case on your test machine.
If you are using oracle, i would set up a partition by date on your tables and the indexes. Then you delete the data by dropping the partition... the data will magically go away with the partition.
This is an easy step - and doesn't clog up your redo logs etc.
There's a basic intro to all this here
Does the delete statement use any of the indexes on the table? Often times a huge performance improvement can be obtained by either modifying the statement to use an existing index or to add an index on the table that helps improve the performance of the query that the delete statement does.
Also, as other mentioned, the deletes should be done in multiple chunks instead of one huge statement. This prevents the table from getting locked too long, and having other processes time out waiting for the delete to finish.
Performance is pretty fast when dropping a table- even a very large one. So here is what I would do. Script out your table complete with indexes from Management Studio. Edit the script and run it to create a copy of your table. Call it table2. Do a select insert to park the data you want to retain into the new table2. Rename the old table, say tableOld. Rename table2 with the original name. Wait. If no one screams at you drop table2.
There is some risk.
1) Check if there are triggers or constraints defined on the original table. They may not get included in the script generated by management studio.
2) if original table has identity fields you may have to turn on identity_insert before inserting into the new table.
I came up with the following T-SQL script which gets an arbitrary amount of recent data.
IF EXISTS(SELECT name FROM sys.tables WHERE name = 'tmp_xxx_tblGPSVehicleInfoLog')
BEGIN
PRINT 'Dropping temp table tmp_xxx_tblGPSVehicleInfoLog'
DROP TABLE tmp_xxx_tblGPSVehicleInfoLog
END
GO
PRINT 'Creating temp table tmp_xxx_tblGPSVehicleInfoLog'
CREATE TABLE [dbo].[tmp_xxx_tblGPSVehicleInfoLog](
[GPSVehicleInfoLogId] [uniqueidentifier] NOT NULL,
[GPSVehicleInfoId] [uniqueidentifier] NULL,
[Longitude] [float] NULL,
[Latitude] [float] NULL,
[GroundSpeed] [float] NULL,
[Altitude] [float] NULL,
[Heading] [float] NULL,
[GPSDeviceTimeStamp] [datetime] NULL,
[Milliseconds] [float] NULL,
[DistanceNext] [float] NULL,
[UpdateDate] [datetime] NULL,
[Stopped] [nvarchar](1) NULL,
[StopTime] [datetime] NULL,
[StartTime] [datetime] NULL,
[TimeStopped] [nvarchar](100) NULL
) ON [PRIMARY]
GO
PRINT 'Inserting data from tblGPSVehicleInfoLog to tmp_xxx_tblGPSVehicleInfoLog'
SELECT * INTO tmp_xxx_tblGPSVehicleInfoLog
FROM tblGPSVehicleInfoLog
WHERE tblGPSVehicleInfoLog.UpdateDate between '03/30/2009 23:59:59' and '05/19/2009 00:00:00'
GO
PRINT 'Truncating table tblGPSVehicleInfoLog'
TRUNCATE TABLE tblGPSVehicleInfoLog
GO
PRINT 'Inserting data from tmp_xxx_tblGPSVehicleInfoLog to tblGPSVehicleInfoLog'
INSERT INTO tblGPSVehicleInfoLog
SELECT * FROM tmp_xxx_tblGPSVehicleInfoLog
GO
To keep the transaction log from growing out of control, modify it in the following way:
DECLARE #i INT
SET #i = 1
SET ROWCOUNT 10000
WHILE #i > 0
BEGIN
BEGIN TRAN
DELETE TOP 1000 FROM dbo.SuperBigTable
WHERE RowDate < '2009-01-01'
COMMIT
SELECT #i = ##ROWCOUNT
END
SET ROWCOUNT 0
And here is a version using the preferred TOP syntax for SQL 2005 and 2008:
DECLARE #i INT
SET #i = 1
WHILE #i > 0
BEGIN
BEGIN TRAN
DELETE TOP 1000 FROM dbo.SuperBigTable
WHERE RowDate < '2009-01-01'
COMMIT
SELECT #i = ##ROWCOUNT
END
I'm sharing my solution. I did index the date field. While the procedure was running, I tested getting record counts, inserts, and updates. They were able to complete while the procedure was running. In an Azure managed instance, running on the absolute lowest configuration (General Purpose, 4 cores) I was able to purge 1 million rows in a minute (about 55 seconds).
CREATE PROCEDURE [dbo].[PurgeRecords] (
#iPurgeDays INT = 2,
#iDeleteRows INT = 1000,
#bDebug BIT = 1 --defaults to debug mode
)
AS
SET NOCOUNT ON
DECLARE #iRecCount INT = 0
DECLARE #iCycles INT = 0
DECLARE #iRowCount INT = 1
DECLARE #dtPurgeDate DATETIME = GETDATE() - #iPurgeDays
SELECT #iRecCount = COUNT(1) FROM YOURTABLE WHERE [Created] <= #dtPurgeDate
SELECT #iCycles = #iRecCount / #iDeleteRows
SET #iCycles = #iCycles + 1 --add one my cycle to get the remainder
--purge the rows in groups
WHILE #iRowCount <= #iCycles
BEGIN
BEGIN TRY
IF #bDebug = 0
BEGIN
--delete a group of records
DELETE TOP (#iDeleteRows) FROM YOURTABLE WHERE [Created] <= #dtPurgeDate
END
ELSE
BEGIN
--display the delete that would have taken place
PRINT 'DELETE TOP (' + CONVERT(VARCHAR(10), #iDeleteRows) + ') FROM YOURTABLE WHERE [Created] <= ''' + CONVERT(VARCHAR(25), #dtPurgeDate) + ''''
END
SET #iRowCount = #iRowCount + 1
END TRY
BEGIN CATCH
--if there are any issues with the delete, raise error and back out
RAISERROR('Error purging YOURTABLE Records', 16, 1)
RETURN
END CATCH
END
GO