Virtual Machine SQL Log growing when adding fields

Virtual Machine SQL Log growing when adding fields - sql

I realize everyone wants to just look the other way on this question. I appreciate it if you continue to read on. Of course the log grows when adding a field to a large table. Let me just explain to my best ability:
We have a database upgrade utility that we deploy to our customers. In that utility we manipulate the database with changes that are specific to our version.
Our testing department is seeing varying results locally vs Virtual and Virtual vs Virtual machine. Some VMs do not have much log growth while others grow by 30gb. The database is set to SIMPLE. The transaction log shouldn't be used, "technically". I understand that the log is used as a cache till a disc is free enough to accept the change requested. I know there is not much to be done on the SQL side. We are stuck with some sort of shrink to handle the change after the upgrade is complete.
I am curious why Physical and VM would act differently and what to look for in a VM environment to see if this is going to be problematic. Do I look at something on the disc, MSINFO32, CPU? I have looked to make sure there is no compression on the VM. I also did profiles and looked at fn_dblog to see the indexes growing on the specific table. I just can't figure out why some grow exponentially while others do not grow. Also if you know of any DB_Owner permission level shrink style commands i would appreciate it. Currently we are testing a checkpoint since ShrinkDB will not be available to us due to permission level.
--table has usually has between 10 and 20 million records. It is a table that has 10 fields strictly typed
IF col_length('[dbo].[foo]','Field1') IS NULL
BEGIN
ALTER TABLE [dbo].[foo] ADD [Field1] smalldatetime NOT NULL CONSTRAINT [df_Field1] DEFAULT (GETDATE())
END
--This has growth and can get a bit out of control on VM. Physical machines it does not affect as much.
--Try 2 hoping Getdate was the issue
IF col_length('[dbo].[foo]','Field1') IS NULL
BEGIN
ALTER TABLE [dbo].[foo] ADD [Field1] smalldatetime NULL
END
DECLARE #TheDate smalldatetime = GETDATE()
UPDATE [dbo].[foo] SET [Field1] = #TheDate
--This was even more problematic to the log and took considerably longer

Doing a CHECKPOINT after each column change would be my first try for SIMPLE recovery model, honestly. SQL doesn't let us have much control of how the transaction log behaves other than recovery mode, backups, growth size, and checkpoints.
Check how many VLF's exist and their sizes with sys.dm_db_log_info or DBCC LOGINFO.
Check the recovery model - maybe it's either not SIMPLE or not always SIMPLE on some machines?
Check indexes and index fragmentation; that difference might matter, and it'll change after each column. Make sure you have a clustered index.
Check total table size.

Related

Auto-increment value does NOT gradually increasing [duplicate]

In one of my tables Fee in column "ReceiptNo" in SQL Server 2012 database identity increment suddenly started jumping to 100s instead of 1 depending on the following two things.
if it is 1205446 it is jumps to 1206306, if it is 1206321, it jumps to 1207306 and if it is 1207314, it jumps to 1208306. What I want to make you note is that the last three digits remain constant i.e 306 whenever the jumping occurs as shown in the following picture.
this problem occurs when I restart my computer

You are encountering this behaviour due to a performance improvement since SQL Server 2012.
It now by default uses a cache size of 1,000 when allocating IDENTITY values for an int column and restarting the service can "lose" unused values (The cache size is 10,000 for bigint/numeric).
This is mentioned in the documentation
SQL Server might cache identity values for performance reasons and
some of the assigned values can be lost during a database failure or
server restart. This can result in gaps in the identity value upon
insert. If gaps are not acceptable then the application should use its
own mechanism to generate key values. Using a sequence generator with
the NOCACHE option can limit the gaps to transactions that are never
committed.
From the data you have shown it looks like this happened after the data entry for 22 December then when it restarted SQL Server reserved the values 1206306 - 1207305. After data entry for 24 - 25 December was done another restart and SQL Server reserved the next range 1207306 - 1208305 visible in the entries for the 28th.
Unless you are restarting the service with unusual frequency any "lost" values are unlikely to make any significant dent in the range of values allowed by the datatype so the best policy is not to worry about it.
If this is for some reason a real issue for you some possible workarounds are...
You can use a SEQUENCE instead of an identity column and define a smaller cache size for example and use NEXT VALUE FOR in a column default.
Or apply trace flag 272 which makes the IDENTITY allocation logged as in versions up to 2008 R2. This applies globally to all databases.
Or, for recent versions, execute ALTER DATABASE SCOPED CONFIGURATION SET IDENTITY_CACHE = OFF to disable the identity caching for a specific database.
You should be aware none of these workarounds assure no gaps. This has never been guaranteed by IDENTITY as it would only be possible by serializing inserts to the table. If you need a gapless column you will need to use a different solution than either IDENTITY or SEQUENCE

This problems occurs after restarting the SQL Server.
The solution is:
Run SQL Server Configuration Manager.
Select SQL Server Services.
Right-click SQL Server and select Properties.
In the opening window under Startup Parameters, type -T272 and click Add, then press Apply button and restart.

From SQL Server 2017+ you could use ALTER DATABASE SCOPED CONFIGURATION:
IDENTITY_CACHE = { ON | OFF }
Enables or disables identity cache at the database level. The default
is ON. Identity caching is used to improve INSERT performance on
tables with Identity columns. To avoid gaps in the values of the
Identity column in cases where the server restarts unexpectedly or
fails over to a secondary server, disable the IDENTITY_CACHE option.
This option is similar to the existing SQL Server Trace Flag 272,
except that it can be set at the database level rather than only at
the server level.
(...)
G. Set IDENTITY_CACHE
This example disables the identity cache.
ALTER DATABASE SCOPED CONFIGURATION SET IDENTITY_CACHE=OFF ;

I know my answer might be late to the party. But i have solved in another way by adding a start up stored procedure in SQL Server 2012.
Create a following stored procedure in master DB.
USE [master]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[ResetTableNameIdentityAfterRestart]
AS
BEGIN
begin TRAN
declare #id int = 0
SELECT #id = MAX(id) FROM [DatabaseName].dbo.[TableName]
--print #id
DBCC CHECKIDENT ('[DatabaseName].dbo.[TableName]', reseed, #id)
Commit
END
Then add it in to Start up by using following syntax.
EXEC sp_procoption 'ResetOrderIdentityAfterRestart', 'startup', 'on';
This is a good idea if you have few tables. but if you have to do for many tables, this method still works but not a good idea.

This is still a very common issue among many developers and applications regardless of size.
Unfortunately the suggestions above do not fix all scenarios, i.e. Shared hosting, you cannot rely on your host to set the -t272 startup parameter.
Also, if you have existing tables that use these identity columns for primary keys, it is a HUGE effort to drop those columns and recreate new ones to use the BS sequence workaround. The Sequence workaround is only good if you are designing the tables new from scratch in SQL 2012+
Bottom line is, if you are on Sql Server 2008R2, then STAY ON IT. Seriously, stay on it. Until Microsoft admits that they introduced a HUGE bug, which is still there even in Sql Server 2016, then we should not upgrade until they own it and FIX IT.
Microsoft straight up introduced a breaking change, i.e. they broke a working API that no longer works as designed, due to the fact that their system forgets their current identity on a restart. Cache or no cache, this is unacceptable, and the Microsoft developer by the name of Bryan needs to own it, instead of tell the world that it is "by design" and a "feature". Sure, the caching is a feature, but losing track of what the next identity should be, IS NOT A FEATURE. It's a fricken BUG!!!
I will share the workaround that I used, because My DB's are on Shared Hosting servers, also, I am not dropping and recreating my Primary Key columns, that would be a huge PITA.
Instead, this is my shameful hack (but not as shameful as this POS bug that microsoft has introduced).
Hack/Fix:
Before your insert commands, just reseed your identity before each insert. This fix is only recommended if you don't have admin control over your Sql Server instance, otherwise I suggest reseeding on restart of server.
declare #newId int -- where int is the datatype of your PKey or Id column
select #newId = max(YourBuggedIdColumn) from YOUR_TABLE_NAME
DBCC CheckIdent('YOUR_TABLE_NAME', RESEED, #newId)
Just those 3 lines immediately before your insert, and you should be good to go. It really won't affect performance that much, i.e. it will be unnoticeable.
Goodluck.

There are many possible reasons for jumping identity values. They range from rolled back inserts to identity management for replication. What is causing this in your case I can't tell without spending some time in your system.
You should know however, that in no case you can assume an identity column to be contiguos. There are just too many things that can cause gaps.
You can find a little more information about this here: http://sqlity.net/en/792/the-gap-in-the-identity-value-sequence/

Improve update performance when setting column to null

Bit of a long shot here, but I have a simple query below:
begin transaction
update s
set s.SomeField = null
from someTable s (NOLOCK)
rollback transaction
This runs in ~30 seconds sitting close to the SQL Server box. Are there any tricks I can use to improve the speed. The table has 144,306 rows in it.
thanks.

The single largest component of the performance of a large UPDATE command like this is going to be the speed of your DB log.
For best performance:
Make sure the DB log (LDF file) is on a separate physical spindle from the DB data (MDF file)
Avoid parity RAID for the log volume, such as RAID-5; RAID-1 or RAID-10 are better
Make sure that the DB log file is pre-grown, and that it's physically contiguous on disk
Make sure your server has enough RAM -- ideally, at least enough to hold all of the DB pages containing the modified rows
Using SSDs for your data drive may also help, because the command will create a large number of dirty buffers, which be flushed to disk later by the lazy writer; this can make other operations on the DB slow while it's happening.

If there's no constraint on it, and you really need to set all values of that column to NULL, then I would test dropping the column and re-adding it.
Not sure if that would be faster or not, but I'd investigate it.

Try disabling the index temporarily.

You could change the syntax of your query slightly, but I had no difference in my testing by doing that. I was using STATISTICS IO and STATISTICS TIME.
You mention the column is indexed. You could disable it / re-enable it as part of your transaction. The t-sql for that is simple, see this - http://blog.sqlauthority.com/2007/05/17/sql-server-disable-index-enable-index-alter-index/
I've had to do that in the past for similar jobs and it has worked out well for me.

Try to implement like this
Disable Index
Drop the column
Create the column
Rebuild index
I can guess that it will improve performance.

Minimally Logged Insert Into

I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.

From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx

Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.

Need to alter column types in production database (SQL Server 2005)

I need help writing a TSQL script to modify two columns' data type.
We are changing two columns:
uniqueidentifier -> varchar(36) * * * has a primary key constraint
xml -> nvarchar(4000)
My main concern is production deployment of the script...
The table is actively used by a public website that gets thousands of hits per hour. Consequently, we need the script to run quickly, without affecting service on the front end. Also, we need to be able to automatically rollback the transaction if an error occurs.
Fortunately, the table only contains about 25 rows, so I am guessing the update will be quick.
This database is SQL Server 2005.
(FYI - the type changes are required because of a 3rd-party tool which is not compatible with SQL Server's xml and uniqueidentifier types. We've already tested the change in dev and there are no functional issues with the change.)

As David said, execute a script in a production database without doing a backup or stop the site is not the best idea, that said, if you want to do changes in only one table with a reduced number of rows you can prepare a script to :
Begin transaction
create a new table with the final
structure you want.
Copy the data from the original table
to the new table
Rename the old table to, for example,
original_name_old
Rename the new table to
original_table_name
End transaction
This will end with a table that is named as the original one but with the new structure you want, and in addition you maintain the original table with a backup name, so if you want to rollback the change you can create a script to do a simple drop of the new table and rename of the original one.
If the table has foreign keys the script will be a little more complicated, but is still possible without much work.

Consequently, we need the script to
run quickly, without affecting service
on the front end.
This is just an opinion, but it's based on experience: That's a bad idea. It's better to have a short, (pre-announced if possible) scheduled downtime than to take the risk.
The only exception is if you really don't care if the data in these tables gets corrupted, and you can be down for an extended period.
In this situation, based on th types of changes you're making and the testing you've already performed, it sounds like the risk is very minimal, since you've tested the changes and you SHOULD be able to do it safely, but nothing is guaranteed.
First, you need to have a fall-back plan in case something goes wrong. The short version of a MINIMAL reasonable plan would include:
Shut down the website
Make a backup of the database
Run your script
test the DB for integrity
bring the website back online
It would be very unwise to attempt to make such an update while the website is live. you run the risk of being down for an extended period if something goes wrong.
A GOOD plan would also have you testing this against a copy of the database and a copy of the website (a test/staging environment) first and then taking the steps outlined above for the live server update. You have already done this. Kudos to you!
There are even better methods for making such an update, but the trade-off of down time for safety is a no-brainer in most cases.

And if you absolutely need to do this in live then you might consider this:
1) Build an offline version of the table with the new datatypes and copied data.
2) Build all the required keys and indexes on the offline tables.
3) swap the tables out in a transaction. 00 you could rename the old table to something else as an emergency backup.
sp_help 'sp_rename'
But TEST FIRST all of this in a prod like environment. And make sure your backups are up to date. AND do this when you are least busy.

SQL Server 2005 Shrink and Rebuild indexes

We have a weekly maintenance plan to shrink all user databases and rebuild their indexes. This has working fine until we created a read-only database, now each time the plan runs it fails when it starts processing this database due to its read only state.
As far as I can see we have two options remove the read only flag from the database, this is possible but as the database is only updated once a quarter it makes sense from a performance point of view to make use of the read-only feature. Or manually select the database that the plan should run for i.e. all the users databases apart from the read only one, this then requires people to remember to add any new databases into the plan.
Does anyone have any suggestions of a better way of doing this?
Thanks
Neil

why are you shrinking the database in the first place?
also there's no need to maintain read opnly db's like that.

I'd remove the read only flag if you don't want to customise the maint plan.
Why are you shrinking DBs too? If the database grows to a given size, then this is probably it's natural current size.
Also remember that an index rebuild (rule of thumb) require free space of 120% of target table size. Eg 500 MB table needs 600 MB free space.
It's pointless to shrink then rebuild... and you'll have horrendous file fragmentation too

I suppose could modify the maintenance plan to start with a 'Execute T-SQL Statement' step, which removes the readonly flag (ALTER DATABASE database-name SET READ_WRITE) and add a final step to reset it:
ALTER DATABASE database-name SET READ_ONLY

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas