SQL Server sp_clean_db_free_space - sql

I just want to ask if I can use the stored procedure sp_clean_db_free_space as part of preventive maintenance?
What are the pros and cons of this built in stored procedure? I just want to hear the inputs of someone who already uses this.
Thank you

You would typically not need to explicitly run this, unless you have specific reasons.
Quoting from SQL Server Books Online's page for sp_clean_db_free_space:
Delete operations from a table or update operations that cause a row
to move can immediately free up space on a page by removing references
to the row. However, under certain circumstances, the row can
physically remain on the data page as a ghost record. Ghost records
are periodically removed by a background process. This residual data
is not returned by the Database Engine in response to queries.
However, in environments in which the physical security of the data or
backup files is at risk, you can use sp_clean_db_free_space to clean
these ghost records.
Notice that SQL Server already has a background process to achieve the same result as sp_clean_db_free_space.
The only reason you might wish to explicitly run sp_clean_db_free_space is if there is a risk that the underlying database files or backups can be compromised and analysed. In such cases, any data that has not yet been swept up by the background process can be exposed. Of course, if your system has been compromised in such a way, you probably also have bigger problems on your hands!
Another reason might be that you have a time-bound requirement that deleted data should not be retained in any readable form. If you have only a general requirement that is not time-bound, then it would be acceptable to wait for the regular background process to perform this automatically.
The same page also mentions:
Because running sp_clean_db_free_space can significantly affect I/O
activity, we recommend that you run this procedure outside usual
operation hours.
Before you run sp_clean_db_free_space, we recommend
that you create a full database backup.
To summarize:
You'd use the sp_clean_db_free_space stored procedure only if you have a time-bound requirement that deleted data should not be retained in any readable form.
Running sp_clean_db_free_space is IO intensive.
Microsoft recommend a full database backup prior to this, which has its own IO and space requirements.
Take a look at this related question on dba.stackexchange.com: https://dba.stackexchange.com/questions/11280/how-can-i-truly-delete-data-from-a-sql-server-table-still-shows-up-in-notepad/11281

Related

Should I create separate SQL Server database for each user?

I am working on Asp.Net MVC web application, back-end is SQL Server 2012.
This application will provide billing, accounting, and inventory management. The user will create an account by signup. just like http://www.quickbooks.in. Each user will create some masters and various transactions. There is no limit, user can make unlimited records in the database.
I want to keep stable database performance, after heavy data load. I am maintaining proper indexing and primary keys in it, but there would be a heavy load on the database, per user.
So, should I create a separate database for each user, or should maintain one database with UserID. Add UserID in each table and making a partition based on UserID?
I am not an expert in SQL Server, so please provide suggestions with clear specifications.
Please inform me if there is any lack of information.
A DB per user is what happens when customers need to be able pack up and leave taking the actual database with them. Think of a self hosted wordpress website. Or if there are incredible risks to one user accidentally seeing another user's data, so it's safer to rely on the servers security model than to rely on remembering to add the UserId filter to all your queries. I can't imagine a scenario like that, but who knows-- maybe if the privacy laws allowed for jail time, I would rather data partitioned by security rules rather than carefully writing WHERE clauses.
If you did do user-per-database, creating a new user will be 10x more effort. While INSERT, UPDATE and so on stay the same from version to version, with each upgrade the syntax for database, user creation, permission granting and so on will evolve enough to break those scripts each SQL version upgrade.
Also, this will multiply your migration headaches by the number of users. Let's say you have 5000 users and you need to add some new columns, change a columns data type, update a trigger, and so on. Instead of needing to run that change script 1x, you need to run it 5000 times.
Per user Dbs also probably wastes disk space. Each of those databases is going to have a transaction log, sitting idle taking up the minimum log space.
As for load, if collectively your 5000 users are doing 1 billion inserts, updates and so on per day, my intuition tells me that it's going to be faster on one database, unless there is some sort of contension issue (everyone reading and writing to the same table at the same time and the same pages of the same table). Each database has machine resources (probably threads and memory) per database doing housekeeping, so these extra DBs can't be free.
Anyhow, the best thing to do is to simulate the two architectures and use a random data generator to simulate load and see how they perform.
It's not an easy answer to give.
First, there is logical design to be considered. Then you have integrity, security, management and performance (in this very order).
A database is a logical unit of data, self contained. Ideally, you should be able to take a database, move it to another instance, probably change the connection strings and be running again.
All the constraints are database-level. No foreign keys can exist referencing some object outside the database.
So, try thinking in these terms first.
How would you reliably prevent one user messing up the other user's data? Keep in mind that it's just a matter of time before someone opens an excel sheet and fire up queries on the database bypassing your application. Row level security in SQL Server is something you don't want to deal with.
Multiple databases mean that all management tasks should be scripted out and executed on all databases. Yes, there is some overhead to it, but once you set it up it's just the matter of monitoring. If a database goes suspect, it's a single customer down, not all of them. You can even have different versions for different customes if each customer have it's own database. Additionally, if you roll an upgrade, you can do it per customer, so the inpact will be much less.
Performance is the least relevant factor here. Of course, it really depends on how many customers and how much data, but proper indexing will solve these issues. Scale-out is much easier with multiple databases.
BTW, partitioning, as you mentioned it, is never a performance booster, it's simply a management feature, allowing for faster loading and evicting of data from a table.
I'd probably put each customer in separate database, but it's up to you eventually to make a decision for yourself. Hope I've helped some with this.

Is it enough to test a stored-procedure safely just by running it in a transaction?

I have a sp called MoveSomeItems which gets some rows from tableA from Foo Db. and moves them to tableA in Bar Db.
I want to test this sp if it really moves the items.
Is it enough to run this sp in a transaction and select the rows to see if they are moved OR I should approach it in a different way?
This depends upon what the impact of it all going wrong is? What impact would having incorrect data in the destination table be, will it kill someone, simply annoy them or is it unlikely anyone will notice? Will it be easy to fix?
There are risks associated with the approach you have given. For instance:
If the database is very busy, it is possible to cause excessive locking or even a deadlock with a transaction that may cause other transactions to fail. Setting the TRANSACTION ISOLATION LEVEL to READ UNCOMITTED and the DEADLOCK PRIORITY to LOW will help to minimise this but not eliminate it entirely.
There is the possibility that other transactions may be running in READ UNCOMMITED isolation mode. In which case they will see the results of the insert temporarily until the roll back is issued.
It is worth noting that if the procedure you are testing calls COMMIT TRANSACTION inside it you might not get the result you want when you call the ROLLBACK.
You might push the database or log to run out of disk space.
You might use up all the available CPU, Memory, Disk IO, Network or some other capacity limit.
Finally, I suspect this is not a complete list. The point I’m trying to make is that it could go wrong in strange ways.
If you have a personal development database that is fully backed up then you wouldn't even need the transaction, simply do a restore after the event. The transaction may well save you some time though. This is the safest solution.
If you are using a shared development database your approach might be acceptable enough, but I would still do a backup just in case, especially if you are already on bad terms with the team.
If you are using a live database it may still be acceptable if the system as a whole is not that critical and can sustain some downtime while you repair things. Again do a backup.
If the database you are looking at is controlling a process that is safety critical or some other mission critical function, don't even go there you may lose the no claims on your liability insurance or worse. In this instance it is best to restore a backup onto a test server and test there thus creating my first scenario. But be warned there are lots of issues that have to be considered when doing this. For instance it may be illegal to use personal information in a test system. Also there may be dependencies on other systems that will need to be mocked out to ensure you don't affect them, for example don't connect a test system to a live email server.
If I havea complex stored proc that I want to be able to test and rollback, I add an input parameter(always as the last parameter), #debug with a default value of 0 (so you don't need to specify it when you are running on prod).
Then I write code at the end to test if the parameter = 1 and if so I run any select queries to shwo me what data I want to see and then send the program to the catch block using raiseerror (Never write multiple transactions without a try catch block) and have it rollback.
This way you can easily check your results on dev and automatically rollback.

How to continuously delivery SQL-based app?

I'm looking to apply continuous delivery concepts to web app we are building, and wondering if there any solution to protecting the database from accidental erroneous commit. For example, a bug that erases whole table instead of a single record.
How this issue impact can be limited according to continuous delivery doctorine, where the application deployed gradually over segments of infrastructure?
Any ideas?
Well first you cannot tell just from looking what is a bad SQL statement. You might have wanted to delete the entire contents of the table. Therefore is is not physiucally possible to have an automated tool that detects intent.
So to protect your database, first make sure you are in full recovery (not simple) mode and have full backups nightly and transaction log backups every 15 minutes or so. Now you cannot lose much information no matter how badly the process breaks. Your dbas should be trained to be able to recover to a point in time. If you don't have any dbas, I'd suggest the best thing you can do to protect your data is hire some. This is a non-negotiable in any non-trivial database environment and it is terribly risky not to have trained, experienced dbas if your data is critical to the business.
Next, you need to treat SQL like any other code, it should be in source control in scripts. If you are terribly concerned about accidental deletions, then write the scripts for deletes to copy all deletes to a staging table and delete the content of the staging table once a week or so. Enforce this convention in the code reviews. Or better yet set up an auditing process that runs through triggers. Once all records are audited, it is much easier to get back the 150 accidental deletions without having to restore a database. I would never consider having any enterprise application without auditing.
All SQL scripts without exception should be code-reviewed just like other code. All SQL scripts should be tested on QA and passed before moving to porduction. This will greatly reduce the possiblility for error. No developer should have write rights to production, only dbas should have that. Therefore each script should be written so that is can just be run, not run one chunk at a time where you could accidentally forget to highlight the where clause. Train your developers to use transactions correctly in the scripts as well.
Your concern is bad data happening to the database. The solution is to use full logging of all transactions so you can back out of transactions that you want to. This would usually be used in a context of full backups/incremental backups/full logging.
SQL Server, for instance, allows you to restore to a point in time (http://msdn.microsoft.com/en-us/library/ms190982(v=sql.105).aspx), assuming you have full logging.
If you are creating and dropping tables, this could be an expensive solution, in terms of space needed for the log. However, it might meet your needs for development.
You may find that full-logging is too expensive for such an application. In that case, you might want to make periodic backups (daily? hourly?) and just keep these around. For this purpose, I've found LightSpeed to be a good product for fast and efficient backups.
One of the strategies that is commonly adopted is to log the incremental sql statements rather than a collective schema generation so you can control the change at a much granular levels:
ex:
change 1:
UP:
Add column
DOWN:
Remove column
change 2:
UP:
Add trigger
DOWN:
Remove trigger
Once the changes are incrementally captured like this, you can have a simple but efficient script to upgrade (UP) from any version to any version without having to worry about the changes that happening. When the change # are linked to build, it becomes even more effective. When you deploy a build the database is also automatically upgraded(UP) or downgraded(DOWN) to that specific build.
We have an pipeline app which does that at CloudMunch.

Recommended approach how to modify schema of a production SQL database?

Say there is a database with 100+ tables and a major feature is added, which requires 20 of existing tables to be modified and 30 more added. The changes were done over a long time (6 months) by multiple developers on the development database. Let's assume the changes do not make any existing production data invalid (e.g. there are default values/nulls allowed on added columns, there are no new relations or constraints that could not be fulfilled).
What is the easiest way to publish these changes in schema to the production database? Preferably, without shutting the database down for an extended amount of time.
Write a T-SQL script that performs the needed changes. Test it on a copy of your production database (restore from a recent backup to get the copy). Fix the inevitable mistakes that the test will discover. Repeat until script works perfectly.
Then, when it's time for the actual migration: lock the DB so only admins can log in. Take a backup. Run the script. Verify results. Put DB back online.
The longest part will be the backup, but you'd be crazy not to do it. You should know how long backups take, the overall process won't take much longer than that, so that's how long your downtime will need to be. The middle of the night works well for most businesses.
There is no generic answer on how to make 'changes' without downtime. The answer really depends from case to case, based on exactly what are the changes. Some changes have no impact on down time (eg. adding new tables), some changes have minimal impact (eg. adding columns to existing tables with no data size change, like a new nullable column that doe snot increase the null bitmap size) and other changes will wreck havoc on down time (any operation that will change data size will force and index rebuild and lock the table for the duration). Some changes are impossible to apply without *significant * downtime. I know of cases when the changes were applies in parallel: a copy of the database is created, replication is set up to keep it current, then the copy is changed and kept in sync, finally operations are moved to the modified copy that becomes the master database. There is a presentation at PASS 2009 given by Michelle Ufford that mentions how godaddy gone through such a change that lasted weeks.
But, at a lesser scale, you must apply the changes through a well tested script, and measure the impact on the test evaluation.
But the real question is: is this going to be the last changes you ever make to the schema? Finally, you have discovered the perfect schema for the application and the production database will never change? Congratulation, once you pull this off, you can go to rest. But realistically, you will face the very same problem in 6 months. the real problem is your development process, with developers and making changes from SSMS or from VS Server Explored straight into the database. Your development process must make a conscious effort to adopt a schema change strategy based on versioning and T-SQL scripts, like the one described in Version Control and your Database.
Use a tool to create a diff script and run it during a maintenance window. I use RedGate SQL Compare for this and have been very happy with it.
I've been using dbdeploy successfully for a couple of years now. It allows your devs to create small sql change deltas which can then be applied against your database. The changes are tracked by a changelog table within database so that it knows what to apply.
http://dbdeploy.com/

Access sql that triggered the trigger from within trigger (Sybase)

Is there a way to access the sql that triggered a trigger from within the trigger? I've managed to get it by joining to the master..monProcessSQLText MDA table but this only works for users with the mon_role and I don't want to give that to everyone. Is there a global variable I've missed?
I'm trying to log all the updates run against a table so I can trace it back to an IP address and username.
This is with ASE 12.5.
If you are trying to
log all the updates run against a table so I can trace it back to an IP address and username
A trigger is definitely the wrong way to go about it, triggers were not designed for that, and there are other ASE facilities which were designed for that. It is not about the table, it is about security and monitoring in general.
Sybase Auditing.
It takes a bit of setting up, much less overhead than MAD tables; but most important, it was designed for auditing (MDA was not). And there is no coding requirements such as for MDA. It is highly configurable, the idea is to capture only what you need, and not more.
Monitoring.
I would not recommend MDA tables, but since you have them in place, and you have enabled monitoring, and accepted the 22% overhead for capturing SQL text... The info is very transient. In order to use them for any relevant purpose, such as yours, you need to write a capture-and-store mechanism, archiving all required info to an archive database. This has to be done on an ongoing basis, and completely independent of a trigger, etc. You can also filter on the fly to reduce the volume of data stored (warning, it is huge). purge data over 7 days old, etc. It is a little project in itself, that is why there are commercially available from 3rd parties.
Once either of these facilities are in place, then, separately, whenever you wish to inquire about who updated a table, when and from where, all you need to do is inspect the archive. nothing to do with a trigger, or difficulties getting the info from a trigger, or giving admin privileges to ordinary users.
Also, it needs to be appreciated that you do not have normal security in place, the tables are being updated directly by users; thus direct update permissions have been granted to either specific users, or worse, all users. The consequence is, there is no way of knowing who is updating the table, and who is breaking the data or referential integrity.
The secure method is to place the entire transaction in a stored proc, thus eliminating the possibility of incomplete transactions (as well as improving execution speed); and to grant permissions on the procs, not the tables, thus eliminating direct updates. Over time, you may wish to implement security in the server, so that the consequences do not have to be chased down and closed one by one, a process with no finite end.
As far as Auditing goes, if security were in place, then the auditing burden is also substantially reduced: you need to audit stored proc executions only. Otherwise, you need to audit all updates to all tables.