SQL Server 2005: Audit random record deletion - sql-server-2005

This may seem like a dumb question, but I'm in a head-> wall situation right now.
I work on a massive ERP application in which the SQL Server 2005 database is updated by multiple disparate applications. I'm trying to figure out where the deletes in a particular table are originating from.
I tried using the Profiler but I'm not able to filter the event types enough to be able to identify the errant SP because there are so many hits to the database every second from various quarters. Also the Profiler seems more directed to finding DDL changes or Object DROP type actions.
I'm simply trying to answer the question: What Stored Proc. or SQL query caused a record to be deleted from Table X?
What tool should I use? I was hoping to avoid something like Trigger based Auditing. Or is the Profiler the best tool for this sort of investigation? Or are third-party tools the only resort?
Please provide any helpful links you can because I'm relatively unfamiliar with this topic.

Finding the culprit with profiler could be like finding a needle in a haystack, especially on a busy system; if you can't find it with filters like edosoft suggests, try to minimize the noise by eliminating statments with writes=0, filter by application name, filter by textdata not like '%select%'; you should be able to get it narrowed down.
If you're really desperate, you could deny delete permission to all users on the table and wait for the phone to ring.
You could also run occassional SELECT COUNT(*) on the table into a work table with timestamps and try to correlate any drops in record counts to other activity.

You could use SQL Profiler for this, but you need to filter the results. To monitor DELETE statements select "RPC:Starting" and "SP:Starting" events and apply a filter on the TextData column: "TextData LIKE '%DELETE%FROM%'".
-Edoode

Related

How do I achieve this sort of replication?

I have a main SQL Server A that data is inserted into. The table of interest on A looks like this:
Name|Entry Time|Exit Time|Comments
From this main table, I want to construct a table on another server B that contains the same data from A but with some additional filters using a WHERE clause i.e. WHERE Name IN ('John', 'Adam', 'Jack').
I am not sure what this is called and if this is even supported by SQL Server natively (or I should setup a script to achieve this). Replication means replicating entire data but can someone tell me what is it that I am looking for and how to achieve this?
Transactional replication does support filters on articles, but I'll be honest - I've never set it up with articles with filters. This article may help as well as this topic in Books Online.
If it's only one table and/or you are uncomfortable diving into replication, you may want to populate the remote table with a trigger (this will obviously be easier if the data is only written to the table on insert and never updated). But you'll need to have logic set up to deal with situations where the remote server is down.
A third solution might be viable if you do not need server B to be continuously up to date - you can manually move data over every n minutes using a job - either using an outer join / merge or completely swapping out the set of data that matches the filter (I've used shadow schemas for this scenario to minimize the impact this has on readers of server B - see this dba.stackexchange answer for more details).
Transactional replication with SQL Server supports the ability to filter data. In fact, when you set up your replication, there is an Add Filter dialog box (assuming you're using SSMS) that allows you to create your filter (Where clause).
You can learn more about this here.

Change SQL Query update time

I'm reasonably new to using SQL, so I apologise if this is a newbish question!
I'm wanting to be able to view the results of a query in a graph on my website. The problem however is the query takes 2-3 seconds to process (due to a Distinct count over 200,000+ fields), and could likely be called multiple times a second.
If I put the query in a view instead, and have my graph access the view, is there anyway I can set the view to update periodically instead of everytime someone goes to the site? Or is there another way round this?
EDIT: DBMS is MySQL
You could populate the data using a sproc or even just a normal TSQL if you want, using a Scheduled Job. Or, as Gijo suggested, you could look at database replication, but I am not sure that you need that kind of overhead if you can get away with the first choice of periodically caching the data.
You may try replication. With replication, you can set to populate a certain table to get data from a stored procedure and then use this table to show the graph.

detect cartesian product or other non sensible queries

I'm working on a product which gives users a lot of "flexibility" to create sql, ie they can easily set up queries that can bring the system to it's knees with over inclusive where clauses.
I would like to be able to warn users when this is potentially the case and I'm wondering if there is any known strategy for intelligently analysing queries which can be employed to this end?
I feel your pain. I've been tasked with something similar in the past. It's a constant struggle between users demanding all of the features and functionality of SQL while also complaining that it's too complicated, doesn't help them, doesn't prevent them from doing stupid stuff.
Adding paging into the query won't stop bad queries from being executed, but it will reduce the damage. If you only show the first 50 records returned from SELECT * FROM UNIVERSE and provide the ability to page to the next 50 and so on and so forth, you can avoid out of memory issues and reduce the performance hit.
I don't know if it's appropriate for your data/business domain; but I forcefully add table joins when the user doesn't supply them. If the query contains TABLE A and TABLE B, A.ID needs to equal B.ID; I add it.
If you don't mind writing code that is specific to a database, I know you can get data about a query from the database (Explain Plan in Oracle - http://www.adp-gmbh.ch/ora/explainplan.html). You can execute the plan on their query first, and use the results of that to prompt or warn the user. But the details will vary depending on which DB you are working with.

Date ranges in views - is this normal?

I recently started working at a company with an enormous "enterprisey" application. At my last job, I designed the database, but here we have a whole Database Architecture department that I'm not part of.
One of the stranger things in their database is that they have a bunch of views which, instead of having the user provide the date ranges they want to see, join with a (global temporary) table "TMP_PARM_RANG" with a start and end date. Every time the main app starts processing a request, the first thing it does it "DELETE FROM TMP_PARM_RANG;" then an insert into it.
This seems like a bizarre way of doing things, and not very safe, but everybody else here seems ok with it. Is this normal, or is my uneasiness valid?
Update I should mention that they use transactions and per-client locks, so it is guarded against most concurrency problems. Also, there are literally dozens if not hundreds of views that all depend on TMP_PARM_RANG.
Do I understand this correctly?
There is a view like this:
SELECT * FROM some_table, tmp_parm_rang
WHERE some_table.date_column BETWEEN tmp_parm_rang.start_date AND tmp_parm_rang.end_date;
Then in some frontend a user inputs a date range, and the application does the following:
Deletes all existing rows from
TMP_PARM_RANG
Inserts a new row into
TMP_PARM_RANG with the user's values
Selects all rows from the view
I wonder if the changes to TMP_PARM_RANG are committed or rolled back, and if so when? Is it a temporary table or a normal table? Basically, depending on the answers to these questions, the process may not be safe for multiple users to execute in parallel. One hopes that if this were the case they would have already discovered that and addressed it, but who knows?
Even if it is done in a thread-safe way, making changes to the database for simple query operations doesn't make a lot of sense. These DELETEs and INSERTs are generating redo/undo (or whatever the equivalent is in a non-Oracle database) which is completely unnecessary.
A simple and more normal way of accomplishing the same goal would be to execute this query, binding the user's inputs to the query parameters:
SELECT * FROM some_table WHERE some_table.date_column BETWEEN ? AND ?;
If the database is oracle, it's possibly a global temporary table; every session sees its own version of the table and inserts/deletes won't affect other users.
There must be some business reason for this table. I've seen views with dates hardcoded that were actually a partioned view and they were using dates as the partioning field. I've also seen joining on a table like when dealing with daylights saving times imagine a view that returned all activity which occured during DST. And none of these things would ever delete and insert into the table...that's just odd
So either there is a deeper reason for this that needs to be dug out, or it's just something that at the time seemed like a good idea but why it was done that way has been lost as tribal knowledge.
Personally, I'm guessing that it would be a pretty strange occurance. And from what you are saying two methods calling the process at the same time could be very interesting.
Typically date ranges are done as filters on a view, and not driven by outside values stored in other tables.
The only justification I could see for this is if there was a multi-step process, that was only executed once at a time and the dates are needed for multiple operations, across multiple stored procedures.
I suppose it would let them support multiple ranges. For example, they can return all dates between 1/1/2008 and 1/1/2009 AND 1/1/2006 and 1/1/2007 to compare 2006 data to 2008 data. You couldn't do that with a single pair of bound parameters. Also, I don't know how Oracle does it's query plan caching for views, but perhaps it has something to do with that? With the date columns being checked as part of the view the server could cache a plan that always assumes the dates will be checked.
Just throwing out some guesses here :)
Also, you wrote:
I should mention that they use
transactions and per-client locks, so
it is guarded against most concurrency
problems.
While that may guard against data consistency problems due to concurrency, it hurts when it comes to performance problems due to concurrency.
Do they also add one -in the application- to generate the next unique value for the primary key?
It seems that the concept of shared state eludes these folks, or the reason for the shared state eludes us.
That sounds like a pretty weird algorithm to me. I wonder how it handles concurrency - is it wrapped in a transaction?
Sounds to me like someone just wasn't sure how to write their WHERE clause.
The views are probably used as temp tables. In SQL Server we can use a table variable or a temp table (# / ##) for this purpose. Although creating views are not recommended by experts, I have created lots of them for my SSRS projects because the tables I am working on do not reference one another (NO FK's, seriously!). I have to workaround deficiencies in the database design; that's why I am using views a lot.
With the global temporary table GTT approach that you comment is being used here, the method is certainly safe with regard to a multiuser system, so no problem there. If this is Oracle then I'd want to check that the system either is using an appropriate level of dynamic sampling so that the GTT is joined appropriately, or that a call to DBMS_STATS is made to supply statistics on the GTT.

Strategy for identifying unused tables in SQL Server 2000?

I'm working with a SQL Server 2000 database that likely has a few dozen tables that are no longer accessed. I'd like to clear out the data that we no longer need to be maintaining, but I'm not sure how to identify which tables to remove.
The database is shared by several different applications, so I can't be 100% confident that reviewing these will give me a complete list of the objects that are used.
What I'd like to do, if it's possible, is to get a list of tables that haven't been accessed at all for some period of time. No reads, no writes. How should I approach this?
MSSQL2000 won't give you that kind of information. But a way you can identify what tables ARE used (and then deduce which ones are not) is to use the SQL Profiler, to save all the queries that go to a certain database. Configure the profiler to record the results to a new table, and then check the queries saved there to find all the tables (and views, sps, etc) that are used by your applications.
Another way I think you might check if there's any "writes" is to add a new timestamp column to every table, and a trigger that updates that column every time there's an update or an insert. But keep in mind that if your apps do queries of the type
select * from ...
then they will receive a new column and that might cause you some problems.
Another suggestion for tracking tables that have been written to is to use Red Gate SQL Log Rescue (free). This tool dives into the log of the database and will show you all inserts, updates and deletes. The list is fully searchable, too.
It doesn't meet your criteria for researching reads into the database, but I think the SQL Profiler technique will get you a fair idea as far as that goes.
If you have lastupdate columns you can check for the writes, there is really no easy way to check for reads. You could run profiler, save the trace to a table and check in there
What I usually do is rename the table by prefixing it with an underscrore, when people start to scream I just rename it back
If by not used, you mean your application has no more references to the tables in question and you are using dynamic sql, you could do a search for the table names in your app, if they don't exist blow them away.
I've also outputted all sprocs, functions, etc. to a text file and done a search for the table names. If not found, or found in procedures that will need to be deleted too, blow them away.
It looks like using the Profiler is going to work. Once I've let it run for a while, I should have a good list of used tables. Anyone who doesn't use their tables every day can probably wait for them to be restored from backup. Thanks, folks.
Probably too late to help mogrify, but for anybody doing a search; I would search for all objects using this object in my code, then in SQL Server by running this :
select distinct '[' + object_name(id) + ']'
from syscomments
where text like '%MY_TABLE_NAME%'