How to trace row level dependency? - sql

In case I need to change the PK of a single row from 1 to 10, for example, is there any way to trace every proc, view and function that might reference the old value?
I mean, a simple select in a proc like: select * from table where FK = 1 would break, and I'd had to look for every reference for ones in every proc and view and change them to 10 to get the system to work.
Is there any automatic way of doing this? I use SQL SERVER.

I suspect that the only way to do this correctly involves querying the database metadata - to identify all the places that use your PK as a FK, in a proc, or in a view. This is likely to be complex; fragile; and prone to error.
This is one of the (many) reasons to avoid having the PK as anything other than a system derived, meaningless value, which is not accessible to manipulation by (even) the creator/administrator. Also, under what circumstances would you have a PK hard coded in a proc or function - again a potential source of fragility in your system.
If a PK is created that is incorrect (by whatever criteria) or which needs to be changed, create a new record and copy the existing values into it. While this does not answer your query, your routines to delete or modify values in the table need to know how and where it is used; and so a routine to copy a row should be able to access this information.

Related

Are there any shortcuts for creating a table variable to match the columns in a view?

I frequently need to create table variables within procedures, to store the top 25 rows from a view for processing.
I need to store these small batches in variables temporarily, because I'm performing numerous operations that modify data within the underlying tables, and some of these operations cause the rows to no longer appear within the view itself based on the view criteria (this is by design).
I need to keep the data around for the entire processing session, and I can't rely on the view itself to remain consistent through the operation.
The problem is, since we're doing this in many instances across multiple databases, if we ever make any changes to the columns in any of our views, the code becomes somewhat bug-prone since we also have to make sure to modify the relevant table types as well - without making any typos or mistakes, or overlooking anything.
So my question is, can we just declare table variables (or table types, if necessary) by just stating "Match the current columns in this view?"
That would make things much easier, since it would automatically keep all relevant table variables in sync with the current layout of the views in question, and eliminate the headache that comes with trying to keep it all straight manually.
If no such shortcut exists, then I guess we'll just have to create custom table types matching our views as needed, to at least keep it as centralized as possible.
If the usage of type variable could be replaced by Temporary Table something like:
SELECT * INTO #TempTable FROM myView
Will do the job perfectly.
With SELECT INTO, the table will be created with colomn and metadata avialable in you select statement.
Hope this helps.

Quick way to reset all column values to a default

I'm converting data from one schema to another. Each table in the source schema has a 'status' column (default NULL). When a record has been converted, I update the status column to 1. Afterwards, I can report on the # of records that are (not) converted.
While the conversion routines are still under development, I'd like to be able to quickly reset all values for status to NULL again.
An UPDATE statement on the tables is too slow (there are too many records). Does anyone know a fast alternative way to accomplish this?
The fastest way to reset a column would be to SET UNUSED the column, then add a column with the same name and datatype.
This will be the fastest way since both operations will not touch the actual table (only dictionary update).
As in Nivas' answer the actual ordering of the columns will be changed (the reset column will be the last column). If your code rely on the ordering of the columns (it should not!) you can create a view that will have the column in the right order (rename table, create view with the same name as old table, revoke grants from base table, add grants to view).
The SET UNUSED method will not reclaim the space used by the column (whereas dropping the column will free space in each block).
If the column is nullable (since default is NULL, I think this is the case), drop and add the column again?
While the conversion routines are still under development, I'd like to be able to quickly reset all values for status to NULL again.
If you are in development why do you need 70 million records? Why not develop against a subset of the data?
Have you tried using flashback table?
For example:
select current_scn from v$database;
-- 5607722
-- do a bunch of work
flashback table TABLE_NAME to scn 5607722;
What this does is ensure that the table you are working on is IDENTICAL each time you run your tests. Of course, you need to ensure you have sufficient UNDO to hold your changes.
hm. maybe add an index to the status column.
or alterately, add a new table with the primary key only in it. then insert to that table when the record is converted, and TRUNC that table to reset...
I like some of the other answers, but I just read in a tuning book that for several reasons it's often quicker to recreate the table than to do massive updates on the table. In this case, it seems ideal, since you would be writing the CREATE TABLE X AS SELECT with hopefully very few columns.

Using Trigger to get ID on Insert - SQL 2005

I have a table (table_a) that, upon insert, needs to retrieve the next available id from the available_id field in another table (table_b) to use as the primary key in table_a, and then increment the available_id field in table_b by 1. While doing this via stored procedures is easy, I need to be able to have this occur on any insert into the table.
I know I need to use triggers, but I am unsure how to code this. Any advice?
Basically this is my dilema:
I need to ensure 2 different tables have unique id's throughout. What would be the best way to do this without using GUID's? (Some of this code cannot be controlled on our end and requires ints as id's).
My advice is DON'T! Use an identity field instead.
In the first place, inserts can have multiple records and so a trigger to properly do this would have to account for that making it rather tricky to write. It would have to be an instead of trigger which is also tricky as you wouldn't have one of the required values (I assume your ID field is required) in the initial insert. In the second place two inserts going on at the same time could try to pick the same number or could lock the second connection for a good bit of time if you are doing a large import of data in one connection.
You could use an Oracle-style sequence, described here, calling it either via a trigger or from your application (providing the resulting value to your insert routine):
http://www.sqlteam.com/article/custom-auto-generated-sequences-with-sql-server
He mentions these issues to consider:
• What if two processes attempt to add
a row to the table at the exact same
time? Can you ensure that the same
value is not generated for both
processes?
• There can be overhead querying the
existing data each time you'd like to
insert new data
• Unless this is implemented as a
trigger, this means that all inserts
to your data must always go through
the same stored procedure that
calculates these sequences. This
means that bulk imports, or moving
data from production to testing and
so on, might not be possible or might
be very inefficient.
• If it is implemented as a trigger,
will it work for a set-based
multi-row INSERT statement? If so,
how efficient will it be? This
function wouldn't work if called for
each row in a single set-based INSERT
-- each NextCustomerNumber() returned would be the same value.

When to Create, When to Modify a Table?

I wanted to know, what should i consider while deciding if i should create a new table or modify an existing table for a sql db. i use both mysql and sqlite.
-Edit- I always thought if i can put a column into a table where it makes sense and can be used by every row then i would always modify it. However at work if its a different 'release' we put it in a different table.
You can modify existing tables, as long as
you are keeping the database Normalized
you are not breaking code that uses the table
You can create new tables even if 1. and 2. are true for the following reasons:
Performance reasons
Clarity in your schema logic.
Not sure if I'm understanding your question correctly, but one thing I always try to consider is the impact on existing data.
Taking the case of an application which relies on a database...
When you update the application (including database schema updates), it is important to ensure that any existing, in-use databases will be either backwards compatible with the application, or there is way to migrate and update the existing database.
Generally if the data is in a one-to-one relationship with the existing data in the table and if the table row size is not too large already and if there aren't too many records in the table, then I usually alter the table to accept the new column.
However, suppose I want to add a column with a default value to a table where it doesn't exist. Adding it to the table with 50 million records might not be so speedy a process and it might lock up the table on production when we move the change up. In this case, putting it into a separate table and adding the records to it may work out better. In general, I wouldn't do this unless my testing has shown that adding and populating the column will take an unacceptably long time. I would prefer to keep the record together where possible.
Same thing with the overall record size. SQL server has a byte limit to the number of bytes that can be in a record, it will allow you to create a structure that is potentially larger than that, but it will not alow you to put more than the byte limit into a specific record. Further, less wide tables tend to be faster to access due to how they are stored. Frequently, people will create a table that has a one-to-one relationship (we call them extended tables in our structure) for additional columns that are not as frequnetly used. If the fields from both tables will be frequently used, often they still create two tables but have a view that will pickout all the columns needed.
And of course if the data is in a one to many relationship, you need a related table not just a new column.
Incidentally, you should always do an alter table through a script and the SSMS GUI as it is more efficient and easier to move to prod.

MySQL SELECT statement using Regex to recognise existing data

My web application parses data from an uploaded file and inserts it into a database table. Due to the nature of the input data (bank transaction data), duplicate data can exist from one upload to another. At the moment I'm using hideously inefficient code to check for the existence of duplicates by loading all rows within the date range from the DB into memory, and iterating over them and comparing each with the uploaded file data.
Needless to say, this can become very slow as the data set size increases.
So, I'm looking to replace this with a SQL query (against a MySQL database) which checks for the existence of duplicate data, e.g.
SELECT count(*) FROM transactions WHERE desc = ? AND dated_on = ? AND amount = ?
This works fine, but my real-world case is a little bit more complicated. The description of a transaction in the input data can sometimes contain erroneous punctuation (e.g. "BANK 12323 DESCRIPTION" can often be represented as "BANK.12323.DESCRIPTION") so our existing (in memory) matching logic performs a little cleaning on this description before we do a comparison.
Whilst this works in memory, my question is can this cleaning be done in a SQL statement so I can move this matching logic to the database, something like:
SELECT count(*) FROM transactions WHERE CLEAN_ME(desc) = ? AND dated_on = ? AND amount = ?
Where CLEAN_ME is a proc which strips the field of the erroneous data.
Obviously the cleanest (no pun intended!) solution would be to store the already cleaned data in the database (either in the same column, or in a separate column), but before I resort to that I thought I'd try and find out whether there's a cleverer way around this.
Thanks a lot
can this cleaning be done in a SQL statement
Yes, you can write a stored procedure to do it in the database layer:
mysql> CREATE FUNCTION clean_me (s VARCHAR(255))
-> RETURNS VARCHAR(255) DETERMINISTIC
-> RETURN REPLACE(s, '.', ' ');
mysql> SELECT clean_me('BANK.12323.DESCRIPTION');
BANK 12323 DESCRIPTION
This will perform very poorly across a large table though.
Obviously the cleanest (no pun intended!) solution would be to store the already cleaned data in the database (either in the same column, or in a separate column), but before I resort to that I thought I'd try and find out whether there's a cleverer way around this.
No, as far as databases are concerned the cleanest way is always the cleverest way (as long as performance isn't awful).
Do that, and add indexes to the columns you're doing bulk compares on, to improve performance. If it's actually intrinsic to the type of data that desc/dated-on/amount are always unique, then express that in the schema by making it a UNIQUE index constraint.
The easiest way to do that is to add a unique index on the appropriate columns and to use ON DUPLICATE KEY UPDATE. I would further recommend transforming the file into a csv and loading it into a temporary table to get the most out of mysql's builtin functions, which are surely faster than anything that you could write yourself - if you consider that you would have to pull the data into your own application, while mysql does everything in place.
The cleanest way is indeed to make sure only correct data is in the database.
In this example the "BANK.12323.DESCRIPTION" would be returned by:
SELECT count(*) FROM transactions
WHERE desc LIKE 'BANK%12323%DESCRIPTION' AND dated_on = ? AND amount = ?
But this might impose performance issues when you have a lot of data in the table.
Another way that you could do it is as follows:
Clean the description before inserting.
Create a primary key for the table that is a combination of the columns that uniquely identifier the entry. Sounds like that might be cleaned description, date and amount.
Use the either the 'replace' or 'on duplicate key' syntax, which ever is more appropriate. 'replace' actually replaces the existing row in the db with the updated one when an existing unique key confict occurs, e.g:
REPLACE INTO transactions (desc, dated_on, amount) values (?,?,?)
'on duplicate key' allows you to specify which columns to update on a duplicate key error:
INSERT INTO transaction (desc, dated_on, amount) values (?,?,?)
ON DUPLICATE KEY SET amount = amount
By using the multi-column primary key, you will gain a lot of performance since primary key lookups are usually quite fast.
If you prefer to keep your existing primary key, you could also create a unique unix on those three columns.
Whichever way you choose, I would recommend cleaning the description before going into the db, even if you also store the original description and just use the cleaned one for indexing.