Is there a efficient way to delete every view/function/table/sp from a database? - db2-luw

In a DB2 federated database (based on remote servers and nicknames), I need to clean up the model and recreate it from another database. I need to delete every database object except those servers and nicknames.
I know how to retrieve the list of objects from the SYSCAT schema. Now I need to run the DROP statements on each. Obviously the dependencies will get in the way.
The brute force approach would be to run the DROPs in a loop until all have succeeded, but depending on the order (lucky or not), it could take a very long time.
Would you know a way to efficiently order the DROP statement so that the total time for the deletion is the shortest possible?
A perfect solution is not expected. A reasonably clever solution is good enough.
Thank you

You might want to see the references of each table (which you can do with syscat.references according to http://www.ibm.com/developerworks/data/library/techarticle/dm-0401melnyk/) and built a tree of the dependencies yourself (should be doable e.g. with temporary tables, if you are restricted to sql only). Then you may drop from the bottom of that tree.
So, basically, my answer to your question would be that in order to do it quick, just order the tables based on the references they have between themselves before deleting. Since there should not be any dependency cycles, you should always be able to pick one table which is not referenced. Drop it and repeat.
You might also wish to see this (similar?) question: DB2 cascade delete command? in case you want to delete the data first.
If I am wrong at some point, please correct. This answer is based on my experiences with other databases, therefore it might not be fully suitable for DB2. Although it should work ;)

This query is able to order the statements according to the total number of elements they depend on. The resulting order works almost without a glitch, the second pass of the "brute force" approach contains only a handful of objects (out of several thousands objects to delete).
Problem, it is very slow...
EDIT : There was a typo in the query that made it return more or less correct data but very very slowly.
WITH FIRST_LEVEL_DEPENDENCIES (BSCHEMA, BNAME, DTYPE, DSCHEMA, DNAME) AS
(
SELECT T1.TABSCHEMA AS BSCHEMA, T1.TABNAME AS BNAME, T1.BTYPE, T1.BSCHEMA, T1.BNAME
FROM SYSCAT.TABDEP T1
WHERE T1.TABSCHEMA NOT LIKE 'SYS%'
AND T1.BTYPE <> 'N'
UNION ALL
SELECT T1.ROUTINESCHEMA AS BSCHEMA, T1.SPECIFICNAME AS BNAME, T1.BTYPE, T1.BSCHEMA, T1.BNAME
FROM SYSCAT.ROUTINEDEP T1
WHERE T1.ROUTINESCHEMA NOT LIKE 'SYS%'
AND T1.BTYPE <> 'N'
UNION ALL
SELECT T1.TABSCHEMA AS BSCHEMA, T1.TABNAME AS BNAME, 'T', T1.REFTABSCHEMA, T1.REFTABNAME
FROM SYSCAT.REFERENCES T1
WHERE T1.TABSCHEMA NOT LIKE 'SYS%'
),
RECURSIVE_DEPENDENCIES (LEVEL, BSCHEMA, BNAME, DTYPE, DSCHEMA, DNAME) AS
(
SELECT 1, U.BSCHEMA, U.BNAME, U.DTYPE, U.DSCHEMA, U.DNAME
FROM FIRST_LEVEL_DEPENDENCIES AS U
UNION ALL
SELECT LEVEL + 1, REC.BSCHEMA, REC.BNAME, U.DTYPE, U.DSCHEMA, U.DNAME
FROM RECURSIVE_DEPENDENCIES REC,
FIRST_LEVEL_DEPENDENCIES U
WHERE LEVEL < 6
AND U.BSCHEMA = REC.DSCHEMA
AND U.BNAME = REC.DNAME
)
SELECT BSCHEMA, BNAME, COUNT(*)
FROM RECURSIVE_DEPENDENCIES
GROUP BY BSCHEMA, BNAME
ORDER BY COUNT(*)

I have not a DIRECT solution for DB2, but I can suggest that:
A) In Microsoft SQL Server 2008, it has been solved the problem to DELETE (not DROP) the tables respecting foreign keys order, at this link:
Generate Delete Statement From Foreign Key Relationships in SQL 2008?
B) In Oracle PL/SQL, it has been solved the problem to to DELETE (not DROP) respecting foreign keys order, at this link:
How to generate DELETE statements in PL/SQL, based on the tables FK relations?
I think you can arrange one of these two scripts, in order to obtain the solution for DB2.
Do you agree or not?
EDIT 1: At this link:
http://bytes.com/topic/db2/answers/183189-how-delete-tables-completely
I can read:
Robert,
why not simply
LOAD FROM /dev/null of del replace into tablename NONRECOVERABLE
- This truncates the table very quickly, not sure if
it reclaims space updates stats by default?
This has the added advantage that you don't have to
perform deletes in the correct RI order.
(though you will have to do a SET INTEGRITY afterwards)
OK
EDIT 2: Please see the following:
Dropping a schema and all of its contents in DB2 8.x

Related

Store results of SQL Server query for pagination

In my database I have a table with a rather large data set that users can perform searches on. So for the following table structure for the Person table that contains about 250,000 records:
firstName|lastName|age
---------|--------|---
John | Doe |25
---------|--------|---
John | Sams |15
---------|--------|---
the users would be able to perform a query that can return about 500 or so results. What I would like to do is allow the user see his search results 50 at a time using pagination. I've figured out the client side pagination stuff, but I need somewhere to store the query results so that the pagination uses the results from his unique query and not from a SELECT * statement.
Can anyone provide some guidance on the best way to achieve this? Thanks.
Side note: I've been trying to use temp tables to do this by using the SELECT INTO statements, but I think that might cause some problems if, say, User A performs a search and his results are stored in the temp table then User B performs a search shortly after and User A's search results are overwritten.
In SQL Server the ROW_NUMBER() function is great for pagination, and may be helpful depending on what parameters change between searches, for example if searches were just for different firstName values you could use:
;WITH search AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY firstName ORDER BY lastName) AS RN_firstName
FROM YourTable)
SELECT *
FROM search
WHERE RN BETWEEN 51 AND 100
AND firstName = 'John'
You could add additional ROW_NUMBER() lines, altering the PARTITION BY clause based on which fields are being searched.
Historically, for us, the best way to manage this is to create a complete new table, with a unique name. Then, when you're done, you can schedule the table for deletion.
The table, if practical, simply contains an index id (a simple sequenece: 1,2,3,4,5) and the primary key to the table(s) that are part of the query. Not the entire result set.
Your pagination logic then does something like:
SELECT p.* FROM temp_1234 t, primary_table p
WHERE t.pkey = p.primary_key
AND t.serial_id between 51 and 100
The serial id is your paging index.
So, you end up with something like (note, I'm not a SQL Server guy, so pardon):
CREATE TABLE temp_1234 (
serial_id serial,
pkey number
);
INSERT INTO temp_1234
SELECT 0, primary_key FROM primary_table WHERE <criteria> ORDER BY <sort>;
CREATE INDEX i_temp_1234 ON temp_1234(serial_id); // I think sql already does this for you
If you can delay the index, it's faster than creating it first, but it's a marginal improvement most likely.
Also, create a tracking table where you insert the table name, and the date. You can use this with a reaper process later (late at night) to DROP the days tables (those more than, say, X hours old).
Full table operations are much cheaper than inserting and deleting rows in to an individual table:
INSERT INTO page_table SELECT 'temp_1234', <sequence>, primary_key...
DELETE FROM page_table WHERE page_id = 'temp_1234';
That's just awful.
First of all, make sure you really need to do this. You're adding significant complexity, so go & measure whether the queries and pagination really hurts or you just "feel like you should". The pagination can be handled with ROW_NUMBER() quite easily.
Assuming you go ahead, once you've got your query, clearly you need to build a cache so first you need to identify what the key is. It will be the SQL statement or operation identifier (name of stored procedure perhaps) and the criteria used. If you don't want to share between users then the user name or some kind of session ID too.
Now when you do a query, you first look up in this table with all the key data then either
a) Can't find it so you run the query and add to the cache, storing the criteria/keys and the data or PK of the data depending on if you want a snapshot or real time. Bear in mind that "real time" isn't really because other users could be changing data under you.
b) Find it, so remove the results (or join the PK to the underlying tables) and return the results.
Of course now you need a background process to go and clean up the cache when it's been hanging around too long.
Like I said - you should really make sure you need to do this before you embark on it. In the example you give I don't think it's worth it.

TSQL: Best way to get data from temp/scratch table to normalized version?

I'm working in with relatively large data sets; ~200GB. The data is coming from text files that are being imported to SQL via a script. They are being bulkcopy'd into a temp table with the normalized tables waiting to recieve the data.
My question comes from the fact that I'm mostly a scripter so my logic would be to loop through each row and do individual checks per row to put the data where it needs to go but I read a different post on SO saying that's really wrong for SQL.
So my question is, if I have one temp table (31 columns) that is to be normalized between 5 others, what's the best way to go about this?
Table relationship is as follows:
System - Table that contains machine information (e.g. name, domain, etc.)
File - File information (e.g. name, size, directory, etc.)
SystemFile - The many-to-many system<->file relationship table.
Metadata - File metadata (language, etc.) - has foreign key relationship to file primary key
DigitalSignature - File digital signature status - has foreign key relationship to file primary key
Thanks
Dont have any links, don't have enough experience with things like ssis etc to give a balanced view. but when doing the task you are talking about my normal process would be (generic, simple version):
1.look at normalised data set and consider the least dependant components in the data being imported (e.g. order headers created before order items)
2.create queries the select out the data i will have.. these often have this form:
select
t.x,t.y,t.z
from
temp_table as t
left outer join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
where
n.x is null
where temp_table may have lots of columns but these three represent whatever normalised nugget i want to add first, the left outer join and where null make sure i only get the new values - if merging is the same
verify that i am getting good information and that i am only getting the new rows i want. often you have to use group bys or distincts on the temp data to get accurate data for inserting.. something like:
select
t.x,t.y,t.z
from
(select
distinct x,y,z
from
temp_table ) as t
left outer join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
where
n.x is null
3.wrap that select in an insert:
insert into
normalise_table (x,y,z)
select
t.x,t.y,t.z
from
(select
distinct x,y,z
from
temp_table ) as t
left outer join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
where
n.x is null
in this way you are inserting sets of data.. the procedural part is doing this for each set to be inserted, but in general you are not iterating over rows.
BTW T-SQL has a merge command for when you may or may not have the data in the target table (and if you want to remove keys missing from the temp tables)
http://msdn.microsoft.com/en-us/library/bb510625.aspx
Some comments on foreign keys - these tend to be more specific to the situation:
Can you identify the relationship without the primary key? This is the easiest situation to deal with..
Imagine I have inserted my xyz object into a normalised table but it has 100 child rows (abc's) in another table (each child may have 100 children too.. this would mean 10000 rows in the de-normalised data for one xyz)
you would have to go through the validation before but your final query may look something like:
insert into
normalise_table_2 (parentID,a,b,c)
select
n.id,t.a,t.b,t.c
from
(select
distinct x,y,z,a,b,c
from
temp_table ) as t
inner join join normalise_table as n
on t.x=n.x
and t.y=n.y
and t.z=n.z
left outer join normalise_table_2 as n2
on n.id = n2.parentID
and t.a = n2.a
and t.b = n2.b
and t.c = n2.c
where
n2.a is null
or maybe a more readable way:
insert into normalise_table_2 (parentID,a,b,c)
select
*
from (
select distinct
n.id,t.a,t.b,t.c
from
normalise_table as n
inner join temp_table as t
on t.x = n.x
and t.y = n.y
and t.z = n.z
left outer join normalise_table_2 as n2
on t.a = n2.a
and t.b = n2.b
and t.c = n2.c
and n2.parentID = n.id
where
n2.id is null
) as x
If you are having trouble identifying the row without the id here are some points to consider
I often give a unique id to every row in the de-normalised/import data this makes it easier to track what has and has not been done. not to mention paying off in other ways (e.g. when source data has blanks if its they are to be the same as the row above)
I have created temp tables to track relationships like this as I go along.
sometimes (especially for less consistent data) these are not temp tables as they can be used after the fact for analysis what did and didn't import (and where it went), sometimes i have a comments column that the update queries populate with any details about exceptions relating to the import of that row.
sometimes you are lucky and there is some kind of source or oldId field in the target that can be used to link the de-normalised data and normalised version (this is particularly true of system migration type tasks as people often want to be able to look up items in the old system). sometimes this can be weird and wonderful - e.g. using the updated by or created by field looking for a special account that executes this particular process (though i would not particulary recommend that)
Sometimes it makes sense to update the source tables in some way.. e.g. replacing identifiers there
Sometimes you come up with ID ranges or similar that are used for import and you break normal rules about where ids are generated and your import process creates the ID.
this often means shutting down all other access to the target system while the import is executed. may sound mad but sometimes this is the best way for very complex uploads that require a lot of preparation
But often when you think about it there is a particular order you can add your data in and avoid this issue as you will always be able to identify the correct data. I have used the above techniques to make my life easier but I am not sure I have ever HAD to use them..
The only exception I can think of is generating IDs outside of the system which i have had to use, but this was so that IDs would be consistent across multiple trial loads and the final production load. Also data was coming from many sources with many people working on it, it made life easier that they could be in control of their own IDs - but it did bring other issues ;).
Generally I would try and leave the source data alone and ensure that if you re-run any of your scripts then they wont have any effect. this makes the whole system much more robust and gives everyone more confidence as you can re-import the same data or a file that has some of the same data and run everything again and nothing breaks.
note i have not tested any of these queries and just written them off the top of my head so sorry if they are not totally accurate.

SQL - renumbering a sequential column to be sequential again after deletion

I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.

Performance: Subquery or Joining

I got a little question about performance of a subquery / joining another table
INSERT
INTO Original.Person
(
PID, Name, Surname, SID
)
(
SELECT ma.PID_new , TBL.Name , ma.Surname, TBL.SID
FROM Copy.Person TBL , original.MATabelle MA
WHERE TBL.PID = p_PID_old
AND TBL.PID = MA.PID_old
);
This is my SQL, now this thing runs around 1 million times or more.
My question is what would be faster?
If I change TBL.SID to (Select new from helptable where old = tbl.sid)
OR
If I add the 'HelpTable' to the from and do the joining in the where?
edit1
Well, this script runs only as much as there r persons.
My program has 2 modules one that populates MaTabelle and one that transfers data. This program does merge 2 databases together and coz of this, sometimes the same Key is used.
Now I'm working on a solution that no duplicate Keys exists.
My solution is to make a 'HelpTable'. The owner of the key(SID) generates a new key and writes it into a 'HelpTable'. All other tables that use this key can read it from the 'HelpTable'.
edit2
Just got something in my mind:
if a table as a Key that can be null(foreignkey that is not linked)
then this won't work with the from or?
Modern RDBMs, including Oracle, optimize most joins and sub queries down to the same execution plan.
Therefore, I would go ahead and write your query in the way that is simplest for you and focus on ensuring that you've fully optimized your indexes.
If you provide your final query and your database schema, we might be able to offer detailed suggestions, including information regarding potential locking issues.
Edit
Here are some general tips that apply to your query:
For joins, ensure that you have an index on the columns that you are joining on. Be sure to apply an index to the joined columns in both tables. You might think you only need the index in one direction, but you should index both, since sometimes the database determines that it's better to join in the opposite direction.
For WHERE clauses, ensure that you have indexes on the columns mentioned in the WHERE.
For inserting many rows, it's best if you can insert them all in a single query.
For inserting on a table with a clustered index, it's best if you insert with incremental values for the clustered index so that the new rows are appended to the end of the data. This avoids rebuilding the index and often avoids locks on the existing records, which would slow down SELECT queries against existing rows. Basically, inserts become less painful to other users of the system.
Joining would be much faster than a subquery
The main difference betwen subquery and join is
subquery is faster when we have to retrieve data from large number of tables.Because it becomes tedious to join more tables.
join is faster to retrieve data from database when we have less number of tables.
Also, this joins vs subquery can give you some more info
Instead of focussing on whether to use join or subquery, I would focus on the necessity of doing 1,000,000 executions of that particular insert statement. Especially as Oracle's optimizer -as Marcus Adams already pointed out- will optimize and rewrite your statements under the covers to its most optimal form.
Are you populating MaTabelle 1,000,000 times with only a few rows and issue that statement? If yes, then the answer is to do it in one shot. Can you provide some more information on your process that is executing this statement so many times?
EDIT: You indicate that this insert statement is executed for every person. In that case the advice is to populate MATabelle first and then execute once:
INSERT
INTO Original.Person
(
PID, Name, Surname, SID
)
(
SELECT ma.PID_new , TBL.Name , ma.Surname, TBL.SID
FROM Copy.Person TBL , original.MATabelle MA
WHERE TBL.PID = MA.PID_old
);
Regards,
Rob.

MySQL - Selecting data from multiple tables all with same structure but different data

Ok, here is my dilemma I have a database set up with about 5 tables all with the exact same data structure. The data is separated in this manner for localization purposes and to split up a total of about 4.5 million records.
A majority of the time only one table is needed and all is well. However, sometimes data is needed from 2 or more of the tables and it needs to be sorted by a user defined column. This is where I am having problems.
data columns:
id, band_name, song_name, album_name, genre
MySQL statment:
SELECT * from us_music, de_music where `genre` = 'punk'
MySQL spits out this error:
#1052 - Column 'genre' in where clause is ambiguous
Obviously, I am doing this wrong. Anyone care to shed some light on this for me?
I think you're looking for the UNION clause, a la
(SELECT * from us_music where `genre` = 'punk')
UNION
(SELECT * from de_music where `genre` = 'punk')
It sounds like you'd be happer with a single table. The five having the same schema, and sometimes needing to be presented as if they came from one table point to putting it all in one table.
Add a new column which can be used to distinguish among the five languages (I'm assuming it's language that is different among the tables since you said it was for localization). Don't worry about having 4.5 million records. Any real database can handle that size no problem. Add the correct indexes, and you'll have no trouble dealing with them as a single table.
Any of the above answers are valid, or an alternative way is to expand the table name to include the database name as well - eg:
SELECT * from us_music, de_music where `us_music.genre` = 'punk' AND `de_music.genre` = 'punk'
The column is ambiguous because it appears in both tables you would need to specify the where (or sort) field fully such as us_music.genre or de_music.genre but you'd usually specify two tables if you were then going to join them together in some fashion. The structure your dealing with is occasionally referred to as a partitioned table although it's usually done to separate the dataset into distinct files as well rather than to just split the dataset arbitrarily. If you're in charge of the database structure and there's no good reason to partition the data then I'd build one big table with an extra "origin" field that contains a country code but you're probably doing it for legitimate performance reason.
Either use a union to join the tables you're interested in http://dev.mysql.com/doc/refman/5.0/en/union.html or by using the Merge database engine http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html.
Your original attempt to span both tables creates an implicit JOIN. This is frowned upon by most experienced SQL programmers because it separates the tables to be combined with the condition of how.
The UNION is a good solution for the tables as they are, but there should be no reason they can't be put into the one table with decent indexing. I've seen adding the correct index to a large table increase query speed by three orders of magnitude.
The union statement cause a deal time in huge data. It is good to perform the select in 2 steps:
select the id
then select the main table with it