Finding table and package references in oracle

Finding table and package references in oracle - sql

I have a list of tables and a list of packages. I need to come up with the below two lists
What are the packages that uses the given set of tables
List of tables that are referenced by each of the given package
The packages uses dynamic sql, hence I may not be able to depend only on dba_reference table.
The other way I could think of is using a LIKE clause against the dba_source table. But, I will have to write a OR condition for each of the tables that I need (or of course a function or procedure which can loop through each table)
Is there any better way of doing this?
Any help is greatly appreciated.
Edit: rephrasing the question -
I have a package which select/inserts/updates several tables. This has dynamic sql. One example is provided below.
I want to identify all the tables referred in this package. What is the best way to achieve this?
In the below example I want to capture both table1 and table2.
if flag = 'Y'
then final_sql := 'insert into table1 (...)';
else final_sql := 'insert into table2 (...)';
end if;
execute immediate final_sql;

For systems using a lot of dynamic SQL I suggest two approaches.
First one is to apply strict coding standards so you know what to look for and can then reliably parse out the table names from the rest of the code. I mean, always have the table same string written to a known variable name, and search for that variable.
This is not always easy to do, especially if you have mountains of code that do not follow the standard. It only takes a couple of folk to not adhere to the standard and it all falls down. However it can be made to work, but probably never going to be 100% reliable.
Another approach is to write test scripts that exercise the whole code base and logic paths. Write them in such a way that they log the procedure name. Enable SQL Trace and capture the trace files from the tests. With clever scripting you should be able to tie the trace to the procedure. This will give you the "raw" SQL, which you can then grep for matches with you list of tables. You might be able to get the same info by harvesting V$SQL tying to V$SESSION.
This is an old school way of doing this, but one that I have used and works.
On one of the largest systems I worked on I wrote a CRUD parser which tokenised the code and produced a CRUD matrix by source file and table access. For dynamic SQL we processed SQL Trace/tkprof files.
If the code has good amount of debugging which dumps out these the table names you could again run the test scripts and harvest the debug logs.

Related

Netezza - reuse common sql code across multiple queries in a DRY manner?

Is it possible with Netezza queries to include sql files (which contain specific sql code) or it is not the right way of usage ?
Here is an example.
I have some common sql code (lets say common.sql) which creates some temp table and needs to be used across multiple other queries (lets say analysis1.sql, analysis2.sql etc.) - . From a code management perspective it is quite overwhelming to maintain if the code in common.sql is repeated across the many other queries. Is there a DRY way to do this - something like #include <common.sql> from the other queries to call the reused code common.sql ?

Including sql files is not the right way to do it. If you wish to persist with this you could use a preprocessor like cpp or even php to assemble the files for you and have a build process to generate finished ones.
However from a maintainability perspective you are better off creating views and functions for reusable content. Note that this can pose optimization barriers so large queries are often the way to go.

I agree, views, functions (table values if needed) or more likely: stored procedures is the way to go.
We have had a lot of luck letting stored procedures generate complex but repeatable code patterns on the fly based on input parameters and metadata on the tables being processed.
An example: all tables has a 'unique constraint' (which is not really unique, but that doesn't matter since it isn't enforced in Netezza) that has a fixed name of UBK_[tablename]
UBK is used as a 'signal' to the stored procedure identifying the columns of the BusinessKey for a classic 'kimball style' type 2 dimension table.
The SP can then apply the 'incoming' rows to the target table just by being supplied With the name of the target table and a 'stage' table containing all the same column names and data types.
Other examples could be a SP that takes a tablename and three arguments each with a 'string,of,columns' and does a 'excel style pivot' with group-by on the columns in the first argument and uses the second argument as to do a 'select distinct' to generate new column names for the pivoted columns, and does a 'sum' on the column in the third argument into some target table you specify the name for...
Can you follow me?
I think that the nzsql command line tool may be able to do an 'include', but a combination of strong 'building block stored procedures' and perl/python and/or an ETL tool will most likely proove a better choice

Can data and schema be changed with DB2/z load/unload?

I'm trying to find an efficient way to migrate tables with DB2 on the mainframe using JCL. When we update our application such that the schema changes, we need to migrate the database to match.
What we've been doing in the past is basically creating a new table, selecting from the old table into that, deleting the original and renaming the new table to the original name.
Needless to say, that's not a very high-performance solution when the tables are big (and some of them are very big).
With latter versions of DB2, I know you can do simple things like alter column types but we have migration jobs which need to do more complicated things to the data.
Consider for example the case where we want to combine two columns into one (firstname + lastname -> fullname). Never mind that it's not necessarily a good idea to do that, just take it for granted that this is the sort of thing we need to do. There may be arbitrarily complicated transformations to the data, basically anything you can do with a select statement.
My question is this. The DB2 unload utility can be used to pull all of the data out of a table into a couple of data sets (the load JCL used for reloading the data, and the data itself). Is there an easy way (or any way) to massage this output of unload so that these arbitrary changes are made when reloading the data?
I assume that I could modify the load JCL member and the data member somehow to achieve this but I'm not sure how easy that would be.
Or, better yet, can the unload/load process itself do this without having to massage the members directly?
Does anyone have any experience of this, or have pointers to redbooks or redpapers (or any other sources) that describe how to do this?
Is there a different (better, obviously) way of doing this other than unload/load?

As you have noted, SELECTing from the old table into the new table will have very poor performance. Poor performance here is generally due to the relatively high costs of insertion INTO the target table (index building and RI enforcement). The SELECT itself is generally not a performance issue. This is why the LOAD utility is generally perferred when large tables need to be populated from scratch, indices may be built more efficiently and RI may be deferred.
the UNLOAD utility allows unrestricted usage of SELECT. If you can SELECT data using scalar and/or column functions to build a result set that is compatible with your new table column definitions then UNLOAD can be used to do the data conversion. Specify a SELECT statement in SYSIN for the UNLOAD utility. Something like:
//SYSIN DD *
SELECT CONCAT(FIRST_NAME, LAST_NAME) AS "FULLNAME"
FROM OLD_TABLE
/*
The resulting SYSRECxx file will contain a single column that is a concatenation of the two identified columns (result of the CONCAT function) and SYSPUNCH will contain a
compatible column definition for FULLNAME - the converted column name for the new table. All you need to do is edit the new table name in SYSPUNCH (this will have defaulted to TBLNAME) and LOAD it. Try not to fiddle with the SYSRECxx data or the SYSPUNCH column definitions - a goof here could get ugly.
Use the REPLACE option when running the LOAD utility
to create the new table (I think the default is LOAD RESUME which won't work here). Often it is a good idea to leave RI off when running the LOAD, this will improve performance and
save the headache of figuring out the order in which LOAD jobs need to be run. Once finished you need to verify the
RI and build the indices.
The LOAD utility is documented here

I assume that I could modify the load JCL member and the data member somehow to achieve this but I'm not sure how easy that would be.
I believe you have provided the answer within your question. As to the question of "how easy that would be," it would depend on the nature of your modifications.
SORT utilities (DFSORT, SyncSort, etc.) now have very sophisticated data manipulation functions. We use these to move data around, substitute one value for another, combine fields, split fields, etc. albeit in a different context from what you are describing.
You could do something similar with your load control statements, but that might not be worth the trouble. It will depend on the extent of your changes. It may be worth your time to attempt to automate modification of the load control statements if you have a repetitive modification that is necessary. If the modifications are all "one off" then a manual solution may be more expedient.

Keeping dynamic out of SQL while using specifications with stored procedures

A specification essentially is a text string representing a "where" clause created by an end user.
I have stored procedures that copy a set of related tables and records to other places. The operation is always the same, but dependent on some crazy user requirements like "products that are frozen and blue and on sale on Tuesday".
What if we fed the user specification (or string parameter) to a scalar function that returned true/false which executed the specification as dynamic SQL or just exec (#variable).
It could tell us whether those records exist. We could add the result of the function to our copy products where clause.
It would keep us from recompiling the copy script each time our where clauses changed. Plus it would isolate the product selection in to a single function.
Anyone ever do anything like this or have examples? What bad things could come of it?
EDIT:
This is the specification I simply added to the end of each insert/select statement:
and exists (
select null as nothing
from SameTableAsOutsideTable inside
where inside.ID = outside.id and -- Join operations to outside table
inside.page in (6, 7) and -- Criteria 1
inside.dept in (7, 6, 2, 4) -- Criteria 2
)
It would be great to feed a parameter into a function that produces records based on the user criteria, so all that above could be something like:
and dbo.UserCriteria( #page="6,7", #dept="7,6,2,4")

Dynamic Search Conditions in T-SQL
When optimizing SQL the important thing is optimizing the access path to data (ie. index usage). This trumps code reuse, maintainability, nice formatting and just about every other development perk you can think of. This is because a bad access path will cause the query to perform hundreds of times slower than it should. The article linked sums up very well all the options you have, and your envisioned function is nowhere on the radar. Your options will gravitate around dynamic SQL or very complicated static queries. I'm afraid there is no free lunch on this topic.

It doesn't sound like a very good idea to me. Even supposing that you had proper defensive coding to avoid SQL injection attacks it's not going to really buy you anything. The code still needs to be "compiled" each time.
Also, it's pretty much always a bad idea to let users create free-form WHERE clauses. Users are pretty good at finding new and innovative ways to bring a server to a grinding halt.
If you or your users or someone else in the business can't come up with some concrete search requirements then it's likely that someone isn't thinking about it hard enough and doesn't really know what they want. You can have pretty versatile search capabilities without letting the users completely loose on the system. Alternatively, look at some of the BI tools out there and consider creating a data mart where they can do these kinds of ad hoc searches.

How about this:
You create another store procedure (instead of function) and pass the right condition to it.
Based on that condition it dumps the record ids to a temp table.
Next you move procedure will read ids from that table and do the needful things?
Or you could create a user function that returns a table which is nothing but the ids of the records that matches your criteria (dynamic)
If I am totally off, then please clarify me.
Hope this helps.

If you are forced to use dynamic queries and you don't have any solid and predefined search requirements, it is strongly recommended to use sp_executesql instead of EXEC . It provides parametrized queries to prevent SQL Injection attacks (to some extent) and It makes use of execution plans to speed up performance. (More info)

Best way to compare contents of two tables in Teradata?

When you need to compare two tables to see what the differences are, are there any tools or shortcuts you use, or do you handcode the SQL to compare the two tables?
Basically the core features of a product like Red Gate SQL Data Compare (schemas for my tables typically always match).
Background: In my SQL Server environment, I created a stored procedure which inspects the metadata of the two tables/views, creates a query (as dynamic sql) which joins the two tables on the specified key columns, and compares data in the compare columns, reporting key differences and data differences. The query can either be printed and modified/copied or just excecuted as is. We are not allowed to create stored procedures in our Teradata environment, unfortunately.

Sounds like a data profiling tool such as Talend's Open Profiler would make the most sense at that point.
You could write a BTEQ statement that builds the query similar to your SQL Server stored procedure and then export the dynamically built SQL. You can then in turn run that inside of your BTEQ. It might get cumbersome, but with enough determination you could probably mock something up.

I dont know if this is the right answer you are searching for.
sel * from database_name1.table_name1
minus
sel * from database_name2.table_name2;
you can do the same by selecting specific columns. This will basically give the non existent rows from table2 which are in table1.
If you were not looking for this type of answer, please ignore this and continue.
Also you can select like below.
select
table1.keycol1,
table2.keycol2,
(table1.factcol1 - table2.factcol2) as diff
from table1
inner join
table2
on table1.keycol1 = table2.keycol1
and table1.keycol2 = table2.keycol2
where diff <> 0
This was just an analysis which can give an idea. Please ignore any syntactical and programmatical errors.
Hope this helps.

Strategy for identifying unused tables in SQL Server 2000?

I'm working with a SQL Server 2000 database that likely has a few dozen tables that are no longer accessed. I'd like to clear out the data that we no longer need to be maintaining, but I'm not sure how to identify which tables to remove.
The database is shared by several different applications, so I can't be 100% confident that reviewing these will give me a complete list of the objects that are used.
What I'd like to do, if it's possible, is to get a list of tables that haven't been accessed at all for some period of time. No reads, no writes. How should I approach this?

MSSQL2000 won't give you that kind of information. But a way you can identify what tables ARE used (and then deduce which ones are not) is to use the SQL Profiler, to save all the queries that go to a certain database. Configure the profiler to record the results to a new table, and then check the queries saved there to find all the tables (and views, sps, etc) that are used by your applications.
Another way I think you might check if there's any "writes" is to add a new timestamp column to every table, and a trigger that updates that column every time there's an update or an insert. But keep in mind that if your apps do queries of the type
select * from ...
then they will receive a new column and that might cause you some problems.

Another suggestion for tracking tables that have been written to is to use Red Gate SQL Log Rescue (free). This tool dives into the log of the database and will show you all inserts, updates and deletes. The list is fully searchable, too.
It doesn't meet your criteria for researching reads into the database, but I think the SQL Profiler technique will get you a fair idea as far as that goes.

If you have lastupdate columns you can check for the writes, there is really no easy way to check for reads. You could run profiler, save the trace to a table and check in there
What I usually do is rename the table by prefixing it with an underscrore, when people start to scream I just rename it back

If by not used, you mean your application has no more references to the tables in question and you are using dynamic sql, you could do a search for the table names in your app, if they don't exist blow them away.
I've also outputted all sprocs, functions, etc. to a text file and done a search for the table names. If not found, or found in procedures that will need to be deleted too, blow them away.

It looks like using the Profiler is going to work. Once I've let it run for a while, I should have a good list of used tables. Anyone who doesn't use their tables every day can probably wait for them to be restored from backup. Thanks, folks.

Probably too late to help mogrify, but for anybody doing a search; I would search for all objects using this object in my code, then in SQL Server by running this :
select distinct '[' + object_name(id) + ']'
from syscomments
where text like '%MY_TABLE_NAME%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas