Best way to compare contents of two tables in Teradata? - sql

When you need to compare two tables to see what the differences are, are there any tools or shortcuts you use, or do you handcode the SQL to compare the two tables?
Basically the core features of a product like Red Gate SQL Data Compare (schemas for my tables typically always match).
Background: In my SQL Server environment, I created a stored procedure which inspects the metadata of the two tables/views, creates a query (as dynamic sql) which joins the two tables on the specified key columns, and compares data in the compare columns, reporting key differences and data differences. The query can either be printed and modified/copied or just excecuted as is. We are not allowed to create stored procedures in our Teradata environment, unfortunately.

Sounds like a data profiling tool such as Talend's Open Profiler would make the most sense at that point.
You could write a BTEQ statement that builds the query similar to your SQL Server stored procedure and then export the dynamically built SQL. You can then in turn run that inside of your BTEQ. It might get cumbersome, but with enough determination you could probably mock something up.

I dont know if this is the right answer you are searching for.
sel * from database_name1.table_name1
minus
sel * from database_name2.table_name2;
you can do the same by selecting specific columns. This will basically give the non existent rows from table2 which are in table1.
If you were not looking for this type of answer, please ignore this and continue.
Also you can select like below.
select
table1.keycol1,
table2.keycol2,
(table1.factcol1 - table2.factcol2) as diff
from table1
inner join
table2
on table1.keycol1 = table2.keycol1
and table1.keycol2 = table2.keycol2
where diff <> 0
This was just an analysis which can give an idea. Please ignore any syntactical and programmatical errors.
Hope this helps.

Related

How to do 2 select queries?

I am using Firebird and what I want to do is display 2 different select queries. Here is an example
select * from tblStates;
select * from tblTeachers;
This are two tables with 2 completely different columns. When I use the code above firebird will only display tblTeachers. What I want is to see both tblStates and tblTeachers as two different tables. I was told to use suspend but I don't know the syntax and when I just type suspend there is a unknown token error.
I am unfamiliar with the details of Firebird. However, in doing some research, I came across this post that might help.
What you're looking for is considered a batch separator statement. In SQL Server, it would be something like:
SELECT * from myTable1
GO
SELECT * from myTable2
GO
This would return two tables in a table or database studio viewer. I did not see something similar for Firebird other than what is linked above.
However, the next question is why are you wanting this functionality? Are you sure there is not a relationship between States and Teachers, as per your example? If there is not, then a common practice would be to run your unrelated SQL statements and save the returned tables in memory for use in your application.
Sometimes, if you cannot figure out a way to do what you want, its a good idea to look back at exactly what your goal is and wonder if there might be a better way :)
Hope this helps.

Finding table and package references in oracle

I have a list of tables and a list of packages. I need to come up with the below two lists
What are the packages that uses the given set of tables
List of tables that are referenced by each of the given package
The packages uses dynamic sql, hence I may not be able to depend only on dba_reference table.
The other way I could think of is using a LIKE clause against the dba_source table. But, I will have to write a OR condition for each of the tables that I need (or of course a function or procedure which can loop through each table)
Is there any better way of doing this?
Any help is greatly appreciated.
Edit: rephrasing the question -
I have a package which select/inserts/updates several tables. This has dynamic sql. One example is provided below.
I want to identify all the tables referred in this package. What is the best way to achieve this?
In the below example I want to capture both table1 and table2.
if flag = 'Y'
then final_sql := 'insert into table1 (...)';
else final_sql := 'insert into table2 (...)';
end if;
execute immediate final_sql;
For systems using a lot of dynamic SQL I suggest two approaches.
First one is to apply strict coding standards so you know what to look for and can then reliably parse out the table names from the rest of the code. I mean, always have the table same string written to a known variable name, and search for that variable.
This is not always easy to do, especially if you have mountains of code that do not follow the standard. It only takes a couple of folk to not adhere to the standard and it all falls down. However it can be made to work, but probably never going to be 100% reliable.
Another approach is to write test scripts that exercise the whole code base and logic paths. Write them in such a way that they log the procedure name. Enable SQL Trace and capture the trace files from the tests. With clever scripting you should be able to tie the trace to the procedure. This will give you the "raw" SQL, which you can then grep for matches with you list of tables. You might be able to get the same info by harvesting V$SQL tying to V$SESSION.
This is an old school way of doing this, but one that I have used and works.
On one of the largest systems I worked on I wrote a CRUD parser which tokenised the code and produced a CRUD matrix by source file and table access. For dynamic SQL we processed SQL Trace/tkprof files.
If the code has good amount of debugging which dumps out these the table names you could again run the test scripts and harvest the debug logs.

How to check if a SQL SELECT query is a subset of other query

I an trying to find a way to determine whether or not an SQL SELECT query A is prone to return a subset of the results returned by another query B. Furthermore, this needs to be acomplished from the queries alone, without having access to the respective result sets.
For example, the query SELECT * from employee WHERE salary >= 1000 will return a subset of the results of query SELECT * from employee. I need to find an automated way to perform this validation for any two queries A and B, without accessing the database that stores the data.
If it is unfeasable to achieve this without the aid of an RDBMS, we can assume that I have access to a local, but empty RDBMS, but with the data stored somewhere else. In addition, this check must be done in code, either using an algorithm or a library. The language I am using is Java, but other language will also do.
Many thanks in advance.
I don't know how deep you want to get into parsing queries, but basically you can say that there are two general ways of making a subset of a query (given that source table and projection(select) staying the same):
using where clause to add condition to row values
using having clause to add conditions to aggregated values
So you can say that if you have two objects that represent queries and say they look something close to this:
{
'select': { ... },
'from': {},
'where': {},
'orderby': {}
}
and they have select, from and orderby to be the same, but one have extra condition in the where clause , you have a subset.
One way you might be able to determine if a query is a subset of another is by examining their source tables. If you don't have access to the data itself, this can be tricky. This question references using Snowflake joins to generate database diagrams based on a query without having access to the data itself:
Generate table relationship diagram from existing schema (SQL Server)
If your query is 800 characters or less, the tool is free to use: https://snowflakejoins.com/index.html
I tested it out using the AdventureWorks database and these two queries:
SELECT * FROM HumanResources.Employee
SELECT * FROM HumanResources.Employee WHERE EmployeeID < 200
When I plugged both of them into the Snowflake Joins text editor, this is what was generated:
SnowflakeJoins DB Diagram example
Hope that helps.

DYnamic SQL examples

I have lately learned what is dynamic sql and one of the most interesting features of it to me is that we can use dynamic columns names and tables. But I cannot think about useful real life examples. The only one that came into my mind is statistical table.
Let`s say that we have table with name, type and created_data. Then we want to have a table that in columns are years from created_data column and in row type and number of names created in years. (sorry for my English)
What can be other useful real life examples of using dynamic sql with column and table as parameters? How do you use it?
Thanks for any suggestions and help :)
regards
Gabe
/edit
Thx for replies, I am particulary interested in examples that do not contain administrative things or database convertion or something like that, I am looking for examples where the code in example java is more complicated than using a dynamic sql in for example stored procedure.
An example of dynamic SQL is to fix a broken schema and make it more usable.
For example if you have hundreds of users and someone originally decided to create a new table for each user, you might want to redesign the database to have only one table. Then you'd need to migrate all the existing data to this new system.
You can query the information schema for table names with a certain naming pattern or containing certain columns then use dynamic SQL to select all the data from each of those tables then put it into a single table.
INSERT INTO users (name, col1, col2)
SELECT 'foo', col1, col2 FROM user_foo
UNION ALL
SELECT 'bar', col1, col2 FROM user_bar
UNION ALL
...
Then hopefully after doing this once you will never need to touch dynamic SQL again.
Long-long ago I have worked with appliaction where users uses their own tables in common database.
Imagine, each user can create their own table in database from UI. To get the access to data from these tables, developer needs to use the dynamic SQL.
I once had to write an Excel import where the excel sheet was not like a csv file but layed out like a matrix. So I had to deal with a unknown number of columns for 3 temporary tables (columns, rows, "infield"). The rows were also a short form of tree. Sounds weird, but was a fun to do.
In SQL Server there was no chance to handle this without dynamic SQL.
Another example from a situation I recently came up against. A MySQL database of about 250 tables, all in MyISAM engine and no database design schema, chart or other explanation at all - well, except the not so helpful table and column names.
To plan for conversion to InnoDB and find possible foreign keys, we either had to manually check all queries (and the conditions used in JOIN and WHERE clauses) created from the web frontend code or make a script that uses dynamic SQL and checks all combinations of columns with compatible datatype and compares the data stored in those columns combinations (and then manually accept or reject these possibilities).

Integration Service drops entires on joining an table

i have a special problem with SQL Integarion service 2005 (SSIS). During a stored procedure i fill a table with data. Afterwards i join this table over a varchar column with SSIS and another table, but i miss some of the entries. If i do the same using only SQL server (no SSIS) i get all entires. I know already SSIS has a different mechanism for comparing (on byte level) but i can find, why this entries are missing.
I already compared the length of the text of the entries,checked it by hand, tried differend collation.
Has anyone a idea, how i identify this entires (which missing on SSIS)?
Best Regards
SSIS is case-sensitive, so if you are joining on string columns, you may not get some matches. Given that you've checked for length and collation already, it sounds like this may be the issue. If this is the problem, you can make the columns Uppercase in the Data Flow to perform the join operation.
If you're using a Merge Join component to do the join in SSIS, make sure your source queries are ordering the results by the column you're using to join.
It's a common mistake to set the IsSorted value to True on the source without actually ordering the results in the query with an ORDER BY clause.