Dynamic column query for test script - sql

I am testing tables (and data therein) acquired into our data lake against the source application tables. We do not transform any of the data on acquisition but we do not always acquire all columns of a table and the acquisition process adds several data lake columns to the table (date acquired etc.)
So I have to compare two tables where most of the columns are the same but some aren't. Obviously I can deal with this by manually specifying the columns for each SELECT statement. I want to make a testing script that will do this automatically, comparing the common columns and then allowing me to do further queries using that list of columns.
I already test common columns to ensure data type integrity between columns:
SELECT /*fixed*/
b.column_name,
a.data_type AS source_data_type,
b.data_type AS acquired_data_type,
CASE
WHEN a.data_type = b.data_type THEN 'Pass'
ELSE 'Fail'
END AS DATA_TYPE_TEST
FROM
all_tab_cols#&sourcelink a
INNER JOIN all_tab_cols b ON a.column_name = b.column_name
WHERE
a.owner = '&sourceschema'
AND b.owner = 'DATALAKE'
AND a.table_name = '&tableName'
AND b.table_name = '&tableName';
The above works as intended and gets only common columns. How can I save this list of common columns so that when I'm querying the tables directly I can use them in a further query, such as:
SELECT
<my dynamic list of columns here>
FROM
&sourceschema..&tablename#&sourcelink a
INNER JOIN datalake.&tablename b ON a.id = b.id;
Is this possible with Oracle PL/SQL or should I use python instead?

LISTAGG can reduce this to a column list for you
SQL> select listagg(column_name,',') within group ( order by column_id)
2 from user_tab_columns
3 where table_name = 'EMP';
LISTAGG(COLUMN_NAME,',')WITHINGROUP(ORDERBYCOLUMN_ID)
--------------------------------------------------------------------------
EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
and then you could return a dynamic ref cursor to whatever client you want to, ie
open my_ref_cur for
'select '||col_list||' from ....';

Related

oracle sql column alias

I have 3 tables A, B, c and I want to join those tables. These tables have common columns like, id_no, order_no
and i want to write a query that returns all columns from all 3 tables with column name extension like tabA., tabB., tabC....I don't want to manually specify all column names. In that way i can differentiate the common columns among tables.
select tabA.id_no, tabA.order_no, tabA....., tabB.id_no, tabB.order_no,tabB..., tabC.id_no, tabC.order_no,tabC..
from A tabA, B tabB, C tabC
where tabA.id_no = tabB.id_no
and tabB.id_no = tabC.id_no
could u pls let me know how to achieve this in oracle sql.
Oracle SQL Developer can do that.
Write your * query, put your mouse over the '*'
SQL Developer offers to explode that to the fully qualified column list, click the blue text.
Ta-da.
Don't forget your WHERE clause or ANSI join in the FROM, or your DBA will explain to you what a Cartesian product is.
If your table has foreign keys, SQLDev can generate that as well.
You can do the following:
SELECT tabA.*, tabB.*, tabC.*
FROM a tabA INNER JOIN b tabB
ON tabA.id_no = tabB.id_no
INNER JOIN c tabC
ON tabB.id_no = tabC.id_no;
EDIT
If you want only to get a list of the columns associated with the three tables, and to see which column names are common among the three, then you can try something like the following:
SELECT column_name, COUNT(*), LISTAGG(table_name, ',') WITHIN GROUP ( ORDER BY table_name
FROM all_tab_columns
WHERE owner = '<table_owner>'
AND table_name IN ('A','B','C')
GROUP BY column_name;
N.B. LISTAGG() assumes you're using Oracle 11g or greater; prior to that you can use the undocumented function WM_CONCAT().
Hope this helps.

SQL multiple tables query for column names

I have multiple tables (number > 100) within a database in SQL, each table may have a few hundred entries.
For every table, I am seeking to retrieve simply the names of the columns from the tables which have at least 1 non-null entry.
How can I do this?
To return table/column name:
SELECT table_name, column_name
FROM information_schema.columns
That's pretty easy, here's a solution for nulls depending on if you have permissions:
select a.table_name
, schema_name
, sum(c.rows) total_rows
from
information_schema.tables a
join information_schema.schemas b on (a.schema_id = b.schema_id)
join information_schema.partitions c on (a.object_id = c.object_id)
where c.index_id in (0,1)
group by a.name,b.name
having sum(c.rows) = 0;
Note: I did this in vertica, and you have to have access to partitions. Also, some dbs use sys instead of information_schema, but the idea is the same.

Compare 2 different tables columns from 2 different databases

I have a requirement to compare different tables' columns from 2 different databases, in order to add columns to the master tables based on the requirement.
For example:
Assume in master database I have created one table like:
create table test(id int,name varchar(10))
Assume in test database I have created one table like
create table testings(id int,name varchar(20), sal int)
now I have to compare 2 table columns
I don't want to use red-gate tools.
Can anyone help me?
Is it just red-gate tools you don’t want to use or basically any third party tool? Why not, even if you don’t have the budget to buy you can still use this in trial mode to get the job done?
We’ve been using Apex Diff tool but there are many more out there.
With so many tools available you can probably run all one by one in trial mode for months…
Knowing system tables and how to do this natively is great but it’s just too time consuming...
You can use the EXCEPT or INTERSECT set operators for this. Like so:
SELECT id, name FROM master.dbo.test
EXCEPT -- or INTERSECT
SELECT id, name FROM test.dbo.testings
This will give you:
EXCEPT: returns any distinct values from the left query that are not
also found on the right query.
INTERSECT: returns any distinct values that are returned by both the
query on the left and right sides of the INTERSECT operand.
In your case, since you want to select from two different databases, you have to use a fully qualified table names. They have to be in the form database.schema.object_name.
Update: If you want compare the two tables columns' names, not the data itself, you have to work with the metadata tables to compare the columns' names the same way with EXCEPT.
For instance, suppose you have two databases:
Test database contains the table:
create table test(id int, name varchar(10), dep varchar(50));
and another database:
anotherdatabase database contains the table:
create table testings(id int,name varchar(20), sal int);
And you want to compare the two tables' columns and get the tables that don't exist in the other table, in our example you need to get sal and dep.
Then you can do this:
SELECT ColumnName
FROM
(
SELECT c.name "ColumnName"
FROM test.sys.tables t
INNER JOIN test.sys.all_columns c
ON t.object_id = c.object_id
INNER JOIN test.sys.types ty
ON c.system_type_id = ty.system_type_id
WHERE t.name = 'test'
EXCEPT
SELECT c.name
FROM anotherdatabase.sys.tables t
INNER JOIN anotherdatabase.sys.all_columns c
ON t.object_id = c.object_id
INNER JOIN anotherdatabase.sys.types ty
ON c.system_type_id = ty.system_type_id
WHERE t.name = 'testings'
) t1
UNION ALL
SELECT ColumnName
FROM
(
SELECT c.name ColumnName
FROM anotherdatabase.sys.tables t
INNER JOIN anotherdatabase.sys.all_columns c
ON t.object_id = c.object_id
INNER JOIN anotherdatabase.sys.types ty
ON c.system_type_id = ty.system_type_id
WHERE t.name = 'testings'
EXCEPT
SELECT c.name
FROM test.sys.tables t
INNER JOIN test.sys.all_columns c
ON t.object_id = c.object_id
INNER JOIN test.sys.types ty
ON c.system_type_id = ty.system_type_id
WHERE t.name = 'test'
) t2;
This should give you:
Note that: I joined the tables:
databasename.sys.tables, and
databasename.sys.all_columns
with the table:
databasename.systypes
to get only those columns that have the same data type. If you didn't joined this table, then if two columns have the same name but different data type, they would be the same.
To compare columns use INFORMATION_SCHEMA.COLUMNS table in a SQL SERVER.
This is the exmaple:
select column_name from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME='your_table_name1'
except
select column_name from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME='your_table_name2'
This is a GPL Java program I wrote for comparing data in any two tables, with a common key and common columns, across any two heterogeneous databases using JDBC: https://sourceforge.net/projects/metaqa/
It intelligently forgives (numeric, string and date) data type differences by reducing them to a common format. The output is a sparse tab delimited file with .xls extension for use in a spreadsheet.

Retrieving column and other metadata information in Teradata

I have a half dozen views in SQL Server that I need to replicate in Teradata, but I haven't been able to find the TD equivalent of the SQL metadata tables. I'd like to replicate the following functionality (which I assume is fairly self-explainatory):
select table_name, column_id ordinal_position, column_name,
data_type, char_length char_max_length,
data_precision numeric_precision, data_scale numeric_scale
from user_tab_columns
select name as FUNCTION_NAME
from sys.objects
where type_desc='SQL_SCALAR_FUNCTION'
select TABLE_NAME as VIEW_NAME
from INFORMATION_SCHEMA.VIEWS
I'd also like to know if there are any usable Teradata references online; everything I run across seems to be advertising rather than practical information.
All Teradata system tables are stored under DBC schema.
For columns, it is dbc.columns
select * from dbc.columns
For views, it is dbc.tables with a filter on a column something named table_type 'V' (where V stands for Views)
select * from dbc.tables
I am not sure about how to get all functions in Teradata. Whoever knows it, please edit this answer.
In Teradata DBC.Tables contains many of the objects that exist on the system. (e.g. Stored Procedures, UDF, Triggers, Macros, Views, Tables, Hash Index, Join Index, etc.) The column Table Kind is used to identify the type of object.
SELECT *
FROM DBC.TABLES
WHERE TABLEKIND = '<see below>'
A = Aggregate Function
B = Combined Aggregate Function and ordered analytical function
D = JAR
E = External Stored Procedure
F = Standard Function
G = Trigger
H = Instance or Constructor Method
I = Join Index
J = Journal
M = Macro
N = Hash Index
O = No Primary Index (Table)
P = Stored Procedure
Q = Queue Table
R = Table Function
S = Ordered Analytical Function
T = Table
U = User-defined data type
V = View
X = Authorization
Y = GLOP Set

How can I identify unused/redundant columns given a list of tables?

[This is on an iSeries/DB2 database if that makes any difference]
I want to write a procedure to identify columns that are left as blank or zero (given a list of tables).
Assuming I can pull out table and column definitions from the central system tables, how should I check the above condition? My first guess is for each column generate a statement dynamically such as:
select count(*) from my_table where my_column != 0
and to check if this returns zero rows, but is there a better/faster/standard way to do this?
NB This just needs to handle simple character, integer/decimal fields, nothing fancy!
To check for columns that contain only NULLs on DB2:
Execute RUNSTATS on your database (http://www.ibm.com/developerworks/data/library/techarticle/dm-0412pay/)
Check the database statistics by quering SYSSTAT.TABLES and SYSSTAT.COLUMNS . Comparing SYSSTAT.TABLES.CARD and SYSSTAT.COLUMNS.NUMNULLS will tell you what you need.
An example could be:
select t.tabschema, t.tabname, c.colname
from sysstat.tables t, sysstat.columns c
where ((t.tabschema = 'MYSCHEMA1' and t.tabname='MYTABLE1') or
(t.tabschema = 'MYSCHEMA2' and t.tabname='MYTABLE2') or
(...)) and
t.tabschema = c.tabschema and t.tabname = c.tabname and
t.card = c.numnulls
More on system stats e.g. here: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001070.htm and http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001073.htm
Similarly, you can use SYSSTAT.COLUMNS.AVGCOLLEN to check for empty columns (just it doesn't seem to work for LOBs).
EDIT: And, to check for columns that contain only zeros, use try comparing HIGH2KEY and LOW2KEY in SYSSTAT.COLUMNS.
Yes, typically, I would do something like this in SQL Server:
SELECT
REPLACE(REPLACE(REPLACE(
'
SELECT COUNT(*) AS [COUNT NON-EMPTY IN {TABLE_NAME}.{COLUMN_NAME}]
FROM [{TABLE_SCHEMA}].[{TABLE_NAME}]
WHERE [{COLUMN_NAME}] IS NOT NULL
OR [{COLUMN_NAME}] <> 0
'
, '{TABLE_SCHEMA}', c.TABLE_SCHEMA)
, '{TABLE_NAME}', c.TABLE_NAME)
, '{COLUMN_NAME}', c.COLUMN_NAME) AS [SQL]
FROM INFORMATION_SCHEMA.COLUMNS c
INNER JOIN INFORMATION_SCHEMA.TABLES t
ON t.TABLE_TYPE = 'BASE TABLE'
AND c.TABLE_CATALOG = t.TABLE_CATALOG
AND c.TABLE_SCHEMA = t.TABLE_SCHEMA
AND c.TABLE_NAME = t.TABLE_NAME
AND c.DATA_TYPE = 'int'
You can get a lot fancier by doing UNIONs of the entire query and checking the IS_NULLABLE on each column and obviously you might have different requirements for different data types, and skipping identity columns, etc.
I'm assuming you mean you want to know if there are any values in all the rows of a given column. If your column can have "blanks" you're probably going to need to add an OR NOT NULL to your WHERE clause to get the correct answer.