We are currently undertaking a testing phase which requires us to see if there is any data in each column for each table. Now, the route that is long and labour-intensive is:
SELECT COUNT(Col1), COUNT(Col2)...FROM TABLE
Is there any easier way to do this? We can go down this route by concatenating each column name from our data lineage document with the COUNT() function, but we have a lot of tables and a lot of columns in each table, making this a bit unfeasible.
Essentially we just need a count of records in each column for each table, without having to write long COUNT(Col) queries.
Thanks
This query will return accurate results if the table statistics were recently gathered with the default value for ESTIMATE_PERCENT:
SELECT utab.table_name
, tcol.column_name
, utab.num_rows
from user_tables utab,
user_tab_cols tcol
where utab.table_name = tcol.table_name
and utab.num_rows > 0
and utab.num_rows = tcol.num_nulls;
You could use a dynamic query to build the queries. This will generate all the queries.
SELECT 'SELECT COUNT(' || t.column_name || ' ) FROM ' || t.owner || '.' || t.table_name || ';' FROM dba_tab_columns t
You can generate all the select statements like so:
SELECT CASE WHEN column_id = 1 AND column_id_desc != 1 THEN 'SELECT ''' || LOWER(owner) || '.' || LOWER(table_name) || ''' table_name, ' || CHR(10) || 'COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt,'
WHEN column_id = 1 AND column_id_desc = 1 THEN 'SELECT ''' || LOWER(owner) || '.' || LOWER(table_name) || ''' table_name, ' || CHR(10) || 'COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt FROM ' || LOWER(owner) || '.' || LOWER(table_name) || ';'
WHEN column_id_desc = 1 THEN ' COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt' || CHR(10) || 'FROM ' || LOWER(owner) || '.' || LOWER(table_name) || ';'
ELSE ' COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt,'
END sql_text
FROM (SELECT owner,
table_name,
column_name,
column_id,
row_number() OVER (PARTITION BY owner, table_name ORDER BY column_id DESC) column_id_desc
FROM all_tab_columns)
WHERE <predicates to filter on the tables you're interested in>
ORDER BY owner,
table_name,
column_id;
This goes through all the tables you're interested in plus their columns and outputs text that will, when taken together, form a select statement for each table.
The text that is output in the sql_text column depends on whether the column in the list is the first or last (or both!); this way you get the full statement which queries each table once, rather than one per table and column.
You can then copy and paste the results and run that as a script.
It's can help you
SELECT
a.table_name,
a.column_name
FROM
ALL_TAB_COLUMNS a
WHERE owner = '<your user>'
AND a.SAMPLE_SIZE = a.NUM_NULLS
Related
I would like to display results of dynamic sql based on antoher sql, but get error message. My code:
DECLARE
sql_qry VARCHAR2(1000) := NULL;
TYPE results IS
TABLE OF all_tab_columns%rowtype;
results_tbl results;
BEGIN
FOR i IN (
SELECT
*
FROM
all_tab_columns
WHERE
owner = 'OWNER_XYZ'
AND upper(column_name) LIKE '%COLUMN_XYZ%'
ORDER BY
table_name,
column_name
) LOOP
sql_qry := ' SELECT DISTINCT '
|| i.column_name
|| ' as column_name '
|| ' FROM '
|| i.owner
|| '.'
|| i.table_name
|| ' WHERE SUBSTR('
|| i.column_name
|| ',1,1) = ''Y''';
EXECUTE IMMEDIATE sql_qry BULK COLLECT
INTO results_tbl;
dbms_output.put_line(results_tbl);
END LOOP;
END;
I get the error message:
PLS-00306: wrong number or types of arguments in call to 'PUT_LINE'
In fact I need the results of all queries with an union between them like that
[1] [1]: https://i.stack.imgur.com/llxzr.png
Your dynamic query is selecting a single column, but you are bulk collecting into results_tbl, which is of type results, based on all_tab_columns%rowtype - which has lots of columns.
If you define your collection type as a single column of the data type you need, i.e. some length of string:
DECLARE
sql_qry VARCHAR2(1000) := NULL;
TYPE results IS
TABLE OF varchar2(4000);
results_tbl results;
...
You will then bulk collect a single column into that single-column collection. To display the results you need to loop over the collection:
FOR j IN 1..results_tbl.COUNT loop
dbms_output.put_line(results_tbl(j));
END LOOP;
so the whole block becomes:
DECLARE
sql_qry VARCHAR2(1000) := NULL;
TYPE results IS
TABLE OF varchar2(4000);
results_tbl results;
BEGIN
FOR i IN (
SELECT
*
FROM
all_tab_columns
WHERE
owner = 'OWNER_XYZ'
AND upper(column_name) LIKE '%COLUMN_XYZ%'
ORDER BY
table_name,
column_name
) LOOP
sql_qry := ' SELECT DISTINCT '
|| i.column_name
|| ' as column_name '
|| ' FROM '
|| i.owner
|| '.'
|| i.table_name
|| ' WHERE SUBSTR('
|| i.column_name
|| ',1,1) = ''Y''';
EXECUTE IMMEDIATE sql_qry BULK COLLECT
INTO results_tbl;
FOR j IN 1..results_tbl.COUNT loop
dbms_output.put_line(results_tbl(j));
END LOOP;
END LOOP;
END;
/
However, you can also do this without PL/SQL, using a variation on an XML trick. You can get the results of the dynamic query as an XML document using something like:
select dbms_xmlgen.getxmltype(
'select distinct "' || column_name || '" as value'
|| ' from "' || owner || '"."' || table_name || '"'
|| ' where substr("' || column_name || '", 1, 1) = ''Y'''
)
from all_tab_columns
where owner = 'OWNER_XYZ'
and upper(column_name) like '%COLUMN_XYZ%';
and then extract the value you want from that:
with cte (xml) as (
select dbms_xmlgen.getxmltype(
'select distinct "' || column_name || '" as value'
|| ' from "' || owner || '"."' || table_name || '"'
|| ' where substr("' || column_name || '", 1, 1) = ''Y'''
)
from all_tab_columns
where owner = 'OWNER_XYZ'
and upper(column_name) like '%COLUMN_XYZ%'
)
select x.value
from cte
cross apply xmltable(
'/ROWSET/ROW'
passing cte.xml
columns value varchar2(4000) path 'VALUE'
) x;
You can also easily include the table each value came from if you want that information (and the owner, and actual column name, etc.).
db<>fiddle showing both approaches.
I have a redshift database that is being updated with new tables so I can't just manually list the tables I want. I want to get a count of the rows of all the tables from my query. So far I have:
select 'SELECT ''' || table_name || ''' as table_name, count(*) As con ' ||
'FROM ' || table_name ||
CASE WHEN lead(table_name) OVER (order by table_name ) IS NOT NULL
THEN ' UNION ALL ' END
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE '%results%'
but when I do this I get the error:
Specified types or functions (one per INFO message) not supported on Redshift tables.
I've searched a lot but I can't seem to find a solution for my problem. Any help would be greatly appreciated. Thanks!
EDIT:
I've changed my approach to this and decided to use a for loop in R to get the row counts of each but I'm running into the issue that 'row_counts' is only saving one number, not the count of each row like I want. Here is the code:
schema <- "x"
table_prefix <- "results"
geos <- ad_districts %>% filter(geo != "geo")
row_count <- list()
i = 1
for (geo in geos){
table_name <- paste0(schema, ".", table_prefix, geo)
row_count[[i]] <- dbGetQuery(con,
paste("SELECT COUNT(*) FROM", table_name))
i = i + 1
}
Your query is doing a select * for all tables, this will take a lot of time and resources. Instead use a system table to get the same info
select name, sum(rows) as rows
from stv_tbl_perm
where name like '%results%'
group by 1
[EDIT] - I think this is the root cause - some sql functions are only supported on the leader node. Try connecting to that node and re-run your SQL.
https://docs.aws.amazon.com/redshift/latest/dg/c_sql-functions-leader-node.html
Hope this helps.
select 'select count(*) as "' || table_schema || '.' || table_name || '" from ' || table_schema || '.' || table_name || ' ;' as sql_text
from information_schema.tables
;
[EDIT - refined this a bit to generate a series of statements that can be run at once]
select rownum, case when rownum > 1 then sql_text else replace(sql_text, 'union all', '') end as sql_text
from
(
select rank() over (order by sql_text DESC) as rownum,
sql_text
from
(
select 'select ''' || table_schema || ' ' || table_name || ''' , count(*) as "' || table_schema || '.' || table_name || '" from ' || table_schema || '.' || table_name || ' union all ' as sql_text
from information_schema.tables
where table_schema = 'public'
order by table_schema, table_name
)X
)Y
order by rownum desc ;
SELECT ' Select count(*) , '''+ tablename + ''' from '+'"' + tablename +'"' +' Union ALL '
FROM pg_table_def
GROUP BY tablename
Above query eliminates any table name with space. Remove UNION ALL at the end of the query and query will be ready to be executed.
I'm going mental over this. I'm fairly new to dynamic SQL, so I may just not be asking Google the right question, but here's what I'm trying to do... I have a query with dynamic SQL. When I run that query, it produces several rows. All of these rows (about 30) make up a single union query. I can copy all of those rows and paste into a new query and run - works fine, but what I need to do is run this all in a single query. I've looked up examples of using execute immediate and fetch, but I cannot seem to get them to actually spit out the data...they just end up saying something like "Executed Successfully", but doesn't actually produce any resulting rows. The resulting column name of the below SQL is "qry_txt" - instead of producing it at face value, I want to execute it as a query. Again, I may not be articulating this well, but I'm basically trying to turn 2 queries (with a manual copy/paste step involved) into a single query. Hope this makes sense...
Here's my SQL:
Select CASE when
lead(ROWNUM) over(order by ROWNUM) is null then
'SELECT '||''''||T.TABLE_NAME||''''||' as TABLE_NAME,'||''''||T.COLUMN_NAME||''''||' as COLUMN_NAME, cast('|| T.COLUMN_NAME ||' as
varchar2(100)) as SAMPLE_DATA ||
from rpt.'||T.TABLE_NAME ||' where '||T.COLUMN_NAME||' is not null and ROWNUM=1;'
else
'SELECT '||''''||T.TABLE_NAME||''''||' as TABLE_NAME,'||''''||T.COLUMN_NAME||''''||' as COLUMN_NAME, cast('|| T.COLUMN_NAME ||' as
varchar2(100)) as SAMPLE_DATA from rpt.'||T.TABLE_NAME ||' where '||T.COLUMN_NAME||' is not null and ROWNUM=1 union ' end as qry_txt
from all_tab_columns t where T.OWNER='rpt' and T.DATA_TYPE != 'BLOB' and T.DATA_TYPE != 'LONG' and T.TABLE_NAME = 'NME_DMN'
ORDER BY ROWNUM asc;
You cannot write a dynamic query in a SQL. You need to use PLSQL block to accomploish that. Please see how you can do it.
PS: Code is not tested.
declare
var1 <decalration same of column in select list> ;
var2 <decalration same of column in select list> ;
var3 <decalration same of column in select list> ;
....
varn ;
begin
for i in ( SELECT LEAD (ROWNUM) OVER (ORDER BY ROWNUM) COl1
FROM all_tab_columns t
WHERE T.OWNER = 'rpt'
AND T.DATA_TYPE != 'BLOB'
AND T.DATA_TYPE != 'LONG'
AND T.TABLE_NAME = 'NME_DMN'
ORDER BY ROWNUM ASC)
Loop
If i.col1 IS NULL Then
execute immediate 'SELECT '
|| ''''
|| T.TABLE_NAME
|| ''''
|| ' as TABLE_NAME,'
|| ''''
|| T.COLUMN_NAME
|| ''''
|| ' as COLUMN_NAME, cast('
|| T.COLUMN_NAME
|| ' as
varchar2(100)) as SAMPLE_DATA ||
from rpt.'
|| T.TABLE_NAME
|| ' where '
|| T.COLUMN_NAME
|| ' is not null and ROWNUM=1' into var1 , var2 ,var3 ....varn;
Else
execute immediate 'SELECT '
|| ''''
|| T.TABLE_NAME
|| ''''
|| ' as TABLE_NAME,'
|| ''''
|| T.COLUMN_NAME
|| ''''
|| ' as COLUMN_NAME, cast('
|| T.COLUMN_NAME
|| ' as
varchar2(100)) as SAMPLE_DATA from rpt.'
|| T.TABLE_NAME
|| ' where '
|| T.COLUMN_NAME
|| ' is not null and ROWNUM=1' into var1 , var2 ,var3 ....varn;
end if;
End Loop;
exception
when others then
dbms_output.put_lin(sqlcode ||'--'||sqlerrm);
End;
I am trying to create a SQL code in SQL Server that dynamically selects all the tables in a specific database and then for each column in each table, counts the number of missing values and non-null values. I also want this result inserted into another table.
Is there any way I can do this without manually changing the column names for each:
Table Name - Column selection
I have a teradata code for the same which I tried to convert to SQL Server code. But I am unable to get the dynamic allocation and insertion parts right.
insert into temp
values (select ''CAMP'',
rtrim(''' || tablename || '''),
rtrim(''' || columnname || '''),
rtrim(''' || columnformat || '''),
count(1),
count(rtrim(upper(case when ' || columnname || '='''' then NULL else '|| columnname ||' end))),
(cast (count(rtrim(upper(case when ' || columnname || '='''' then NULL else ' || columnname || ' end))) as float) / (cast (count(1) as float))) * 100,
count(distinct rtrim(upper(case when ' || columnname || '='''' then NULL else '|| columnname ||' end))),
min(rtrim(upper(case when ' || columnname || '='''' then NULL else '|| columnname ||' end))),
max(rtrim(upper(case when ' || columnname || '='''' then NULL else '|| columnname ||' end))),
min(len(rtrim(upper(case when ' || columnname || '='''' then NULL else '|| columnname ||' end)))),
max(len(rtrim(upper(case when ' || columnname || '='''' then NULL else '|| columnname ||' end))))
from ' || tablename ||')
Any help on this front would be great!
Thanks!
Not sure if you need a UNION or a JOIN, but in either case you can just use a three-part name for the object in the other database if you are using multi-database:
USE database1; // Your database name
GO
CREATE VIEW dbo.MyView
AS
SELECT columns FROM dbo.Table1
UNION ALL
SELECT columns FROM database2.dbo.Table2; //second database
GO
select * from dbo.MyView // Getting all data from view
Hope that helps
Does something like this help you:
SELECT [Table] = t.[name]
, [Column] = c.[name]
FROM sys.tables t
INNER JOIN sys.columns c
ON c.[object_id] = t.[object_id]
I have a table with mixed types of data (real, integrer, character ...) but i would only recover columns that have real values.
I can construct this:
SELECT 'SELECT ' || array_to_string(ARRAY(
select 'o' || '.' || c.column_name
from information_schema.columns as c
where table_name = 'final_datas'
and c.data_type = 'real'), ',') || ' FROM final_datas as o' As sqlstmt
that gives that:
"SELECT o.random,o.struct2d_pred2_num,o.pfam_num,o.transmb_num [...] FROM final_datas as o"
The i would like to create a table with these columns. Of course, do this, doesn't work:
create table table2 as (
SELECT 'SELECT ' || array_to_string(ARRAY(
select 'o' || '.' || c.column_name
from information_schema.columns as c
where table_name = 'final_datas'
and c.data_type = 'real'), ',') || ' FROM final_datas as o' As sqlstmt
)
Suggestions?
You need to generate the whole CREATE TABLE statement as dynamic SQL:
SELECT 'CREATE TABLE table2 AS SELECT ' || array_to_string(ARRAY(
select 'o' || '.' || c.column_name
from information_schema.columns as c
where table_name = 'final_datas'
and c.data_type = 'real'), ',') || ' FROM final_datas as o' As sqlstmt
The result can be run with EXECUTE sqlstmt;