I'm trying to get an count of all the rows in a set of views in my Snowflake database.
The built-in row_count from information_schema.tables is not present in information_schema.views, unfortunately.
It seems I'd need to count all rows in each view, something like:
with view_name as (select table_name
from account_usage.views
where table_schema = 'ACCESS' and RIGHT(table_name,7) = 'CURRENT'
)
select count (*) from view_name;
But that returns only one results, instead of one for each line
If I change the select to include the view name, i.e.
select concat('Rows in ', view_name), count (*) from view_name;
…it returns the error "invalid identifier 'VIEW_NAME' (line 5)"
How can I show all results and include the view name?
You can create a query that looks at the information_schema to create a query that will go view by view getting its count:
select listagg(xx, ' union all ')
from (
select 'select count(*) c, \'' || x || '\' v from ' || x as xx
from (
select TABLE_CATALOG ||'.'|| TABLE_SCHEMA ||'."'||TABLE_NAME||'"' x
from KNOEMA_FORECAST_DATA_ATLAS.INFORMATION_SCHEMA.VIEWS
where table_schema='FORECAST'
)
)
See also How to find the number of rows for all views in a schema?
Related
I am querying TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED columns from Snowflake information schema. VIEWS. Next, I would like to MERGE that table with row count for the view. Below are my queries I am running in Snowflake my issue is I am not sure how to combine these two table in 1 table ?
Note: I am new to Snowflake. Please provide code with explanation.
Thanks in advance for help!
Query 1
SELECT TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED FROM DB.SCHEMA.VIEWS
WHERE TABLE_SCHEMA="MY_SHEMA" AND TABLE_NAME IN ('VIEW_TABLE1','VIEW_TABLE2','VIEW_TABLE3')
Query 2
SELECT COUNT(*) FROM DB.SCHEMA.VIEW_TABLE1
UNION ALL SELECT COUNT(*) FROM DB.SCHEMA.VIEW_TABLE2
To get result of the COUNT(*) needs to be built dynamically and attached to the "driving query".
Sample data:
CREATE VIEW VIEW_TABLE1(c)
AS
SELECT 1;
CREATE VIEW VIEW_TABLE2(e)
AS
SELECT 2 UNION ALL SELECT 4;
CREATE VIEW VIEW_TABLE3(f)
AS
SELECT 3;
Full query:
DECLARE
QUERY STRING;
RES RESULTSET;
BEGIN
SELECT
LISTAGG(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
$$SELECT '<TABLE_SCHEMA>' AS TABLE_SCHEMA,
'<TABLE_NAME>' AS TABLE_NAME,
'<CREATED>' AS CREATED,
'<LAST_ALTERED>' AS LAST_ALTERED,
COUNT(*) AS cnt
FROM <tab_name>
$$,
'<TABLE_SCHEMA>', v.TABLE_SCHEMA),
'<TABLE_NAME>', v.TABLE_NAME),
'<CREATED>', v.CREATED),
'<LAST_ALTERED>', v.LAST_ALTERED),
'<tab_name>', CONCAT_WS('.', v.table_catalog, v.table_schema, v.table_name)),
' UNION ALL ') WITHIN GROUP (ORDER BY CONCAT_WS('.', v.table_catalog, v.table_schema, v.table_name))
INTO :QUERY
FROM INFORMATION_SCHEMA.VIEWS v
WHERE TABLE_SCHEMA='PUBLIC'
AND TABLE_NAME IN ('VIEW_TABLE1','VIEW_TABLE2','VIEW_TABLE3');
RES := (EXECUTE IMMEDIATE :QUERY);
RETURN TABLE(RES);
END;
Output:
Rationale:
The ideal query would be(pseudocode):
SELECT TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED,
EVAL('SELECT COUNT(*) FROM ' || view_name) AS row_count
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_SCHEMA='MY_SHEMA'
AND TABLE_NAME IN ('VIEW_TABLE1','VIEW_TABLE2','VIEW_TABLE3');
Such construct EVAL(dynamic query) at SELECT list does not exist as it would require building a query on the fly and execute per each row. Though for some RDBMSes are workaround like dbms_xmlgen.getxmltype
Include table/view names as string in your count(*) queries and then you can join.
Example below -
select * from
(SELECT TABLE_SCHEMA,TABLE_NAME,CREATED FROM information_schema.tables
WHERE TABLE_SCHEMA='PUBLIC' AND TABLE_NAME IN ('D1','D2')) t1
left join
(
SELECT 'D1' table_name, COUNT(*) FROM d1
UNION ALL SELECT 'D2',COUNT(*) FROM d2) t2
on t1.table_name = t2.table_name ;
TABLE_SCHEMA
TABLE_NAME
CREATED
TABLE_NAME
COUNT(*)
PUBLIC
D1
2022-04-06 14:24:56.224 -0700
D1
12
PUBLIC
D2
2022-04-06 14:25:27.276 -0700
D2
5
with
table1 as (
select 'joe' as name, 17 as age, 25 as speed
),
table2 as (
select 'nick' as name, 21 as speed, 23 as strength
)
select * from table1
union all
select * from table2
In Google BigQuery, this union all does not throw an error because both tables have the same number of columns (3 each). However I receive bad data output because the columns do not match. Rather than outputting a new table with 4 columns name, age, speed, strength with correct values + nulls for missing values (which would probably be preferred), the union all keeps the 3 columns from the top row.
Is there a good way to catch that the columns do not match, rather than the query silently returning bad data? Is there any way for this to return an error perhaps, as opposed to a successful table? I'm not sure how to check in SQL that the columns in the 2 tables match.
Edit: in this example it is clear to see that the columns do not match, however in our data we have 100+ columns and we want to avoid a situation where we make an error in a UNION ALL
Below is for BigQuery Standard SQL and using scripting feature of BQ
DECLARE statement STRING;
SET statement = (
WITH table1_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table1` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), table2_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table2` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), all_columns AS (
SELECT column FROM table1_columns UNION DISTINCT SELECT column FROM table2_columns
)
SELECT (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table1` UNION ALL '
FROM all_columns a LEFT JOIN table1_columns t USING(column)
) || (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table2`'
FROM all_columns a LEFT JOIN table2_columns t USING(column)
)
);
EXECUTE IMMEDIATE statement;
when applied to sample data from your question - output is
Row name age speed strength
1 joe 17 25 null
2 nick null 21 23
After saving table1 and table2 as 2 tables in a dataset in BigQuery, I then used the metadata using INFORMATION_SCHEMA to check that the columns matched.
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table1'
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table2'
INFORMATION_SCHEMA.COLUMNS returns information including the column names and their positioning. I can join these 2 tables together then to check that the names match...
I have written the select to get the distinct table name (I have to use dba_tab_cols table because it's the only one with table names I have permission:
select distinct(table_name)||'"'||':' as SQL_TXT from dba_tab_cols where table_name = 'SAMPLE_TABLE'
I would like to add " before the table_name in this select, however when I write the following:
select '"'||distinct(table_name) I got the error:
**00936. 00000 - "missing expression"
*Cause:
*Action:**
I could not find similar topic, that is why I am sending this question.
The distinct is part of the select. So:
select distinct ('"' || table_name || '"' || ':') as SQL_TXT
from dba_tab_cols
where table_name = 'SAMPLE_TABLE';
select distinct is a "single" keyword that applies to all expressions in the select. It is not a function that applies to a single column.
Is there a way, through the information_schema or otherwise, to calculate how many percent of each column of a table (or a set of tables, better yet) are NULLs?
Your query has a number of problems, most importantly you are not escaping identifiers (which could lead to exceptions at best or SQL injection attacks in the worst case) and you are not taking the schema into account.
Use instead:
SELECT 'SELECT ' || string_agg(concat('round(100 - 100 * count(', col
, ') / count(*)::numeric, 2) AS ', col_pct), E'\n , ')
|| E'\nFROM ' || tbl
FROM (
SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS tbl
, quote_ident(column_name) AS col
, quote_ident(column_name || '_pct') AS col_pct
FROM information_schema.columns
WHERE table_name = 'my_table_name'
ORDER BY ordinal_position
) sub
GROUP BY tbl;
Produces a query like:
SELECT round(100 - 100 * count(id) / count(*)::numeric, 2) AS id_pct
, round(100 - 100 * count(day) / count(*)::numeric, 2) AS day_pct
, round(100 - 100 * count("oDd X") / count(*)::numeric, 2) AS "oDd X_pct"
FROM public.my_table_name;
Closely related answer on dba.SE with a lot more details:
Check whether empty strings are present in character-type columns
In PostgreSQL, you can easily compute it using the statistics tables if your autovacuum setting is on (check it by SHOW ALL;). You can also set the vacuum interval to configure how fast your statistics tables should be updated. You can then compute the NULL percentage (aka, null fraction) simply using the query below:
select attname, null_frac from pg_stats where tablename = 'table_name'
Think there is not built-in features for this. You can make this self
Just walk thorough each column in table and calc count() for all rows and count() for rows where column is null.
There is possible and optimize this for one query for one table.
OK, I played around a little and made a query that returns a query--or queries if you use LIKE 'my_table%' instead of = 'my_table_name':
SELECT 'select '|| string_agg('(count(*)::real-count('||column_name||')::real)/count(*)::real as '||column_name||'_percentage ', ', ') || 'from ' || table_name
FROM information_schema.columns
WHERE table_name LIKE 'my_table_name'
GROUP BY table_name
It returns a ready-to-run SQL query, like:
"SELECT (count(*)::real-count(id)::real)/count(*)::real AS id_percentage , (count(*)::real-count(value)::real)/count(*)::real AS value_percentage FROM my_table_name"
id_percentage;value_percentage
0;0.0177515
(The caps didn't go exactly right for readability.)
Good day!
The database has a table that you want to archive. A copy of the table, with the addition of the prefix name "ARCH_".
e.g. Table: BALANCE. Archiving table: ARCH_BALANCE.
I need to write a query to check: that in the tables "ARCH_%" present all fields of the base tables. *also have a database table that are not archived.
I wrote the following query:
select distinct COLUMN_NAME, TABLE_NAME
from ALL_TAB_COLUMNS res
where TABLE_NAME in(
SELECT TABLE_NAME
FROM all_tables core_t
where TABLE_NAME not like 'ARCH_%' AND
EXISTS (
SELECT TABLE_NAME
FROM all_tables hist_t
WHERE hist_t.TABLE_NAME = concat('ARCH_', core_t.TABLE_NAME)
)
) and COLUMN_NAME NOT IN (
select COLUMN_NAME
from ALL_TAB_COLUMNS
where TABLE_NAME = concat('ARCH_', res.TABLE_NAME)
);
Parts of the code works, but generally runs indefinitely.
Perhaps there is somebody other ideas.
This query joins both tables and columns and displays which fields they have in common and if there is one missing, it shows it in the fourth column:
select orig.column_name
, arch.column_name
, case
when orig.column_name is null
then 'column doesn''t exist in orig'
when arch.column_name is null
then 'column doesn''t exist in arch'
else 'exists in both'
end
status
from ( select table_name
, column_name
from all_tab_columns
where table_name = 'X'
)
orig
full
outer
join ( select table_name
, column_name
from all_tab_columns
where table_name = 'ARCH_X'
)
arch
on orig.column_name = arch.column_name
The problem can be in that part of query:
SELECT TABLE_NAME
FROM all_tables core_t
where TABLE_NAME not like 'ARCH_%' AND
EXISTS (
SELECT TABLE_NAME
FROM all_tables hist_t
WHERE hist_t.TABLE_NAME = concat('ARCH_', core_t.TABLE_NAME)
)
You are trying to fetch records that match 2 condidtion in the same time:
1) TABLE_NAME not like 'ARCH_%'
2) TABLE_NAME = concat ('ARCH_',TABLE_NAME)
Those are two conditions that stays in opposite sides.
Also you could fix prefixes for column names (TABLE_NAME can be from ALL_TAB_COLUMNS table or ALL_TABLES).