Redshift: svv_tables and pg_tables contradict each other? - permissions

I am getting a permission denied on a table. Trying to find out if the user has permissions or not. Querying pg_tables and svv_tables yields different results. Why?
-- list permissions for user
SELECT
u.usename,
t.schemaname,
t.tablename,
has_schema_privilege(u.usename,t.schemaname,'create') AS user_has_select_permission,
has_schema_privilege(u.usename,t.schemaname,'usage') AS user_has_usage_permission
FROM
pg_user u,
pg_tables t
WHERE
u.usename = 'X'
AND tablename = 'Y'
ORDER BY
t.tablename
, t.schemaname
, usename
;
-- list permissions on table
SELECT DISTINCT
u.usename,
t.table_schema,
t.table_name,
has_table_privilege(u.usename, t.table_schema || '.' || t.table_name, 'select') AS user_has_select_permission,
has_table_privilege(u.usename,t.table_schema || '.' || t.table_name,'insert') AS user_has_insert_permission,
has_table_privilege(u.usename,t.table_schema || '.' || t.table_name,'update') AS user_has_update_permission,
has_table_privilege(u.usename,t.table_schema || '.' || t.table_name,'delete') AS user_has_delete_permission
, has_table_privilege(u.usename,t.table_schema || '.' || t.table_name,'references') AS user_has_references_permission
FROM
pg_user u
CROSS JOIN svv_tables t
WHERE
t.table_schema not like 'pg_temp%' and
t.table_schema not in ('pg_catalog', 'pg_internal') and
t.table_type != 'EXTERNAL TABLE'
and u.usename = 'X'
and t.table_name = 'Y'
order by
usename
, table_schema
, table_name
;

Related

How to find all columns in a database with all NULL values

I have a large Snowflake database with 70+ tables and 3000+ fields. Is there a query I can use across the entire database to find all columns with all NULLs? I have a command I can use to find all the columns
select * from prod_db.information_schema.columns
Is there a way to modify that command to identify which columns are all NULLs? If there is not a way to do it across the entire database. Is there a way to do it across a table? I do not want to type:
select column_name from prod_db.information_schema.table_name
3000+ times. Thanks!
This uses a SQL generator to generate a SQL statement that will locate columns matching two criteria:
The column is in a table with one or more rows
The column has all nulls.
To be highly efficient, rather than checking the each table entirely, it uses a UNION ALL block that looks for a single non-null row in each table. It uses TOP 1 to find a not null row. That way as soon as it finds a not null row, it returns that row and stops scanning that table so it can move to another table scan.
This means that the large UNION ALL section will list tables where it finds a not null row, which is the opposite of what we want. To use this information, a CTE wrapped around the UNION ALL will do an anti-join against the column view in the information schema.
with COLS as
(
select 'select top 1 ''' || C.TABLE_CATALOG || ''' as TABLE_CATALOG, ''' || C.TABLE_SCHEMA ||
''' as TABLE_SCHEMA, ''' || C.TABLE_NAME || ''' as TABLE_NAME, ''' || C.COLUMN_NAME ||
''' as COLUMN_NAME from "' ||
C.TABLE_CATALOG || '"."' || C.TABLE_SCHEMA || '"."' || C.TABLE_NAME || '"' ||
' where "' || C.COLUMN_NAME || '" is not null'
as NULL_CHECK
from INFORMATION_SCHEMA.COLUMNS C
left join INFORMATION_SCHEMA.TABLES T on
C.TABLE_CATALOG = T.TABLE_CATALOG and
C.TABLE_SCHEMA = T.TABLE_SCHEMA and
C.TABLE_NAME = T.TABLE_NAME
where C.IS_NULLABLE = 'YES' and T.TABLE_TYPE = 'BASE TABLE'
and T.ROW_COUNT > 0
), UNIONED as
(
select listagg(NULL_CHECK, '\nunion all\n') as UNIONED from COLS
)
select replace($$
with NON_NULL_COLUMNS as (
!~UNIONED~!
)
select C.TABLE_CATALOG, C.TABLE_SCHEMA, C.TABLE_NAME, C.COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS C
left join NON_NULL_COLUMNS NN
on C.TABLE_CATALOG = NN.TABLE_CATALOG
and C.TABLE_SCHEMA = NN.TABLE_SCHEMA
and C.TABLE_NAME = NN.TABLE_NAME
and C.COLUMN_NAME = NN.COLUMN_NAME
left join INFORMATION_SCHEMA.TABLES T
on C.TABLE_CATALOG = T.TABLE_CATALOG
and C.TABLE_SCHEMA = T.TABLE_SCHEMA
and C.TABLE_NAME = T.TABLE_NAME
where NN.COLUMN_NAME is null and T.ROW_COUNT > 0
;$$, '!~UNIONED~!', UNIONED) as SQL_TO_RUN from UNIONED
;
You can produce a list of SELECT queries for each column as follows
SELECT CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT('SELECT ''', TABLE_NAME), ''', '''), COLUMN_NAME), ''', '), 'COUNT(*) FROM '), TABLE_NAME), ' WHERE '), COLUMN_NAME), ' IS NULL OR '), LEN(COLUMN_NAME)), ' = 0'), ' UNION ')
from information_schema.columns
The result of the above query can then be taken and executed to get the result you need (PS: Remove the UNION on the last row produced from Step 1 before executing)
Hope this helps.

Postgresql query that prints database level permissions

Im trying to now only display the user, schema & tables but also show the database the schema & tables belong to. Any suggestions on what I should add to this query?
SELECT
u.usename as user, t.schemaname as schema, t.tablename as table,
has_table_privilege(u.usename,t.schemaname||'.'||t.tablename,'select') AS "Select",
has_table_privilege(u.usename,t.schemaname||'.'||t.tablename,'insert') AS "Insert",
has_table_privilege(u.usename,t.schemaname||'.'||t.tablename,'update') AS "Update",
has_table_privilege(u.usename,t.schemaname||'.'||t.tablename,'delete') AS "Delete",
has_table_privilege(u.usename,t.schemaname||'.'||t.tablename,'references') AS "Reference"
FROM pg_user u
CROSS JOIN pg_tables t
WHERE t.schemaname != 'information_schema' and t.schemaname != 'pg_internal' and t.schemaname != 'pg_catalog' and t.tablename not like '% %'
ORDER BY u.usename, t.schemaname, t.tablename;
I suggest looking into information_schema.*. These are sql standards and information_schema.tables should have what you want.
You can replace the use of pg_tables as such:
select
u.usename as user,
t.table_catalog as database,
t.table_schema as schema,
t.table_name as table,
has_table_privilege(u.usename, t.table_schema || '.' || t.table_name, 'select') as "Select",
has_table_privilege(u.usename, t.table_schema || '.' || t.table_name, 'insert') as "Insert",
has_table_privilege(u.usename, t.table_schema || '.' || t.table_name, 'update') as "Update",
has_table_privilege(u.usename, t.table_schema || '.' || t.table_name, 'delete') as "Delete",
has_table_privilege(u.usename, t.table_schema || '.' || t.table_name, 'references') as "Reference"
from
pg_user u
cross join information_schema.tables t
where
t.table_schema != 'information_schema' and t.table_schema != 'pg_internal' and t.table_schema != 'pg_catalog'
and t.table_schema not like '% %'
order by
u.usename, t.table_schema, t.table_name;

How to quickly see what columns in a table have data?

We are currently undertaking a testing phase which requires us to see if there is any data in each column for each table. Now, the route that is long and labour-intensive is:
SELECT COUNT(Col1), COUNT(Col2)...FROM TABLE
Is there any easier way to do this? We can go down this route by concatenating each column name from our data lineage document with the COUNT() function, but we have a lot of tables and a lot of columns in each table, making this a bit unfeasible.
Essentially we just need a count of records in each column for each table, without having to write long COUNT(Col) queries.
Thanks
This query will return accurate results if the table statistics were recently gathered with the default value for ESTIMATE_PERCENT:
SELECT utab.table_name
, tcol.column_name
, utab.num_rows
from user_tables utab,
user_tab_cols tcol
where utab.table_name = tcol.table_name
and utab.num_rows > 0
and utab.num_rows = tcol.num_nulls;
You could use a dynamic query to build the queries. This will generate all the queries.
SELECT 'SELECT COUNT(' || t.column_name || ' ) FROM ' || t.owner || '.' || t.table_name || ';' FROM dba_tab_columns t
You can generate all the select statements like so:
SELECT CASE WHEN column_id = 1 AND column_id_desc != 1 THEN 'SELECT ''' || LOWER(owner) || '.' || LOWER(table_name) || ''' table_name, ' || CHR(10) || 'COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt,'
WHEN column_id = 1 AND column_id_desc = 1 THEN 'SELECT ''' || LOWER(owner) || '.' || LOWER(table_name) || ''' table_name, ' || CHR(10) || 'COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt FROM ' || LOWER(owner) || '.' || LOWER(table_name) || ';'
WHEN column_id_desc = 1 THEN ' COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt' || CHR(10) || 'FROM ' || LOWER(owner) || '.' || LOWER(table_name) || ';'
ELSE ' COUNT(' || LOWER(column_name) || ') ' || SUBSTR(LOWER(column_name), 1, 26) || '_cnt,'
END sql_text
FROM (SELECT owner,
table_name,
column_name,
column_id,
row_number() OVER (PARTITION BY owner, table_name ORDER BY column_id DESC) column_id_desc
FROM all_tab_columns)
WHERE <predicates to filter on the tables you're interested in>
ORDER BY owner,
table_name,
column_id;
This goes through all the tables you're interested in plus their columns and outputs text that will, when taken together, form a select statement for each table.
The text that is output in the sql_text column depends on whether the column in the list is the first or last (or both!); this way you get the full statement which queries each table once, rather than one per table and column.
You can then copy and paste the results and run that as a script.
It's can help you
SELECT
a.table_name,
a.column_name
FROM
ALL_TAB_COLUMNS a
WHERE owner = '<your user>'
AND a.SAMPLE_SIZE = a.NUM_NULLS

Get count of rows from multiple tables Redshift SQL?

I have a redshift database that is being updated with new tables so I can't just manually list the tables I want. I want to get a count of the rows of all the tables from my query. So far I have:
select 'SELECT ''' || table_name || ''' as table_name, count(*) As con ' ||
'FROM ' || table_name ||
CASE WHEN lead(table_name) OVER (order by table_name ) IS NOT NULL
THEN ' UNION ALL ' END
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE '%results%'
but when I do this I get the error:
Specified types or functions (one per INFO message) not supported on Redshift tables.
I've searched a lot but I can't seem to find a solution for my problem. Any help would be greatly appreciated. Thanks!
EDIT:
I've changed my approach to this and decided to use a for loop in R to get the row counts of each but I'm running into the issue that 'row_counts' is only saving one number, not the count of each row like I want. Here is the code:
schema <- "x"
table_prefix <- "results"
geos <- ad_districts %>% filter(geo != "geo")
row_count <- list()
i = 1
for (geo in geos){
table_name <- paste0(schema, ".", table_prefix, geo)
row_count[[i]] <- dbGetQuery(con,
paste("SELECT COUNT(*) FROM", table_name))
i = i + 1
}
Your query is doing a select * for all tables, this will take a lot of time and resources. Instead use a system table to get the same info
select name, sum(rows) as rows
from stv_tbl_perm
where name like '%results%'
group by 1
[EDIT] - I think this is the root cause - some sql functions are only supported on the leader node. Try connecting to that node and re-run your SQL.
https://docs.aws.amazon.com/redshift/latest/dg/c_sql-functions-leader-node.html
Hope this helps.
select 'select count(*) as "' || table_schema || '.' || table_name || '" from ' || table_schema || '.' || table_name || ' ;' as sql_text
from information_schema.tables
;
[EDIT - refined this a bit to generate a series of statements that can be run at once]
select rownum, case when rownum > 1 then sql_text else replace(sql_text, 'union all', '') end as sql_text
from
(
select rank() over (order by sql_text DESC) as rownum,
sql_text
from
(
select 'select ''' || table_schema || ' ' || table_name || ''' , count(*) as "' || table_schema || '.' || table_name || '" from ' || table_schema || '.' || table_name || ' union all ' as sql_text
from information_schema.tables
where table_schema = 'public'
order by table_schema, table_name
)X
)Y
order by rownum desc ;
SELECT ' Select count(*) , '''+ tablename + ''' from '+'"' + tablename +'"' +' Union ALL '
FROM pg_table_def
GROUP BY tablename
Above query eliminates any table name with space. Remove UNION ALL at the end of the query and query will be ready to be executed.

POSTGRESQL: Create table as Selecting specific type of columns

I have a table with mixed types of data (real, integrer, character ...) but i would only recover columns that have real values.
I can construct this:
SELECT 'SELECT ' || array_to_string(ARRAY(
select 'o' || '.' || c.column_name
from information_schema.columns as c
where table_name = 'final_datas'
and c.data_type = 'real'), ',') || ' FROM final_datas as o' As sqlstmt
that gives that:
"SELECT o.random,o.struct2d_pred2_num,o.pfam_num,o.transmb_num [...] FROM final_datas as o"
The i would like to create a table with these columns. Of course, do this, doesn't work:
create table table2 as (
SELECT 'SELECT ' || array_to_string(ARRAY(
select 'o' || '.' || c.column_name
from information_schema.columns as c
where table_name = 'final_datas'
and c.data_type = 'real'), ',') || ' FROM final_datas as o' As sqlstmt
)
Suggestions?
You need to generate the whole CREATE TABLE statement as dynamic SQL:
SELECT 'CREATE TABLE table2 AS SELECT ' || array_to_string(ARRAY(
select 'o' || '.' || c.column_name
from information_schema.columns as c
where table_name = 'final_datas'
and c.data_type = 'real'), ',') || ' FROM final_datas as o' As sqlstmt
The result can be run with EXECUTE sqlstmt;