PostgreSQL dynamic query: find most recent timestamp of several unrelated tables - sql

I have a number of tables, and many of them have a timestamp column. I can get a list of every table with a timestamp column:
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp';
--------------
table_name
--------------
apples
bananas
sharks
lemons
I can get the hightest timestamp in the sharks table as follows
SELECT MAX(timestamp) FROM sharks;
-------------------------
max
-------------------------
2021-11-24 00:00:00.000
I would like to get a table like
table_name | last_updated
-------------+-------------------------
apples | 2021-11-23 00:02:00.000
bananas | 2019-10-16 00:04:00.000
sharks | 2021-11-24 00:00:00.000
lemons | 1970-01-03 10:00:00.000
I'm suspecting this requires dynamic SQL, so I'm trying something like
SELECT (
EXECUTE 'SELECT MAX(timestamp) FROM my_schema.' || table_name
) FROM (
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp'
);
But it seems like EXECUTE doesn't work in subqueries.
Performance is not particularly a concern, just producing the desired results.

Dynamic Queries cannot function outside of PL/pgSQL blocks, so you need to wrap your code in one.
I set up test tables similar to yours, with only the "timestamp" column shared between them:
drop table if exists public.sharks_70099803 cascade;
create table public.sharks_70099803
as select 1::int integer_column,
now()::timestamp as "timestamp";
drop table if exists public.bananas_70099803 cascade;
create table public.bananas_70099803
as select 'some text'::text text_column,
now()::timestamp as "timestamp";
Wrap a dynamic query in PL/pgSQL function. Inside I build a query to pull the max(timestamp) from each table with the column, then aggregate those into one query with a union all in between, that I later execute.
CREATE OR REPLACE FUNCTION public.test_70099803()
RETURNS SETOF RECORD
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
return query
execute (
select string_agg(select_per_table,' union all ')
from (
select 'select '''||
table_name||
''' as table_name, max(timestamp) from public.'||
table_name "select_per_table"
from information_schema.columns
where table_schema='public'
and column_name='timestamp'
) a
);
END
$BODY$;
select * from public.test_70099803() as t(table_name text, max_timestamp timestamp);
-- table_name | max_timestamp
--------------------+----------------------------
-- sharks_70099803 | 2021-11-24 17:12:03.24951
-- bananas_70099803 | 2021-11-24 17:12:03.253614
--(2 rows)
You can parametrise your function to be applicable to more groups of tables, or to have a predefined output table structure that'll let you just select * from test_70099803();

Related

Listing tables used in a SQL query

I have nearly 800 SQL scripts ranging medium to complex SQLs.
Database is Oracle.
I need list of tables used in each of these scripts. Each script contains numerous with clauses, joins and subqueries. Is there a easier way to achieve this?
Thanks
LN
first store all queries to 1 table -scripts- with the columns - script_name, script_text
then this could work:
select script_name, script_text, object_name
from scripts s
join dba_objects do
on 1 = 1
where 1 = 1
and instr(upper(s.script_text), upper(do.object_name)) > 0
You may try explain plan for each of the statements and then check the content of plan_table. But it:
doesn't count views (plan table shows base tables of the view) and scalar functions
should be adapted for dblinks
should be thoroughly tested with local PL/SQL declarations (with function) and functions in other schemas
has influence of query rewrite and baselines
Below is an example:
begin
for i in 1..10 loop
execute immediate
'create table t'
|| i || '( id int)';
end loop;
end;
/
create view v_test
as
select *
from t5
join t6
using(id)
create function f_tab
return sys.odcinumberlist
pipelined
as
begin
null;
end;
/
explain plan for
select *
from t1, t2, t3, t4,
/*View*/
v_test,
/*Pipelined function*/
table(f_tab())
select
object_owner,
object_name,
object_type
from plan_table
where
(object_owner, object_name) in (
select
f.owner,
f.object_name
from all_objects f
where f.object_type in (
'TABLE'
, 'VIEW'
)
) or (object_name, object_type) in (
select
f.object_name,
'PROCEDURE'
from user_objects f
where f.object_type in (
'FUNCTION'
, 'PROCEDURE'
)
)
order by 1
OBJECT_OWNER | OBJECT_NAME | OBJECT_TYPE
:-------------------------- | :---------- | :----------
FIDDLE_TQYMTNVUFUWHRWJEENKX | T1 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T2 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T3 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T4 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T5 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T6 | TABLE
null | F_TAB | PROCEDURE
db<>fiddle here
Anyway, I think that this sort of task should be addressed to SQL parser in any way instead of any string manipulation, because only parser exactly knows what objects were used

SQL: Delete rows in a table where one field's value is lesser than group average

For now, I am first running the following query:
select group_name, avg(numeric_field) as avg_value, count(group_name) as n from table_name group by group_name order by n desc;
Suppose I get output:
group_name | avg_value | n
----------------------------------------
nice_group_name| 1566.353 | 2034
other_group | 235.43 | 1390
.
.
.
I am then deleting records in each group one by one manually using the following query for each group:
delete from table_name where group_name = 'nice_group_name' and numeric_field < 1567;
Here 1567 is the approximate avg_value for nice_group_name.
How can I run the second query for all rows of the result of first query automatically?
You can use a correlated subquery:
delete from table_name
where numeric_field < (select avg(t2.numeric_field)
from table_name t2
where t2.group_name = table_name.group_name
);
For performance, you want an index on tablename(group_name, numeric_field).
If you have few groups, you might find this more efficient:
with a as (
select group_name, avg(numeric_field) as anf
from table_name
group by group_name
)
delete from table_name
where numeric_field < (select a.anf from a where a.group_name = table_name.group_name);
If table_name has some primary key field (say id) then use the following:
alter table table_name rename to bak;
create temp table avg_val as
select group_name as g, avg(numeric_field) as a from bak
group by group_name;
create table table_name as
select * from bak where id in (
select bak.id from
avg_val join bak on bak.group_name = avg_val.g
where avg_val.a <= bak.numeric_field
);
Check table_name. If all has went well you can delete the backed up old table:
drop table bak;
Briefly, the steps are:
Rename the original table
Create a temporary table of average value for each group
Create a new table with all rows from original table where numeric_field is not less than average for that group.
delete the renamed original table.

Hive - How to read a column from a table which is of type list

I have a Hive table named customer, which has a column named cust_id of list type, with following values:
cust_id
[123,234,456,567]
[345,457,67]
[89,23,34]
Now I want to read only this specific column cust_id in my select query, which can give all these list values as following separate values of this column cust_id:
cust_id
123
234
456
567
345
457
67
89
23
34
Basically I want to fetch all the values of cust_id from this table as one column, to use these values in the where exists or where in clause of my other query.
A solution for this would be highly appreciated.
AFAIK this is what you are looking for from hive manual..
Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row.
for example
SELECT cust_id
FROM mytable LATERAL VIEW explode(cust_id) mytab AS cust_id;
Full example :
drop table customer_tab;
create table customer_tab ( cust_id array<String>);
INSERT INTO table customer_tab select array('123','234','456','567');
INSERT INTO table customer_tab select array('345','457','67');
INSERT INTO table customer_tab select array('89','23','34');
select * from customer_tab;
-- customer_tab.cust_id
-- ["123","234","456","567"]
-- ["345","457","67"]
-- ["89","23","34"]
SELECT mytab.cust_id
FROM customer_tab LATERAL VIEW explode(cust_id) mytab AS cust_id;
mytab.cust_id
123
234
456
567
345
457
67
89
23
34

How to select all possible values of columns from all tables?

SELECT POM.TABLE_NAME, POM.COLUMN_NAME
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%'
I want to see all possible values in columns on the list(in one row if possible). How can i modify this select to do it?
i want soemthing like this
TABLE_NAME | COLUMN_NAME |VALUES
-----------| ----------- | -------
CAR | COLOR | RED,GREEN
You can use the below query for your requirement. It fetched distinct column values for a table.
It can be used only for the table having limited number of distinct values as I have used LISTAGG function.
SELECT POM.TABLE_NAME, POM.COLUMN_NAME,
XMLTYPE(DBMS_XMLGEN.GETXML('SELECT LISTAGG(COLUMN_NAME,'','') WITHIN GROUP (ORDER BY COLUMN_NAME) VAL
FROM (SELECT DISTINCT '|| POM.COLUMN_NAME ||' COLUMN_NAME
FROM '||POM.OWNER||'.'||POM.TABLE_NAME||')')
).EXTRACT('/ROWSET/ROW/VAL/text()').GETSTRINGVAL() VAL
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%';

How to construct dynamic query over multiple tables

I'm not sure how to formulate this query. I think I need a subquery? Here's basically what I'm trying to do in a single query.
This query gives me the list of tables I need:
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name like 'fact_%';
For the list of tables given, then I want to do a count from each table_name (each table_name has the same column info I need to query)
SELECT table_name,
count (domain_key) key_count,
domain_key,
form_created_datetime
FROM (List of tables above)
GROUP BY domain_key,
form_created_datetime;
Can I iterate through each table listed in the first query to do my count?
Do this in a single query?
So expected out would be similar to this:
table_name | key_count | domain_key | form_created_datetime
--------------------------------------------------------------
fact_1 1241 5 2015-09-22 01:47:36.136789
fact_2 32 9 2015-09-22 01:47:36.136789
Example data:
abc_dev_12345=> SELECT *
FROM information_schema.tables
where table_schema='abc_dev_own_12345'
and table_name='fact_1';
table_catalog | table_schema | table_name | table_type | self_referencing_column_name | reference_generation | user_defined_type_catalog | user_defined_type_schema | use
r_defined_type_name | is_insertable_into | is_typed | commit_action
---------------+-------------------+--------------------+------------+------------------------------+----------------------+---------------------------+--------------------------+----
--------------------+--------------------+----------+---------------
abc_dev_12345 | abc_dev_own_12345 | fact_1 | BASE TABLE | | | | |
| YES | NO |
(1 row)
abc_dev_12345=> SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'abc_dev_own_12345'
AND table_name = 'fact_1';
column_name
------------------------
email_date_key
email_time_key
customer_key
form_created_datetime
client_key
domain_key
Like Eelke and Craig Ringer noted, you need a dynamic query in a plpgsql function. The basic statement you want to apply to each table is:
SELECT <table_name>, count(domain_key) AS key_count, domain_key, form_created_datetime
FROM <table_name> GROUP BY 3, 4
and you want to UNION the lot together.
The most efficient way to do this is to first build a query as a text object from the information in information_schema.tables and then EXECUTE that query. There are many ways to build that query, but I particularly like the below dirty trick with string_agg():
CREATE FUNCTION table_domains()
RETURNS TABLE (table_name varchar, key_count bigint, domain_key integer, form_created_datetime timestamp)
AS $$
DECLARE
qry text;
BEGIN
-- format() builds query for individual table
-- string_agg() UNIONs queries from all tables into a single statement
SELECT string_agg(
format('SELECT %1$I, count(domain_key), domain_key, form_created_datetime
FROM %1$I GROUP BY 3, 4', table_name),
' UNION ') INTO qry
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name LIKE 'fact_%';
-- Now EXECUTE the query
RETURN QUERY EXECUTE qry;
END;
$$ LANGUAGE plpgsql;
No need for loops or cursors so pretty efficient.
Use like you would any other table:
SELECT * FROM table_domains();