I have a number of tables, and many of them have a timestamp column. I can get a list of every table with a timestamp column:
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp';
--------------
table_name
--------------
apples
bananas
sharks
lemons
I can get the hightest timestamp in the sharks table as follows
SELECT MAX(timestamp) FROM sharks;
-------------------------
max
-------------------------
2021-11-24 00:00:00.000
I would like to get a table like
table_name | last_updated
-------------+-------------------------
apples | 2021-11-23 00:02:00.000
bananas | 2019-10-16 00:04:00.000
sharks | 2021-11-24 00:00:00.000
lemons | 1970-01-03 10:00:00.000
I'm suspecting this requires dynamic SQL, so I'm trying something like
SELECT (
EXECUTE 'SELECT MAX(timestamp) FROM my_schema.' || table_name
) FROM (
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp'
);
But it seems like EXECUTE doesn't work in subqueries.
Performance is not particularly a concern, just producing the desired results.
Dynamic Queries cannot function outside of PL/pgSQL blocks, so you need to wrap your code in one.
I set up test tables similar to yours, with only the "timestamp" column shared between them:
drop table if exists public.sharks_70099803 cascade;
create table public.sharks_70099803
as select 1::int integer_column,
now()::timestamp as "timestamp";
drop table if exists public.bananas_70099803 cascade;
create table public.bananas_70099803
as select 'some text'::text text_column,
now()::timestamp as "timestamp";
Wrap a dynamic query in PL/pgSQL function. Inside I build a query to pull the max(timestamp) from each table with the column, then aggregate those into one query with a union all in between, that I later execute.
CREATE OR REPLACE FUNCTION public.test_70099803()
RETURNS SETOF RECORD
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
return query
execute (
select string_agg(select_per_table,' union all ')
from (
select 'select '''||
table_name||
''' as table_name, max(timestamp) from public.'||
table_name "select_per_table"
from information_schema.columns
where table_schema='public'
and column_name='timestamp'
) a
);
END
$BODY$;
select * from public.test_70099803() as t(table_name text, max_timestamp timestamp);
-- table_name | max_timestamp
--------------------+----------------------------
-- sharks_70099803 | 2021-11-24 17:12:03.24951
-- bananas_70099803 | 2021-11-24 17:12:03.253614
--(2 rows)
You can parametrise your function to be applicable to more groups of tables, or to have a predefined output table structure that'll let you just select * from test_70099803();
Related
I have nearly 800 SQL scripts ranging medium to complex SQLs.
Database is Oracle.
I need list of tables used in each of these scripts. Each script contains numerous with clauses, joins and subqueries. Is there a easier way to achieve this?
Thanks
LN
first store all queries to 1 table -scripts- with the columns - script_name, script_text
then this could work:
select script_name, script_text, object_name
from scripts s
join dba_objects do
on 1 = 1
where 1 = 1
and instr(upper(s.script_text), upper(do.object_name)) > 0
You may try explain plan for each of the statements and then check the content of plan_table. But it:
doesn't count views (plan table shows base tables of the view) and scalar functions
should be adapted for dblinks
should be thoroughly tested with local PL/SQL declarations (with function) and functions in other schemas
has influence of query rewrite and baselines
Below is an example:
begin
for i in 1..10 loop
execute immediate
'create table t'
|| i || '( id int)';
end loop;
end;
/
create view v_test
as
select *
from t5
join t6
using(id)
create function f_tab
return sys.odcinumberlist
pipelined
as
begin
null;
end;
/
explain plan for
select *
from t1, t2, t3, t4,
/*View*/
v_test,
/*Pipelined function*/
table(f_tab())
select
object_owner,
object_name,
object_type
from plan_table
where
(object_owner, object_name) in (
select
f.owner,
f.object_name
from all_objects f
where f.object_type in (
'TABLE'
, 'VIEW'
)
) or (object_name, object_type) in (
select
f.object_name,
'PROCEDURE'
from user_objects f
where f.object_type in (
'FUNCTION'
, 'PROCEDURE'
)
)
order by 1
OBJECT_OWNER | OBJECT_NAME | OBJECT_TYPE
:-------------------------- | :---------- | :----------
FIDDLE_TQYMTNVUFUWHRWJEENKX | T1 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T2 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T3 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T4 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T5 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T6 | TABLE
null | F_TAB | PROCEDURE
db<>fiddle here
Anyway, I think that this sort of task should be addressed to SQL parser in any way instead of any string manipulation, because only parser exactly knows what objects were used
For now, I am first running the following query:
select group_name, avg(numeric_field) as avg_value, count(group_name) as n from table_name group by group_name order by n desc;
Suppose I get output:
group_name | avg_value | n
----------------------------------------
nice_group_name| 1566.353 | 2034
other_group | 235.43 | 1390
.
.
.
I am then deleting records in each group one by one manually using the following query for each group:
delete from table_name where group_name = 'nice_group_name' and numeric_field < 1567;
Here 1567 is the approximate avg_value for nice_group_name.
How can I run the second query for all rows of the result of first query automatically?
You can use a correlated subquery:
delete from table_name
where numeric_field < (select avg(t2.numeric_field)
from table_name t2
where t2.group_name = table_name.group_name
);
For performance, you want an index on tablename(group_name, numeric_field).
If you have few groups, you might find this more efficient:
with a as (
select group_name, avg(numeric_field) as anf
from table_name
group by group_name
)
delete from table_name
where numeric_field < (select a.anf from a where a.group_name = table_name.group_name);
If table_name has some primary key field (say id) then use the following:
alter table table_name rename to bak;
create temp table avg_val as
select group_name as g, avg(numeric_field) as a from bak
group by group_name;
create table table_name as
select * from bak where id in (
select bak.id from
avg_val join bak on bak.group_name = avg_val.g
where avg_val.a <= bak.numeric_field
);
Check table_name. If all has went well you can delete the backed up old table:
drop table bak;
Briefly, the steps are:
Rename the original table
Create a temporary table of average value for each group
Create a new table with all rows from original table where numeric_field is not less than average for that group.
delete the renamed original table.
I have a Hive table named customer, which has a column named cust_id of list type, with following values:
cust_id
[123,234,456,567]
[345,457,67]
[89,23,34]
Now I want to read only this specific column cust_id in my select query, which can give all these list values as following separate values of this column cust_id:
cust_id
123
234
456
567
345
457
67
89
23
34
Basically I want to fetch all the values of cust_id from this table as one column, to use these values in the where exists or where in clause of my other query.
A solution for this would be highly appreciated.
AFAIK this is what you are looking for from hive manual..
Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row.
for example
SELECT cust_id
FROM mytable LATERAL VIEW explode(cust_id) mytab AS cust_id;
Full example :
drop table customer_tab;
create table customer_tab ( cust_id array<String>);
INSERT INTO table customer_tab select array('123','234','456','567');
INSERT INTO table customer_tab select array('345','457','67');
INSERT INTO table customer_tab select array('89','23','34');
select * from customer_tab;
-- customer_tab.cust_id
-- ["123","234","456","567"]
-- ["345","457","67"]
-- ["89","23","34"]
SELECT mytab.cust_id
FROM customer_tab LATERAL VIEW explode(cust_id) mytab AS cust_id;
mytab.cust_id
123
234
456
567
345
457
67
89
23
34
SELECT POM.TABLE_NAME, POM.COLUMN_NAME
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%'
I want to see all possible values in columns on the list(in one row if possible). How can i modify this select to do it?
i want soemthing like this
TABLE_NAME | COLUMN_NAME |VALUES
-----------| ----------- | -------
CAR | COLOR | RED,GREEN
You can use the below query for your requirement. It fetched distinct column values for a table.
It can be used only for the table having limited number of distinct values as I have used LISTAGG function.
SELECT POM.TABLE_NAME, POM.COLUMN_NAME,
XMLTYPE(DBMS_XMLGEN.GETXML('SELECT LISTAGG(COLUMN_NAME,'','') WITHIN GROUP (ORDER BY COLUMN_NAME) VAL
FROM (SELECT DISTINCT '|| POM.COLUMN_NAME ||' COLUMN_NAME
FROM '||POM.OWNER||'.'||POM.TABLE_NAME||')')
).EXTRACT('/ROWSET/ROW/VAL/text()').GETSTRINGVAL() VAL
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%';
I'm not sure how to formulate this query. I think I need a subquery? Here's basically what I'm trying to do in a single query.
This query gives me the list of tables I need:
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name like 'fact_%';
For the list of tables given, then I want to do a count from each table_name (each table_name has the same column info I need to query)
SELECT table_name,
count (domain_key) key_count,
domain_key,
form_created_datetime
FROM (List of tables above)
GROUP BY domain_key,
form_created_datetime;
Can I iterate through each table listed in the first query to do my count?
Do this in a single query?
So expected out would be similar to this:
table_name | key_count | domain_key | form_created_datetime
--------------------------------------------------------------
fact_1 1241 5 2015-09-22 01:47:36.136789
fact_2 32 9 2015-09-22 01:47:36.136789
Example data:
abc_dev_12345=> SELECT *
FROM information_schema.tables
where table_schema='abc_dev_own_12345'
and table_name='fact_1';
table_catalog | table_schema | table_name | table_type | self_referencing_column_name | reference_generation | user_defined_type_catalog | user_defined_type_schema | use
r_defined_type_name | is_insertable_into | is_typed | commit_action
---------------+-------------------+--------------------+------------+------------------------------+----------------------+---------------------------+--------------------------+----
--------------------+--------------------+----------+---------------
abc_dev_12345 | abc_dev_own_12345 | fact_1 | BASE TABLE | | | | |
| YES | NO |
(1 row)
abc_dev_12345=> SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'abc_dev_own_12345'
AND table_name = 'fact_1';
column_name
------------------------
email_date_key
email_time_key
customer_key
form_created_datetime
client_key
domain_key
Like Eelke and Craig Ringer noted, you need a dynamic query in a plpgsql function. The basic statement you want to apply to each table is:
SELECT <table_name>, count(domain_key) AS key_count, domain_key, form_created_datetime
FROM <table_name> GROUP BY 3, 4
and you want to UNION the lot together.
The most efficient way to do this is to first build a query as a text object from the information in information_schema.tables and then EXECUTE that query. There are many ways to build that query, but I particularly like the below dirty trick with string_agg():
CREATE FUNCTION table_domains()
RETURNS TABLE (table_name varchar, key_count bigint, domain_key integer, form_created_datetime timestamp)
AS $$
DECLARE
qry text;
BEGIN
-- format() builds query for individual table
-- string_agg() UNIONs queries from all tables into a single statement
SELECT string_agg(
format('SELECT %1$I, count(domain_key), domain_key, form_created_datetime
FROM %1$I GROUP BY 3, 4', table_name),
' UNION ') INTO qry
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name LIKE 'fact_%';
-- Now EXECUTE the query
RETURN QUERY EXECUTE qry;
END;
$$ LANGUAGE plpgsql;
No need for loops or cursors so pretty efficient.
Use like you would any other table:
SELECT * FROM table_domains();