Greenplum PSQL Format for Dynamic Query - sql

Firstly, thank you in advance for any help with my relatively simple issue below. It's honestly driving my insane!
Simply, I'm trying to select some metrics on all tables in a schema. However, this specifically includes Partitioned tables in Greenplum (which, for those who don't know it, have a single parent table named X and then child tables named X_1_prt_3, X_1_prt_4, etc).
As a result, my query in trying to get the total table size for the single partitioned table X is as follows:
-- Part 1
select cast(sum(sotaidtablesize) as bigint) / 1024 / 1024 as "Table Size (MB)"
from gp_toolkit.gp_size_of_table_and_indexes_disk
where sotaidschemaname = 'Y'
and sotaidtablename like 'X%'
;
This sums up the table size for any table named X or similar thereafter, which is effectively what I want. But this is just a part of a bigger query.. I don't want to actually specify the schema and table, I want it to be:
-- Part 2
where sotaidschemaname = t4.nspname
and sotaidtablename like 't4.relname%'
but that sadly doesn't just work (what a world that would be!!). I've tried the following, which I think is close, but I cannot get it to return any value other than NULL :
-- Part 3
and sotaidtablename like quote_literal(format( '%I', tablename )::regclass)
where tablename is a column from another part (I already use this column in another format which correctly works, so I know this bit in particular isn't the issue).
Thank you in advance to anyone for any help!
Regards,
Vinny

I find it easier using gp_size_of_table_and_indexes_disk.sotaidoid on the join clause rather than (sotaidschemaname, sotaidtablename).
For example:
SELECT pg_namespace.nspname AS schema,
pg_class.relname AS relation,
pg_size_pretty(sotd.sotdsize::BIGINT) as tablesize,
pg_size_pretty(sotd.sotdtoastsize::BIGINT) as toastsize,
pg_size_pretty(sotd.sotdadditionalsize::BIGINT) as othersize,
pg_size_pretty(sotaid.sotaidtablesize::BIGINT) as tabledisksize,
pg_size_pretty(sotaid.sotaididxsize::BIGINT) as indexsize
FROM pg_class
LEFT JOIN pg_stat_user_tables
ON pg_stat_user_tables.relid = pg_class.oid
LEFT JOIN gp_toolkit.gp_size_of_table_disk sotd
ON sotd.sotdoid = pg_class.oid
LEFT JOIN gp_toolkit.gp_size_of_table_and_indexes_disk sotaid
ON sotaid.sotaidoid = pg_class.oid
LEFT JOIN pg_namespace
ON pg_namespace.oid = pg_class.relnamespace
WHERE
pg_class.relkind = 'r'
AND relstorage != 'x'
AND pg_namespace.nspname NOT IN ('information_schema', 'madlib', 'pg_catalog', 'gptext')
AND pg_class.relname NOT IN ('spatial_ref_sys');

Related

Determine datatypes of columns - SQL selection

Is it possible to determine the type of data of each column after a SQL selection, based on received results? I know it is possible though information_schema.columns, but the data I receive comes from multiple tables and is joint together and the data is renamed. Besides that, I'm not able to see or use this query or execute other queries myself.
My job is to store this received data in another table, but without knowing beforehand what I will receive. I'm obviously able to check for example if a certain column contains numbers or text, but not if it is originally stored as a TINYINT(1) or a BIGINT(128). How to approach this? To clarify, it is alright if the data-types of the columns of the source and destination aren't entirely the same, but I don't want to reserve too much space beforehand (or too less for that matter).
As I'm typing, I realize I'm formulation the question wrong. What would be the best approach to handle described situation? I thought about altering tables on the run (e.g. increasing size if needed), but that seems a bit, well, wrong and not the proper way.
Thanks
Can you issue the following query about your new table after you create it?
SELECT *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'JoinedQueryResults'
Is the query too big to run before knowing how big the results will be? Get a idea of how many rows it may return, but the trick with queries with joins is to group on the columns you are joining on, to help your estimate return more quickly. Here's of an example of just returning a row count from the query above which would have created the JoinedQueryResults table above.
SELECT SUM(A.NumRows * B.NumRows)
FROM (SELECT ID, COUNT(*) AS NumRows
FROM TableA
GROUP BY ID) AS A
INNER JOIN (SELECT ID, COUNT(*) AS NumRows
FROM TableB
GROUP BY ID) AS B ON A.ID = B.ID
The query above will run faster if all you need is a record count to help you estimate a size.
Also try instantiating a table for your results with a query like this.
SELECT TOP 0 *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID

Filter data based on a condition in Redshift

I came across one more issue while resolving the previous problem:
So, I have this data:
For each route -> I want to get only those rows where ob exists in rb. Hence, this output:
I know this also needs to worked through a temp table. Earlier I was doing this as suggested by #smb:
select * from table_name as a
inner join
(select load, rb from table_name
group by load, rb) as b
on a.load = b.load
and
a.ob = b.rb
but this solution will give me:
And this is incorrect as it doesn’t take into account the route.
It’d be great if you guys could help :)
Thanks
updated to add in route -
The answer would be in a nested join. The concept is
Get a list of distinct pairs of obs and rbs
Join to the original data where ob = ob and lane = rb
Code as follows:
select * from table_name as a
inner join
(select route, ob, rb from table_name
group by route, ob, rb) as b
on a.ob = b.ob
and
a.lane = b.rb
and
a.route = b.route
I have done an example using a temp table here so you can see it in action.
Note that if your data is large you should consider making sure your dist key in the join. This makes sure that redshift knows that no rows need to be joined across different compute nodes so it can execute multiple local joins and therefore be more efficient.
few ways (in statement is simple but often slower on larger sets)
select *
from table
where lane in (select rb from table)
or (i find exists faster on larger sets, but try both )
select *
from table
where exists (select 'x' from table t_inner
where t_inner.rb = table.lane)
either way create an index on the rb column for speed

Get overlapped data from two tables with same structure, giving prefrence to other : Oracle

I am completely lost thinking about how do I solve this challenge of data retrieving.
I have this two tables: MY_DATA and MY_DATA_CHANGE in my Oracle database.
I wanted to select data some thing like this:
SELECT ALL COLUMNS
FROM MY_DATA
WHERE ID IN (1,2,4,5) FROM MY_DATA
BUT IF ANY ID IS PRESENT IN (1,2,4,5) IN MY_DATA_CHANGE
THEN USE ROW FROM MY_DATA_CHANGE
So my overall result must look like:
I can only use SQL not stored procedure, as this query is going to be part of another very big query (legacy code written long back) (will be used in Crystal reports tool to create report).
So guys please help. My column data contains CLOB and the usual UNION logic does not work on them.
How do I do it ?
SELECT
m.Id
,COALESCE(c.CLOB1,m.CLOB1) as CLOB1
,COALESCE(c.CLOB2,m.CLOB2) as CLOB2
FROM
MY_DATA m
LEFT JOIN MY_DATA_CHANGE c
ON m.Id = c.Id
WHERE
m.ID IN (1,2,4,5)
The way I would choose to do that is via a LEFT JOIN between the two tables and then use COALESCE().

Translating query from Firebird to PostgreSQL

I have a Firebird query which I should rewrite into PostgreSQL code.
SELECT TRIM(RL.RDB$RELATION_NAME), TRIM(FR.RDB$FIELD_NAME), FS.RDB$FIELD_TYPE
FROM RDB$RELATIONS RL
LEFT OUTER JOIN RDB$RELATION_FIELDS FR ON FR.RDB$RELATION_NAME = RL.RDB$RELATION_NAME
LEFT OUTER JOIN RDB$FIELDS FS ON FS.RDB$FIELD_NAME = FR.RDB$FIELD_SOURCE
WHERE (RL.RDB$VIEW_BLR IS NULL)
ORDER BY RL.RDB$RELATION_NAME, FR.RDB$FIELD_NAME
I understand SQL, but have no idea, how to work with this system tables like RDB$RELATIONS etc. It would be really great if someone helped me with this, but even some links with this tables explanation will be OK.
This piece of query is in C++ code, and when I'm trying to do this :
pqxx::connection conn(serverAddress.str());
pqxx::work trans(conn);
pqxx::result res(trans.exec(/*there is this SQL query*/));//and there is a mistake
it writes that:
RDB$RELATIONS doesn't exist.
Postgres has another way of storing information about system content. This is called System Catalogs.
In Firebird your query basically returns a row for every column of a table in every schema with an additional Integer column that maps to a field datatype.
In Postgres using system tables in pg_catalog schema something similar can be achieved using this query:
SELECT
TRIM(c.relname) AS table_name, TRIM(a.attname) AS column_name, a.atttypid AS field_type
FROM pg_class c
LEFT JOIN pg_attribute a ON
c.oid = a.attrelid
AND a.attnum > 0 -- only ordinary columns, without system ones
WHERE c.relkind = 'r' -- only tables
ORDER BY 1,2
Above query does return system catalogs as well. If you'd like to exclude them you need to add another JOIN to pg_namespace and a where clause with pg_namespace.nspname <> 'pg_catalog', because this is the schema where system catalogs are stored.
If you'd also like to see datatype names instead of their representative numbers add a JOIN to pg_type.
Information schema consists of collection of views. In most cases you don't need the entire SQL query that stands behind the view, so using system tables will give you better performance. You can inspect views definition though, just to get you started on the tables and conditions used to form an output.
I think you are looking for the information_schema.
The tables are listed here: https://www.postgresql.org/docs/current/static/information-schema.html
So for example you can use:
select * from information_schema.tables;
select * from information_schema.columns;

Getting tables with no rows without counting

I got a huge PostgreSQL database with lots of tables. I want learn all empty tables without counting each tables for performance reasons (Some of the tables have several millions rows).
This query will give you an approximate result, but does not include counting table rows.
SELECT relname FROM pg_class JOIN pg_namespace ON (pg_class.relnamespace = pg_namespace.oid) WHERE relpages = 0 AND pg_namespace.nspname = 'public';
This will work best after a VACUUM ANALYZE.
as per http://wiki.postgresql.org/wiki/Slow_Counting , one solution is to first find the tables with small 'reltuples' via
select relname from pg_class where reltuples < X
and then test for emptiness only those.
so u want to see table structure, right? try pg admin
u can open table and see all structure eg datatype, index, function and etc