How to get postgresql to interpret column names from sub-select? - sql

I have the following query in PostgreSQL 9.4.1 on x86_64-apple-darwin14.1.0, compiled by Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn), 64-bit.
SELECT (
SELECT string_agg(trim(cols::text, '()'), ', ')
FROM (
SELECT 'dog.' || column_name
FROM information_schema.columns
WHERE table_name='dog')
AS cols
)
FROM dog;
This produces the following output once for every row in the dogs table:
dog.id, dog.created_date, dog.updated_date, dog.author_id, dog.mother_id, dog.start_date, dog.end_date, dog.session_type_id, dog.note, dog.cancelled, dog.life_phase
But the problem is that the outer select query has not dereferenced the column names and interpreted them as column names but rather labeled the column string_agg and spit out the column names once for every row in the dog table.
How do I get postgresql to ensure that the inner select generating the column names is interpreted by the outer select properly?

You are trying to execute a dynamic SQL. You need a plpgsql function to do that. Example:
create or replace function select_dogs()
returns setof dog language plpgsql as $$
declare
list_of_columns text;
begin
SELECT string_agg(trim(cols::text, '()'), ', ')
INTO list_of_columns
FROM (
SELECT 'dog.' || column_name
FROM information_schema.columns
WHERE table_name='dog'
) AS cols;
return query execute format (
'select %s from dog', list_of_columns);
end $$;
select * from select_dogs();
Read more in the documentation: Executing Dynamic Commands.

Related

Calculate Avg in for loop for columns in a table in PostgreSQL

I come from the Python world, where many things are colorful and easy. Now I'm trying to make my way into SQL, because well, I want to challenge myself outside of pandas, and gain the important experience in SQL.
That said, I have the following question.
I have the following snippet:
do
$do$
declare i varchar(50);
declare average int;
begin
for i in (
select column_name
FROM information_schema.columns
where table_schema = 'public'
and table_name = 'example_table'
and column_name like '%suffix') loop
--raise notice 'Value: %', i;
select AVG(i) as average from example_table;
raise notice 'Value: %', i;
end loop;
end;
$do$
As I learned in the documentation for SQL, I found that for loops are only possible in a do block, and that certain variables have to be declared. I did this for the i variable which contains the name of the column I want to iterate. But I want to get the average of the column and add it as a row in a table with two columns one for the feature (i variable), and the average for this column. I thought that would be possible with my code snippet above, but I receive an error message that says that Function avg(character varying) does not exist.
When I use the function AVG outside of a for loop for a single column, it does retrieve the average value of this numeric column, but when I do it in a for loop, says that this aggregate function does not exists.
Could someone help me out with this please?
UPDATE:
I was taking a step back and trying to make the story shorter:
select column_name
FROM information_schema.columns
where table_schema = 'public'
and table_name = 'my_table'
and column_name like '%wildcard';
This snippet yields a table with a column called column_name and all the
columns that fullfil the constraints stated in the where statement.
I just want to add a column with the average value of those columns.
If you only need it for a single table, you can use:
select x.col, avg(x.value::numeric)
from example_table t
cross join lateral (
select col, value
from jsonb_each(to_jsonb(t)) as e(col, value)
where jsonb_typeof(e.value) = 'number'
) x
group by x.col;
The "magic" is in converting each row from the table into a JSON value. This is what to_jsonb(t) does (t is the alias given to the table in the main query). So we get something like {"name": "Bla", "value": 3.14, "length": 10, "some_date": "2022-03-02"}. So each column name is a key in the JSON value.
This json is then turned into one row per column (=key) using the jsonb_each() function but only rows (=columns) that have a number value are retained. So the derived table returns one row per column and row in the table. The outer query then simply aggregates this per column. The drawback is, you need to write one query for each table.
If you need some kind of report for all tables in a schema, you can use a variation of this answer
with all_selects as (
select table_schema, table_name, 'select '||string_agg(format('avg(%I) as %I', column_name, column_name), ', ')||format(' from %I.%I', table_schema, table_name) as query
from information_schema.columns
where table_schema = 'public'
and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
group by table_schema, table_name
), all_aggregates as (
select table_schema, table_name,
query_to_xml(query, true, true, '') as result
from all_selects
)
select ag.table_schema, ag.table_name, r.column_name, nullif(r.average, '')::numeric as average
from all_aggregates ag
cross join xmltable('/row/*' passing result
columns column_name text path 'local-name()',
average text path '.') as r
This is a bit more tricky. The first part all_selects builds a query for each table in the schema public to apply the avg() aggregate on each column that can contain a number (where data type in (...))
So e.g. this returns a string select avg(value) as value, avg(length) as length from example_table
The next step is running each of these queries through query_to_xml() (sadly there is no built-in query_to_jsonb()).
query_to_xml() would return something like:
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<value>12.345</balance>
<length>42</user_id>
</row>
So one tag for each column (which is the result of the avg(..) function).
The final select then uses xmltable() to turn each tag from the XML result into a row returning the column name and value
Online example
Of course you can do this in PL/pgSQL as well:
do
$do$
declare
l_rec record;
l_sql text;
l_average numeric;
begin
for l_rec in
select table_schema, table_name, column_name
from information_schema.columns
where table_schema = 'public'
and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
loop
l_sql := format('select avg(%I) from %I.%I', l_rec.column_name, l_rec.table_schema, l_rec.table_name);
execute l_sql
into l_average;
raise notice 'Average for %.% is: %', l_rec.table_name, l_rec.column_name, l_average;
end loop;
end;
$do$
Note condition on the column data_type to only process columns that can be averaged. This is however more costly as it runs one query per column, not per table.

Oracle function with select all from tables

SELECT DISTINCT L.* FROM LABALES L , MATCHES M
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID
I need to create function with this select, I tried this but it doesn't work.
CREATE OR REPLACE FUNCTION getSoccerLists
RETURN varchar2 IS
list varchar2(2000);
BEGIN
SELECT DISTINCT L.* FROM LABALES L , MATCHES M
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID
return list;
END;
How will I create function that returns all from table L.
Thanks
You may use implicit result using DBMS_SQL.RETURN_RESULT(Oracle12c and above) in a procedure using a cursor to your query.
CREATE OR REPLACE PROCEDURE getSoccerLists
AS
x SYS_REFCURSOR;
BEGIN
OPEN x FOR SELECT DISTINCT L.* FROM LABALES L
JOIN MATCHES M ON ( 1=1 ) -- join condition
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID;
DBMS_SQL.RETURN_RESULT(x);
END;
/
then simply call the procedure
EXEC getSoccerLists;
For lower versions(Oracle 11g) , you may use a print command to display the cursor's o/p passing ref cursor as out parameter.
CREATE OR REPLACE PROCEDURE getSoccerLists (x OUT SYS_REFCURSOR)
AS
BEGIN
OPEN x FOR SELECT DISTINCT L.* FROM LABALES L
JOIN MATCHES M ON ( 1=1 ) -- join condition
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID;
END;
/
Then, in SQL* Plus or running as script in SQL developer and Toad, you may get the results using this.
VARIABLE r REFCURSOR;
EXEC getSoccerLists (:r);
PRINT r;
Another option is to use TABLE function by defining a collection of the record type of the result within a package.
Refer Create an Oracle function that returns a table
I guess this questions is a repetition of the your previously asked question, where you wanted to get all the columns of tables but into separate column. I already answered in stating this you cannot do if you call your function via a SELECT statement. If you call your function in a Anoymous block you can display it in separate columns.
Here Oracle function returning all columns from tables
Alternatively, you can get the results separated by a comma(,) or pipe (|) as below:
CREATE OR REPLACE
FUNCTION getSoccerLists
RETURN VARCHAR2
IS
list VARCHAR2(2000);
BEGIN
SELECT col1
||','
||col2
||','
||col2
INTO LIST
FROM SOCCER_PREMATCH_LISTS L ,
SOCCER_PREMATCH_MATCHES M
WHERE M.LIST LIKE '%' || (L.SUB_LIST) || '%'
AND (TO_TIMESTAMP((M.M_DATE || ' ' || M.M_TIME), 'DD.MM.YYYY HH24:MI') >
(SELECT SYSTIMESTAMP AT TIME ZONE 'CET' FROM DUAL
))
ORDER BY L.ID");
Return list;
End;
Note here if the column size increased 2000 chars then again you will lose the data.
Edit:
From your comments
I want it to return a table set of results.
You then need to create a table of varchar and then return it from the function. See below:
CREATE TYPE var IS TABLE OF VARCHAR2(2000);
/
CREATE OR REPLACE
FUNCTION getSoccerLists
RETURN var
IS
--Initialization
list VAR :=var();
BEGIN
SELECT NSO ||',' ||NAME BULK COLLECT INTO LIST FROM TEST;
RETURN list;
END;
Execution:
select * from table(getSoccerLists);
Note: Here in the function i have used a table called test and its column. You replace your table with its columnname.
Edit 2:
--Create a object with columns same as your select statement
CREATE TYPE v_var IS OBJECT
(
col1 NUMBER,
col2 VARCHAR2(10)
)
/
--Create a table of your object
CREATE OR REPLACE TYPE var IS TABLE OF v_var;
/
CREATE OR REPLACE FUNCTION getSoccerLists
RETURN var
IS
--Initialization
list VAR :=var();
BEGIN
--You above object should have same columns with same data type as you are selecting here
SELECT v_var( NSO ,NAME) BULK COLLECT INTO LIST FROM TEST;
RETURN list;
END;
Execution:
select * from table(getSoccerLists);
This is not an answer on how to build a function for this, as I'd recommend to make this a view instead:
CREATE OR REPLACE VIEW view_soccer_list AS
SELECT *
FROM soccer_prematch_lists l
WHERE EXISTS
(
SELECT *
FROM soccer_prematch_matches m
WHERE m.list LIKE '%' || (l.sub_list) || '%'
AND TO_TIMESTAMP((m.m_date || ' ' || m.m_time), 'DD.MM.YYYY HH24:MI') >
(SELECT SYSTIMESTAMP AT TIME ZONE 'CET' FROM DUAL)
);
Then call it in a query:
SELECT * FROM view_soccer_list ORDER BY id;
(It makes no sense to put an ORDER BY clause in a view, because you access the view like a table, and table data is considered unordered, so you could not rely on that order. The same is true for a pipelined function youd access with FROM TABLE (getSoccerLists). Always put the ORDER BY clause in your final queries instead.)

Select a row and insert it with different IDs n times

I am trying to come up with a script in Postgres that will select the first row in a table and insert that row x number of times back into the same table.
Here is what I have:
INSERT INTO campaign (select column_name from campaign)
SELECT x.id from generate_series(50, 500) as x(id);
The above obviously doesn't work.
Just get the syntax for the INSERT statement right:
INSERT INTO campaign (id, column_name)
SELECT g.g, t.column_name
FROM (SELECT column_name FROM campaign LIMIT 1) t -- picking arbitrary row
,generate_series(50, 500) g(g); -- 451 times
The CROSS JOIN to generate_series() multiplies each selected row.
Selecting one arbitrary row, since the question didn't define "first". There is no natural order in a table. To pick a certain row, add ORDER BY and/or WHERE.
There is no syntactical shortcut to select all columns except the one named "id". You have to use the complete row or provide a list of selected columns.
Automation with dynamic SQL
To get around this, build the query string from catalog tables (or the information schema) and use EXECUTE in a plpgsql function (or some other procedural language). Only using pg_attribute.
format() requires Postgres 9.1 or later.
CREATE OR REPLACE FUNCTION f_multiply_row(_tbl regclass
, _idname text
, _minid int
, _maxid int)
RETURNS void AS
$func$
BEGIN
EXECUTE (
SELECT format('INSERT INTO %1$s (%2$I, %3$s)
SELECT g.g, %3$s
FROM (SELECT * FROM %1$s LIMIT 1) t
,generate_series($1, $2) g(g)'
, _tbl
, _idname
, string_agg(quote_ident(attname), ', ')
)
FROM pg_attribute
WHERE attrelid = _tbl
AND attname <> _idname -- exclude id column
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
)
USING _minid, _maxid;
END
$func$ LANGUAGE plpgsql;
Call in your case:
SELECT f_multiply_row('campaign', 'id', 50, 500);
SQL Fiddle.
Major points
Properly escape identifiers to avoid SQL injection. Using format() and regclass for the table name. Details:
Table name as a PostgreSQL function parameter
_idname is the column name to exclude ('id' in your case). Case sensitive!
Pass values in the USING clause. $1 and $2 in generate_series($1, $2) reference those parameters (not the function parameters).
More explanation in related answers. Try a search:
https://stackoverflow.com/search?q=[plpgsql]+[dynamic-sql]+format+pg_attribute

nzsql - Converting a subquery into columns for another select

Goal: Use a given subquery's results (a single column with many rows of names) to act as the outer select's selection field.
Currently, my subquery is the following:
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'test_table' AND column_name not in ('colRemove');
What I am doing in this subquery is grabbing all the column names from a table (i.e. test_table) and outputting all except for the column name specified (i.e. colRemove). As stated in the "goal", I want to use this subquery as such:
SELECT (*enter subquery from above here*)
FROM actual_table
WHERE (*enter specific conditions*)
I am working on a Netezza SQL server that is version 7.0.4.4. Ideally, I would like to make the entire query executable in one line, but for now, a working solution would be much appreciated. Thanks!
Note: I do not believe that the SQL extensions has been installed (i.e. arrays), but I will need to double check this.
A year too late, here's the best I can come up with but, as you already noticed, it requires a stored procedure to do the dynamic SQL. The stored proc creates a view with the all the columns from the source table minus the one you want to exclude.
-- Create test data.
CREATE TABLE test (firstcol INTEGER, secondcol INTEGER, thirdcol INTEGER);
INSERT INTO test (firstcol, secondcol, thirdcol) VALUES (1, 2, 3);
INSERT INTO test (firstcol, secondcol, thirdcol) VALUES (4, 5, 6);
-- Install stored procedure.
CREATE OR REPLACE PROCEDURE CreateLimitedView (varchar(ANY), varchar(ANY)) RETURNS BOOLEAN
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
tableName ALIAS FOR $1;
columnToExclude ALIAS FOR $2;
colRec RECORD;
cols VARCHAR(2000); -- Adjust as needed.
isfirstcol BOOLEAN;
BEGIN
isfirstcol := true;
FOR colRec IN EXECUTE
'SELECT ATTNAME AS NAME FROM _V_RELATION_COLUMN
WHERE
NAME=UPPER('||quote_literal(tableName)||')
AND ATTNAME <> UPPER('||quote_literal(columnToExclude)||')
ORDER BY ATTNUM'
LOOP
IF isfirstcol THEN
cols := colRec.NAME;
ELSE
cols := cols || ', ' || colRec.NAME;
END IF;
isfirstcol := false;
END LOOP;
-- Should really check if 'LimitedView' already exists as a view, table or synonym.
EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW LimitedView AS SELECT ' || cols || ' FROM ' || quote_ident(tableName);
RETURN true;
END;
END_PROC
;
-- Run the stored proc to create the view.
CALL CreateLimitedView('test', 'secondcol');
-- Select results from the view.
SELECT * FROM limitedView WHERE firstcol = 4;
FIRSTCOL | THIRDCOL
----------+----------
4 | 6
You could have the stored proc return a resultset directly but then you wouldn't be able to filter results with a WHERE clause.

Update multiple columns that start with a specific string

I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.
I am trying to figure out if there is a way to do something to the affect of:
UPDATE table SET contact_info_address_* = null;
If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.
You need dynamic SQL for this. So you must defend against possible SQL injection.
Basic query
The basic query to generate the DML command needed can look like this:
SELECT format('UPDATE tbl SET (%s) = (%s)'
,string_agg (quote_ident(attname), ', ')
,string_agg ('NULL', ', ')
)
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND NOT attisdropped
AND attnum > 0
AND attname ~~ 'foo_%';
Returns:
UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);
Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.
I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:
Get column names and data types of a query, table or view
quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.
string_agg() requires 9.0+.
Full automation with PL/pgSQL function
CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
, OUT row_ct int, OUT col_ct bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
SELECT INTO _sql, col_ct
format('UPDATE tbl SET (%s) = (%s)'
, string_agg (quote_ident(attname), ', ')
, string_agg ('NULL', ', ')
)
, count(*)
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped columns
AND attnum > 0 -- no system columns
AND attname LIKE _col_pattern; -- only columns matching pattern
-- RAISE NOTICE '%', _sql; -- output SQL for debugging
EXECUTE _sql;
GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;
COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';
Call:
SELECT * FROM f_update_cols('myschema.tbl', 'foo%');
To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.
I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.
The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:
Table name as a PostgreSQL function parameter
db<>fiddle here
Old sqlfiddle
There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.
CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN
EXECUTE (select 'UPDATE table SET '
|| array_to_string(array(
select column_name::text
from information_schema.columns
where table_name = 'table'
and column_name::text like 'contact_info_address_%'
),' = NULL,')
|| ' = NULL');
RETURN true;
END; $$ LANGUAGE plpgsql;
-- run the function
SELECT reset_cols();
It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)