SQL: omit the FROM clause - sql

When I want to test the behavior of some PostgreSQL function FOO() I'd find it useful to execute a query like SELECT FOO(bar), bar being some data I use as a direct input without having to SELECT from a real table.
I read we can omit the FROM clause in a statement like SELECT 1 but I don't know the correct syntax for multiple inputs. I tried SELECT AVG(1, 2) for instance and it does not work.
How can I do that ?

With PostgreSQL you can use a VALUES expression to generate an inlined table:
VALUES computes a row value or set of row values specified by value expressions. It is most commonly used to generate a "constant table" within a larger command, but it can be used on its own.
Emphasis mine. Then you can apply your aggregate function to that "constant table":
select avg(x)
from (
values (1.0), (2.0)
) as t(x)
Or just select expr if expr is not an aggregate function:
select sin(1);
You could also define your own avg function that operates on an array and hide your FROM inside the function:
create function avg(double precision[]) returns double precision as $$
select avg(x) from unnest($1) as t(x);
$$ language 'sql';
And then:
=> select avg(array[1.0, 2.0, 3.0, 4.0]);
avg
-----
2.5
But that's just getting silly unless you're doing this quite often.
Also, if you're using 8.4+, you can write variadic functions and do away with the array. The internals are the same as the array version, you just add variadic to the argument list:
create function avg(variadic double precision[]) returns double precision as $$
select avg(x) from unnest($1) as t(x);
$$ language 'sql';
And then call it without the array stuff:
=> select avg(1.0, 1.2, 2.18, 11, 3.1415927);
avg
------------
3.70431854
(1 row)
Thanks to depesz for the round-about-through-google pointer to variadic function support in PostgreSQL.

To express a SET in most varieties of SQL, you need to actually express a table..
SELECT
AVG(inlineTable.val)
FROM
(
SELECT 1 AS Val
UNION ALL
SELECT 2 AS Val
)
AS inLineTable

Related

Issue While Creating Product of All Values Of Column (UDF in Snowflake)

I was trying to create a Snowflake SQL UDF
Where it computes the Values of the all values and will return the result to the user.
So firstly, i have tried the following approach
# The UDF that Returns the Result.
CREATE OR REPLACE FUNCTION PRODUCT_OF_COL_VAL()
RETURNS FLOAT
LANGUAGE SQL
AS
$$
SELECT EXP(SUM(LN(COL))) AS RESULT FROM SCHEMA.SAMPLE_TABLE
$$
The above code executes perfectly fine....
if you could see above (i have hardcoded the TABLE_NAME and COLUMN_VALUE) which is not i acutally want..
So, i have tried the following approach, by passing the column name dynamically..
create or replace function (COL VARCHAR)
RETURNS FLOAT
LANGUAGE SQL
AS
$$
SELECT EXP(SUM(LN(COL))) AS RESULT from SCHEMA.SAMPLE_TABLE
$$
But it throws the following issue...
Numeric Value 'Col' is not recognized
To elaborate more the Data type of the Column that i am passing is NUMBER(38,6)
and in the background its doing the following work..
EXP(SUM(LN(TO_DOUBLE(COL))))
Does anyone have any idea why this is running fine in Scenario 1 and not in Scenario 2 ?
Hopefully we will be able to have this kind of UDFs one day, in the meantime consider this answer using ARRAY_AGG() and a Python UDF:
Sample usage:
select count(*) how_many, multimy(array_agg(score)) multiplied, tags[0] tag
from stack_questions
where score > 0
group by tag
limit 100
The UDF in Python - which also protects against numbers beyond float's limits:
create or replace function multimy (x array)
returns float
language python
handler = 'x'
runtime_version = '3.8'
as
$$
import math
def x(x):
res = math.prod(x)
return res if math.log10(res)<308 else 'NaN'
$$
;
The parameter you defined in SQL UDF will be evaluated as a literal:
When you call the function like PRODUCT_OF_COL_VAL('Col'), the SQL statement you execute becomes:
SELECT EXP(SUM(LN('Col'))) AS RESULT from SCHEMA.SAMPLE_TABLE
What you want to do is to generate a new SQL based on parameters, and it's only possible using "stored procedures". Check this one:
Dynamic SQL in a Snowflake SQL Stored Procedure

Pass multiple values in single parameter

I want to call a function by passing multiple values on single parameter, like this:
SELECT * FROM jobTitle('270,378');
Here is my function.
CREATE OR REPLACE FUNCTION test(int)
RETURNS TABLE (job_id int, job_reference int, job_job_title text
, job_status text) AS
$$
BEGIN
RETURN QUERY
select jobs.id,jobs.reference, jobs.job_title,
ltrim(substring(jobs.status,3,char_length(jobs.status))) as status
FROM jobs ,company c
WHERE jobs."DeleteFlag" = '0'
and c.id= jobs.id and c.DeleteFlag = '0' and c.active = '1'
and (jobs.id = $1 or -1 = $1)
order by jobs.job_title;
END;
$$ LANGUAGE plpgsql;
Can someone help with the syntax? Or even provide sample code?
VARIADIC
Like #mu provided, VARIADIC is your friend. One more important detail:
You can also call a function using a VARIADIC parameter with an array type directly. Add the key word VARIADIC in the function call:
SELECT * FROM f_test(VARIADIC '{1, 2, 3}'::int[]);
is equivalent to:
SELECT * FROM f_test(1, 2, 3);
Other advice
In Postgres 9.1 or later right() with a negative length is faster and simpler to trim leading characters from a string:
right(j.status, -2)
is equivalent to:
substring(j.status, 3, char_length(jobs.status))
You have j."DeleteFlag" as well as j.DeleteFlag (without double quotes) in your query. This is probably incorrect. See:
PostgreSQL Error: Relation already exists
"DeleteFlag" = '0' indicates another problem. Unlike other RDBMS, Postgres properly supports the boolean data type. If the flag holds boolean data (true / false / NULL) use the boolean type. A character type like text would be inappropriate / inefficient.
Proper function
You don't need PL/pgSQL here. You can use a simpler SQL function:
CREATE OR REPLACE FUNCTION f_test(VARIADIC int[])
RETURNS TABLE (id int, reference int, job_title text, status text)
LANGUAGE sql AS
$func$
SELECT j.id, j.reference, j.job_title
, ltrim(right(j.status, -2)) AS status
FROM company c
JOIN job j USING (id)
WHERE c.active
AND NOT c.delete_flag
AND NOT j.delete_flag
AND (j.id = ANY($1) OR '{-1}'::int[] = $1)
ORDER BY j.job_title
$func$;
db<>fiddle here
Old sqlfiddle
Don't do strange and horrible things like converting a list of integers to a CSV string, this:
jobTitle('270,378')
is not what you want. You want to say things like this:
jobTitle(270, 378)
jobTitle(array[270, 378])
If you're going to be calling jobTitle by hand then a variadic function would probably be easiest to work with:
create or replace function jobTitle(variadic int[])
returns table (...) as $$
-- $1 will be an array if integers in here so UNNEST, IN, ANY, ... as needed
Then you can jobTitle(6), jobTitle(6, 11), jobTitle(6, 11, 23, 42), ... as needed.
If you're going to be building the jobTitle arguments in SQL then the explicit-array version would probably be easier to work with:
create or replace function jobTitle(int[])
returns table (...) as $$
-- $1 will be an array if integers in here so UNNEST, IN, ANY, ... as needed
Then you could jobTitle(array[6]), jobTitle(array[6, 11]), ... as needed and you could use all the usual array operators and functions to build argument lists for jobTitle.
I'll leave the function's internals as an exercise for the reader.

Convert numeric to string inside a user-defined function

I am trying to call/convert a numeric variable into string inside a user-defined function. I was thinking about using to_char, but it didn't pass.
My function is like this:
create or replace function ntile_loop(x numeric)
returns setof numeric as
$$
select
max("billed") as _____(to_char($1,'99')||"%"???) from
(select "billed", "id","cm",ntile(100)
over (partition by "id","cm" order by "billed")
as "percentile" from "table_all") where "percentile"=$1
group by "id","cm","percentile";
$$
language sql;
My purpose is to define a new variable "x%" as its name, with x varying as the function input. In context, x is numeric and will be called again later in the function as a numeric (this part of code wasn't included in the sample above).
What I want to return:
I simply want to return a block of code so that every time I change the percentile number, I don't have to run this block of code again and again. I'd like to calculate 5, 10, 20, 30, ....90th percentile and display all of them in the same table for each id+cm group.
That's why I was thinking about macro or function, but didn't find any solutions I like.
Thank you for your answers. Yes, I will definitely read basics while I am learning. Today's my second day to use SQL, but have to generate some results immediately.
Converting numeric to text is the least of your problems.
My purpose is to define a new variable "x%" as its name, with x
varying as the function input.
First of all: there are no variables in an SQL function. SQL functions are just wrappers for valid SQL statements. Input and output parameters can be named, but names are static, not dynamic.
You may be thinking of a PL/pgSQL function, where you have procedural elements including variables. Parameter names are still static, though. There are no dynamic variable names in plpgsql. You can execute dynamic SQL with EXECUTE but that's something different entirely.
While it is possible to declare a static variable with a name like "123%" it is really exceptionally uncommon to do so. Maybe for deliberately obfuscating code? Other than that: Don't. Use proper, simple, legal, lower case variable names without the need to double-quote and without the potential to do something unexpected after a typo.
Since the window function ntile() returns integer and you run an equality check on the result, the input parameter should be integer, not numeric.
To assign a variable in plpgsql you can use the assignment operator := for a single variable or SELECT INTO for any number of variables. Either way, you want the query to return a single row or you have to loop.
If you want the maximum billed from the chosen percentile, you don't GROUP BY x, y. That might return multiple rows and does not do what you seem to want. Use plain max(billed) without GROUP BY to get a single row.
You don't need to double quote perfectly legal column names.
A valid function might look like this. It's not exactly what you were trying to do, which cannot be done. But it may get you closer to what you actually need.
CREATE OR REPLACE FUNCTION ntile_loop(x integer)
RETURNS SETOF numeric as
$func$
DECLARE
myvar text;
BEGIN
SELECT INTO myvar max(billed)
FROM (
SELECT billed, id, cm
,ntile(100) OVER (PARTITION BY id, cm ORDER BY billed) AS tile
FROM table_all
) sub
WHERE sub.tile = $1;
-- do something with myvar, depending on the value of $1 ...
END
$func$ LANGUAGE plpgsql;
Long story short, you need to study the basics before you try to create sophisticated functions.
Plain SQL
After Q update:
I'd like to calculate 5, 10, 20, 30, ....90th percentile and display
all of them in the same table for each id+cm group.
This simple query should do it all:
SELECT id, cm, tile, max(billed) AS max_billed
FROM (
SELECT billed, id, cm
,ntile(100) OVER (PARTITION BY id, cm ORDER BY billed) AS tile
FROM table_all
) sub
WHERE (tile%10 = 0 OR tile = 5)
AND tile <= 90
GROUP BY 1,2,3
ORDER BY 1,2,3;
% .. modulo operator
GROUP BY 1,2,3 .. positional parameter
It looks like you're looking for return query execute, returning the result from a dynamic SQL statement:
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html
http://www.postgresql.org/docs/current/static/plpgsql-statements.html

Is there something like a zip() function in PostgreSQL that combines two arrays?

I have two array values of the same length in PostgreSQL:
{a,b,c} and {d,e,f}
and I'd like to combine them into
{{a,d},{b,e},{c,f}}
Is there a way to do that?
Postgres 9.5 or later
has array_agg(array expression):
array_agg ( anyarray ) → anyarray
Concatenates all the input arrays into an array of one higher
dimension. (The inputs must all have the same dimensionality, and
cannot be empty or null.)
This is a drop-in replacement for my custom aggregate function array_agg_mult() demonstrated below. It's implemented in C and considerably faster. Use it.
Postgres 9.4
Use the ROWS FROM construct or the updated unnest() which takes multiple arrays to unnest in parallel. Each can have a different length. You get (per documentation):
[...] the number of result rows in this case is that of the largest function
result, with smaller results padded with null values to match.
Use this cleaner and simpler variant:
SELECT ARRAY[a,b] AS ab
FROM unnest('{a,b,c}'::text[]
, '{d,e,f}'::text[]) x(a,b);
Postgres 9.3 or older
Simple zip()
Consider the following demo for Postgres 9.3 or earlier:
SELECT ARRAY[a,b] AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x;
Result:
ab
-------
{a,d}
{b,e}
{c,f}
Note that both arrays must have the same number of elements to unnest in parallel, or you get a cross join instead.
You can wrap this into a function, if you want to:
CREATE OR REPLACE FUNCTION zip(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT ARRAY[a,b] FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
Same result.
zip() to multi-dimensional array:
Now, if you want to aggregate that new set of arrays into one 2-dimenstional array, it gets more complicated.
SELECT ARRAY (SELECT ...)
or:
SELECT array_agg(ARRAY[a,b]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
,unnest('{d,e,f}'::text[]) AS b
) x
or:
SELECT array_agg(ARRAY[ARRAY[a,b]]) AS ab
FROM ...
will all result in the same error message (tested with pg 9.1.5):
ERROR: could not find array type for data type text[]
But there is a way around this, as we worked out under this closely related question.
Create a custom aggregate function:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
, STYPE = anyarray
, INITCOND = '{}'
);
And use it like this:
SELECT array_agg_mult(ARRAY[ARRAY[a,b]]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x
Result:
{{a,d},{b,e},{c,f}}
Note the additional ARRAY[] layer! Without it and just:
SELECT array_agg_mult(ARRAY[a,b]) AS ab
FROM ...
You get:
{a,d,b,e,c,f}
Which may be useful for other purposes.
Roll another function:
CREATE OR REPLACE FUNCTION zip2(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT array_agg_mult(ARRAY[ARRAY[a,b]])
FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip2('{a,b,c}'::text[],'{d,e,f}'::text[]); -- or any other array type
Result:
{{a,d},{b,e},{c,f}}
Here's another approach that's safe for arrays of differing lengths, using the array multi-aggregation mentioned by Erwin:
CREATE OR REPLACE FUNCTION zip(array1 anyarray, array2 anyarray) RETURNS text[]
AS $$
SELECT array_agg_mult(ARRAY[ARRAY[array1[i],array2[i]]])
FROM generate_subscripts(
CASE WHEN array_length(array1,1) >= array_length(array2,1) THEN array1 ELSE array2 END,
1
) AS subscripts(i)
$$ LANGUAGE sql;
regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
zip
---------------------
{{a,d},{b,e},{c,f}}
(1 row)
regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f,g}'::text[]);
zip
------------------------------
{{a,d},{b,e},{c,f},{NULL,g}}
(1 row)
regress=> SELECT zip('{a,b,c,z}'::text[],'{d,e,f}'::text[]);
zip
------------------------------
{{a,d},{b,e},{c,f},{z,NULL}}
(1 row)
If you want to chop off the excess rather than null-padding, just change the >= length test to <= instead.
This function does not handle the rather bizarre PostgreSQL feature that arrays may have a stating element other than 1, but in practice nobody actually uses that feature. Eg with a zero-indexed 3-element array:
regress=> SELECT zip('{a,b,c}'::text[], array_fill('z'::text, ARRAY[3], ARRAY[0]));
zip
------------------------
{{a,z},{b,z},{c,NULL}}
(1 row)
wheras Erwin's code does work with such arrays, and even with multi-dimensional arrays (by flattening them) but does not work with arrays of differing length.
Arrays are a bit special in PostgreSQL, they're a little too flexible with multi-dimensional arrays, configurable origin index, etc.
In 9.4 you'll be able to write:
SELECT array_agg_mult(ARRAY[ARRAY[a,b])
FROM unnest(array1) WITH ORDINALITY as (o,a)
NATURAL FULL OUTER JOIN
unnest(array2) WITH ORDINALITY as (o,b);
which will be a lot nicer, especially if an optimisation to scan the functions together rather than doing a sort and join goes in.

What is the correct way to define a Postgres SQL-lang function that returns multiple columns?

I have the following function, based on the SQL Functions Returning Sets section of the PG docs, which accepts two arrays of equal length, and unpacks them into a set of rows with two columns.
CREATE OR REPLACE FUNCTION unpack_test(
in_int INTEGER[],
in_double DOUBLE PRECISION[],
OUT out_int INTEGER,
OUT out_double DOUBLE PRECISION
) RETURNS SETOF RECORD AS $$
SELECT $1[rowx] AS out_int, $2[rowx] AS out_double
FROM generate_series(1, array_upper($1, 1)) AS rowx;
$$ LANGUAGE SQL STABLE;
I execute the function in PGAdmin3, like this:
SELECT unpack_test(int_col, double_col) FROM test_data
It basically works, but the output looks like this:
|unpack_test|
|record |
|-----------|
|(1, 1) |
|-----------|
|(2, 2) |
|-----------|
...
In other words, the result is a single record, as opposed to two columns. I found this question that seems to provide an answer, but it deals with a function that selects from a table directly, whereas mine accepts the columns as arguments, since it needs to generate the series used to iterate over them. I therefore can't call it using SELECT * FROM function, as suggested in that answer.
First, you'll need to create a type for the return value of your function. Something like this could work:
CREATE TYPE unpack_test_type AS (out_int int, out_double double precision);
Then change your function to return this type instead of record.
Then you can use it like this:
SELECT (unpack_test).out_int, (unpack_test).out_double FROM
(SELECT unpack_test(int_col, double_col) FROM test_data) as test
It doesn't seem possible to take a function returning a generic record type and use it in this manner.