Using WITH... AS (CTE) within a Postgres Function - sql

in all the below descriptions, I have changed the names of variables, functions and datasets so as to make it easier for you to follow the code. What I want to do is create a new function, myFx, which takes in an argument of var_a and outputs a table with 3 columns - var_a, var_b and var_c.
The code is shown below, the bulk of which is a Common Table Expression (CTE) that uses UNION to combine the rows from three SELECT statements to make an interim table result, t_union. The lowermost chunk of code is the one which outputs what I desire through using t_union and self-joining it to itself.
While the paragraphs of code below work WITHOUT me typing the function parameters (CREATE OR REPLACE FUNCTION... and RETURNS TABLE...), once I add the function parameters in, I instead get an error indicating that I cannot use WITH as in (WITH t_union). Could anyone please advise me what the problem is here? Thanks so much!
CREATE OR REPLACE FUNCTION myFx (var_a varchar(2))
RETURNS TABLE (var_a varchar(2),
var_b character,
var_c character )
WITH t_union
AS (SELECT x, y, z,
RANK() OVER (PARTITION BY x ORDER BY (y COLLATE "C") ASC) AS stp_pos
FROM DATASET1
UNION ALL
SELECT x, y, z,
RANK() OVER (PARTITION BY x ORDER BY (y COLLATE "C") ASC) AS stp_pos
FROM DATASET2
UNION ALL
SELECT x, y, z,
RANK() OVER (PARTITION BY x ORDER BY (y COLLATE "C") ASC) AS stp_pos
FROM DATASET3
SELECT BS.atoc_code, L1.train_uid, L1.stp_indicator, L1.location AS loc1, L2.location AS loc2
FROM t_union L1 JOIN t_union L2
ON L1.x=L2.x AND L1.y<L2.y
JOIN DATASET3 BS ON L1.x=BS.x;
This is the error code I get:
ERROR: syntax error at or near "WITH"
LINE 9: WITH t_union
^
SQL state: 42601
Character: 197

Related

concat two strings and put smaller string at first in sql server

for concating two varchars from columns A and B ,like "1923X" and "11459" with the hashtag, while I always want the smallest string become at first, what should I do in SQL server query?
inputs:
Two Columns
A="1923X"
B="11459"
procedure:
while we are checking two inputs from right to left, in this example the second character value in B (1) is smaller than the second character in A (9) so B is smaller.
result: new column C
"11459#1923X"
Original answer:
If you need to order the input strings, not only by second character, STRING_AGG() is also an option:
DECLARE #a varchar(5) = '1923X'
DECLARE #b varchar(5) = '11459'
SELECT STRING_AGG(v.String, '#') WITHIN GROUP (ORDER BY v.String) AS Result
FROM (VALUES (#a), (#b)) v (String)
Output:
Result
11459#1923X
Update:
You changed the requirements (now the strings are stored in two columns), so you need a different statement:
SELECT
A,
B,
C = (
SELECT STRING_AGG(v.String, '#') WITHIN GROUP (ORDER BY v.String)
FROM (VALUES (A), (B)) v (String)
)
FROM (VALUES ('1923X', '11459')) t (a, b)

Suggestion for the use of to_number in Oracle

Below query is giving error.
select sob.set_of_books_id, orginfo.org_information1
from
gl_sets_of_books sob,
hr_organization_information orginfo
where
sob.set_of_books_id = to_number(orginfo.org_information1);
The reason is set_of_books_id is number column and org_information1 is varchar which is containing string and also numeric string. so it is type mismatch. we have to pick only those values which has numeric string in org_information1
to overcome this we used regexp_like which will pick only record which are numeric.
select sob.set_of_books_id, orginfo.org_information1
from hr_organization_information orginfo, gl_sets_of_books sob
where sob.set_of_books_id = to_number(orginfo.org_information1)
and REGEXP_LIKE(orginfo.org_information1, '^[[:digit:]]+$');
We just added this line and REGEXP_LIKE(orginfo.org_information1, '^[[:digit:]]+$'); in previous query and it is working properly.
My question is even in the last query, we are using where clause and joining same condition which was failing in first query .and where will definetly run before AND. so why it is not failing? will it fail for some records? or the query is proper?
is there any better way to use the second query.
if I am not using existing where condition then it is giving error. i dont know what is the issue with below query.
We can not use to_char on set_of_books_id because that column contains index.and we cant modify to create func based index.We have to use index in our query
where will definitely run before and
No!
The optimizer is free to rearrange the conditions in your query. There's no guarantee it processes these top-to-bottom. This can lead to surprising effects if the plan changes (e.g. because you added/removes indexes).
Assuming you're on 12.2 or higher, instead of a regex you can use the on conversion error clause to map all the non-numeric values to null:
with rws as (
select level x,
case mod ( level, 3 )
when 0 then chr ( level+64 )
else to_char ( level )
end y
from dual
connect by level <= 10
)
select y from rws
where x = to_number ( y default null on conversion error );
Y
1
2
4
5
7
8
10
Use TO_CHAR on the number column rather than TO_NUMBER on the string column:
select sob.set_of_books_id,
orginfo.org_information1
from hr_organization_information orginfo
INNER JOIN gl_sets_of_books sob
ON ( TO_CHAR(sob.set_of_books_id) = orginfo.org_information1 );
Or, from Oracle 12.2, you can use TO_NUMBER(value DEFAULT NULL ON CONVERSION ERROR):
select sob.set_of_books_id,
orginfo.org_information1
from hr_organization_information orginfo
INNER JOIN gl_sets_of_books sob
ON ( sob.set_of_books_id
= TO_NUMBER(orginfo.org_information1 DEFAULT NULL ON CONVERSION ERROR)
);

How to aggragate integers in postgresql?

I have a query that gives list of IDs:
ID
2
3
4
5
6
25
ID is integer.
I want to get that result like that in ARRAY of integers type:
ID
2,3,4,5,6,25
I wrote this query:
select string_agg(ID::text,',')
from A
where .....
I have to convert it to text otherwise it won't work. string_agg expect to get (text,text)
this works fine the thing is that this result should later be used in many places that expect ARRAY of integers.
I tried :
select ('{' || string_agg(ID::text,',') || '}')::integer[]
from A
WHERE ...
which gives: {2,3,4,5,6,25} in type int4 integer[]
but this isn't the correct type... I need the same type as ARRAY.
for example SELECT ARRAY[4,5] gives array integer[]
in simple words I want the result of my query to work with (for example):
select *
from b
where b.ID = ANY (FIRST QUERY RESULT) // aka: = ANY (ARRAY[2,3,4,5,6,25])
this is failing as ANY expect array and it doesn't work with regular integer[], i get an error:
ERROR: operator does not exist: integer = integer[]
note: the result of the query is part of a function and will be saved in a variable for later work. Please don't take it to places where you bypass the problem and offer a solution which won't give the ARRAY of Integers.
EDIT: why does
select *
from b
where b.ID = ANY (array [4,5])
is working. but
select *
from b
where b.ID = ANY(select array_agg(ID) from A where ..... )
doesn't work
select *
from b
where b.ID = ANY(select array_agg(4))
doesn't work either
the error is still:
ERROR: operator does not exist: integer = integer[]
Expression select array_agg(4) returns set of rows (actually set of rows with 1 row). Hence the query
select *
from b
where b.id = any (select array_agg(4)) -- ERROR
tries to compare an integer (b.id) to a value of a row (which has 1 column of type integer[]). It raises an error.
To fix it you should use a subquery which returns integers (not arrays of integers):
select *
from b
where b.id = any (select unnest(array_agg(4)))
Alternatively, you can place the column name of the result of select array_agg(4) as an argument of any, e.g.:
select *
from b
cross join (select array_agg(4)) agg(arr)
where b.id = any (arr)
or
with agg as (
select array_agg(4) as arr)
select *
from b
cross join agg
where b.id = any (arr)
More formally, the first two queries use ANY of the form:
expression operator ANY (subquery)
and the other two use
expression operator ANY (array expression)
like it is described in the documentation: 9.22.4. ANY/SOME
and 9.23.3. ANY/SOME (array).
How about this query? Does this give you the expected result?
SELECT *
FROM b b_out
WHERE EXISTS (SELECT 1
FROM b b_in
WHERE b_out.id = b_in.id
AND b_in.id IN (SELECT <<first query that returns 2,3,4,...>>))
What I've tried to do is to break down the logic of ANY into two separate logical checks in order to achieve the same result.
Hence, ANY would be equivalent with a combination of EXISTS at least one of the values IN your list of values returned by the first SELECT.

Oracle "Select Level from dual" does not work as expected with to_number result

Why does
select *
from (
SELECT LEVEL as VAL
FROM DUAL
CONNECT BY LEVEL <= 1000
ORDER BY LEVEL
) n
left outer join (select to_number(trim(alphanumeric_column)) as nr from my_table
where NOT regexp_like (trim(alphanumeric_column),'[^[:digit:]]')) d
on n.VAL = d.nr
where d.nr is null
and n.VAL >= 100
throw a ORA-01722 invalid number (reason is the last row, n.VAL), whereas the similar version with numeric columns im my_table works fine:
select *
from (
SELECT LEVEL as VAL
FROM DUAL
CONNECT BY LEVEL <= 1000
ORDER BY LEVEL
) n
left outer join (select numeric_column as nr from my_table) d
on n.VAL = d.nr
where d.nr is null
and n.VAL >= 100
given that numeric_column is of type number and alphanumeric_column of type nvarchar_2. Note that the upper example works fine without the numerical comparison (n.VAL >= 100).
Does anybody know?
This problem was driving me crazy. I narrowed the problem to a simpler query
SELECT *
FROM (SELECT TO_NUMBER(TRIM (alphanumeric_column)) AS nr
FROM my_table
WHERE NOT REGEXP_LIKE (TRIM (alphanumeric_column), '[^[:digit:]]')) d
WHERE d.nr > 1
With alphanumeric_colum values of ('100','200','XXXX'); Running the above statement gave the "invalid number" error. I then made a slight change to the query to use the CAST function instead of TO_NUMBER:
SELECT *
FROM (SELECT CAST (TRIM (alphanumeric_column) AS NUMBER) AS nr
FROM my_table
WHERE NOT REGEXP_LIKE (TRIM (alphanumeric_column), '[^[:digit:]]')) d
WHERE d.nr > 1
And this correctly returned - 100, 200. I would think that those functions would be similar in behavior. It almost appears as though oracle is trying to evaluate the d.nr > 1 constraint before the view is constructed, which makes no sense. If anyone can shed light on why this is happening, I would be grateful. See SQLFiddle example
UPDATE: I did some more digging, because I don't like not knowing why something just works. I ran EXPLAIN PLAN on both queries and got some interesting results.
For the query that failed, the predicate information looks like this:
1 - filter(TO_NUMBER(TRIM("ALPHANUMERIC_COLUMN"))>1 AND NOT
REGEXP_LIKE (TRIM("ALPHANUMERIC_COLUMN"),'[^[:digit:]]'))
You will notice that the TO_NUMBER function is called first in the AND condition, then
the regexp to exclude alpha values. I am thinking oracle maybe does a short-circuit evaluation with the AND condition, and since it is executing TO_NUMBER first, it fails.
However, when we use the CAST function, the evaluation order is swapped, and the
regexp exclusion is evaluated first. Since for the alpha values, it is false, then the
second part of the AND clause is not evaluated, and the query works.
1 - filter( NOT REGEXP_LIKE (TRIM("ALPHANUMERIC_COLUMN"),'[^[:digit:]
]') AND CAST(TRIM("ALPHANUMERIC_COLUMN") AS NUMBER)>1)
Oracle can be strange sometimes.
I believe when it comes to the Predicate (where) clause, Oracle can/will reorder the entire plan as it sees fit. So with regard to the predicate, it will short-circuit (as OldProgrammer noted) the evaluation however it wants, and you wont be able to guarantee the exact order it occurs.
In your current SQL, you are depending on the predicate to remove non numbers. One option would be to not use "WHERE NOT regexp_like ..." and instead use regexp_substr with coalesce. For example:
create table t_tab2
(
col varchar2(10)
);
create index t_tab2_idx on t_tab2(col);
insert into t_tab2
select level from dual
connect by level <= 100;
insert into t_tab2 values ('123ABC456');
commit;
-- select values > 95 (96->100 exclude non numbers)
select d.* from
(
select COALESCE(TO_NUMBER(REGEXP_SUBSTR(trim(col), '^\d+$')), 0) as nr
from t_tab2
) d
where d.nr > 95;
This should run without throwing invalid number error. Note that the coalesce will return the number 0 for any non numbers coming from the data, you may want to change that based on your needs and data.

Casting NULL type when updating multiple rows

I have a problem when I try to update many rows at the same time.
Here is the table and query I use (simplified for better reading):
table
CREATE TABLE foo
(
pkid integer,
x integer,
y integer
)
query
UPDATE foo SET x=t.x, y=t.y FROM
(VALUES (50, 50, 1),
(100, 120, 2))
AS t(x, y, pkid) WHERE foo.pkid=t.pkid
This query works perfectly, but when I try to execute a query where all x or y values are null, I get an error:
query with nulls
UPDATE foo SET x=t.x, y=t.y FROM
(VALUES (null, 20, 1),
(null, 50, 2))
AS t(x, y, pkid) WHERE foo.pkid=t.pkid
error
ERROR: column "x" is of type integer but expression is of type text
LINE 1: UPDATE foo SET x=t.x FROM
The only way to fix that is to change at least one of the values (null, 20, 1) to (null:int, 50, 2) but I can't do that, since I have a function which generates these "update multiple rows" query and it doesn't know anything about the column types.
What's the best solution here? Is there any better update query for multiple rows? Is there any function or syntax like AS t(x:gettype(foo.x), y:gettype(foo.y), pkid:gettype(foo.pkid))?
With a standalone VALUES expression PostgreSQL has no idea what the data types should be. With simple numeric literals the system is happy to assume matching types. But with other input (like NULL) you would need to cast explicitly - as you already have found out.
You can query pg_catalog (fast, but PostgreSQL-specific) or the information_schema (slow, but standard SQL) to find out and prepare your statement with appropriate types.
Or you can use one of these simple "tricks" (I saved the best for last):
0. Select row with LIMIT 0, append rows with UNION ALL VALUES
UPDATE foo f
SET x = t.x
, y = t.y
FROM (
(SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
UNION ALL
VALUES
(1, 20, NULL) -- no type casts here
, (2, 50, NULL)
) t -- column names and types are already defined
WHERE f.pkid = t.pkid;
The first sub-select of the subquery:
(SELECT x, y, pkid FROM foo LIMIT 0)
gets names and types for the columns, but LIMIT 0 prevents it from adding an actual row. Subsequent rows are coerced to the now well-defined row type - and checked immediately whether they match the type. Should be a subtle additional improvement over your original form.
While providing values for all columns of the table this short syntax can be used for the first row:
(TABLE foo LIMIT 0)
Major limitation: Postgres casts the input literals of the free-standing VALUES expression to a "best-effort" type immediately. When it later tries to cast to the given types of the first SELECT, it may already be too late for some types if there is no registered assignment cast between the assumed type and the target type. Examples: text -> timestamp or text -> json.
Pro:
Minimum overhead.
Readable, simple and fast.
You only need to know relevant column names of the table.
Con:
Type resolution can fail for some types.
1. Select row with LIMIT 0, append rows with UNION ALL SELECT
UPDATE foo f
SET x = t.x
, y = t.y
FROM (
(SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
UNION ALL SELECT 1, 20, NULL
UNION ALL SELECT 2, 50, NULL
) t -- column names and types are already defined
WHERE f.pkid = t.pkid;
Pro:
Like 0., but avoids failing type resolution.
Con:
UNION ALL SELECT is slower than VALUES expression for long lists of rows, as you found in your test.
Verbose syntax per row.
2. VALUES expression with per-column type
...
FROM (
VALUES
((SELECT pkid FROM foo LIMIT 0)
, (SELECT x FROM foo LIMIT 0)
, (SELECT y FROM foo LIMIT 0)) -- get type for each col individually
, (1, 20, NULL)
, (2, 50, NULL)
) t (pkid, x, y) -- columns names not defined yet, only types.
...
Contrary to 0. this avoids premature type resolution.
The first row in the VALUES expression is a row of NULL values which defines the type for all subsequent rows. This leading noise row is filtered by WHERE f.pkid = t.pkid later, so it never sees the light of day. For other purposes you can eliminate the added first row with OFFSET 1 in a subquery.
Pro:
Typically faster than 1. (or even 0.)
Short syntax for tables with many columns and only few are relevant.
You only need to know relevant column names of the table.
Con:
Verbose syntax for only few rows
Less readable (IMO).
3. VALUES expression with row type
UPDATE foo f
SET x = (t.r).x -- parenthesis needed to make syntax unambiguous
, y = (t.r).y
FROM (
VALUES
('(1,20,)'::foo) -- columns need to be in default order of table
,('(2,50,)') -- nothing after the last comma for NULL
) t (r) -- column name for row type
WHERE f.pkid = (t.r).pkid;
You obviously know the table name. If you also know the number of columns and their order you can work with this.
For every table in PostgreSQL a row type is registered automatically. If you match the number of columns in your expression, you can cast to the row type of the table ('(1,50,)'::foo) thereby assigning column types implicitly. Put nothing behind a comma to enter a NULL value. Add a comma for every irrelevant trailing column.
In the next step you can access individual columns with the demonstrated syntax. More about Field Selection in the manual.
Or you could add a row of NULL values and use uniform syntax for actual data:
...
VALUES
((NULL::foo)) -- row of NULL values
, ('(1,20,)') -- uniform ROW value syntax for all
, ('(2,50,)')
...
Pro:
Fastest (at least in my tests with few rows and columns).
Shortest syntax for few rows or tables where you need all columns.
You don't have to spell out columns of the table - all columns automatically have the matching name.
Con:
Not so well known syntax for field selection from record / row / composite type.
You need to know number and position of relevant columns in default order.
4. VALUES expression with decomposed row type
Like 3., but with decomposed rows in standard syntax:
UPDATE foo f
SET x = t.x
, y = t.y
FROM (
VALUES
(('(1,20,)'::foo).*) -- decomposed row of values
, (2, 50, NULL)
) t(pkid, x, y) -- arbitrary column names (I made them match)
WHERE f.pkid = t.pkid; -- eliminates 1st row with NULL values
Or, with a leading row of NULL values again:
...
VALUES
((NULL::foo).*) -- row of NULL values
, (1, 20, NULL) -- uniform syntax for all
, (2, 50, NULL)
...
Pros and cons like 3., but with more commonly known syntax.
And you need to spell out column names (if you need them).
5. VALUES expression with types fetched from row type
Like Unril commented, we can combine the virtues of 2. and 4. to provide only a subset of columns:
UPDATE foo f
SET ( x, y)
= (t.x, t.y) -- short notation, see below
FROM (
VALUES
((NULL::foo).pkid, (NULL::foo).x, (NULL::foo).y) -- subset of columns
, (1, 20, NULL)
, (2, 50, NULL)
) t(pkid, x, y) -- arbitrary column names (I made them match)
WHERE f.pkid = t.pkid;
Pros and cons like 4., but we can work with any subset of columns and don't have to know the full list.
Also displaying short syntax for the UPDATE itself that's convenient for cases with many columns. Related:
Bulk update of all columns
4. and 5. are my favorites.
db<>fiddle here - demonstrating all
If you have a script generating the query you could extract and cache the data type of each column an create the type cast accordingly. E.g:
SELECT column_name,data_type,udt_name
FROM information_schema.columns
WHERE table_name = 'foo';
From this udt_name you'll get the necessary cast as you explained in the last paragraph. Additionally you could do this:
UPDATE foo
SET x = t.x
FROM (VALUES(null::int4,756),(null::int4,6300))
AS t(x,pkid)
WHERE foo.pkid = t.pkid;
Your script will create a temporary table from foo. It will have the same data types as foo. Use an impossible condition so it is empty:
select x, y, pkid
into temp t
from foo
where pkid = -1
Make your script to insert into it:
insert into t (x, y, pkid) values
(null, 20, 1),
(null, 50, 2)
Now update from it:
update foo
set x=t.x, y=t.y
from t
where foo.pkid=t.pkid
Finally drop it:
drop table t