How to backreference a calculated column value in another column during an INSERT query on Postgres? (query-runtime temporary variable assignment) - sql

In MySQL there's some helpful syntax for doing things like SELECT #calc:=3,#calc, but I can't find the way to solve this on PostgreSQL
The idea would be something like:
SELECT (SET) autogen := UUID_GENERATE_v4() AS id, :autogen AS duplicated_id;
returning a row with 2 columns with same value
EDIT: Not interested in conventional \set, I need to do this for hundreds of rows

You can use a subquery:
select id, id as duplicated_id
from (select UUID_GENERATE_v4() AS id
) x
Postgres does not confuse the select statement by allowing variable assignment. Even if it did, nothing guarantees the order of evaluation of expressions in a select, so you still would not be sure that it worked.

Related

How to reuse a computed value multiple times?

Basically I just want a simple way of finding the most recent date in a table, saving it as a variable, and reusing that variable in the same query.
Right now this is how I'm doing it:
with recent_date as (
select max(date)
from mytable
)
select *
from mytable
where date = (select * from recent_date)
(For this simple example, a variable is overkill, but in my real-world use-case I reuse the recent date multiple times in the same query.)
But that feels cumbersome. It would be a lot cleaner to save the recent date to a variable rather than a table and having to select from it.
In pseudo-code, something like this would be nice:
$recent_date = (select max(date) from mytable)
select *
from mytable
where date = $recent_date
Is there something like that in Postgres?
Better for the simple case
For the scope of a single query, CTEs are a good tool. In my hands the query would look like this:
WITH recent(date) AS (SELECT max(date) FROM mytable)
SELECT m.*
FROM recent r
JOIN mytable m USING (date)
Except that the actual example query would burn down to this in my hands:
SELECT *
FROM mytable
ORDER BY date DESC NULLS LAST
FETCH FIRST 1 ROWS WITH TIES;
NULLS LAST only if there can be NULL values. See:
Sort by column ASC, but NULL values first?
WITH TIES only if date isn't UNIQUE NOT NULL. See:
Get top row(s) with highest value, with ties
In combination with an index on mytable (date) (or more specific), this produces the best possible query plan. Look no further.
No, I need variables!
If you positively need variables scoped for the same command, transaction, session or more, there are various options.
The closest thing to "variables" in SQL in Postgres are "customized options". See:
User defined variables in PostgreSQL
You can only store text, any other type has to be cast (and cast back on retrieval).
To set and retrieve a value from within a query, use the Configuration Settings Functions set_config() and current_setting():
SELECT set_config('foo.recent', max(date)::text, false) FROM mytable;
SELECT *
FROM mytable
WHERE date = current_setting('foo.recent')::date;
Typically, there are more efficient ways.
If you need that "recent date" a lot, consider a simple function as "global variable", usable by all transactions in all sessions (but each new command sees its own current state):
CREATE FUNCTION f_recent_date()
RETURNS date
LANGUAGE sql STABLE PARALLEL SAFE AS
'SELECT max(date) FROM mytable';
STABLE is a valid volatility setting as the function returns the same result within the same query. Be sure to actually make it STABLE, so Postgres does not evaluate repeatedly. In Postgres 9.6 or later, also make it PARALLEL SAFE. Then your query becomes:
SELECT * FROM mytable WHERE date = f_recent_date();
More options:
Is there a way to define a named constant in a PostgreSQL query?
Passing user id to PostgreSQL triggers
Typically, if I need variables in Postgres, I use a PL/pgSQL code block in a function, a procedure, or a DO statement for ad-hoc use without the need to return rows:
DO
$do$
DECLARE
_recent_date date := (SELECT max(date) FROM mytable);
BEGIN
PERFORM * FROM mytable WHERE date = _recent_date;
-- more queries using _recent_date ...
END
$do$;
PL/pgSQL may be what you should be using to begin with. Further reading:
When to use stored procedure / user-defined function?
Keep in mind that in SQL you cannot directly declare a variable. Basically a CTE is creating variable (or a set of) and in SQL to use a variable you select it. However, if you want to avoid that structure you can just get the variable directl from a subset directly.
select *
from mytable
where date = (select max(date) from mytable);

How to subscript a Postgres column

I have a Postgres query:
SELECT main
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
Which returns a single column of values in the format (value_a, value_b) in each row. I want the outer query to format those values so that all the value_a's and value_b's are in their own separate columns.
Is there an easy way to do this?
Output screenshot:
http://example.com/path-to-ghosts.jpg
You can abuse row_to_json to do this, but it is probably best to avoid anonymous record types in the first place.
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
To give a concrete example (after running pgbench -i):
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (aid, bid)
END as main
FROM pgbench_accounts
LIMIT 100) inner_t;
But it only works in v10 and up.
This is more of an explanation than an actual answer. But it won't fit into a comment.
The thing is, SQL is a strictly typed language. Postgres demands to know the number and data types in the SELECT list at call time. The *-expansion in SELECT * FROM .. is based on registered types. Postgres knows the columns of a table because the structure is saved in the catalog tables.
The expression nested in your construct (col_a, col_b) is short for ROW(col_a, col_b) and a ROW constructor creates an anonymous record. The manual:
By default, the value created by a ROW expression is of an anonymous
record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.
Postgres does not know how to expand an anonymous record. *-expansion does not work.
You could cast like the manual says. But that's only an option if the type is stable, i.e. you always put in the same number of columns with the same data types. And that still would not preserve column names.
So, for the best solution, first define:
Obviously you want to preserve column vales.
Do you also want to preserve column names?
Do you also want to preserve column types?
And:
Is the number of columns in the expression always the same?
Are data types always the same?
The CASE condition is stable or based on other columns?
If the true aim of the game is to fit multiple values in a single CASE expression, you only care about values, create a text array instead:
SELECT main[1] AS col_a, main[2] AS col_b
FROM (
SELECT CASE WHEN true THEN ARRAY[col_a::text, col_b::text] END AS main
FROM table1
LIMIT 100
) inner_t;
You lose name and type. You can cast and add aliases if you know name & type.
Else you have to describe your use case more closely - in the question.
Try Below query
SELECT split_part(main,',',1) as Val1,split_part(main,',',2) as Val2
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t

Making a new column and setting default values equal to the COUNT of a certain condition in SQL?

I have 2 tables. (table1, table2)
Now, I would like to do something along the lines of:
ALTER TABLE table1
ADD counter INT DEFAULT (Select COUNT(table2.car_type)
FROM table2
WHERE (table1.car_type = table2.car_type));
That's the closest syntax I can come up with, but just by looking at it I know it's wrong. Please tell me how to do it in ONE SQL statement (if possible).
The goal table (with counter implemented) is something like this: (separated columns by -)
table1:
pid-car_type-counter:
1-Honda-2
2-Toyota-3
3-Suzuki-1
4-Ferrari-0
5-Porsche-1
table2:
pid-car_type:
1-Honda
2-Toyota
3-Porsche
4-Honda
5-Suzuki
6-Toyota
7-Toyota
I don't think you can do this, at least through 11g. FTFM:
Restriction on Default Column Values: A DEFAULT expression cannot
contain references to PL/SQL functions or to other columns, the
pseudocolumns CURRVAL, NEXTVAL, LEVEL, PRIOR, and ROWNUM, or date
constants that are not fully specified.
I'm fairly certain you can't use a SELECT statement in a DEFAULT clause, either. There might be a way with virtual columns. Another alternative is to create a view.

Write consistency with nested subquery in Oracle

I've read many of the gory details of write consistency and I understand how it works in the simple cases. What I'm not clear on is what this means for nested sub-queries.
Here's a concrete example:
A table with PK id, and other columns state, temp and date.
UPDATE table SET state = DECODE(state, 'rainy', 'snowy', 'sunny', 'frosty') WHERE id IN (
SELECT id FROM (
SELECT id,state,temp from table WHERE date > 50
) WHERE (state='rainy' OR state='sunny') AND temp < 0
)
The real thing was more convoluted (in the innermost query), but this captures the essence.
If we assume the state column is not nullable, can this update ever fail due to concurrent modification (i.e., the DECODE function doesn't find a match, a value of 'rainy' or 'sunny', and so tries to insert null into a non-nullable column)?
Oracle supports "statement level read and write consistency" (as all other serious DBMS)
This means that the statement as a whole will not see any changes to the database that occurred after the statement started.
As your UPDATE is one single statement there shouldn't be a case where the decode returns null.
Btw: the statement can be simplified, you don't need the outer SELECT in the sub-query:
UPDATE table SET state = DECODE(state, 'rainy', 'snowy', 'sunny', 'frosty')
WHERE id IN (
SELECT id
FROM table
WHERE date > 50
AND (state='rainy' OR state='sunny')
AND temp < 0
)
I don't see any reason to be concerned. The subquery explicitly retrieves only IDs of rows with state 'rainy' or 'sunny' and that's what outer DECODE is going to get. Thole thing is one statement, and is going to be executed within transaction boundaries.
Answering my own question: turns out there is a bug in Oracle which can cause this query to fail. Details confirmed by Tom Kyte, in the discussion starting here.

COUNT() Function in conjunction with NOT IN clause not working properly with varchar field (T-SQL)

I came across a weird situation when trying to count the number of rows that DO NOT have varchar values specified by a select statement. Ok, that sounds confusing even to me, so let me give you an example:
Let's say I have a field "MyField" in "SomeTable" and I want to count in how many rows MyField values do not belong to a domain defined by the values of "MyOtherField" in "SomeOtherTable".
In other words, suppose that I have MyOtherField = {1, 2, 3}, I wanna count in how many rows MyField value are not 1, 2 or 3. For that, I'd use the following query:
SELECT COUNT(*) FROM SomeTable
WHERE ([MyField] NOT IN (SELECT MyOtherField FROM SomeOtherTable))
And it works like a charm. Notice however that MyField and MyOtherField are int typed. If I try to run the exact same query, except for varchar typed fields, its returning value is 0 even though I know that there are wrong values, I put them there! And if I, however, try to count the opposite (how many rows ARE in the domain opposed to what I want that is how many rows are not) simply by supressing the "NOT" clause in the query above... Well, THAT works! ¬¬
Yeah, there must be tons of workarounds to this but I'd like to know why it doesn't work the way it should. Furthermore, I can't simply alter the whole query as most of it is built inside a C# code and basically the only part I have freedom to change that won't have an impact in any other part of the software is the select statement that corresponds to the domain (whatever comes in the NOT IN clause). I hope I made myself clear and someone out there could help me out.
Thanks in advance.
For NOT IN, it is always false if the subquery returns a NULL value. The accepted answer to this question elegantly describes why.
The NULLability of a column value is independent of the datatype used too: most likely your varchar columns has NULL values
Do deal with this, use NOT EXISTS. For non-null values, it works the same as NOT IN so is compatible
SELECT COUNT(*) FROM SomeTable S1
WHERE NOT EXISTS (SELECT * FROm SomeOtherTable S2 WHERE S1.[MyField] = S2.MyOtherField)
gbn has a more complete answer, but I can't be bothered to remember all that. Instead I have the religious habit of filtering nulls out of my IN clauses:
SELECT COUNT(*)
FROM SomeTable
WHERE [MyField] NOT IN (
SELECT MyOtherField FROM SomeOtherTable
WHERE MyOtherField is not null
)