How to subscript a Postgres column - sql

I have a Postgres query:
SELECT main
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
Which returns a single column of values in the format (value_a, value_b) in each row. I want the outer query to format those values so that all the value_a's and value_b's are in their own separate columns.
Is there an easy way to do this?
Output screenshot:
http://example.com/path-to-ghosts.jpg

You can abuse row_to_json to do this, but it is probably best to avoid anonymous record types in the first place.
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t
To give a concrete example (after running pgbench -i):
SELECT row_to_json(main)->>'f1', row_to_json(main)->>'f2'
FROM (
SELECT
CASE WHEN 1=1 THEN (aid, bid)
END as main
FROM pgbench_accounts
LIMIT 100) inner_t;
But it only works in v10 and up.

This is more of an explanation than an actual answer. But it won't fit into a comment.
The thing is, SQL is a strictly typed language. Postgres demands to know the number and data types in the SELECT list at call time. The *-expansion in SELECT * FROM .. is based on registered types. Postgres knows the columns of a table because the structure is saved in the catalog tables.
The expression nested in your construct (col_a, col_b) is short for ROW(col_a, col_b) and a ROW constructor creates an anonymous record. The manual:
By default, the value created by a ROW expression is of an anonymous
record type. If necessary, it can be cast to a named composite type —
either the row type of a table, or a composite type created with
CREATE TYPE AS.
Postgres does not know how to expand an anonymous record. *-expansion does not work.
You could cast like the manual says. But that's only an option if the type is stable, i.e. you always put in the same number of columns with the same data types. And that still would not preserve column names.
So, for the best solution, first define:
Obviously you want to preserve column vales.
Do you also want to preserve column names?
Do you also want to preserve column types?
And:
Is the number of columns in the expression always the same?
Are data types always the same?
The CASE condition is stable or based on other columns?
If the true aim of the game is to fit multiple values in a single CASE expression, you only care about values, create a text array instead:
SELECT main[1] AS col_a, main[2] AS col_b
FROM (
SELECT CASE WHEN true THEN ARRAY[col_a::text, col_b::text] END AS main
FROM table1
LIMIT 100
) inner_t;
You lose name and type. You can cast and add aliases if you know name & type.
Else you have to describe your use case more closely - in the question.

Try Below query
SELECT split_part(main,',',1) as Val1,split_part(main,',',2) as Val2
FROM (
SELECT
CASE WHEN 1=1 THEN (col_a, col_b)
END as main
FROM "table1"
LIMIT 100) inner_t

Related

How to know which column has changed on UPDATE?

In a statement like this:
update tab1 set (col1,col2)=(val1,val2) returning "?"
I send whole row for update on new values, RETURNING * gives back the whole row, but is there a way to check which exactly column has changed when others remained the same?
I understand that UPDATE rewrites the values, but maybe there is some built-in function for such comparison?
Basically, you need the pre-UPDATE values of updated rows to compare. That's kind of hard as RETURNING only returns post-UPDATE state. But can be worked around. See:
Return pre-UPDATE column values using SQL only
So this does the basic trick:
WITH input(col1, col2) AS (
SELECT 1, text 'post_up' -- "whole row"
)
, pre_upd AS (
UPDATE tab1 x
SET (col1, col2) = (i.col1, i.col2)
FROM input i
JOIN tab1 y USING (col1)
WHERE x.col1 = y.col1
RETURNING y.*
)
TABLE pre_upd
UNION ALL
TABLE input;
db<>fiddle here
This is assuming that col1 in your example is the PRIMARY KEY. We need some way to identify rows unambiguously.
Note that this is not safe against race conditions between concurrent writes. You need to do more to be safe. See related answer above.
The explicit cast to text I added in the CTE above is redundant as text is the default type for string literals anyway. (Like integer is the default for simple numeric literals.) For other data types, explicit casting may be necessary. See:
Casting NULL type when updating multiple rows
Also be aware that all updates write a new row version, even if nothing changes at all. Typically, you'd want to suppress such costly empty updates with appropriate WHERE clauses. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
While "passing whole rows", you'll have to check on all columns that might change, to achieve that.

How to backreference a calculated column value in another column during an INSERT query on Postgres? (query-runtime temporary variable assignment)

In MySQL there's some helpful syntax for doing things like SELECT #calc:=3,#calc, but I can't find the way to solve this on PostgreSQL
The idea would be something like:
SELECT (SET) autogen := UUID_GENERATE_v4() AS id, :autogen AS duplicated_id;
returning a row with 2 columns with same value
EDIT: Not interested in conventional \set, I need to do this for hundreds of rows
You can use a subquery:
select id, id as duplicated_id
from (select UUID_GENERATE_v4() AS id
) x
Postgres does not confuse the select statement by allowing variable assignment. Even if it did, nothing guarantees the order of evaluation of expressions in a select, so you still would not be sure that it worked.

How to get unique values from each column based on a condition?

I have been trying to find an optimal solution to select unique values from each column. My problem is I don't know column names in advance since different table has different number of columns. So first, I have to find column names and I could use below query to do it:
select column_name from information_schema.columns
where table_name='m0301010000_ds' and column_name like 'c%'
Sample output for column names:
c1, c2a, c2b, c2c, c2d, c2e, c2f, c2g, c2h, c2i, c2j, c2k, ...
Then I would use returned column names to get unique/distinct value in each column and not just distinct row.
I know a simplest and lousy way is to write select distict column_name from table where column_name = 'something' for every single column (around 20-50 times) and its very time consuming too. Since I can't use more than one distinct per column_name, I am stuck with this old school solution.
I am sure there would be a faster and elegant way to achieve this, and I just couldn't figure how. I will really appreciate any help on this.
You can't just return rows, since distinct values don't go together any more.
You could return arrays, which can be had simpler than you may have expected:
SELECT array_agg(DISTINCT c1) AS c1_arr
,array_agg(DISTINCT c2a) AS c2a_arr
,array_agg(DISTINCT c2b) AS c2ba_arr
, ...
FROM m0301010000_ds;
This returns distinct values per column. One array (possibly big) for each column. All connections between values in columns (what used to be in the same row) are lost in the output.
Build SQL automatically
CREATE OR REPLACE FUNCTION f_build_sql_for_dist_vals(_tbl regclass)
RETURNS text AS
$func$
SELECT 'SELECT ' || string_agg(format('array_agg(DISTINCT %1$I) AS %1$I_arr'
, attname)
, E'\n ,' ORDER BY attnum)
|| E'\nFROM ' || _tbl
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
$func$ LANGUAGE sql;
Call:
SELECT f_build_sql_for_dist_vals('public.m0301010000_ds');
Returns an SQL string as displayed above.
I use the system catalog pg_attribute instead of the information schema. And the object identifier type regclass for the table name. More explanation in this related answer:
PLpgSQL function to find columns with only NULL values in a given table
If you need this in "real time", you won't be able to archive it using a SQL that needs to do a full table scan to archive it.
I would advise you to create a separated table containing the distinct values for each column (initialized with SQL from #Erwin Brandstetter ;) and maintain it using a trigger on the original table.
Your new table will have one column per field. # of row will be equals to the max number of distinct values for one field.
For on insert: for each field to maintain check if that value is already there or not. If not, add it.
For on update: for each field to maintain that has old value != from new value, check if the new value is already there or not. If not, add it. Regarding the old value, check if any other row has that value, and if not, remove it from the list (set field to null).
For delete : for each field to maintain, check if any other row has that value, and if not, remove it from the list (set value to null).
This way the load mainly moved to the trigger, and the SQL on the value list table will super fast.
P.S.: Make sure to pass all you SQL from trigger to explain plan to make sure they use best index and execution plan as possible. For update/deletion, just check if old value exists (limit 1).

SQL: What does NULL as ColumnName imply

I understand that AS is used to create an alias. Therefore, it makes sense to have one long name aliased as a shorter one. However, I am seeing a SQL query NULL as ColumnName
What does this imply?
SELECT *, NULL as aColumn
Aliasing can be used in a number of ways, not just to shorten a long column name.
In this case, your example means you're returning a column that always contains NULL, and it's alias/column name is aColumn.
Aliasing can also be used when you're using computed values, such as Column1 + Column2 AS Column3.
When unioning or joining datasets using a 'Null AS [ColumnA] is a quick way to make sure create a complete dataset that can then be updated later and a new column does not need to be created in any of the source tables.
In the statement result we have a column that has all NULL values. We can refer to that column using alias.
In your case the query selects all records from table, and each result record has additional column containing only NULL values. If we want to refer to this result set and to additional column in other place in the future, we should use alias.
It means that "aColumn" has only Null values. This column could be updated with actual values later but it's an empty one when selected.
---I'm not sure if you know about SSIS, but this mechanism is useful with SSIS to add variable value to the "empty" column.
When using SELECT you can pass a value to the column directly.
So something like :
SELECT ID, Name, 'None' AS Hobbies, 0 AS NumberOfPets, NULL AS Picture, '' AS Adress
Is valid.
It can be used to format nicely a query output when using UNION/UNION ALL.
Query result can have a new column that has all NULL values. In SQL Server we can do it like this
SELECT *, CAST(NULL AS <data-type>) AS as aColumn
e.g.
SELECT *, CAST(NULL AS BIGINT) AS as aColumn
How about without using the the as
SELECT ID
, Name
, 'None' AS Hobbies
, 0 AS NumberOfPets
, NULL Picture
Usually adding NULL as [Column] name at the end of a select all is used when inserting into another table a calculated column based on the table you have just selected.
UPDATE #TempTable SET aColumn = Column1 + Column2 WHERE ...
Then exporting or saving the results to another table.

Can SQL determine which values from a set of possible column values do not exist?

I have a unique column. I also have a known set of elements that are possible values for the column. I need to know which of the possible values are not already in the table, and as such, are suitable for insertion.
Is this possible with SQL or is post processing required?
Currently, I am using the "in" operator to select all rows where the column value equals an element in my set. Then I remove all matched elements from my set via post processing.
Stick the allowed values in a temporary table allowed, then use a subquery using NOT IN:
SELECT *
FROM allowed
WHERE allowed.val NOT IN (
SELECT maintable.val
)
Some DBs will allow you to build up a table "in-place", instead of having to create a separate table. E.g. in PostgreSQL (any version):
SELECT *
FROM (
SELECT 'foo'
UNION ALL SELECT 'bar'
UNION ALL SELECT 'baz' -- etc.
) inplace_allowed
WHERE inplace_allowed.val NOT IN (
SELECT maintable.val
)
More modern versions of PostgreSQL (and perhaps other DBs) will let you use the slightly nicer VALUES syntax to do the same thing.
To do this entirely in SQL you will need to create a separate table with one column. Each row holds one value from the known set of elements. Assuming the table is called ElementList and the other table is called Existing:
SELECT * FROM ElementList WHERE Element NOT IN
(SELECT DISTINCT Element FROM Existing)
Depending on what database engine you're using you may be able to use a temporary table to create and hold the list without saving it permanently in the database. However, storing the list of allowed elements is valuable for constraining the Element column in the Existing table (and for presenting the user with allowed Elements in the user interface).