Normalize array subscripts so they start with 1

Normalize array subscripts so they start with 1 - sql

PostgreSQL can work with array subscripts starting anywhere.
Consider this example that creates an array with 3 elements with subscripts from 5 to 7:
SELECT '[5:7]={1,2,3}'::int[];
Returns:
[5:7]={1,2,3}
We get the first element at subscript 5:
SELECT ('[5:7]={1,2,3}'::int[])[5];
I want to normalize 1-dimensional arrays to start with array subscript 1.
The best I could come up with:
SELECT ('[5:7]={1,2,3}'::int[])[array_lower('[5:7]={1,2,3}'::int[], 1):array_upper('[5:7]={1,2,3}'::int[], 1)]
The same, easier the read:
WITH cte(a) AS (SELECT '[5:7]={1,2,3}'::int[])
SELECT a[array_lower(a, 1):array_upper(a, 1)]
FROM cte;
Do you know a simpler / faster or at least more elegant way?
Benchmark with old solutions on Postgres 9.5
db<>fiddle here
Benchmark including new solution on Postgres 14
db<>fiddle here

Eventually, something more elegant popped up with Postgres 9.6. The manual:
It is possible to omit the lower-bound and/or
upper-bound of a slice specifier; the missing bound is replaced by the lower or upper limit of the array's subscripts. For example:
So it's simple now:
SELECT my_arr[:];
With my example array literal you need enclosing parentheses to make the syntax unambiguous:
SELECT ('[5:7]={1,2,3}'::int[])[:];
About the same performance as Daniel's solution with hard-coded max array subscripts - which is still the way to go with Postgres 9.5 or earlier.

There is a simpler method that is ugly, but I believe technically correct: extract the largest possible slice out of the array, as opposed to the exact slice with computed bounds.
It avoids the two function calls.
Example:
select ('[5:7]={1,2,3}'::int[])[-2147483648:2147483647];
results in:
int4
---------
{1,2,3}

Not sure if this is already covered, but:
SELECT array_agg(v) FROM unnest('[5:7]={1,2,3}'::int[]) AS a(v);
To test performance I had to add id column on the test table. Slow.

Related

How to iterate over an array in a Postgresql query?

I have constructed a naive query that works for my application, which returns multiple values from a table that contains an array of reals.
SELECT
"timestamp" AS "time",
type as metric,
id,
values[80],
values[81],
values[82],
values[83],
values[84],
values[85],
values[86],
values[87],
values[88],
values[89],
values[91],
values[92],
values[93],
values[94],
values[95],
values[96],
values[97],
values[98],
values[99],
values[100],
values[101]
FROM test_table
WHERE
"timestamp" BETWEEN '2021-04-03T15:05:56.928Z' AND '2021-04-03T15:10:56.928Z'
ORDER BY 1
However, for obvious reasons, I want to avoid long lists of values[n] in my query, especially as in fact I need to get up to 1000 values.
I know that it is possible to iterate over an array but everything I try to replace the list of array subscripts along the lines of
FOR i IN ARRAY values
LOOP
values [i]
END LOOP;
errors out.
As an obvious SQL noob, can someone suggest how this might query might be done more elegantly?

Clickhouse, column values to array

I want to make a query then turn the values for each of it's columns into arrays, I've tried finding a way to do this, but until now it has alluded me.
The query is a simple select:
SELECT a,b,c FROM X
Instead of the usual result of say (in the default format):
val_a_1, val_b_1, val_c_1
----------------
val_a_2, val_b_2, val_c_2
-----------------
val_a_3, val_b_3, val_c_3
I want to get an array for each columns, namely:
[val_a_1, val_a_2, val_a_3], [val_b_1, val_b_2, val_b_3] , [val_c_1, val_c_2, val_c_3]
Is this at all possible ?

Nevermind, this can easily be done with groupArray as:
SELECT groupArray(a), groupArray(b), groupArray(c) FROM X
It completely slipped my mind and google didn't help...
leaving this here in case there's a better option or anyone stumbles upon it when searching.

BigQuery Standard SQL: how to return the first value of array?

Small working example
SELECT SPLIT("hello::hej::hallo::hoi", "::")
returns an array [hello, hej, hallo, hoi] where I want to select the first element i.e. hello. BG Standard provides no FIRST, instead FIRST_VALUE(..) OVER() which I cannot get working for this example above, so
How can I select the first value of array with BigQuery Standard SQL?

I think the documentation in BigQuery is pretty good. You can read about arrays here.
You can use either OFFSET() or ORDINAL(). The method would be:
select array[offset(0)]
or
select array[ordinal(1)]

Check if value exists in Postgres array

Using Postgres 9.0, I need a way to test if a value exists in a given array. So far I came up with something like this:
select '{1,2,3}'::int[] #> (ARRAY[]::int[] || value_variable::int)
But I keep thinking there should be a simpler way to this, I just can't see it. This seems better:
select '{1,2,3}'::int[] #> ARRAY[value_variable::int]
I believe it will suffice. But if you have other ways to do it, please share!

Simpler with the ANY construct:
SELECT value_variable = ANY ('{1,2,3}'::int[])
The right operand of ANY (between parentheses) can either be a set (result of a subquery, for instance) or an array. There are several ways to use it:
SQLAlchemy: how to filter on PgArray column types?
IN vs ANY operator in PostgreSQL
Important difference: Array operators (<#, #>, && et al.) expect array types as operands and support GIN or GiST indices in the standard distribution of PostgreSQL, while the ANY construct expects an element type as left operand and can be supported with a plain B-tree index (with the indexed expression to the left of the operator, not the other way round like it seems to be in your example). Example:
Index for finding an element in a JSON array
None of this works for NULL elements. To test for NULL:
Check if NULL exists in Postgres array

Watch out for the trap I got into: When checking if certain value is not present in an array, you shouldn't do:
SELECT value_variable != ANY('{1,2,3}'::int[])
but use
SELECT value_variable != ALL('{1,2,3}'::int[])
instead.

but if you have other ways to do it please share.
You can compare two arrays. If any of the values in the left array overlap the values in the right array, then it returns true. It's kind of hackish, but it works.
SELECT '{1}' && '{1,2,3}'::int[]; -- true
SELECT '{1,4}' && '{1,2,3}'::int[]; -- true
SELECT '{4}' && '{1,2,3}'::int[]; -- false
In the first and second query, value 1 is in the right array
Notice that the second query is true, even though the value 4 is not contained in the right array
For the third query, no values in the left array (i.e., 4) are in the right array, so it returns false

unnest can be used as well.
It expands array to a set of rows and then simply checking a value exists or not is as simple as using IN or NOT IN.
e.g.
id => uuid
exception_list_ids => uuid[]
select * from table where id NOT IN (select unnest(exception_list_ids) from table2)

Hi that one works fine for me, maybe useful for someone
select * from your_table where array_column ::text ilike ANY (ARRAY['%text_to_search%'::text]);

"Any" works well. Just make sure that the any keyword is on the right side of the equal to sign i.e. is present after the equal to sign.
Below statement will throw error: ERROR: syntax error at or near "any"
select 1 where any('{hello}'::text[]) = 'hello';
Whereas below example works fine
select 1 where 'hello' = any('{hello}'::text[]);

When looking for the existence of a element in an array, proper casting is required to pass the SQL parser of postgres. Here is one example query using array contains operator in the join clause:
For simplicity I only list the relevant part:
table1 other_name text[]; -- is an array of text
The join part of SQL shown
from table1 t1 join table2 t2 on t1.other_name::text[] #> ARRAY[t2.panel::text]
The following also works
on t2.panel = ANY(t1.other_name)
I am just guessing that the extra casting is required because the parse does not have to fetch the table definition to figure the exact type of the column. Others please comment on this.

PostgreSQL ORDER BY issue - natural sort

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?

Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14

One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.

This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.

you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"

I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).

I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)

I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort

The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Normalize array subscripts so they start with 1 - sql

Not sure if this is already covered, but: SELECT array_agg(v) FROM unnest('[5:7]={1,2,3}'::int[]) AS a(v); To test performance I had to add id column on the test table. Slow.

Related

How to iterate over an array in a Postgresql query?

Clickhouse, column values to array

BigQuery Standard SQL: how to return the first value of array?

Check if value exists in Postgres array

PostgreSQL ORDER BY issue - natural sort

Categories

Resources