PostgreSQL unnest() with element number - sql

When I have a column with separated values, I can use the unnest() function:
myTable
id | elements
---+------------
1 |ab,cd,efg,hi
2 |jk,lm,no,pq
3 |rstuv,wxyz
select id, unnest(string_to_array(elements, ',')) AS elem
from myTable
id | elem
---+-----
1 | ab
1 | cd
1 | efg
1 | hi
2 | jk
...
How can I include element numbers? I.e.:
id | elem | nr
---+------+---
1 | ab | 1
1 | cd | 2
1 | efg | 3
1 | hi | 4
2 | jk | 1
...
I want the original position of each element in the source string. I've tried with window functions (row_number(), rank() etc.) but I always get 1. Maybe because they are in the same row of the source table?
I know it's a bad table design. It's not mine, I'm just trying to fix it.

Postgres 9.4 or later
Use WITH ORDINALITY for set-returning functions:
When a function in the FROM clause is suffixed by WITH ORDINALITY, a
bigint column is appended to the output which starts from 1 and
increments by 1 for each row of the function's output. This is most
useful in the case of set returning functions such as unnest().
In combination with the LATERAL feature in pg 9.3+, and according to this thread on pgsql-hackers, the above query can now be written as:
SELECT t.id, a.elem, a.nr
FROM tbl AS t
LEFT JOIN LATERAL unnest(string_to_array(t.elements, ','))
WITH ORDINALITY AS a(elem, nr) ON true;
LEFT JOIN ... ON true preserves all rows in the left table, even if the table expression to the right returns no rows. If that's of no concern you can use this otherwise equivalent, less verbose form with an implicit CROSS JOIN LATERAL:
SELECT t.id, a.elem, a.nr
FROM tbl t, unnest(string_to_array(t.elements, ',')) WITH ORDINALITY a(elem, nr);
Or simpler if based off an actual array (arr being an array column):
SELECT t.id, a.elem, a.nr
FROM tbl t, unnest(t.arr) WITH ORDINALITY a(elem, nr);
Or even, with minimal syntax:
SELECT id, a, ordinality
FROM tbl, unnest(arr) WITH ORDINALITY a;
a is automatically table and column alias. The default name of the added ordinality column is ordinality. But it's better (safer, cleaner) to add explicit column aliases and table-qualify columns.
Postgres 8.4 - 9.3
With row_number() OVER (PARTITION BY id ORDER BY elem) you get numbers according to the sort order, not the ordinal number of the original ordinal position in the string.
You can simply omit ORDER BY:
SELECT *, row_number() OVER (PARTITION by id) AS nr
FROM (SELECT id, regexp_split_to_table(elements, ',') AS elem FROM tbl) t;
While this normally works and I have never seen it fail in simple queries, PostgreSQL asserts nothing concerning the order of rows without ORDER BY. It happens to work due to an implementation detail.
To guarantee ordinal numbers of elements in the blank-separated string:
SELECT id, arr[nr] AS elem, nr
FROM (
SELECT *, generate_subscripts(arr, 1) AS nr
FROM (SELECT id, string_to_array(elements, ' ') AS arr FROM tbl) t
) sub;
Or simpler if based off an actual array:
SELECT id, arr[nr] AS elem, nr
FROM (SELECT *, generate_subscripts(arr, 1) AS nr FROM tbl) t;
Related answer on dba.SE:
How to preserve the original order of elements in an unnested array?
Postgres 8.1 - 8.4
None of these features are available, yet: RETURNS TABLE, generate_subscripts(), unnest(), array_length(). But this works:
CREATE FUNCTION f_unnest_ord(anyarray, OUT val anyelement, OUT ordinality integer)
RETURNS SETOF record
LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1
FROM generate_series(array_lower($1,1), array_upper($1,1)) i';
Note in particular, that the array index can differ from ordinal positions of elements. Consider this demo with an extended function:
CREATE FUNCTION f_unnest_ord_idx(anyarray, OUT val anyelement, OUT ordinality int, OUT idx int)
RETURNS SETOF record
LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1, i
FROM generate_series(array_lower($1,1), array_upper($1,1)) i';
SELECT id, arr, (rec).*
FROM (
SELECT *, f_unnest_ord_idx(arr) AS rec
FROM (
VALUES
(1, '{a,b,c}'::text[]) -- short for: '[1:3]={a,b,c}'
, (2, '[5:7]={a,b,c}')
, (3, '[-9:-7]={a,b,c}')
) t(id, arr)
) sub;
id | arr | val | ordinality | idx
----+-----------------+-----+------------+-----
1 | {a,b,c} | a | 1 | 1
1 | {a,b,c} | b | 2 | 2
1 | {a,b,c} | c | 3 | 3
2 | [5:7]={a,b,c} | a | 1 | 5
2 | [5:7]={a,b,c} | b | 2 | 6
2 | [5:7]={a,b,c} | c | 3 | 7
3 | [-9:-7]={a,b,c} | a | 1 | -9
3 | [-9:-7]={a,b,c} | b | 2 | -8
3 | [-9:-7]={a,b,c} | c | 3 | -7
Compare:
Normalize array subscripts so they start with 1

Try:
select v.*, row_number() over (partition by id order by elem) rn from
(select
id,
unnest(string_to_array(elements, ',')) AS elem
from myTable) v

Use Subscript Generating Functions.
http://www.postgresql.org/docs/current/static/functions-srf.html#FUNCTIONS-SRF-SUBSCRIPTS
For example:
SELECT
id
, elements[i] AS elem
, i AS nr
FROM
( SELECT
id
, elements
, generate_subscripts(elements, 1) AS i
FROM
( SELECT
id
, string_to_array(elements, ',') AS elements
FROM
myTable
) AS foo
) bar
;
More simply:
SELECT
id
, unnest(elements) AS elem
, generate_subscripts(elements, 1) AS nr
FROM
( SELECT
id
, string_to_array(elements, ',') AS elements
FROM
myTable
) AS foo
;

If the order of element is not important, you can
select
id, elem, row_number() over (partition by id) as nr
from (
select
id,
unnest(string_to_array(elements, ',')) AS elem
from myTable
) a

I think this is related, using a correlated subquery to assign arbitrary ranked / ordinal values to the final set. It's more of a practical applied use using PG array handling to De-Pivot a dataset (works w/ PG 9.4).
WITH _students AS ( /** CTE **/
SELECT * FROM
( SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT
UNION
SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT
UNION
SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT
UNION
SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT
) s ( fn, ln, id )
) /** end WITH **/
SELECT s.id
, ax.fanm
, ax.anm
, ax.val
, ax.num
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
(
( SELECT ARRAY[ fn, ln ]::text[] AS anm
/** CORRELATED SUBQUERY **/
FROM _students s2 WHERE s2.id = s.id
)
,( SELECT ARRAY[ 'first name', 'last name' ]::text[] AS fanm )
,( SELECT ARRAY[ '9','8','7'] AS val)
,( SELECT ARRAY[ 1,2,3,4,5 ] AS num)
) ax ( anm, fanm, val, num )
;
DE-PIVOTED RESULT SET:
+--+----------+-----+----+---+
|id|fanm |anm |val |num|
+--+----------+-----+----+---+
|2 |first name|john |9 |1 |
|2 |last name |doe |8 |2 |
|2 |NULL |NULL |7 |3 |
|2 |NULL |NULL |NULL|4 |
|2 |NULL |NULL |NULL|5 |
|1 |first name|jane |9 |1 |
|1 |last name |doe |8 |2 |
|1 |NULL |NULL |7 |3 |
|1 |NULL |NULL |NULL|4 |
|1 |NULL |NULL |NULL|5 |
|4 |first name|jodi |9 |1 |
|4 |last name |roe |8 |2 |
|4 |NULL |NULL |7 |3 |
|4 |NULL |NULL |NULL|4 |
|4 |NULL |NULL |NULL|5 |
|3 |first name|jerry|9 |1 |
|3 |last name |roe |8 |2 |
|3 |NULL |NULL |7 |3 |
|3 |NULL |NULL |NULL|4 |
|3 |NULL |NULL |NULL|5 |
+--+----------+-----+----+---+

unnest2() as exercise
Older versions before pg v8.4 need a user-defined unnest(). We can adapt this old function to return elements with an index:
CREATE FUNCTION unnest2(anyarray)
RETURNS setof record AS
$BODY$
SELECT $1[i], i
FROM generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$ LANGUAGE sql IMMUTABLE;

Related

How to disregard a row in a returned MS Access query when all fields bar one are distinct

I'm trying to create a query on Access 2010 which only produces a single row per patient. There are a really small number of patients (each patient represented by
a unique nhs_number in the table n) who are listed as having 2 practices in the table pp and so two rows are generated for them. Is there a way I can arbitrarily select one of the practices and ignore the other?
This is the query:
SELECT DISTINCT
n.nhs_number,
IIF(ch.care_home_date>#2/1/1900#, "TRUE", "FALSE") AS care_home,
pp.practice
FROM (nhs_no_tbl AS n
LEFT JOIN patient_practice_tbl AS pp ON n.nhs_number = pp.nhs_number)
LEFT JOIN patient_care_home_tbl AS ch ON n.nhs_number = ch.nhs_number;
The tables the query is using contains data along these lines:
nhs_no_tbl:
|nhs_number|
| -------- |
|1 |
|2 |
|3 |
|4 |
patient_practice_tbl:
|nhs_number|practice|
| -------- | ------ |
|1 |GP_A |
|2 |GP_A |
|3 |GP_B |
|4 |GP_A |
|4 |GP_B |
patient_care_home_tbl:
|nhs_number|care_home_date|
| -------- | ------------ |
|1 |1/5/2000 |
|1 |1/10/2010 |
|4 |26/10/2017 |
At the end, I'd like the query to return the following:
|nhs_number|Care_home|practice|
| -------- | ------- | ------ |
|1 |TRUE |GP_A |
|2 |FALSE | |
|3 |FALSE | |
|4 |TRUE |GP_A [or GP_B] |
I've update the query with CTE,
;WITH cte1 AS ---select all results
(
SELECT DISTINCT nnt.nhs_number,
CASE WHEN pcht.care_home_date IS NULL
THEN 'FALSE'
ELSE 'TRUE'
END AS CareHome,
ppt.practice,
rank()OVER(PARTITION BY nnt.nhs_number ORDER BY ppt.practice) AS R
FROM #nhs_no_tbl nnt
LEFT JOIN #patient_practice_tbl ppt ON nnt.nhs_number = ppt.nhs_number
LEFT JOIN #patient_care_home_tbl pcht ON nnt.nhs_number = pcht.nhs_number
),
CTE2 AS ---choose who may have multiple pracices
(
SELECT nhs_number
FROM CTE1
WHERE R = 2
),
CTE3 AS --- combine GP_A and GP_B
(
SELECT t.nhs_number,STRING_AGG(val,',') AS Practices
FROM
(
SELECT DISTINCT val = cte1.practice, CTE1.nhs_number
FROM #nhs_no_tbl nnt
INNER JOIN CTE2 ON nnt.nhs_number = CTE2.nhs_number
INNER JOIN cte1 ON nnt.nhs_number = CTE1.nhs_number
) t
--RIGHT JOIN #nhs_no_tbl nnt ON t.nhs_number = nnt.nhs_number
GROUP BY t.nhs_number
)
SELECT cte1.nhs_number,cte1.carehome,cte1.practice, cte3.Practices
FROM CTE1
LEFT JOIN cte3 ON cte1.nhs_number = CTE3.nhs_number
The result would be
then next you could store the result into a temp table and update temp table where practices is not null.

reset index in dense_rank or row_number after variable partitioning over changes

I'm using DB2 SQL. I have the following:
select * from mytable order by Var,Varseq
ID Var Varseq
-- --- ------
1 A 1
1 A 2
1 B 1
1 A 3
2 A 1
2 C 1
but would like to get:
ID Var Varseq NewSeq
-- --- ------ ------
1 A 1 1
1 A 2 2
1 B 1 1
1 A 3 1
2 A 1 1
2 C 1 1
However dense_rank produces the same as the original result. I hope you can see the difference in the desired output - in the 4th line when ID=1 returns to Var=A, I want the index reset to 1, instead of carrying on as 3. i.e. I would like the index to be reset every time Var changes for a given ID.
for ref here was my query:
SELECT *, DENSE_RANK() OVER (PARTITION BY ID, VAR ORDER BY VARSEQ) FROM MYTABLE
This is an example of a gaps-and-islands problem. However, SQL tables represent unordered sets. Without a column that specifies the overall ordering, your question does not make sense.
In this case, the difference of row numbers will do what you want. But you need an overall ordering column:
select t.*,
row_number() over (partition by id, var, seqnum - seqnum2 order by <ordering col>) as newseq
from (select t.*,
row_number() over (partition by id order by <ordering col>) as seqnum,
row_number() over (partition by id, var order by <ordering col>) as seqnum2
from t
) t
Not an answer yet, but just to have better formatting.
WITH TAB (ID, Var, Varseq) AS
(
VALUES
(1, 'A', 1)
, (1, 'A', 2)
, (1, 'A', 3)
, (1, 'B', 1)
, (2, 'A', 1)
, (2, 'C', 1)
)
SELECT *
FROM TAB
ORDER BY ID, <order keys>;
You specified Var, Varseq as <order keys> in the query above.
The result is:
|ID |VAR|VARSEQ |
|-----------|---|-----------|
|1 |A |1 |
|1 |A |2 |
|1 |A |3 |
|1 |B |1 |
|2 |A |1 |
|2 |C |1 |
But you need the following according to your question:
|ID |VAR|VARSEQ |
|-----------|---|-----------|
|1 |A |1 |
|1 |A |2 |
|1 |B |1 |
|1 |A |3 |
|2 |A |1 |
|2 |C |1 |
So, please, edit your question to specify such a <order keys> clause to get the result you need. And please, run your query getting such an order on your system first before posting here...

How can I count occasions of grouped values in a table?

I have the table in my postgres db below. I would like to know how many times the the values (name1, name2, name3) occur in the table where trial is 1.
In the case below the expected output:
name1, 4
name2, 3
name3, 2
+--------------+
| id|name|trial|
+--------------+
|1 |name1|1 |
|2 |name1|1 |
|3 |name1|1 |
|4 |name1|1 |
|5 |name2|1 |
|6 |name2|1 |
|7 |name2|1 |
|8 |name3|1 |
|9 |name3|1 |
What I tried so far:
SELECT count(C.NAME)
FROM FIRST AS C
WHERE NAME = (
SELECT CS.NAME
FROM FIRST AS CS
WHERE TRIAL = 1
GROUP BY CS.NAME
)
this query returns with 9, which is number of rows.
You're missing the group by clause. Also, the query can be simplified, try this:
SELECT count(1), Name
FROM FIRST
WHERE TRIAL = 1
GROUP BY Name

Max value from joined table

I have two tables:
Operations (op_id,super,name,last)
Orders (or_id,number)
Operations:
+--------------------------------+
|op_id| super| name | last|
+--------------------------------+
|1 1 OperationXX 1 |
|2 1 OperationXY 2 |
|3 1 OperationXC 4 |
|4 1 OperationXZ 3 |
|5 2 OperationXX 1 |
|6 3 OperationXY 2 |
|7 4 OperationXC 1 |
|8 4 OperationXZ 2 |
+--------------------------------+
Orders:
+--------------+
|or_id | number|
+--------------+
|1 2UY |
|2 23X |
|3 xx2 |
|4 121 |
+--------------+
I need query to get table:
+-------------------------------------+
|or_id |number |max(last)| name |
|1 2UY 4 OperationXC|
|2 23X 1 OperationXX|
|3 xx2 2 OperationXY|
|4 121 2 OperationXZ|
+-------------------------------------+
use corelared subquery and join
select o.*,a.last,a.name from
(
select super,name,last from Operations from operations t
where last = (select max(last) from operations t2 where t2.super=t.super)
) a join orders o on t1.super =o.or_id
you can use row_number as well
with cte as
(
select * from
(
select * , row_number() over(partition by super order by last desc) rn
from operations
) tt where rn=1
) select o.*,cte.last,cte.name from Orders o join cte on o.or_id=cte.super
SELECT Orders.or_id, Orders.number, Operations.name, Operations.last AS max
FROM Orders
INNER JOIN Operations on Operations.super = Orders.or_id
GROUP BY Orders.or_id, Orders.number, Operations.name;
I don't have a way of testing this right now, but I think this is it.
Also, you didn't specify the foreign key, so the join might be wrong.

SQL Insert Query For Multiple Max IDs

Table w:
|ID|Comment|SeqID|
|1 |bajg | 1 |
|1 |2423 | 2 |
|2 |ref | 1 |
|2 |comment| 2 |
|2 |juk | 3 |
|3 |efef | 1 |
|4 | hy | 1 |
|4 | 6u | 2 |
How do I insert a standard new comment for each ID for a new SeqID (SeqID increase by 1)
The Below query results in the highest SeqID:
Select *
From w
Where SEQID =
(select max(seqid)
from w)
Table w:
|2 |juk | 3 |
Expected Result
Table w:
|ID|Comment|SeqID|
|1 |sqc | 3 |
|2 |sqc | 4 |
|3 |sqc | 2 |
|4 |sqc | 3 |
Will I have to go through and insert all the values (new comment as sqc) I want into the table using the below, or is there a faster way?
INSERT INTO table_name
VALUES (value1,value2,value3,...);
Try this:
INSERT INTO mytable (ID, Comment, SeqID)
SELECT ID, 'sqc', MAX(SeqID) + 1
FROM mytable
GROUP BY ID
Demo here
You are probably better off just calculating the value when you query. Define an identity column on the table, say CommentId and run a query like:
select id, comment,
row_number() over (partition by comment order by CommentId) as SeqId
from t;
What is nice about this approach is that the ids are always sequential, you don't have no opportunities for duplicates, the table does not have to be locked to when inserting, and the sequential ids work even for updates and deletes.