Wrong order of elements after GROUP BY using ST_MakeLine - sql

I have a table (well, it's CTE) containing path, as array of node IDs, and table of nodes with their geometries. I am trying to SELECT paths with their start and end nodes, and geometries, like this:
SELECT *
FROM (
SELECT t.path_id, t.segment_num, t.start_node, t.end_node, ST_MakeLine(n.geom) AS geom
FROM (SELECT path_id, segment_num, nodes[1] AS start_node, nodes[array_upper(nodes,1)] AS end_node, unnest(nodes) AS node_id
FROM paths
) t
JOIN nodes n ON n.id = t.node_id
GROUP BY path_id, segment_num, start_node, end_node
) rs
This seems to be working just fine when I try it on individual path samples, but when I run this on large dataset, small number of resulting geometries are bad - clearly the ST_MakeLine received points in wrong order. I suspect parallel aggregation resulting in wrong order, but maybe I am missing something else here?
How can I ensure correct order of points into ST_MakeLine?
If I am correct about the parallel aggregation, postgres docs are saying that Scans of common table expressions (CTEs) are always parallel restricted, but does that mean I have to make CTE with unnested array and mark it AS MATERIALIZED so it does not get optimized back into query?

Thanks for reminding me of ST_MakeLine(geom ORDER BY something) possibility, ST_MakeLine is aggregate function after all. I dont have any explicit ordering column available (order is position in nodes array, but one node can be present multiple times). Fortunately, unnest can be used in FROM clause with WITH ORDINALITY and therefore create an ordering column for me. Working solution:
SELECT *
FROM (SELECT t.path_id, t.segment_num, t.start_node, t.end_node, ST_MakeLine(n.geom ORDER BY node_order) AS geom
FROM (SELECT path_id, segment_num, nodes[1] AS start_node, nodes[array_upper(nodes,1)] AS end_node, a.elem AS node_id, a.nr AS node_order
FROM paths, unnest(nodes) WITH ORDINALITY a(elem, nr)
) t
JOIN nodes n ON n.id = t.node_id
GROUP BY path_id, segment_num, start_node, end_node
) rs

In order for ST_MakeLine to create a LineString in the right order you must explicitly state it with an ORDER BY. The following examples show how the order of points make a huge difference in the output:
Without ordering
WITH j (id,geom) AS (
VALUES
(3,'SRID=4326;POINT(1 2)'::geometry),
(1,'SRID=4326;POINT(3 4)'::geometry),
(0,'SRID=4326;POINT(1 9)'::geometry),
(2,'SRID=4326;POINT(8 3)'::geometry)
)
SELECT ST_MakeLine(geom) FROM j;
Ordering by id:
WITH j (id,geom) AS (
VALUES
(3,'SRID=4326;POINT(1 2)'::geometry),
(1,'SRID=4326;POINT(3 4)'::geometry),
(0,'SRID=4326;POINT(1 9)'::geometry),
(2,'SRID=4326;POINT(8 3)'::geometry)
)
SELECT ST_MakeLine(geom ORDER BY id) FROM j;
Demo: db<>fiddle

Related

Postgresql - Map array aggregates into a single array in a particular order

I have a PostgreSQL table containing a column of 1 dimensional array data. I wish to perform an aggregate query on this column, obtaining min/max/mean for each element of the array as well as the group count, returning the result as a 1 dimensional array. The array lengths in the table may vary, but I can be certain that in any grouping I perform, all arrays will be of the same length.
In a simple form, say my arrays are of length 2 and have readings for x and y, I want to return the result as
{Min(x), Max(x), Mean(x), Min(y), Max(y), Mean(y), Count()}
I am able to get a result in the form {Min(x), Min(y), Max(x), Max(y), Mean(x), Mean(y) Count()} but I can't get from there to my desired result.
Here's an example showing where I am so far (this time with arrays of length 3, but without the mean aggregation as there isnt one for arrays built in to pgSql):
(SQLFiddle here)
CREATE TABLE my_test(some_key numeric, event_data bigint[]);
INSERT INTO my_test(some_key, event_data) VALUES
(1, {11,12,13}),
(1, {5,6,7}),
(1, {-11,-12,-13});
SELECT MIN(event_data) || MAX(event_data) || COUNT(event_data) FROM my_test GROUP BY some_key;
The above gives me
{11,12,13,-11,-12,-13,3}
However, I don't know how to transform a result like the above into what I want, which is:
{11,-11,12,-12,13,-13,3}
What function should I use to transform the above?
Note that the aggregation functions above don't exactly match with those I am using to get min, max - I'm using the aggs_for_vecs extension to give me min, max and mean.
I would recommend using array operations and aggregation:
select x.some_key,
array_agg(u.val order by x.n, u.nn)
from (select t.some_key, ed.n, min(val) as minval, max(val) as maxval
from my_test t cross join lateral
unnest(t.event_data) with ordinality as ed(val, n)
group by t.some_key, ed.n
) x cross join lateral
unnest(array[x.minval, x.maxval]) with ordinality u(val, nn)
group by x.some_key;
Personally, I would prefer an array with three elements and the min/max as a record:
select x.some_key, array_agg((x.minval, x.maxval) order by x.n)
from (select t.some_key, ed.n, min(val) as minval, max(val) as maxval
from my_test t cross join lateral
unnest(t.event_data) with ordinality as ed(val, n)
group by t.some_key, ed.n
) x
group by x.some_key;
Here is a db<>fiddle.

Using Multiple aggregate functions in the where clause

We have a select statement in production that takes quite a lot of time.
The current query uses row number - window function.
I am trying to rewrite the query and test the same. assuming its orc table fetching aggregate values instead of using row number may help to reduce the execution time, is my assumption
Is something like this possible. Let me know if i am missing anything.
Sorry i am trying to learn, so please bear with my mistakes, if any.
I tried to rewrite the query as mentioned below.
Original query
SELECT
Q.id,
Q.crt_ts,
Q.upd_ts,
Q.exp_ts,
Q.biz_effdt
(
SELECT u.id, u.crt_ts, u.upd_ts, u.exp_ts, u.biz_effdt, ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY u.crt_ts DESC) AS ROW_N
FROM ( SELECT cust_prd.id, cust_prd.crt_ts, cust_prd.upd_ts, cust_prd.exp_ts, cust_prd.biz_effdt FROM MSTR_CORE.cust_prd
WHERE biz_effdt IN ( SELECT MAX(cust_prd.biz_effdt) FROM MSTR_CORE.cust_prd )
) U
)Q WHERE Q.row_n = 1
My attempt:
SELECT cust_prd.id, cust_prd.crt_ts, cust_prd.upd_ts, cust_prd.exp_ts, cust_prd.biz_effdt FROM MSTR_CORE.cust_prd
WHERE biz_effdt IN ( SELECT MAX(cust_prd.biz_effdt) FROM MSTR_CORE.cust_prd )
having cust_prd.crt_ts = max (cust_prd.crt_ts)

Average interval between timestamps in an array

In a PostgreSQL 9.x database, I have a column which is an array of type timestamp. Each array has between 1..n timestamps.
I'm trying to extract the average interval between all elements in each array.
I understand using a window function on the source table might be the ideal way to tackle this but in this case I am trying to do it as an operation on the array.
I've looked at several other questions that are trying to calculate the moving average of another column etc or the avg (median date of a list of timestamps).
For example the average interval I'm looking for on an array with 3 elements like this:
'{"2012-10-09 17:04:05.710887"
,"2013-10-18 22:30:08.973749"
,"2014-10-22 22:18:18.885973"}'::timestamp[]
Would be:
-368d
Wondering if I need to unpack the array through a function?
One way of many possible: unnest, join, avg in a lateral subquery:
SELECT *
FROM tbl t
LEFT JOIN LATERAL (
SELECT avg(a2.ts - a1.ts) AS avg_intv
FROM unnest(t.arr) WITH ORDINALITY a1(ts, ord)
JOIN unnest(t.arr) WITH ORDINALITY a2(ts, ord) ON (a2.ord = a1.ord + 1)
) avg ON true;
db<>fiddle here
The [INNER] JOIN in the subquery produces exactly the set of combinations relevant for intervals between elements.
I get 371 days 14:37:06.587543, not '-368d', btw.
Related, with more explanation:
PostgreSQL unnest() with element number
You can also only unnest once and use the window functions lead() or lag(), but you were trying to avoid window functions. And you need to make sure of the original order of elements in any case ...
(There is no array function you could use directly to get what you need - in case you were hoping for that.)
Alternative with CTE
Might be appealing to still unnest only once (even while avoiding window functions):
SELECT *
FROM tbl t
LEFT JOIN LATERAL (
WITH a AS (SELECT * FROM unnest(t.arr) WITH ORDINALITY a1(ts, ord))
SELECT avg(a2.ts - a1.ts) AS avg_intv
FROM a a1
JOIN a a2 ON (a2.ord = a1.ord +1)
) avg ON true;
But I expect the added CTE overhead to cost more than unnesting twice. Mostly just demonstrating a WITH clause in a subquery.

ERROR: plan should not reference subplan's variable, how to solve it? postgreSQL

I am trying to do this select in postgresSQL and It gives me this error:
ERROR: plan should not reference subplan's variable
SQL state: XX000
I don't know how I can solve this, I tested and all in my table is correct....
select distinct concat(concat(ID,'_'), ID_DEV), EXTERNAL_URL, LAST_UPDATED,
from NOISE
where concat(concat(ID,'_'), ID_DEV) not in (select distinct concat(concat(ID,'_'), ID_DEV)
from NOISE
where upload_time < (select max(UP_TIME) from NOISE order by max(UP_TIME) desc fetch first row only) )
Here's what I would try first:
select distinct concat(concat(ID,'_'), ID_DEV),
EXTERNAL_URL, LAST_UPDATED,
from NOISE n
where (ID, ID_DEV) not in
(select (n1.ID, n1.ID_DEV)
from NOISE n1
where upload_time <
(select max(n2.UP_TIME)
from NOISE n2)
There is no point to the ordering by max since you aren't grouping and you know with the aggregate it will be just one row anyway. It's still a bit hairy of a query.....

How to compare the current row with next and previous row in PostgreSQL?

I want to know how to retrieve results in a SQL query doing some logic comparison with the next or previous rows. I'm using PostgreSQL.
Example
Supposing I have a table in my database with two attributes (ordered position and random numbers), I want to retrieve the odd numbers that are between even numbers. How can I do this?
The real usage
I want to find words that are between two another words which have the category NAME (and the word is not a name). The ordering is provided by sentence and position.
Edit
I want to know if the Window function of PostgreSQL are best solution for this kind of problem than doing queries. I heard about them, but never used.
This is my solution using WINDOW functions. I used the lag and lead functions. Both returns a value from a column from a row in offset from the current row. lag goes back and lead goes next in the offset.
SELECT tokcat.text
FROM (
SELECT text, category, chartype, lag(category,1) OVER w as previousCategory, lead(category,1) OVER w as nextCategory
FROM token t, textBlockHasToken tb
WHERE tb.tokenId = t.id
WINDOW w AS (
PARTITION BY textBlockId, sentence
ORDER BY textBlockId, sentence, position
)
) tokcat
WHERE 'NAME' = ANY(previousCategory)
AND 'NAME' = ANY(nextCategory)
AND 'NAME' <> ANY(category)
Simplified version:
SELECT text
FROM (
SELECT text
,category
,lag(category) OVER w as previous_cat
,lead(category) OVER w as next_cat
FROM token t
JOIN textblockhastoken tb ON tb.tokenid = t.id
WINDOW w AS (PARTITION BY textblockid, sentence ORDER BY position)
) tokcat
WHERE category <> 'NAME'
AND previous_cat = 'NAME'
AND next_cat = 'NAME';
Major points
= ANY() is not needed, the window function returns a single value
some redundant fields in the subquery
no need to order by columns, that you PARTITION BY - the ORDER BY applies within partitions
Don't use mixed case identifiers without quoting, it only leads to confusion. (Better yet: don't use mixed case identifiers in PostgreSQL ever)
You can find the best solution in this address:
http://blog.sqlauthority.com/2013/09/25/sql-server-how-to-access-the-previous-row-and-next-row-value-in-select-statement-part-4/
Query 1 for SQL Server 2012 and later version:
SELECT
LAG(p.FirstName) OVER(ORDER BY p.BusinessEntityID) PreviousValue,
p.FirstName,
LEAD(p.FirstName) OVER(ORDER BY p.BusinessEntityID) NextValue
FROM Person.Person p
GO
Query 2 for SQL Server 2005+ and later version:
WITH CTE AS(
SELECT rownum = ROW_NUMBER() OVER(ORDER BY p.BusinessEntityID),
p.FirstName FROM Person.Person p
)
SELECT
prev.FirstName PreviousValue,
CTE.FirstName,
nex.FirstName NextValue
FROM CTE
LEFT JOIN CTE prev ON prev.rownum = CTE.rownum - 1
LEFT JOIN CTE nex ON nex.rownum = CTE.rownum + 1
GO
This should work:
SELECT w1.word AS word_before, w.word, w2.word AS word_after
FROM word w
JOIN word w1 USING (sentence)
JOIN word w2 USING (sentence)
WHERE w.category <> 'name'
AND w1.pos = (w.pos - 1)
AND w1.category = 'name'
AND w2.pos = (w.pos + 1)
AND w2.category = 'name'
Use two self-joins
All words must be in the same sentence (?) and in order.
Word before and word after have to be of category 'name'. Word itself not 'name'
This assumes that category IS NOT NULL
To answer your additional question: no, a window function would not be particularly useful in this case, self-join is the magic word here.
Edit:
I stand corrected. Renato demonstrates a cool solution with the window functions lag() and lead().
Note the subtle differences:
the self joins operate on absolute values: if the row with pos -1 is missing, then the row with pos does not qualify.
Renatos version with lag() and lead() operates on the relative position of rows created by ORDER BY.
In many cases (like probably in the one at hand?) both versions lead to identical results. With gaps in the id space there will be different results.