calculate distance between different entries in one column - sql

I have this kind of column in my table:
Table A:
geom_etrs(geometry)
"0101000020E8640000FE2EAF0B3C981C414E499E34DFE65441"
"0101000020E864..."
"0101000020E875..."
"0101000020E867..."
How can I calculate the distances between each of the entries (they are already defined as POINT)?
I want to create a new column where the distances between 1 and 2, then between 2 and 3, then between 3 and 4 and so on, are displayed.

select st_distance(point, lead(point,1) over (partition by rn))
from ( select point, row_number() over (partition by id) as rn
from table_1)t;
gis=# \d users
user_id | bigint |
geog | geography |
select st_astext(geog), st_astext(lead(geog,1) over (partition by rn)) from ( select geog, row_number() over (partition by user_id) as rn from users limit 10)t;
st_astext | st_astext
-------------------------------------------+-------------------------------------------
POINT(-70.0777937636872 41.6670617084209) | POINT(-70.0783833464664 41.6675384387944)
POINT(-70.0783833464664 41.6675384387944) | POINT(-70.0793901822679 41.667476122803)
POINT(-70.0793901822679 41.667476122803) | POINT(-70.0787530494335 41.6671461707966)
POINT(-70.0787530494335 41.6671461707966) | POINT(-70.07908017161 41.6663672501228)
POINT(-70.07908017161 41.6663672501228) | POINT(-70.0795407352778 41.6669886861798)
POINT(-70.0795407352778 41.6669886861798) | POINT(-70.0798881265976 41.6663775240468)
POINT(-70.0798881265976 41.6663775240468) | POINT(-70.0781470955597 41.6667824284963)
POINT(-70.0781470955597 41.6667824284963) | POINT(-70.0790447962989 41.6675773546665)
POINT(-70.0790447962989 41.6675773546665) | POINT(-70.0778760883834 41.6675017901701)
POINT(-70.0778760883834 41.6675017901701) |
gis=# select st_distance(geog, lead(geog,1) over (partition by rn)) from ( select geog, row_number() over (partition by user_id) as rn from users limit 10)t;
st_distance
--------------
72.21147623
84.13511302
64.48606246
90.70040367
78.96272466
73.78817244
151.81026032
115.69092832
97.69189128
This should work for you

Using window function LEAD gives you the next value as new column next to the current value: demo:db<>fiddle (only with text type because the fiddle does not support geometry, but it's the same)
SELECT
point_column,
LEAD(point_column) OVER ()
FROM
table
Now you are able to calculate the distance with PostGIS' st_distance:
SELECT
st_distance(
point_column,
LEAD(point_column) OVER ()
)
FROM
table

Related

Concat a column's value with other column's lead() value in impala

I have a table like the below:
+---------------------+------------------------------------+---------------------+
| prompt | answer | step_timestamp |
+---------------------+------------------------------------+---------------------+
| hi Lary | | 2022-04-04 10:00:00 |
| how are you? | | 2022-04-04 10:02:00 |
| how is your pet? |I am fine | 2022-04-04 10:05:00 |
| what is your hobby? |my pet is good | 2022-04-04 10:15:00 |
| ok thanks |football | 2022-04-04 10:25:00 |
+---------------------+-------------------------------------+---------------------
The answer has to match with the prompt of the previous row.
Expected result :
hi Lary, how are you?I am fine. how is your pet?my pet is good. what is your hobby? football. ok thanks
For this I have done this
WITH SUPER AS(
SELECT call_id, group_concat(tall,'\t') as dialog_text,
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY tall,call_id
ORDER BY step_timestamp ASC) AS rn,call_id,tall
FROM
(SELECT call_id,step_timestamp, concat(prompt,':',lead(answer) over(PARTITION BY call_id,step_timestamp order by step_timestamp asc)) tall
FROM db.table
ORDER BY step_timestamp ASC
limit 100000000
)as inq
ORDER BY step_timestamp ASC
limit 100000000
) b
WHERE rn =1
GROUP BY call_id,call_ani
)select distinct call_id, dialog_text
from super;
But it does not work as expecting. For example some times I have something like this:
hi lary, how are you?I am fine. how is your pet?my pet is good. how is your pet?I am fine. what is your hobby? football. ok thanks
You probably know the reason already. group_concat() in impala doesnt maintain order by. Now even if you put limit 10000000, it may not put all rows into same node to ensure ordered concat.
Use hive collect_list().
I couldnt find relevance of your rownumber() so i removed it to keep the solution simple. Please test below code with your original data and then add rownumber if needed.
select
id call_id,
concat( concat_ws(',', min(g)) ) dialog_text
from
(
select
s.id,
--use collect list to cooncat all dialogues in order of timestamp.
collect_list(s.tall) over (partition by s.id order by s.step_timestamp desc rows between unbounded preceding and unbounded following) g
from
(
SELECT call_id id,step_timestamp,
concat(prompt,':',lead(answer) over(PARTITION BY call_id,step_timestamp order by step_timestamp asc)) tall
FROM db.table -- This it main data
) s
) gs
-- Need to group by 'id' since we have duplicate collect_list values
group by id

How to call value stored in last row and add into a new column? BigQuery - SQL

Pls, can you help me with this?
I've these columns in my table:
date(yyyy/mm/dd) | productcode_str | productname_str | daysales_int
and need to write a query that must output:
productcode_X_str | productname_X_str | isoweek_date | isoweek_of_year | weeksales_int | week_sales_last_week | week_difference_to_last_week |
I've been trying this so far:
SELECT productcode_str ,
productname_str ,
DATE_TRUNC(date, ISOWEEK) AS isoweek_date,
EXTRACT(ISOWEEK FROM date) AS isoweek_of_year,
SUM(daysales_int) AS weeksales_int,
LAG(SUM(daysales_int))
OVER (PARTITION BY (DATE_TRUNC(date, ISOWEEK)) ORDER BY date)
AS week_sales_last_week
FROM my_table
WHERE productcode_str = 'X'
GROUP BY 1, 2, 3, 4
ORDER BY 3
that returns perfectly:
productcode_X_str | productname_X_str | isoweek_date | isoweek_of_year | weeksales_int
But in LAG query I got this error: "Partition by expression references column date which is neither group nor aggregated"
So, is missing week_sales_last_week | week_difference_to_last_week
Does somebody knows how to query these two missing?
Done!
I did it by just:
I) Pulling out "PARTITION BY" clause
II) Insert some aggregation in date
So, the final query is:
SELECT productcode_str ,
productname_str ,
DATE_TRUNC(date, ISOWEEK) AS isoweek_date,
EXTRACT(ISOWEEK FROM date) AS isoweek_of_year,
SUM(daysales_int) AS weeksales_int,
LAG(SUM(daysales_int))
OVER (ORDER BY MAX(date))
AS week_sales_last_week
(SUM(daysales_int) - LAG(SUM(daysales_int))
OVER (ORDER BY MAX(date))) AS week_difference_to_last_week
FROM my_table
WHERE productcode_str = 'X'
GROUP BY 1, 2, 3, 4
ORDER BY 3

Oracle distinct on single column returning row

I have an api endpoint that accepts distinct arguments for filtering on specific columns. For this reason I'm trying to build a query that is easy to add arbitrary filters to the base query. For some reason if I use:
SELECT "MY_VIEW".*
FROM "MY_VIEW"
-- Distinct on ID filter
WHERE ID IN (SELECT Max(ID)
FROM "MY_VIEW"
GROUP BY ID)
-- Other arbitrary filters...
ORDER BY "MY_VIEW"."NAME" DESC
I get terrible performance so I started using this query:
SELECT * FROM "MY_VIEW"
-- Distinct on ID filter
LEFT JOIN(
SELECT DISTINCT
FIRST_VALUE("MY_VIEW"."ID")
OVER(PARTITION BY "MY_VIEW"."UNIQUE_ID") as DISTINCT_ID
FROM "MY_VIEW"
) d ON d.DISTINCT_ID = "MY_VIEW"."ID"
-- Other arbitrary filters...
ORDER BY "MY_VIEW"."NAME" DESC
)
However when I left join it discards the distinct filter.
Also I can't use rowid because it is a view.
The view is a versioned table.
Index Info
UNIQUENESS | STATUS | INDEX_TYPE | TEMPORARY | PARTITIONED | JOIN_INDEX | COLUMNS
NONUNIQUE | VALID | NORMAL | N | NO | NO | ID
UNIQUE | VALID | NORMAL | N | NO | NO | UNIQUE_ID
NONUNIQUE | VALID | DOMAIN | N | NO | NO | NAME
I don't have enough reputation to leave a "comment" so I will post this as an "answer." Your first example is:
SELECT "MY_VIEW".*
FROM "MY_VIEW"
-- Distinct on ID filter
WHERE ID IN (SELECT Max(ID)
FROM "MY_VIEW"
GROUP BY ID)
-- Other arbitrary filters...
ORDER BY "MY_VIEW"."NAME" DESC
But do you realize that the "GROUP BY ID" clause negates the effect of the MAX() function on ID? In other words, you will get all the rows and the MAX will be computed on each row's ID, returning . . . that row's ID. Perhaps try:
SELECT "MY_VIEW".*
FROM "MY_VIEW"
-- Distinct on ID filter
WHERE ID IN (SELECT Max(ID)
FROM "MY_VIEW")
-- Other arbitrary filters...
ORDER BY "MY_VIEW"."NAME" DESC

Aggregating multiple rows more than once

I've got a set of data which has an type column, and a created_at time column. I've already got a query which is pulling the relevant data from the database, and this is the data that is returned.
type | created_at | row_num
-----------------------------------------------------
"ordersPage" | "2015-07-21 11:32:40.568+12" | 1
"getQuote" | "2015-07-21 15:49:47.072+12" | 2
"completeBrief" | "2015-07-23 01:00:15.341+12" | 3
"sendBrief" | "2015-07-24 08:59:42.41+12" | 4
"sendQuote" | "2015-07-24 18:43:15.967+12" | 5
"acceptQuote" | "2015-08-03 04:40:20.573+12" | 6
The row number is returned from the standard row number function in postgres
ROW_NUMBER() OVER (ORDER BY created_at ASC) AS row_num
What I want to do is somehow aggregate this data so get a time distance between every event, so the output data might look something like this
type_1 | type_2 | time_distance
--------------------------------------------------------
"ordersPage" | "getQuote" | 123423.3423
"getQuote" | "completeBrief" | 123423.3423
"completeBrief" | "sendBrief" | 123423.3423
"sendBrief" | "sendQuote" | 123423.3423
"sendQuote" | "acceptQuote" | 123423.3423
The time distance would be a float in milliseconds, in other queries I've been using something like this to get time differences.
EXTRACT(EPOCH FROM (MAX(events.created_at) - MIN(events.created_at)))
But this time i need it for every pair of events in the sequential order of the row_num so I need the aggregate for (1,2), (2,3), (3,4)...
Any ideas if this is possible? Also doesn't have to be exact, I can deal with duplicates, and with type_1 and type_2 columns returning an existing row in a different order. I just need a way to at least get those values above.
What about a self join ? It would look like this :
SELECT
t1.type
, t2.type
, ABS(t1.created_at - t2.created_at) AS time_diff
FROM your_table t1
INNER JOIN your_table t2
ON t1.row_num = t2.row_num + 1
You can use the LAG window function to compare the current value with the previous:
with
t(type,created_at) as (
values
('ordersPage', '2015-07-21 11:32:40.568+12'::timestamptz),
('getQuote', '2015-07-21 15:49:47.072+12'),
('completeBrief', '2015-07-23 01:00:15.341+12'),
('sendBrief', '2015-07-24 08:59:42.41+12'),
('sendQuote', '2015-07-24 18:43:15.967+12'),
('acceptQuote', '2015-08-03 04:40:20.573+12'))
select *, EXTRACT(EPOCH FROM created_at - lag(created_at) over (order by created_at))
from t
order by created_at
select type_1,
type_2,
created_at_2-created_at_1 as time_distance
from
(select
type type_1,
lead(type,1) over (order by row_num) type_2,
created_at created_at_1,
lead(created_at,1) over (order by row_num) created_at_2
from table_name) temp
where type_2 is not null

Select first & last date in window

I'm trying to select first & last date in window based on month & year of date supplied.
Here is example data:
F.rates
| id | c_id | date | rate |
---------------------------------
| 1 | 1 | 01-01-1991 | 1 |
| 1 | 1 | 15-01-1991 | 0.5 |
| 1 | 1 | 30-01-1991 | 2 |
.................................
| 1 | 1 | 01-11-2014 | 1 |
| 1 | 1 | 15-11-2014 | 0.5 |
| 1 | 1 | 30-11-2014 | 2 |
Here is pgSQL SELECT I came up with:
SELECT c_id, first_value(date) OVER w, last_value(date) OVER w FROM F.rates
WINDOW w AS (PARTITION BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date), c_id
ORDER BY date ASC)
Which gives me a result pretty close to what I want:
| c_id | first_date | last_date |
----------------------------------
| 1 | 01-01-1991 | 15-01-1991 |
| 1 | 01-01-1991 | 30-01-1991 |
.................................
Should be:
| c_id | first_date | last_date |
----------------------------------
| 1 | 01-01-1991 | 30-01-1991 |
.................................
For some reasons last_value(date) returns every record in a window. Which giving me a thought that I'm misunderstanding how windows in SQL works. It's like SQL forming a new window for each row it iterates through, but not multiple windows for entire table based on YEAR and MONTH.
So could any one be kind and explain if I'm wrong and how do I achieve the result I want?
There is a reason why i'm not using MAX/MIN over GROUP BY clause. My next step would be to retrieve associated rates for dates I selected, like:
| c_id | first_date | last_date | first_rate | last_rate | avg rate |
-----------------------------------------------------------------------
| 1 | 01-01-1991 | 30-01-1991 | 1 | 2 | 1.1 |
.......................................................................
If you want your output to become grouped into a single (or just fewer) row(s), you should use simple aggregation (i.e. GROUP BY), if avg_rate is enough:
SELECT c_id, min(date), max(date), avg(rate)
FROM F.rates
GROUP BY c_id, date_trunc('month', date)
More about window functions in PostgreSQL's documentation:
But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.
...
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.
...
There are options to define the window frame in other ways ... See Section 4.2.8 for details.
EDIT:
If you want to collapse (min/max aggregation) your data and want to collect more columns than those what listed in GROUP BY, you have 2 choice:
The SQL way
Select min/max value(s) in a sub-query, then join their original rows back (but this way, you have to deal with the fact, that min/max-ed column(s) usually not unique):
SELECT c_id,
min first_date,
max last_date,
first.rate first_rate,
last.rate last_rate,
avg avg_rate
FROM (SELECT c_id, min(date), max(date), avg(rate)
FROM F.rates
GROUP BY c_id, date_trunc('month', date)) agg
JOIN F.rates first ON agg.c_id = first.c_id AND agg.min = first.date
JOIN F.rates last ON agg.c_id = last.c_id AND agg.max = last.date
PostgreSQL's DISTINCT ON
DISTINCT ON is typically meant for this task, but highly rely on ordering (only 1 extremum can be searched for this way at a time):
SELECT DISTINCT ON (c_id, date_trunc('month', date))
c_id,
date first_date,
rate first_rate
FROM F.rates
ORDER BY c_id, date
You can join this query with other aggregated sub-queries of F.rates, but this point (if you really need both minimum & maximum, and in your case even an average) the SQL compliant way is more suiting.
Windowing functions aren't appropriate for this. Use aggregate functions instead.
select
c_id, date_trunc('month', date)::date,
min(date) first_date, max(date) last_date
from rates
group by c_id, date_trunc('month', date)::date;
c_id | date_trunc | first_date | last_date
------+------------+------------+------------
1 | 2014-11-01 | 2014-11-01 | 2014-11-30
1 | 1991-01-01 | 1991-01-01 | 1991-01-30
create table rates (
id integer not null,
c_id integer not null,
date date not null,
rate numeric(2, 1),
primary key (id, c_id, date)
);
insert into rates values
(1, 1, '1991-01-01', 1),
(1, 1, '1991-01-15', 0.5),
(1, 1, '1991-01-30', 2),
(1, 1, '2014-11-01', 1),
(1, 1, '2014-11-15', 0.5),
(1, 1, '2014-11-30', 2);