PostgreSQL: show trips within a bounding box - sql

I have a trips table containing user's trip information, like so:
select * from trips limit 10;
trip_id | daily_user_id | session_ids | seconds_start | lat_start | lon_start | seconds_end | lat_end | lon_end | distance
---------+---------------+-------------+---------------+------------+------------+-------------+------------+------------+------------------
594221 | 16772 | {170487} | 1561324555 | 41.1175475 | -8.6298934 | 1561325119 | 41.1554091 | -8.6283493 | 5875.39697884959
563097 | 7682 | {128618} | 1495295471 | 41.1782829 | -8.5950303 | 1495299137 | 41.1783908 | -8.5948965 | 5364.81067787512
596303 | 17264 | {172851} | 1578011699 | 41.5195598 | -8.6393526 | 1578012513 | 41.4614024 | -8.717709 | 11187.7956426909
595648 | 17124 | {172119} | 1575620857 | 41.1553116 | -8.6439528 | 1575621885 | 41.1621821 | -8.6383042 | 1774.83365424607
566061 | 8720 | {133624} | 1509005051 | 41.1241975 | -8.5958988 | 1509006310 | 41.1424158 | -8.6101461 | 3066.40306678979
566753 | 8947 | {134662} | 1511127813 | 41.1887996 | -8.5844238 | 1511129839 | 41.2107519 | -8.5511712 | 5264.64026582458
561179 | 7198 | {125861} | 1493311197 | 41.1776935 | -8.5947254 | 1493311859 | 41.1773815 | -8.5947254 | 771.437257541019
541328 | 2119 | {46950} | 1461103381 | 41.1779 | -8.5949738 | 1461103613 | 41.1779129 | -8.5950202 | 177.610819150637
535519 | 908 | {6016} | 1460140650 | 41.1644658 | -8.6422775 | 1460141201 | 41.1642646 | -8.6423309 | 1484.61552373019
548460 | 3525 | {102026} | 1462289206 | 41.177689 | -8.594679 | 1462289843 | 41.1734476 | -8.5916326 | 1108.05119077308
(10 rows)
The task is to filter trips that start and end within the bounding box defined by upper left: 41.24895, -8.68494 and lower right: 41.11591, -8.47569.

If I understand correctly, you can just compare that starting and ending coordinates:
select t.*
from trips t
where lat_start >= 41.11591 and lat_start <= 41.24895 and
lat_end >= 41.11591 and lat_end <= 41.24895 and
long_start >= -8.68494 and long_start <= -8.47569 and
long_end >= -8.68494 and long_end <= -8.47569

Since your coordinates are stored in x,y columns, you have to use ST_MakePoint to create a proper geometry. After that, you can create a BBOX using the function ST_MakeEnvelope and check if start and end coordinates are inside the BBOX using ST_Contains, e.g.
WITH bbox(geom) AS (
VALUES (ST_MakeEnvelope(-8.68494,41.24895,-8.47569,41.11591,4326))
)
SELECT * FROM trips,bbox
WHERE
ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_start,lat_start),4326)) AND
ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_end,lat_end),4326));
Note: the CTE isn't really necessary and is in the query just for illustration purposes. You can repeat the ST_MakeEnvelope function on both conditions in the WHERE clause instead of bbox.geom. This query also assumes the SRS WGS84 (4326).

Related

PostgreSQL: create table with unique timestamp for all rows

I have a record of users' trips with begin/end positions and time in a table like this:
CREATE TABLE trips(id integer, start_timestamp timestamp with time zone,
session_id integer, start_lat double precision,
start_lon double precision, end_lat double precision,
end_lon double precision, mode integer);
INSERT INTO trips (id, start_timestamp, session_id, start_lat,start_lon,end_lat,end_lon,mode)
VALUES (563097015,'2017-05-20 17:47:12+01', 128618, 41.1783308,-8.5949878, 41.1784478, -8.5948463, 0),
(563097013, '2017-05-20 17:45:29+01', 128618, 41.1781344, -8.5951169, 41.1782919, -8.5950689, 0),
(563097011, '2017-05-20 17:43:41+01', 128618, 41.1781196, -8.5954075, 41.1782139, -8.5950689, 0),
(563097009, '2017-05-20 17:41:48+01', 128618, 41.1782497, -8.595197, 41.1781101, -8.5954124, 0),
(563097003, '2017-05-20 17:10:29+01', 128618, 41.1832512, -8.6081606, 41.1782561, -8.5950259, 0)
And in the second table is the records of raw gps traces for all the trips similar to:
CREATE TABLE gps_traces (session_id integer, seconds integer, lat double precision,
lon double precision, speed double precision);
INSERT INTO gps_traces (session_id, seconds , lat , lon , speed )
VALUES (128618,1495296443,41.1844471,-8.6065158,1.35148),
(128618,1495296444,41.1844482,-8.6065303,1.28004),
(128618,1495296445,41.1844572,-8.6065503,1.46086),
(128618,1495296446,41.1844541,-8.6065691,1.23),
(128618,1495296446,41.1844589,-8.6065861, 1.22919),
(128618,1495296447,41.1844587, -8.6066043, 1.30188),
(128618, 1495296448, 41.1844604, -8.6066261, 1.43126),
(128618, 1495296449, 41.184471, -8.6066412, 1.55003),
(128618,1495296450, 41.1844715, -8.6066572, 1.29062),
(128618,1495296450, 41.1844707, -8.6066736, 1.3618)
From this I want to create a new table mytable containing GPS joining these tables on session_id, like so:
CREATE TABLE mytable AS SELECT id, seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
However, in the new table, I want to ensure that for rows recorded twice at same unix timestamp in a trip, only only is selected into my new table. For example in this case:
SELECT * FROM mytable WHERE id = 563097003;
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296446 | 41.1844589 | -8.6065861 | 1.22919 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 563097003 | 1495296450 | 41.1844707 | -8.6066736 | 1.3618 | 0 |
| 10 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Column seconds is the Unix timestamp. As shown, we can see rows having more than 1 unique timestamp count at 1495296446 and 1495296450. I would like to ensure that for each trip, records are selected into the new table with unique timestamp (so in the case above, only one recorded should selected into the new table). I illustrate that in this db<>fiddle.
EDIT
Expected output:
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 8 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Use DISTINCT ON:
CREATE TABLE mytable AS
SELECT DISTINCT ON (t.session_id, seconds) id, seconds, lat, lon, speed, mode
FROM trips t JOIN
gps_traces g
ON t.session_id = g.session_id
ORDER BY t.session_id, seconds;
Note: I would expect you to include session_id in the new table as well.
Thanks to #Abelisto, it turns out that the following modification to this answer works as intended.
CREATE TABLE mytable AS SELECT DISTINCT ON (id, seconds)id,
seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
ORDER BY id, seconds
Here is a db<>fiddle.

Alphanumberic output from ST_MakeLine

I'm trying to convert lat/lon to linestring. Basically, grouping the columns lat and lon, making a point, and creating a linestring.
Table:
+------------+----------+-----------+------------+---------+--------+
| link_id | seq_num | lat | lon | z_coord | zlevel |
+------------+----------+-----------+------------+---------+--------+
| "16777220" | "0" | "4129098" | "-7192948" | | 0 |
| "16777220" | "999999" | "4129134" | "-7192950" | | 0 |
| "16777222" | "0" | "4128989" | "-7193030" | | 0 |
| "16777222" | "1" | "4128975" | "-7193016" | | 0 |
| "16777222" | "2" | "4128940" | "-7193001" | | 0 |
| "16777222" | "3" | "4128917" | "-7192998" | | 0 |
| "16777222" | "4" | "4128911" | "-7193002" | | 0 |
+------------+----------+-----------+------------+---------+--------+
My code:
select link_id, ST_SetSRID(ST_MakeLine(ST_MakePoint((lon::double precision / 100000), (lat::double precision / 100000))),4326) as geometry
from public.rdf_link_geometry
group by link_id
limit 50
geometry output column example:
"0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440"
^^ What is this? how did it get formatted in such a way? I expected a linestring, something like
geometry
7.123 50.123,7.321 50.321
7.321 50.321,7.321 50.321
Data format for link_id is bingint, and for geometry it says geometry
SOLUTION:
select link_id, ST_AsText(ST_SetSRID(ST_MakeLine(ST_MakePoint(
(lon::double precision / 100000), (lat::double precision / 100000))),4326)) as geometry
from public.rdf_link_geometry
group by link_id
limit 50
The output is a geometry, which you can display as text using st_asText
select st_asText('0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440');
st_astext
--------------------------------------------------
LINESTRING(-71.92948 41.29098,-71.9295 41.29134)
That being said, should you have more than 2 points, you could order them to create a meaningful line:
select st_makeline(geom ORDER BY seqID) from tbl;

Get similar employees based on their attribute values

Consider the following sample table("Customer") with these records
=========
Customer
=========
-----------------------------------------------------------------------------------------------
| customer-id | att-a | att-b | att-c | att-d | att-e | att-f | att-g | att-h | att-i | att-j |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-1 | att-a-7 | att-b-3 | att-c-10 | att-d-10 | att-e-15 | att-f-11 | att-g-2 | att-h-7 | att-i-5 | att-j-14 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-2 | att-a-9 | att-b-7 | att-c-12 | att-d-4 | att-e-10 | att-f-4 | att-g-13 | att-h-4 | att-i-1 | att-j-13 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-3 | att-a-10 | att-b-6 | att-c-1 | att-d-1 | att-e-13 | att-f-12 | att-g-9 | att-h-6 | att-i-7 | tt-j-4 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-19 | att-a-7 | att-b-9 | att-c-13 | att-d-5 | att-e-8 | att-f-5 | att-g-12 | att-h-14 | att-i-13 | att-j-15 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
I have these records and many more records dumped into SQL database and wanted to find top 10 similar customer based on the attribute value. For example customer-1 and customer-19 have atleast one column value matching .i.e "att-a-7" so the output should give me 2 customer-id's or top similar customer that are customer-1 and customer-19.
P.S - there can be one or more columns similar across rows.
I'm using windowing technique to find top 10 similar customer and im not sure if I'm correct.
following is my approach I used in my query :
row_number() over (partition by att-a, att-b,..,att-j order by customer-id) as customers
is this correct. ?

PostgreSQL: Get elapsed amount between integers

I'm trying to get the difference between the first start_time and the last stop_time in the table. But I can't seem to get this done in one query. Can somebody help me? This is some sample data:
start_time | stop_time
-------------------+-------------------
1398871312.769668 | 1398871312.769676
1398871312.771368 | 1398871312.771429
1398871312.771471 | 1398871312.771476
1398871312.771494 | 1398871312.771543
1398871312.781109 | 1398871312.781115
1398871312.781150 | 1398871312.781154
1398871312.781233 | 1398871312.781282
1398871312.992759 | 1398871312.992765
1398871312.992795 | 1398871312.992798
1398871312.992832 | 1398871312.992881
1398871313.3387 | 1398871313.3399
1398871313.3435 | 1398871313.3440
1398871313.3703 | 1398871313.3745
1398871313.203462 | 1398871313.203469
1398871313.203497 | 1398871313.203501
1398871313.203560 | 1398871313.203600
1398871313.214120 | 1398871313.214127
1398871313.214153 | 1398871313.214158
1398871313.214177 | 1398871313.214192
1398871313.214208 | 1398871313.214248
1398871313.415027 | 1398871313.415035
1398871313.415136 | 1398871313.415140
1398871313.415218 | 1398871313.415226
1398871313.415252 | 1398871313.415265
1398871313.415290 | 1398871313.415298
1398871313.415332 | 1398871313.415339
1398871313.415350 | 1398871313.415362
1398871314.144867 | 1398871314.144886
1398871314.144896 | 1398871314.144901
1398871314.144906 | 1398871314.144912
1398871314.144918 | 1398871314.144923
1398871314.144927 | 1398871314.144931
1398871314.144935 | 1398871314.144939
1398871314.144965 | 1398871314.144974
1398871314.145055 | 1398871314.145060
1398871314.145138 | 1398871314.145146
1398871314.145152 | 1398871314.145158
1398871314.145166 | 1398871314.145173
1398871314.145211 | 1398871314.145215
1398871314.145235 | 1398871314.145243
1398871314.145247 | 1398871314.145252
1398871314.145262 | 1398871314.145267
1398871314.145307 | 1398871314.145314
1398871314.145547 | 1398871314.145551
1398871314.145563 | 1398871314.145571
1398871314.145576 | 1398871314.145581
1398871314.145586 | 1398871314.145590
1398871314.145600 | 1398871314.145606
1398871314.145611 | 1398871314.145618
1398871314.145623 | 1398871314.145627
1398871314.145634 | 1398871314.145641
1398871314.145999 | 1398871314.146003
1398871314.146014 | 1398871314.146022
1398871314.146026 | 1398871314.146033
1398871314.146043 | 1398871314.146050
1398871314.146140 | 1398871314.146145
1398871314.146160 | 1398871314.146168
1398871314.146178 | 1398871314.146185
So I want the difference between 1398871312 and 1398871314, in one query. Is this possible? Can anyone help me? Thanks!
Something like this?
select max(stop_time) - min(start_time)
from Table1

casting a REAL as INT and comparing

I am casting a real to an int and a float to an int and comparing the two like this:
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
but i am still getting results where the two are equal. for example:
+-----------+-----------+------------+------------+----------+
| accn | load_dt | pmtdt | sumpaidamt | Bpaidamt |
+-----------+-----------+------------+------------+----------+
| A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 |
| A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 |
| A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 |
| A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 |
| A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 |
| A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 |
| A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 |
| AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 |
+-----------+-----------+------------+------------+----------+
here is the full query:
select
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)] sumpaidamt,
sum(b.paid_amt) Bpaidamt
from
[MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a
join
F_PAYOR_PAYMENTS_DAILY b
on
a.accn=b.ACCESSION_ID
and
a.final_rpt_dt=b.FINAL_REPORT_DATE
and
a.load_dt=b.LOAD_DATE
and
a.pmtdt=b.PAYMENT_DATE
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
group by
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)]
what am i doing wrong? how do i return only records that are NOT equal?
I don't see why there is an issue.
The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment.
Perhaps your intention is to have a HAVING clause instead:
having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)
You can do a round and a cast statement.
cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money)
Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1