PostgreSQL: create table with unique timestamp for all rows - sql

I have a record of users' trips with begin/end positions and time in a table like this:
CREATE TABLE trips(id integer, start_timestamp timestamp with time zone,
session_id integer, start_lat double precision,
start_lon double precision, end_lat double precision,
end_lon double precision, mode integer);
INSERT INTO trips (id, start_timestamp, session_id, start_lat,start_lon,end_lat,end_lon,mode)
VALUES (563097015,'2017-05-20 17:47:12+01', 128618, 41.1783308,-8.5949878, 41.1784478, -8.5948463, 0),
(563097013, '2017-05-20 17:45:29+01', 128618, 41.1781344, -8.5951169, 41.1782919, -8.5950689, 0),
(563097011, '2017-05-20 17:43:41+01', 128618, 41.1781196, -8.5954075, 41.1782139, -8.5950689, 0),
(563097009, '2017-05-20 17:41:48+01', 128618, 41.1782497, -8.595197, 41.1781101, -8.5954124, 0),
(563097003, '2017-05-20 17:10:29+01', 128618, 41.1832512, -8.6081606, 41.1782561, -8.5950259, 0)
And in the second table is the records of raw gps traces for all the trips similar to:
CREATE TABLE gps_traces (session_id integer, seconds integer, lat double precision,
lon double precision, speed double precision);
INSERT INTO gps_traces (session_id, seconds , lat , lon , speed )
VALUES (128618,1495296443,41.1844471,-8.6065158,1.35148),
(128618,1495296444,41.1844482,-8.6065303,1.28004),
(128618,1495296445,41.1844572,-8.6065503,1.46086),
(128618,1495296446,41.1844541,-8.6065691,1.23),
(128618,1495296446,41.1844589,-8.6065861, 1.22919),
(128618,1495296447,41.1844587, -8.6066043, 1.30188),
(128618, 1495296448, 41.1844604, -8.6066261, 1.43126),
(128618, 1495296449, 41.184471, -8.6066412, 1.55003),
(128618,1495296450, 41.1844715, -8.6066572, 1.29062),
(128618,1495296450, 41.1844707, -8.6066736, 1.3618)
From this I want to create a new table mytable containing GPS joining these tables on session_id, like so:
CREATE TABLE mytable AS SELECT id, seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
However, in the new table, I want to ensure that for rows recorded twice at same unix timestamp in a trip, only only is selected into my new table. For example in this case:
SELECT * FROM mytable WHERE id = 563097003;
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296446 | 41.1844589 | -8.6065861 | 1.22919 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 563097003 | 1495296450 | 41.1844707 | -8.6066736 | 1.3618 | 0 |
| 10 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Column seconds is the Unix timestamp. As shown, we can see rows having more than 1 unique timestamp count at 1495296446 and 1495296450. I would like to ensure that for each trip, records are selected into the new table with unique timestamp (so in the case above, only one recorded should selected into the new table). I illustrate that in this db<>fiddle.
EDIT
Expected output:
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 8 rows | | | | | |
+-----------+------------+------------+------------+---------+------+

Use DISTINCT ON:
CREATE TABLE mytable AS
SELECT DISTINCT ON (t.session_id, seconds) id, seconds, lat, lon, speed, mode
FROM trips t JOIN
gps_traces g
ON t.session_id = g.session_id
ORDER BY t.session_id, seconds;
Note: I would expect you to include session_id in the new table as well.
Thanks to #Abelisto, it turns out that the following modification to this answer works as intended.
CREATE TABLE mytable AS SELECT DISTINCT ON (id, seconds)id,
seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
ORDER BY id, seconds
Here is a db<>fiddle.

Related

postgres: statictics of column of user type "set"

I am accessing a table table that I recognised one column was defined like type list (if you come from python like I do). I retrieved its create statement via pg_dump
CREATE TABLE sensemyfeup.trips (
trip_id integer NOT NULL,
daily_user_id integer,
session_ids integer[],
seconds_start integer,
lat_start double precision,
lon_start double precision,
seconds_end integer,
lat_end double precision,
lon_end double precision,
distance double precision
);
I am referring to column session_ids. It contents look like:
SELECT * FROM trips LIMIT 5;
trip_id | daily_user_id | session_ids | seconds_start | lat_start | lon_start | seconds_end | lat_end | lon_end | distance
---------+---------------+---------------+---------------+------------+------------+----- --------+------------+------------+------------------
540797 | 2169 | {43350} | 1461056108 | 41.1250659 | -8.5993936 | 1461056424 | 41.1221733 | -8.6004883 | 412.658565594423
546128 | 3096 | {84659,84663} | 1461847953 | 41.1787939 | -8.6078294 | 1461849730 | 41.1840573 | -8.6033242 | 3469.92906971906
536069 | 1080 | {9837} | 1460293763 | 41.1836186 | -8.6001802 | 1460294099 | 41.1836725 | -8.6001787 | 47.7817179218928
537711 | 1373 | {17641,17689} | 1460590761 | 41.1477454 | -8.611109 | 1460593908 | 41.1477451 | -8.6111093 | 1081.61337507529
542407 | 2254 | {53112} | 1461173383 | 40.9853811 | -8.5205261 | 1461173677 | 40.9873266 | -8.5003848 | 2224.13368208515
As we can see in the session_ids column, some records have 1 value, some multiple.
How I get a summary statistics of rows with 1 session_ids value, with 2, etc..?
We can use cardinality to count the number of elements in session_ids and group by with the results.
select cardinality(session_ids) as number_of_session_id_values
,count(*)
from t
group by cardinality(session_ids)
number_of_session_id_values
count
2
2
1
3
Fiddle

PostgreSQL: show trips within a bounding box

I have a trips table containing user's trip information, like so:
select * from trips limit 10;
trip_id | daily_user_id | session_ids | seconds_start | lat_start | lon_start | seconds_end | lat_end | lon_end | distance
---------+---------------+-------------+---------------+------------+------------+-------------+------------+------------+------------------
594221 | 16772 | {170487} | 1561324555 | 41.1175475 | -8.6298934 | 1561325119 | 41.1554091 | -8.6283493 | 5875.39697884959
563097 | 7682 | {128618} | 1495295471 | 41.1782829 | -8.5950303 | 1495299137 | 41.1783908 | -8.5948965 | 5364.81067787512
596303 | 17264 | {172851} | 1578011699 | 41.5195598 | -8.6393526 | 1578012513 | 41.4614024 | -8.717709 | 11187.7956426909
595648 | 17124 | {172119} | 1575620857 | 41.1553116 | -8.6439528 | 1575621885 | 41.1621821 | -8.6383042 | 1774.83365424607
566061 | 8720 | {133624} | 1509005051 | 41.1241975 | -8.5958988 | 1509006310 | 41.1424158 | -8.6101461 | 3066.40306678979
566753 | 8947 | {134662} | 1511127813 | 41.1887996 | -8.5844238 | 1511129839 | 41.2107519 | -8.5511712 | 5264.64026582458
561179 | 7198 | {125861} | 1493311197 | 41.1776935 | -8.5947254 | 1493311859 | 41.1773815 | -8.5947254 | 771.437257541019
541328 | 2119 | {46950} | 1461103381 | 41.1779 | -8.5949738 | 1461103613 | 41.1779129 | -8.5950202 | 177.610819150637
535519 | 908 | {6016} | 1460140650 | 41.1644658 | -8.6422775 | 1460141201 | 41.1642646 | -8.6423309 | 1484.61552373019
548460 | 3525 | {102026} | 1462289206 | 41.177689 | -8.594679 | 1462289843 | 41.1734476 | -8.5916326 | 1108.05119077308
(10 rows)
The task is to filter trips that start and end within the bounding box defined by upper left: 41.24895, -8.68494 and lower right: 41.11591, -8.47569.
If I understand correctly, you can just compare that starting and ending coordinates:
select t.*
from trips t
where lat_start >= 41.11591 and lat_start <= 41.24895 and
lat_end >= 41.11591 and lat_end <= 41.24895 and
long_start >= -8.68494 and long_start <= -8.47569 and
long_end >= -8.68494 and long_end <= -8.47569
Since your coordinates are stored in x,y columns, you have to use ST_MakePoint to create a proper geometry. After that, you can create a BBOX using the function ST_MakeEnvelope and check if start and end coordinates are inside the BBOX using ST_Contains, e.g.
WITH bbox(geom) AS (
VALUES (ST_MakeEnvelope(-8.68494,41.24895,-8.47569,41.11591,4326))
)
SELECT * FROM trips,bbox
WHERE
ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_start,lat_start),4326)) AND
ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_end,lat_end),4326));
Note: the CTE isn't really necessary and is in the query just for illustration purposes. You can repeat the ST_MakeEnvelope function on both conditions in the WHERE clause instead of bbox.geom. This query also assumes the SRS WGS84 (4326).

Alphanumberic output from ST_MakeLine

I'm trying to convert lat/lon to linestring. Basically, grouping the columns lat and lon, making a point, and creating a linestring.
Table:
+------------+----------+-----------+------------+---------+--------+
| link_id | seq_num | lat | lon | z_coord | zlevel |
+------------+----------+-----------+------------+---------+--------+
| "16777220" | "0" | "4129098" | "-7192948" | | 0 |
| "16777220" | "999999" | "4129134" | "-7192950" | | 0 |
| "16777222" | "0" | "4128989" | "-7193030" | | 0 |
| "16777222" | "1" | "4128975" | "-7193016" | | 0 |
| "16777222" | "2" | "4128940" | "-7193001" | | 0 |
| "16777222" | "3" | "4128917" | "-7192998" | | 0 |
| "16777222" | "4" | "4128911" | "-7193002" | | 0 |
+------------+----------+-----------+------------+---------+--------+
My code:
select link_id, ST_SetSRID(ST_MakeLine(ST_MakePoint((lon::double precision / 100000), (lat::double precision / 100000))),4326) as geometry
from public.rdf_link_geometry
group by link_id
limit 50
geometry output column example:
"0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440"
^^ What is this? how did it get formatted in such a way? I expected a linestring, something like
geometry
7.123 50.123,7.321 50.321
7.321 50.321,7.321 50.321
Data format for link_id is bingint, and for geometry it says geometry
SOLUTION:
select link_id, ST_AsText(ST_SetSRID(ST_MakeLine(ST_MakePoint(
(lon::double precision / 100000), (lat::double precision / 100000))),4326)) as geometry
from public.rdf_link_geometry
group by link_id
limit 50
The output is a geometry, which you can display as text using st_asText
select st_asText('0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440');
st_astext
--------------------------------------------------
LINESTRING(-71.92948 41.29098,-71.9295 41.29134)
That being said, should you have more than 2 points, you could order them to create a meaningful line:
select st_makeline(geom ORDER BY seqID) from tbl;

PostgresQL ERROR: operator does not exist: integer = integer[]

I am doing an SQL JOIN...ON in which the column of the other table to join on is an array to a set of rows, and therefore encounter this error. Specifically, I'm doing the JOIN on the tables.
TABLE: location
+------------+------------+---------+----------+
| session_id | gpstime | lat | lon |
+------------+------------+---------+----------+
| 49 | 1458203595 | 39.7449 | -8.8052 |
| 59 | 1458203601 | 39.7438 | -8.8057 |
| 95 | 1458203602 | 39.7438 | -8.8056 |
| 49 | 1458203602 | 39.7438 | -8.8057 |
+------------+------------+---------+----------+
TABLE: trips
+-------------+-----------+---------+-----------+---------+-------------+
| session_ids | lat_start | lat_end | lon_start | lon_end | travel_mode |
+-------------+-----------+---------+-----------+---------+-------------+
| {49} | 39.7449 | 41.1782 | -8.8053 | -8.5946 | car |
| {59,60} | 41.1551 | 41.1542 | -8.6294 | -8.6247 | foot |
| {94,95} | 41.1545 | 40.7636 | -8.6273 | -8.1729 | bike |
+-------------+-----------+---------+-----------+---------+-------------+
Here's the query I used:
SELECT gpstime, lat, lon, travel_mode
FROM location
INNER JOIN trips
ON session_id = session_ids
WHERE (lat BETWEEN SYMMETRIC lat_start AND lat_end)
AND (lon BETWEEN SYMMETRIC lon_start AND lon_end);
Error:
ERROR: operator does not exist: integer = integer[]
LINE 4: ON session_id = session_ids
How do I fix the issue?
The = comparator can only compare two values of the same type. But here you are trying to compare an integer value with an array. So the value 1 cannot equal a value that look like [1,2].
You can use the = ANY(...) comparator which checks if the left value is part of the right array:
demo:db<>fiddle
ON session_id = ANY(session_ids)
S-Man is correct, although you can also use the ANY function as described here.
For more information on the differences between using IN and ANY/ALL, read this question.

casting a REAL as INT and comparing

I am casting a real to an int and a float to an int and comparing the two like this:
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
but i am still getting results where the two are equal. for example:
+-----------+-----------+------------+------------+----------+
| accn | load_dt | pmtdt | sumpaidamt | Bpaidamt |
+-----------+-----------+------------+------------+----------+
| A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 |
| A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 |
| A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 |
| A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 |
| A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 |
| A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 |
| A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 |
| AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 |
+-----------+-----------+------------+------------+----------+
here is the full query:
select
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)] sumpaidamt,
sum(b.paid_amt) Bpaidamt
from
[MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a
join
F_PAYOR_PAYMENTS_DAILY b
on
a.accn=b.ACCESSION_ID
and
a.final_rpt_dt=b.FINAL_REPORT_DATE
and
a.load_dt=b.LOAD_DATE
and
a.pmtdt=b.PAYMENT_DATE
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
group by
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)]
what am i doing wrong? how do i return only records that are NOT equal?
I don't see why there is an issue.
The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment.
Perhaps your intention is to have a HAVING clause instead:
having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)
You can do a round and a cast statement.
cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money)
Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1