PostgreSQL: create table with unique timestamp for all rows - sql
I have a record of users' trips with begin/end positions and time in a table like this:
CREATE TABLE trips(id integer, start_timestamp timestamp with time zone,
session_id integer, start_lat double precision,
start_lon double precision, end_lat double precision,
end_lon double precision, mode integer);
INSERT INTO trips (id, start_timestamp, session_id, start_lat,start_lon,end_lat,end_lon,mode)
VALUES (563097015,'2017-05-20 17:47:12+01', 128618, 41.1783308,-8.5949878, 41.1784478, -8.5948463, 0),
(563097013, '2017-05-20 17:45:29+01', 128618, 41.1781344, -8.5951169, 41.1782919, -8.5950689, 0),
(563097011, '2017-05-20 17:43:41+01', 128618, 41.1781196, -8.5954075, 41.1782139, -8.5950689, 0),
(563097009, '2017-05-20 17:41:48+01', 128618, 41.1782497, -8.595197, 41.1781101, -8.5954124, 0),
(563097003, '2017-05-20 17:10:29+01', 128618, 41.1832512, -8.6081606, 41.1782561, -8.5950259, 0)
And in the second table is the records of raw gps traces for all the trips similar to:
CREATE TABLE gps_traces (session_id integer, seconds integer, lat double precision,
lon double precision, speed double precision);
INSERT INTO gps_traces (session_id, seconds , lat , lon , speed )
VALUES (128618,1495296443,41.1844471,-8.6065158,1.35148),
(128618,1495296444,41.1844482,-8.6065303,1.28004),
(128618,1495296445,41.1844572,-8.6065503,1.46086),
(128618,1495296446,41.1844541,-8.6065691,1.23),
(128618,1495296446,41.1844589,-8.6065861, 1.22919),
(128618,1495296447,41.1844587, -8.6066043, 1.30188),
(128618, 1495296448, 41.1844604, -8.6066261, 1.43126),
(128618, 1495296449, 41.184471, -8.6066412, 1.55003),
(128618,1495296450, 41.1844715, -8.6066572, 1.29062),
(128618,1495296450, 41.1844707, -8.6066736, 1.3618)
From this I want to create a new table mytable containing GPS joining these tables on session_id, like so:
CREATE TABLE mytable AS SELECT id, seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
However, in the new table, I want to ensure that for rows recorded twice at same unix timestamp in a trip, only only is selected into my new table. For example in this case:
SELECT * FROM mytable WHERE id = 563097003;
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296446 | 41.1844589 | -8.6065861 | 1.22919 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 563097003 | 1495296450 | 41.1844707 | -8.6066736 | 1.3618 | 0 |
| 10 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Column seconds is the Unix timestamp. As shown, we can see rows having more than 1 unique timestamp count at 1495296446 and 1495296450. I would like to ensure that for each trip, records are selected into the new table with unique timestamp (so in the case above, only one recorded should selected into the new table). I illustrate that in this db<>fiddle.
EDIT
Expected output:
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 8 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Use DISTINCT ON:
CREATE TABLE mytable AS
SELECT DISTINCT ON (t.session_id, seconds) id, seconds, lat, lon, speed, mode
FROM trips t JOIN
gps_traces g
ON t.session_id = g.session_id
ORDER BY t.session_id, seconds;
Note: I would expect you to include session_id in the new table as well.
Thanks to #Abelisto, it turns out that the following modification to this answer works as intended.
CREATE TABLE mytable AS SELECT DISTINCT ON (id, seconds)id,
seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
ORDER BY id, seconds
Here is a db<>fiddle.
Related
postgres: statictics of column of user type "set"
I am accessing a table table that I recognised one column was defined like type list (if you come from python like I do). I retrieved its create statement via pg_dump CREATE TABLE sensemyfeup.trips ( trip_id integer NOT NULL, daily_user_id integer, session_ids integer[], seconds_start integer, lat_start double precision, lon_start double precision, seconds_end integer, lat_end double precision, lon_end double precision, distance double precision ); I am referring to column session_ids. It contents look like: SELECT * FROM trips LIMIT 5; trip_id | daily_user_id | session_ids | seconds_start | lat_start | lon_start | seconds_end | lat_end | lon_end | distance ---------+---------------+---------------+---------------+------------+------------+----- --------+------------+------------+------------------ 540797 | 2169 | {43350} | 1461056108 | 41.1250659 | -8.5993936 | 1461056424 | 41.1221733 | -8.6004883 | 412.658565594423 546128 | 3096 | {84659,84663} | 1461847953 | 41.1787939 | -8.6078294 | 1461849730 | 41.1840573 | -8.6033242 | 3469.92906971906 536069 | 1080 | {9837} | 1460293763 | 41.1836186 | -8.6001802 | 1460294099 | 41.1836725 | -8.6001787 | 47.7817179218928 537711 | 1373 | {17641,17689} | 1460590761 | 41.1477454 | -8.611109 | 1460593908 | 41.1477451 | -8.6111093 | 1081.61337507529 542407 | 2254 | {53112} | 1461173383 | 40.9853811 | -8.5205261 | 1461173677 | 40.9873266 | -8.5003848 | 2224.13368208515 As we can see in the session_ids column, some records have 1 value, some multiple. How I get a summary statistics of rows with 1 session_ids value, with 2, etc..?
We can use cardinality to count the number of elements in session_ids and group by with the results. select cardinality(session_ids) as number_of_session_id_values ,count(*) from t group by cardinality(session_ids) number_of_session_id_values count 2 2 1 3 Fiddle
PostgreSQL: show trips within a bounding box
I have a trips table containing user's trip information, like so: select * from trips limit 10; trip_id | daily_user_id | session_ids | seconds_start | lat_start | lon_start | seconds_end | lat_end | lon_end | distance ---------+---------------+-------------+---------------+------------+------------+-------------+------------+------------+------------------ 594221 | 16772 | {170487} | 1561324555 | 41.1175475 | -8.6298934 | 1561325119 | 41.1554091 | -8.6283493 | 5875.39697884959 563097 | 7682 | {128618} | 1495295471 | 41.1782829 | -8.5950303 | 1495299137 | 41.1783908 | -8.5948965 | 5364.81067787512 596303 | 17264 | {172851} | 1578011699 | 41.5195598 | -8.6393526 | 1578012513 | 41.4614024 | -8.717709 | 11187.7956426909 595648 | 17124 | {172119} | 1575620857 | 41.1553116 | -8.6439528 | 1575621885 | 41.1621821 | -8.6383042 | 1774.83365424607 566061 | 8720 | {133624} | 1509005051 | 41.1241975 | -8.5958988 | 1509006310 | 41.1424158 | -8.6101461 | 3066.40306678979 566753 | 8947 | {134662} | 1511127813 | 41.1887996 | -8.5844238 | 1511129839 | 41.2107519 | -8.5511712 | 5264.64026582458 561179 | 7198 | {125861} | 1493311197 | 41.1776935 | -8.5947254 | 1493311859 | 41.1773815 | -8.5947254 | 771.437257541019 541328 | 2119 | {46950} | 1461103381 | 41.1779 | -8.5949738 | 1461103613 | 41.1779129 | -8.5950202 | 177.610819150637 535519 | 908 | {6016} | 1460140650 | 41.1644658 | -8.6422775 | 1460141201 | 41.1642646 | -8.6423309 | 1484.61552373019 548460 | 3525 | {102026} | 1462289206 | 41.177689 | -8.594679 | 1462289843 | 41.1734476 | -8.5916326 | 1108.05119077308 (10 rows) The task is to filter trips that start and end within the bounding box defined by upper left: 41.24895, -8.68494 and lower right: 41.11591, -8.47569.
If I understand correctly, you can just compare that starting and ending coordinates: select t.* from trips t where lat_start >= 41.11591 and lat_start <= 41.24895 and lat_end >= 41.11591 and lat_end <= 41.24895 and long_start >= -8.68494 and long_start <= -8.47569 and long_end >= -8.68494 and long_end <= -8.47569
Since your coordinates are stored in x,y columns, you have to use ST_MakePoint to create a proper geometry. After that, you can create a BBOX using the function ST_MakeEnvelope and check if start and end coordinates are inside the BBOX using ST_Contains, e.g. WITH bbox(geom) AS ( VALUES (ST_MakeEnvelope(-8.68494,41.24895,-8.47569,41.11591,4326)) ) SELECT * FROM trips,bbox WHERE ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_start,lat_start),4326)) AND ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_end,lat_end),4326)); Note: the CTE isn't really necessary and is in the query just for illustration purposes. You can repeat the ST_MakeEnvelope function on both conditions in the WHERE clause instead of bbox.geom. This query also assumes the SRS WGS84 (4326).
Alphanumberic output from ST_MakeLine
I'm trying to convert lat/lon to linestring. Basically, grouping the columns lat and lon, making a point, and creating a linestring. Table: +------------+----------+-----------+------------+---------+--------+ | link_id | seq_num | lat | lon | z_coord | zlevel | +------------+----------+-----------+------------+---------+--------+ | "16777220" | "0" | "4129098" | "-7192948" | | 0 | | "16777220" | "999999" | "4129134" | "-7192950" | | 0 | | "16777222" | "0" | "4128989" | "-7193030" | | 0 | | "16777222" | "1" | "4128975" | "-7193016" | | 0 | | "16777222" | "2" | "4128940" | "-7193001" | | 0 | | "16777222" | "3" | "4128917" | "-7192998" | | 0 | | "16777222" | "4" | "4128911" | "-7193002" | | 0 | +------------+----------+-----------+------------+---------+--------+ My code: select link_id, ST_SetSRID(ST_MakeLine(ST_MakePoint((lon::double precision / 100000), (lat::double precision / 100000))),4326) as geometry from public.rdf_link_geometry group by link_id limit 50 geometry output column example: "0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440" ^^ What is this? how did it get formatted in such a way? I expected a linestring, something like geometry 7.123 50.123,7.321 50.321 7.321 50.321,7.321 50.321 Data format for link_id is bingint, and for geometry it says geometry SOLUTION: select link_id, ST_AsText(ST_SetSRID(ST_MakeLine(ST_MakePoint( (lon::double precision / 100000), (lat::double precision / 100000))),4326)) as geometry from public.rdf_link_geometry group by link_id limit 50
The output is a geometry, which you can display as text using st_asText select st_asText('0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440'); st_astext -------------------------------------------------- LINESTRING(-71.92948 41.29098,-71.9295 41.29134) That being said, should you have more than 2 points, you could order them to create a meaningful line: select st_makeline(geom ORDER BY seqID) from tbl;
PostgresQL ERROR: operator does not exist: integer = integer[]
I am doing an SQL JOIN...ON in which the column of the other table to join on is an array to a set of rows, and therefore encounter this error. Specifically, I'm doing the JOIN on the tables. TABLE: location +------------+------------+---------+----------+ | session_id | gpstime | lat | lon | +------------+------------+---------+----------+ | 49 | 1458203595 | 39.7449 | -8.8052 | | 59 | 1458203601 | 39.7438 | -8.8057 | | 95 | 1458203602 | 39.7438 | -8.8056 | | 49 | 1458203602 | 39.7438 | -8.8057 | +------------+------------+---------+----------+ TABLE: trips +-------------+-----------+---------+-----------+---------+-------------+ | session_ids | lat_start | lat_end | lon_start | lon_end | travel_mode | +-------------+-----------+---------+-----------+---------+-------------+ | {49} | 39.7449 | 41.1782 | -8.8053 | -8.5946 | car | | {59,60} | 41.1551 | 41.1542 | -8.6294 | -8.6247 | foot | | {94,95} | 41.1545 | 40.7636 | -8.6273 | -8.1729 | bike | +-------------+-----------+---------+-----------+---------+-------------+ Here's the query I used: SELECT gpstime, lat, lon, travel_mode FROM location INNER JOIN trips ON session_id = session_ids WHERE (lat BETWEEN SYMMETRIC lat_start AND lat_end) AND (lon BETWEEN SYMMETRIC lon_start AND lon_end); Error: ERROR: operator does not exist: integer = integer[] LINE 4: ON session_id = session_ids How do I fix the issue?
The = comparator can only compare two values of the same type. But here you are trying to compare an integer value with an array. So the value 1 cannot equal a value that look like [1,2]. You can use the = ANY(...) comparator which checks if the left value is part of the right array: demo:db<>fiddle ON session_id = ANY(session_ids)
S-Man is correct, although you can also use the ANY function as described here. For more information on the differences between using IN and ANY/ALL, read this question.
casting a REAL as INT and comparing
I am casting a real to an int and a float to an int and comparing the two like this: where cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int) but i am still getting results where the two are equal. for example: +-----------+-----------+------------+------------+----------+ | accn | load_dt | pmtdt | sumpaidamt | Bpaidamt | +-----------+-----------+------------+------------+----------+ | A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 | | A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 | | A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 | | A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 | | A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 | | A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 | | A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 | | AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 | +-----------+-----------+------------+------------+----------+ here is the full query: select a.accn, a.load_dt, a.pmtdt, a.[SUM(PAID_AMT)] sumpaidamt, sum(b.paid_amt) Bpaidamt from [MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a join F_PAYOR_PAYMENTS_DAILY b on a.accn=b.ACCESSION_ID and a.final_rpt_dt=b.FINAL_REPORT_DATE and a.load_dt=b.LOAD_DATE and a.pmtdt=b.PAYMENT_DATE where cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int) group by a.accn, a.load_dt, a.pmtdt, a.[SUM(PAID_AMT)] what am i doing wrong? how do i return only records that are NOT equal?
I don't see why there is an issue. The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment. Perhaps your intention is to have a HAVING clause instead: having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)
You can do a round and a cast statement. cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money) Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1