I have a record of users' trips with begin/end positions and time in a table like this:
CREATE TABLE trips(id integer, start_timestamp timestamp with time zone,
session_id integer, start_lat double precision,
start_lon double precision, end_lat double precision,
end_lon double precision, mode integer);
INSERT INTO trips (id, start_timestamp, session_id, start_lat,start_lon,end_lat,end_lon,mode)
VALUES (563097015,'2017-05-20 17:47:12+01', 128618, 41.1783308,-8.5949878, 41.1784478, -8.5948463, 0),
(563097013, '2017-05-20 17:45:29+01', 128618, 41.1781344, -8.5951169, 41.1782919, -8.5950689, 0),
(563097011, '2017-05-20 17:43:41+01', 128618, 41.1781196, -8.5954075, 41.1782139, -8.5950689, 0),
(563097009, '2017-05-20 17:41:48+01', 128618, 41.1782497, -8.595197, 41.1781101, -8.5954124, 0),
(563097003, '2017-05-20 17:10:29+01', 128618, 41.1832512, -8.6081606, 41.1782561, -8.5950259, 0)
And in the second table is the records of raw gps traces for all the trips similar to:
CREATE TABLE gps_traces (session_id integer, seconds integer, lat double precision,
lon double precision, speed double precision);
INSERT INTO gps_traces (session_id, seconds , lat , lon , speed )
VALUES (128618,1495296443,41.1844471,-8.6065158,1.35148),
(128618,1495296444,41.1844482,-8.6065303,1.28004),
(128618,1495296445,41.1844572,-8.6065503,1.46086),
(128618,1495296446,41.1844541,-8.6065691,1.23),
(128618,1495296446,41.1844589,-8.6065861, 1.22919),
(128618,1495296447,41.1844587, -8.6066043, 1.30188),
(128618, 1495296448, 41.1844604, -8.6066261, 1.43126),
(128618, 1495296449, 41.184471, -8.6066412, 1.55003),
(128618,1495296450, 41.1844715, -8.6066572, 1.29062),
(128618,1495296450, 41.1844707, -8.6066736, 1.3618)
From this I want to create a new table mytable containing GPS joining these tables on session_id, like so:
CREATE TABLE mytable AS SELECT id, seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
However, in the new table, I want to ensure that for rows recorded twice at same unix timestamp in a trip, only only is selected into my new table. For example in this case:
SELECT * FROM mytable WHERE id = 563097003;
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296446 | 41.1844589 | -8.6065861 | 1.22919 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 563097003 | 1495296450 | 41.1844707 | -8.6066736 | 1.3618 | 0 |
| 10 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Column seconds is the Unix timestamp. As shown, we can see rows having more than 1 unique timestamp count at 1495296446 and 1495296450. I would like to ensure that for each trip, records are selected into the new table with unique timestamp (so in the case above, only one recorded should selected into the new table). I illustrate that in this db<>fiddle.
EDIT
Expected output:
+-----------+------------+------------+------------+---------+------+
| id | seconds | lat | lon | speed | mode |
+-----------+------------+------------+------------+---------+------+
| 563097003 | 1495296443 | 41.1844471 | -8.6065158 | 1.35148 | 0 |
| 563097003 | 1495296444 | 41.1844482 | -8.6065303 | 1.28004 | 0 |
| 563097003 | 1495296445 | 41.1844572 | -8.6065503 | 1.46086 | 0 |
| 563097003 | 1495296446 | 41.1844541 | -8.6065691 | 1.23 | 0 |
| 563097003 | 1495296447 | 41.1844587 | -8.6066043 | 1.30188 | 0 |
| 563097003 | 1495296448 | 41.1844604 | -8.6066261 | 1.43126 | 0 |
| 563097003 | 1495296449 | 41.184471 | -8.6066412 | 1.55003 | 0 |
| 563097003 | 1495296450 | 41.1844715 | -8.6066572 | 1.29062 | 0 |
| 8 rows | | | | | |
+-----------+------------+------------+------------+---------+------+
Use DISTINCT ON:
CREATE TABLE mytable AS
SELECT DISTINCT ON (t.session_id, seconds) id, seconds, lat, lon, speed, mode
FROM trips t JOIN
gps_traces g
ON t.session_id = g.session_id
ORDER BY t.session_id, seconds;
Note: I would expect you to include session_id in the new table as well.
Thanks to #Abelisto, it turns out that the following modification to this answer works as intended.
CREATE TABLE mytable AS SELECT DISTINCT ON (id, seconds)id,
seconds, lat, lon, speed, mode
FROM trips t
JOIN gps_traces g
ON t.session_id=g.session_id
ORDER BY id, seconds
Here is a db<>fiddle.
I'm trying to convert lat/lon to linestring. Basically, grouping the columns lat and lon, making a point, and creating a linestring.
Table:
+------------+----------+-----------+------------+---------+--------+
| link_id | seq_num | lat | lon | z_coord | zlevel |
+------------+----------+-----------+------------+---------+--------+
| "16777220" | "0" | "4129098" | "-7192948" | | 0 |
| "16777220" | "999999" | "4129134" | "-7192950" | | 0 |
| "16777222" | "0" | "4128989" | "-7193030" | | 0 |
| "16777222" | "1" | "4128975" | "-7193016" | | 0 |
| "16777222" | "2" | "4128940" | "-7193001" | | 0 |
| "16777222" | "3" | "4128917" | "-7192998" | | 0 |
| "16777222" | "4" | "4128911" | "-7193002" | | 0 |
+------------+----------+-----------+------------+---------+--------+
My code:
select link_id, ST_SetSRID(ST_MakeLine(ST_MakePoint((lon::double precision / 100000), (lat::double precision / 100000))),4326) as geometry
from public.rdf_link_geometry
group by link_id
limit 50
geometry output column example:
"0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440"
^^ What is this? how did it get formatted in such a way? I expected a linestring, something like
geometry
7.123 50.123,7.321 50.321
7.321 50.321,7.321 50.321
Data format for link_id is bingint, and for geometry it says geometry
SOLUTION:
select link_id, ST_AsText(ST_SetSRID(ST_MakeLine(ST_MakePoint(
(lon::double precision / 100000), (lat::double precision / 100000))),4326)) as geometry
from public.rdf_link_geometry
group by link_id
limit 50
The output is a geometry, which you can display as text using st_asText
select st_asText('0102000020E6100000020000004F92AE997CFB51C021E527D53EA54440736891ED7CFB51C021020EA14AA54440');
st_astext
--------------------------------------------------
LINESTRING(-71.92948 41.29098,-71.9295 41.29134)
That being said, should you have more than 2 points, you could order them to create a meaningful line:
select st_makeline(geom ORDER BY seqID) from tbl;
Consider the following sample table("Customer") with these records
=========
Customer
=========
-----------------------------------------------------------------------------------------------
| customer-id | att-a | att-b | att-c | att-d | att-e | att-f | att-g | att-h | att-i | att-j |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-1 | att-a-7 | att-b-3 | att-c-10 | att-d-10 | att-e-15 | att-f-11 | att-g-2 | att-h-7 | att-i-5 | att-j-14 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-2 | att-a-9 | att-b-7 | att-c-12 | att-d-4 | att-e-10 | att-f-4 | att-g-13 | att-h-4 | att-i-1 | att-j-13 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-3 | att-a-10 | att-b-6 | att-c-1 | att-d-1 | att-e-13 | att-f-12 | att-g-9 | att-h-6 | att-i-7 | tt-j-4 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| customer-19 | att-a-7 | att-b-9 | att-c-13 | att-d-5 | att-e-8 | att-f-5 | att-g-12 | att-h-14 | att-i-13 | att-j-15 |
--------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
I have these records and many more records dumped into SQL database and wanted to find top 10 similar customer based on the attribute value. For example customer-1 and customer-19 have atleast one column value matching .i.e "att-a-7" so the output should give me 2 customer-id's or top similar customer that are customer-1 and customer-19.
P.S - there can be one or more columns similar across rows.
I'm using windowing technique to find top 10 similar customer and im not sure if I'm correct.
following is my approach I used in my query :
row_number() over (partition by att-a, att-b,..,att-j order by customer-id) as customers
is this correct. ?
I'm trying to get the difference between the first start_time and the last stop_time in the table. But I can't seem to get this done in one query. Can somebody help me? This is some sample data:
start_time | stop_time
-------------------+-------------------
1398871312.769668 | 1398871312.769676
1398871312.771368 | 1398871312.771429
1398871312.771471 | 1398871312.771476
1398871312.771494 | 1398871312.771543
1398871312.781109 | 1398871312.781115
1398871312.781150 | 1398871312.781154
1398871312.781233 | 1398871312.781282
1398871312.992759 | 1398871312.992765
1398871312.992795 | 1398871312.992798
1398871312.992832 | 1398871312.992881
1398871313.3387 | 1398871313.3399
1398871313.3435 | 1398871313.3440
1398871313.3703 | 1398871313.3745
1398871313.203462 | 1398871313.203469
1398871313.203497 | 1398871313.203501
1398871313.203560 | 1398871313.203600
1398871313.214120 | 1398871313.214127
1398871313.214153 | 1398871313.214158
1398871313.214177 | 1398871313.214192
1398871313.214208 | 1398871313.214248
1398871313.415027 | 1398871313.415035
1398871313.415136 | 1398871313.415140
1398871313.415218 | 1398871313.415226
1398871313.415252 | 1398871313.415265
1398871313.415290 | 1398871313.415298
1398871313.415332 | 1398871313.415339
1398871313.415350 | 1398871313.415362
1398871314.144867 | 1398871314.144886
1398871314.144896 | 1398871314.144901
1398871314.144906 | 1398871314.144912
1398871314.144918 | 1398871314.144923
1398871314.144927 | 1398871314.144931
1398871314.144935 | 1398871314.144939
1398871314.144965 | 1398871314.144974
1398871314.145055 | 1398871314.145060
1398871314.145138 | 1398871314.145146
1398871314.145152 | 1398871314.145158
1398871314.145166 | 1398871314.145173
1398871314.145211 | 1398871314.145215
1398871314.145235 | 1398871314.145243
1398871314.145247 | 1398871314.145252
1398871314.145262 | 1398871314.145267
1398871314.145307 | 1398871314.145314
1398871314.145547 | 1398871314.145551
1398871314.145563 | 1398871314.145571
1398871314.145576 | 1398871314.145581
1398871314.145586 | 1398871314.145590
1398871314.145600 | 1398871314.145606
1398871314.145611 | 1398871314.145618
1398871314.145623 | 1398871314.145627
1398871314.145634 | 1398871314.145641
1398871314.145999 | 1398871314.146003
1398871314.146014 | 1398871314.146022
1398871314.146026 | 1398871314.146033
1398871314.146043 | 1398871314.146050
1398871314.146140 | 1398871314.146145
1398871314.146160 | 1398871314.146168
1398871314.146178 | 1398871314.146185
So I want the difference between 1398871312 and 1398871314, in one query. Is this possible? Can anyone help me? Thanks!
Something like this?
select max(stop_time) - min(start_time)
from Table1
I am casting a real to an int and a float to an int and comparing the two like this:
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
but i am still getting results where the two are equal. for example:
+-----------+-----------+------------+------------+----------+
| accn | load_dt | pmtdt | sumpaidamt | Bpaidamt |
+-----------+-----------+------------+------------+----------+
| A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 |
| A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 |
| A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 |
| A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 |
| A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 |
| A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 |
| A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 |
| AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 |
+-----------+-----------+------------+------------+----------+
here is the full query:
select
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)] sumpaidamt,
sum(b.paid_amt) Bpaidamt
from
[MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a
join
F_PAYOR_PAYMENTS_DAILY b
on
a.accn=b.ACCESSION_ID
and
a.final_rpt_dt=b.FINAL_REPORT_DATE
and
a.load_dt=b.LOAD_DATE
and
a.pmtdt=b.PAYMENT_DATE
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
group by
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)]
what am i doing wrong? how do i return only records that are NOT equal?
I don't see why there is an issue.
The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment.
Perhaps your intention is to have a HAVING clause instead:
having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)
You can do a round and a cast statement.
cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money)
Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1