Optimizing results for query with WHERE EXISTS clause - sql

I have this table in postgres:
id | id_datetime | longitude | latitude
--------+---------------------+---------------------+--------------------
639438 | 2018-02-20 18:00:00 | -122.3880011217841 | 37.75538988423265
639439 | 2018-02-20 20:30:00 | -122.38756878451498 | 37.760550220844614
639440 | 2018-02-20 20:05:00 | -122.39640513677658 | 37.76130039041195
639441 | 2018-02-24 10:00:00 | -122.45819139221014 | 37.724317534370066
639442 | 2018-02-10 09:00:00 | -122.44693382058489 | 37.77000760474354
I want an output with all the differents ID's which has at least another ID between the last 15 minutes and between 1000 meters (geographic distance).
My table has more than 100K rows. So, I'm currently trying with the following query which works but takes too long. Are there any steps I can take to optimize this?
SELECT DISTINCT
x.id
FROM table x
WHERE EXISTS(
SELECT
1
FROM table t
WHERE t.id <> x.id
AND (t.id_datetime between x.id_datetime - interval '15 minutes' AND x.id_datetime)
AND (ST_Distance((geography(ST_MakePoint(x.longitude, x.latitude))),
geography(ST_MakePoint(t.longitude, t.latitude)) ) <= 1000)
)

Related

Insert a row for each month in the range [duplicate]

This question already has answers here:
Generate series of months for every row in Oracle
(1 answer)
Create all months list from a date column in ORACLE SQL
(3 answers)
Closed 1 year ago.
I want to make my table here in Oracle
+----+------------+------------+
| N | Start | End |
+----+------------+------------+
| 1 | 2018-01-01 | 2018-05-31 |
| 1 | 2018-01-01 | 2018-06-31 |
+----+------------+------------+
Into, as silly as it looks I need to insert one row for each month in the range for each in the first table
+----+------------+
| N | month| |
+----+------------+
| 1 | 2018-01-01 |
| 1 | 2018-01-01 |
| 1 | 2018-02-01 |
| 1 | 2018-02-01 |
| 1 | 2018-03-01 |
| 1 | 2018-03-01 |
| 1 | 2018-04-01 |
| 1 | 2018-04-01 |
| 1 | 2018-05-01 |
| 1 | 2018-05-01 |
| 1 | 2018-06-01 |
+----+------------+
I been trying to follow SQL: Generate Record Per Month In Date Range but I haven't had any luck figuring out the result I want.
Thanks for helping
My best guess is that you want to show all begining of months that are in the interval start to end in your table.
create table t1 as
select date'2018-01-01' start_d, date'2018-05-31' end_d from dual union all
select date'2018-01-01' start_d, date'2018-06-30' end_d from dual;
with cal as
(select add_months(date'2018-01-01', rownum-1) month_d
from dual connect by level <= 12)
select cal.month_d from cal
join t1 on cal.month_d between t1.start_d and t1.end_d
order by 1;
MONTH_D
-------------------
01.01.2018 00:00:00
01.01.2018 00:00:00
01.02.2018 00:00:00
01.02.2018 00:00:00
01.03.2018 00:00:00
01.03.2018 00:00:00
01.04.2018 00:00:00
01.04.2018 00:00:00
01.05.2018 00:00:00
01.05.2018 00:00:00
01.06.2018 00:00:00
So probaly there is a cut & paste error in your expectation for January.
Some other points
do not use reserved word as start for column names
Use DATE format to store dates to aviod invalid entries such as 2018-06-31
You can use a recursive CTE. For example:
with
n (s, e, cur) as (
select s, e, s from t
union all
select s, e, add_months(cur, 1)
from n
where add_months(cur, 1) < e
)
select cur from n;
Result:
CUR
---------
01-JAN-18
01-JAN-18
01-FEB-18
01-FEB-18
01-MAR-18
01-MAR-18
01-APR-18
01-APR-18
01-MAY-18
01-MAY-18
01-JUN-18
See running example at db<>fiddle.

SQL: usage time of item between dates combining two tables

Trying to create query that will give me usage time of each car part between dates when that part is used. Etc. let say part id 1 is installed on 2018-03-01 and on 2018-04-01 runs for 50min and then on 2018-05-10 runs 30min total usage of this part shoud be 1:20min as result.
These are examples of my tables.
Table1
| id | part_id | car_id | part_date |
|----|-------- |--------|------------|
| 1 | 1 | 3 | 2018-03-01 |
| 2 | 1 | 1 | 2018-03-28 |
| 3 | 1 | 3 | 2018-05-10 |
Table2
| id | car_id | run_date | puton_time | putoff_time |
|----|--------|------------|---------------------|---------------------|
| 1 | 3 | 2018-04-01 | 2018-04-01 12:00:00 | 2018-04-01 12:50:00 |
| 2 | 2 | 2018-04-10 | 2018-04-10 15:10:00 | 2018-04-10 15:20:00 |
| 3 | 3 | 2018-05-10 | 2018-05-10 10:00:00 | 2018-05-10 10:30:00 |
| 4 | 1 | 2018-05-11 | 2018-05-11 12:00:00 | 2018-04-01 12:50:00 |
Table1 contains dates when each part is installed, table2 contains usage time of each part and they are joined on car_id, I have try to write query but it does not work well if somebody can figure out my mistake in this query that would be healpful.
My SQL query
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(t1.puton_time, t1.putoff_time)))) AS total_time
FROM table2 t1
LEFT JOIN table1 t2 ON t1.car_id=t2.car_id
WHERE t2.id=1 AND t1.run_date BETWEEN t2.datum AND
(SELECT COALESCE(MIN(datum), '2100-01-01') AS NextDate FROM table1 WHERE
id=1 AND t2.part_date > part_date);
Expected result
| part_id | total_time |
|---------|------------|
| 1 | 1:20:00 |
Hope that this problem make sence because in my search I found nothing like this, so I need help.
Solution, thanks to Kota Mori
SELECT t1.id, SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(t2.puton_time, t2.putoff_time)))) AS total_time
FROM table1 t1
LEFT JOIN table2 t2 ON t1.car_id = t2.car_id
AND t1.part_date >= t2.run_date
GROUP BY t1.id
You first need to join the two tables by the car_id and also a condition that part_date should be no greater than run_date.
Then compute the total minutes for each part_id separately.
The following is a query example for SQLite (The only SQL engine that I have access to right now).
Since SQLite does not have datetime type, I convert strings into unix timestamp by strftime function. This part should be changed in accordance with the SQL engine you are using. Apart from that, this is fairly a standard sql and mostly valid for other SQL dialect.
SELECT
t1.id,
sum(
cast(strftime('%s', t2.putoff_time) as integer) -
cast(strftime('%s', t2.puton_time) as integer)
) / 60 AS total_minutes
FROM
table1 t1
LEFT JOIN
table2 t2
ON
t1.car_id = t2.car_id
AND t1.part_date <= t2.run_date
GROUP BY
t1.id
The result is something like the below. Note that ID 1 gets 80 minutes (1:20) as expected.
id total_minutes
0 1 80
1 2 80
2 3 30

Filling NULL values with preceding Non-NULL values using FIRST_VALUE

I am joining two tables.
In the first table, I have some items starting at a specific time. In the second table, I have values and timestamps for each minute in between the start and end time of each item.
First table
UniqueID Items start_time
123 one 10:00 AM
456 two 11:00 AM
789 three 11:30 AM
Second table
UniqueID Items time_hit value
123 one 10:00 AM x
123 one 10:05 AM x
123 one 10:10 AM x
123 one 10:30 AM x
456 two 11:00 AM x
456 two 11:15 AM x
789 three 11:30 AM x
So When joining the two tables I have this:
UniqueID Items start_time time_hit value
123 one 10:00 AM 10:00 AM x
123 null null 10:05 AM x
123 null null 10:10 AM x
123 null null 10:30 AM x
456 two 11:00 AM 11:00 AM x
456 null null 11:15 AM x
789 three 11:30 AM 11:30 AM x
I'd like to replace these null values with the values from the non-null precedent row...
So the expected result is
UniqueID Items start_time time_hit value
123 one 10:00 AM 10:00 AM x
123 one 10:00 AM 10:05 AM x
123 one 10:00 AM 10:10 AM x
123 one 10:00 AM 10:30 AM x
456 two 11:00 AM 11:00 AM x
456 two 11:00 AM 11:15 AM x
789 three 11:30 AM 11:30 AM x
I tried to build my join using the following function without success:
FIRST_VALUE(Items IGNORE NULLS) OVER (
PARTITION BY time_hit ORDER BY time_hit
ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING) AS test
My question was a bit off. I found out that UniqueID were inconsistent that is why I had these null values in my output. So the validated answer is a good option to fill null-values when joining two tables and one of your tables has more unique rows than the other.
You could use first_value (but last_value would also work too in this scenario). The import part is to specify rows between unbounded preceding and current row to set the boundaries of the window.
Answer updated to reflect updated question, and preference for first_value
select
first_value(t1.UniqueId ignore nulls) over (partition by t2.UniqueId
order by t2.time_hit
rows between unbounded preceding and current row) as UniqueId,
first_value(t1.items ignore nulls) over (partition by t2.UniqueId
order by t2.time_hit
rows between unbounded preceding and current row) as Items,
first_value(t1.start_time ignore nulls) over (partition by t2.UniqueId
order by t2.time_hit
rows between unbounded preceding and current row) as start_time,
t2.time_hit,
t2.item_value
from table2 t2
left join table1 t1 on t1.start_time = t2.time_hit
order by t2.time_hit;
Result
| UNIQUEID | ITEMS | START_TIME | TIME_HIT | ITEM_VALUE |
|----------|-------|------------|----------|------------|
| 123 | one | 10:00:00 | 10:00:00 | x |
| 123 | one | 10:00:00 | 10:05:00 | x |
| 123 | one | 10:00:00 | 10:10:00 | x |
| 123 | one | 10:00:00 | 10:30:00 | x |
| 456 | two | 11:00:00 | 11:00:00 | x |
| 456 | two | 11:00:00 | 11:15:00 | x |
| 789 | three | 11:30:00 | 11:30:00 | x |
SQL Fiddle Example
Note: I had to use Oracle in SQL Fiddle (so I had to change the data types and a column name). But it should work for your database.
One alternative solution would be to use a NOT EXISTS clause as JOIN condition, with a correlated subquery that ensures that we are relating to the relevant record.
SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2
ON t1.items = t2.items
AND t1.start_time <= t2.time_hit
AND NOT EXISTS (
SELECT 1 FROM table1 t10
WHERE
t10.items = t2.items
AND t10.start_time <= t2.time_hit
AND t10.start_time > t1.start_time
)
Demo on DB Fiddle:
| items | start_time | time_hit | value |
| ----- | ---------- | -------- | ----- |
| one | 10:00:00 | 10:00:00 | x |
| one | 10:00:00 | 10:05:00 | x |
| one | 10:00:00 | 10:10:00 | x |
| one | 10:00:00 | 10:30:00 | x |
| two | 11:00:00 | 11:00:00 | x |
| two | 11:00:00 | 11:15:00 | x |
| three | 11:30:00 | 11:30:00 | x |
Alternative solution to avoid using EXISTS on a JOIN condition (not allowed in Big Query): just move that condition to the WHERE clause.
SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2
ON t1.items = t2.items
AND t1.start_time <= t2.time_hit
WHERE NOT EXISTS (
SELECT 1 FROM table1 t10
WHERE
t10.items = t2.items
AND t10.start_time <= t2.time_hit
AND t10.start_time > t1.start_time
)
DB Fiddle
I guess you are expecting an output by using INNER JOIN. But not sure why you used FIRST_VALUE.
SELECT I.Item, I.Start_Time, ID.Time_hit, ID.Value
FROM Items I
INNER JOIN ItemDetails ID
ON I.Items = ID.Items
Please explain if you are looking for any specific reasons to look over this approach.

How to group date by week in PostgreSQL?

I have pretty simple table which has 2 column. First one show time (timestamp), the second one show speed of car at that time (float8).
| DATE_TIME | SPEED |
|---------------------|-------|
| 2018-11-09 00:00:00 | 256 |
| 2018-11-09 01:00:00 | 659 |
| 2018-11-09 02:00:00 | 256 |
| other dates | xxx |
| 2018-11-21 21:00:00 | 651 |
| 2018-11-21 22:00:00 | 515 |
| 2018-11-21 23:00:00 | 849 |
Lets say we have period from 9 november to 21 november. How to group that period by week. In fact I want such result:
| DATE_TIME | AVG_SPEED |
|---------------------|-----------|
| 9-11 November | XXX |
| 12-18 November | YYY |
| 19-21 November | ZZZ |
I use PostgreSQL 10.4.
I use such SQL Statement to know the number of the week of the certain date:
SELECT EXTRACT(WEEK FROM TIMESTAMP '2018-11-09 00:00:00');
EDIT:
#tim-biegeleisen when I set period from '2018-11-01' to '2018-11-13' your sql statement return 2 result:
In fact I need such result:
2018-11-01 00:00:00 | 2018-11-04 23:00:00
2018-11-05 00:00:00 | 2018-11-11 23:00:00
2018-11-12 00:00:00 | 2018-11-13 05:00:00
As you can see in the calendar there are 3 week in that period.
We can do this using a calendar table. This answer assumes that a week begins with the first date in your data set. You could also do this assuming something else, e.g. a standard week according to something else.
WITH dates AS (
SELECT date_trunc('day', dd)::date AS dt
FROM generate_series
( '2018-11-09'::timestamp
, '2018-11-21'::timestamp
, '1 day'::interval) dd
),
cte AS (
SELECT t1.dt, t2.DATE_TIME, t2.SPEED,
EXTRACT(week from t1.dt) week
FROM dates t1
LEFT JOIN yourTable t2
ON t1.dt = t2.DATE_TIME::date
)
SELECT
MIN(dt)::text || '-' || MAX(dt) AS DATE_TIME,
AVG(SPEED) AS AVG_SPEED
FROM cte
GROUP BY
week
ORDER BY
MIN(dt);
Demo

BigQuery - how many entries per partition?

I have big partitioned tables and try to figure out how many entries are in each day-partition.
So far I used a for loop in a script but there must be a simpler way doing it.
Google did not help me. Does anyone know the right query?
Thanks
you can run the following query to count how many entries you have in each partition
#standardSQL
SELECT
_PARTITIONTIME AS pt,
COUNT(1)
FROM
`dataset.table`
GROUP BY
1
ORDER BY
1 DESC
and
#legacySQL
SELECT
_PARTITIONTIME AS pt,
COUNT(1)
FROM
[dataset:table]
GROUP BY
1
ORDER BY
1 DESC
it returns a table like this, please note that the NULL entries are still in streaming buffer. Hint: to obtain records which are in streaming buffer us a query with NULL.
+-------------------------+-----+--+
| 2017-02-14 00:00:00 UTC | 252 | |
+-------------------------+-----+--+
| 2017-02-13 00:00:00 UTC | 257 | |
+-------------------------+-----+--+
| 2017-02-12 00:00:00 UTC | 188 | |
+-------------------------+-----+--+
| 2017-02-11 00:00:00 UTC | 234 | |
+-------------------------+-----+--+
| 2017-02-10 00:00:00 UTC | 107 | |
+-------------------------+-----+--+
| null | 13 | |
+-------------------------+-----+--+