SQL: usage time of item between dates combining two tables - sql

Trying to create query that will give me usage time of each car part between dates when that part is used. Etc. let say part id 1 is installed on 2018-03-01 and on 2018-04-01 runs for 50min and then on 2018-05-10 runs 30min total usage of this part shoud be 1:20min as result.
These are examples of my tables.
Table1
| id | part_id | car_id | part_date |
|----|-------- |--------|------------|
| 1 | 1 | 3 | 2018-03-01 |
| 2 | 1 | 1 | 2018-03-28 |
| 3 | 1 | 3 | 2018-05-10 |
Table2
| id | car_id | run_date | puton_time | putoff_time |
|----|--------|------------|---------------------|---------------------|
| 1 | 3 | 2018-04-01 | 2018-04-01 12:00:00 | 2018-04-01 12:50:00 |
| 2 | 2 | 2018-04-10 | 2018-04-10 15:10:00 | 2018-04-10 15:20:00 |
| 3 | 3 | 2018-05-10 | 2018-05-10 10:00:00 | 2018-05-10 10:30:00 |
| 4 | 1 | 2018-05-11 | 2018-05-11 12:00:00 | 2018-04-01 12:50:00 |
Table1 contains dates when each part is installed, table2 contains usage time of each part and they are joined on car_id, I have try to write query but it does not work well if somebody can figure out my mistake in this query that would be healpful.
My SQL query
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(t1.puton_time, t1.putoff_time)))) AS total_time
FROM table2 t1
LEFT JOIN table1 t2 ON t1.car_id=t2.car_id
WHERE t2.id=1 AND t1.run_date BETWEEN t2.datum AND
(SELECT COALESCE(MIN(datum), '2100-01-01') AS NextDate FROM table1 WHERE
id=1 AND t2.part_date > part_date);
Expected result
| part_id | total_time |
|---------|------------|
| 1 | 1:20:00 |
Hope that this problem make sence because in my search I found nothing like this, so I need help.
Solution, thanks to Kota Mori
SELECT t1.id, SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(t2.puton_time, t2.putoff_time)))) AS total_time
FROM table1 t1
LEFT JOIN table2 t2 ON t1.car_id = t2.car_id
AND t1.part_date >= t2.run_date
GROUP BY t1.id

You first need to join the two tables by the car_id and also a condition that part_date should be no greater than run_date.
Then compute the total minutes for each part_id separately.
The following is a query example for SQLite (The only SQL engine that I have access to right now).
Since SQLite does not have datetime type, I convert strings into unix timestamp by strftime function. This part should be changed in accordance with the SQL engine you are using. Apart from that, this is fairly a standard sql and mostly valid for other SQL dialect.
SELECT
t1.id,
sum(
cast(strftime('%s', t2.putoff_time) as integer) -
cast(strftime('%s', t2.puton_time) as integer)
) / 60 AS total_minutes
FROM
table1 t1
LEFT JOIN
table2 t2
ON
t1.car_id = t2.car_id
AND t1.part_date <= t2.run_date
GROUP BY
t1.id
The result is something like the below. Note that ID 1 gets 80 minutes (1:20) as expected.
id total_minutes
0 1 80
1 2 80
2 3 30

Related

Optimizing results for query with WHERE EXISTS clause

I have this table in postgres:
id | id_datetime | longitude | latitude
--------+---------------------+---------------------+--------------------
639438 | 2018-02-20 18:00:00 | -122.3880011217841 | 37.75538988423265
639439 | 2018-02-20 20:30:00 | -122.38756878451498 | 37.760550220844614
639440 | 2018-02-20 20:05:00 | -122.39640513677658 | 37.76130039041195
639441 | 2018-02-24 10:00:00 | -122.45819139221014 | 37.724317534370066
639442 | 2018-02-10 09:00:00 | -122.44693382058489 | 37.77000760474354
I want an output with all the differents ID's which has at least another ID between the last 15 minutes and between 1000 meters (geographic distance).
My table has more than 100K rows. So, I'm currently trying with the following query which works but takes too long. Are there any steps I can take to optimize this?
SELECT DISTINCT
x.id
FROM table x
WHERE EXISTS(
SELECT
1
FROM table t
WHERE t.id <> x.id
AND (t.id_datetime between x.id_datetime - interval '15 minutes' AND x.id_datetime)
AND (ST_Distance((geography(ST_MakePoint(x.longitude, x.latitude))),
geography(ST_MakePoint(t.longitude, t.latitude)) ) <= 1000)
)

SQL interpolating missing values for a specific date range - with some conditions

There are some similar questions on the site, but I believe mine warrants a new post because there are specific conditions that need to be incorporated.
I have a table with monthly intervals, structured like this:
+----+--------+--------------+--------------+
| ID | amount | interval_beg | interval_end |
+----+--------+--------------+--------------+
| 1 | 10 | 12/17/2017 | 1/17/2018 |
| 1 | 10 | 1/18/2018 | 2/18/2018 |
| 1 | 10 | 2/19/2018 | 3/19/2018 |
| 1 | 10 | 3/20/2018 | 4/20/2018 |
| 1 | 10 | 4/21/2018 | 5/21/2018 |
+----+--------+--------------+--------------+
I've found that sometimes there is a month of data missing around the end/beginning of the year where I know it should exist, like this:
+----+--------+--------------+--------------+
| ID | amount | interval_beg | interval_end |
+----+--------+--------------+--------------+
| 2 | 10 | 10/14/2018 | 11/14/2018 |
| 2 | 10 | 11/15/2018 | 12/15/2018 |
| 2 | 10 | 1/17/2019 | 2/17/2019 |
| 2 | 10 | 2/18/2019 | 3/18/2019 |
| 2 | 10 | 3/19/2019 | 4/19/2019 |
+----+--------+--------------+--------------+
What I need is a statement that will:
Identify where this year-end period is missing (but not find missing
months that aren't at the beginning/end of the year).
Create this interval by using the length of an existing interval for
that ID (maybe using the mean interval length for the ID to do it?). I could create the interval from the "gap" between the previous and next interval, except that won't work if I'm missing an interval at the beginning or end of the ID's record (i.e. if the record starts at say 1/16/2015, I need the amount for 12/15/2014-1/15/2015
Interpolate an 'amount' for this interval using the mean daily
'amount' from the closest existing interval.
The end result for the sample above should look like:
+----+--------+--------------+--------------+
| ID | amount | interval_beg | interval_end |
+----+--------+--------------+--------------+
| 2 | 10 | 10/14/2018 | 11/14/2018 |
| 2 | 10 | 11/15/2018 | 12/15/2018 |
| 2 | 10 | 12/16/2018 | 1/16/2018 |
| 2 | 10 | 1/17/2019 | 2/17/2019 |
| 2 | 10 | 2/18/2019 | 3/18/2019 |
+----+--------+--------------+--------------+
A 'nice to have' would be a flag indicating that this value is interpolated.
Is there a way to do this efficiently in SQL? I have written a solution in SAS, but have a need to move it to SQL, and my SAS solution is very inefficient (optimization isn't a goal, so any statement that does what I need is fantastic).
EDIT: I've made an SQLFiddle with my example table here:
http://sqlfiddle.com/#!18/8b16d
You can use a sequence of CTEs to build up the data for the missing periods. In this query, the first CTE (EOYS) generates all the end-of-year dates (YYYY-12-31) relevant to the table; the second (INTERVALS) the average interval length for each ID and the third (MISSING) attempts to find start (from t2) and end (from t3) dates of adjoining intervals for any missing (indicated by t1.ID IS NULL) end-of-year interval. The output of this CTE is then used in an INSERT ... SELECT query to add missing interval records to the table, generating missing dates by adding/subtracting the interval length to the end/start date of the adjacent interval as necessary.
First though we add the interp column to indicate if a row was interpolated:
ALTER TABLE Table1 ADD interp TINYINT NOT NULL DEFAULT 0;
This sets interp to 0 for all existing rows. Then we can do the INSERT, setting interp for all those rows to 1:
WITH EOYS AS (
SELECT DISTINCT DATEFROMPARTS(DATEPART(YEAR, interval_beg), 12, 31) AS eoy
FROM Table1
),
INTERVALS AS (
SELECT ID, AVG(DATEDIFF(DAY, interval_beg, interval_end)) AS interval_len
FROM Table1
GROUP BY ID
),
MISSING AS (
SELECT e.eoy,
ids.ID,
i.interval_len,
COALESCE(t2.amount, t3.amount) AS amount,
DATEADD(DAY, 1, t2.interval_end) AS interval_beg,
DATEADD(DAY, -1, t3.interval_beg) AS interval_end
FROM EOYS e
CROSS JOIN (SELECT DISTINCT ID FROM Table1) ids
JOIN INTERVALS i ON i.ID = ids.ID
LEFT JOIN Table1 t1 ON ids.ID = t1.ID
AND e.eoy BETWEEN t1.interval_beg AND t1.interval_end
LEFT JOIN Table1 t2 ON ids.ID = t2.ID
AND DATEADD(MONTH, -1, e.eoy) BETWEEN t2.interval_beg AND t2.interval_end
LEFT JOIN Table1 t3 ON ids.ID = t3.ID
AND DATEADD(MONTH, 1, e.eoy) BETWEEN t3.interval_beg AND t3.interval_end
WHERE t1.ID IS NULL
)
INSERT INTO Table1 (ID, amount, interval_beg, interval_end, interp)
SELECT ID,
amount,
COALESCE(interval_beg, DATEADD(DAY, -interval_len, interval_end)) AS interval_beg,
COALESCE(interval_end, DATEADD(DAY, interval_len, interval_beg)) AS interval_end,
1 AS interp
FROM MISSING
This adds the following rows to the table:
ID amount interval_beg interval_end interp
2 10 2017-12-05 2018-01-04 1
2 10 2018-12-16 2019-01-16 1
2 10 2019-12-28 2020-01-27 1
Demo on SQLFiddle

SQL: Find the longest date gap from multiple table

i need some help.
I have two tables like this.
Table Person
p_id | name | registration date
-----------------------------
1 | ABC | 2018-01-01
2 | DEF | 2018-02-02
3 | GHI | 2018-03-01
4 | JKL | 2018-01-02
5 | MNO | 2018-02-01
6 | PQR | 2018-03-02
Table Order
Order_id| p_id | order_date
----------------------------
123 | 1 | 2018-01-05
345 | 2 | 2018-02-06
678 | 3 | 2018-03-07
910 | 4 | 2018-01-08
012 | 3 | 2018-03-04
234 | 4 | 2018-01-05
567 | 5 | 2018-02-08
890 | 6 | 2018-03-09
I need to find out how many days is the longest period when this two table aren't updated.
What's the easiest query to get the result in SQL?
Thank you
UPDATE:
The result should be showing the longest date gap between order_date and registration_date. Because the longest date gap is 2018-01-08 and 2018-02-01, so the result should return '24'
Try this:
SELECT MAX(DATE_PART('day', now() - '2018-02-15'::TIMESTAMP)) FROM person p
JOIN order o
USING (p_id)
Assuming current PostgreSQL and lots of orders per person on avg., this should be among the fastest options:
SELECT o.order_date - p.registration_date AS days
FROM person p
CROSS JOIN LATERAL (
SELECT order_date
FROM "order" -- order is a reserved word!
WHRE p_id = p.p_id
ORDER BY 1 DESC -- assuming NOT NULL
LIMIT 1
) o
ORDER BY 1 DESC
LIMIT 1;
Needs an index on "orders"(p_id, order_date).
Detailed explanation:
Optimize GROUP BY query to retrieve latest record per user
Select first row in each GROUP BY group?
You seem to want:
select max(o.order_date - p.registration_date)
from person p join
orders o
on p.p_id = o.p_id;
select max((date_part('day',age(order_date, "registration date")))) + 1 as dif
from (
select "p_id" ,max(order_date) order_date
from "Order"
group by "p_id"
) T1
left join Person T2 on T1.p_id = T2.p_id
| maxday |
|--------|
| 8 |
[SQL Fiddle DEMO LINK]

What is the most efficient way to group by day for overlapping date ranges (SQL Server 2008)?

Consider a simplified table T1 as follows:
CREATE TABLE dbo.T1 (
id INTEGER NOT NULL
,measure NUMERIC(15,2) NOT NULL
,begin_dt DATE NOT NULL
,end_dt DATE NOT NULL
);
Assume that constraints / business logic ensure that while each id can have multiple records, there are no overlapping date ranges for a single id and no date range gaps for a single id. e.g.,
id | measure | begin_dt | end_dt
-----------------------------------------
1 | 100.00 | 2012-05-07 | 2012-05-30
1 | 200.00 | 2012-05-31 | 2013-10-11
1 | 50.00 | 2013-10-12 | 2013-10-13
1 | 0.00 | 2013-10-14 | 9999-12-31
2 | 1234.56 | 2002-02-25 | 9999-12-31
3 | 9.87 | 2014-01-31 | 2014-02-15
3 | 50.00 | 2014-02-16 | 2015-01-04
3 | 0.00 | 2015-01-05 | 9999-12-31
...
Now, my goal is to produce a resultset that shows one record for every unique begin_dt in T1, along with the count of id's with positive measure value and the sum of the measure field across all id's for which that date falls between the begin_dt and end_dt. So, something like the following:
dt | count_of_ids | sum_of_measure
-------------------------------------------
2002-02-25 | 1 | 1234.56
2012-05-07 | 2 | 1334.56
2012-05-31 | 2 | 1434.56
2013-10-12 | 2 | 1284.56
2013-10-14 | 1 | 1234.56
2014-01-31 | 2 | 1244.43
2014-02-16 | 2 | 1284.56
2015-01-05 | 1 | 1234.56
...
My current solution is essentially the following:
SELECT *
FROM (
SELECT DISTINCT t1.begin_dt AS dt
FROM dbo.T1 AS t1
) AS dt_s
CROSS APPLY (
SELECT COUNT(t1.id) AS count_of_ids
,SUM(t1.measure) AS sum_of_measure
FROM dbo.T1 AS t1
WHERE t1.measure > 0
AND dt_s.dt BETWEEN t1.begin_dt AND t1.end_dt
) AS t1_x
ORDER BY dt_s.dt DESC;
This takes roughly 3.5 minutes to execute (on the actual dataset with ~10MM records, ~2,500 unique dates and many more fields, measures, & aggregations to deal with) - I'm hoping there's a way to get that < 10 seconds or so.
I've attempted other approaches (using UDFs / CTEs / etc.), but they all seem to follow the same execution plan. I don't have much experience with the optimization side of things yet, so I'm eager to hear from others what the best general approach to this would be. Thanks in advance!
Try using below code:
SELECT t1.begin_dt AS dt,COUNT(t2.id) AS count_of_ids,SUM(t1.measure) AS sum_of_measure
FROM dbo.T1 AS t1
JOIN dbo.T1 AS t2 ON t1.begin_dt BETWEEN t2.begin_dt AND t2.end_dt
GROUP BY t1.begin_dt;
Performance can definitely be improved by making use of index on begin_dt, end_dt with convering fields ID and measure.
Hope this helps!

SQL Combine two tables with two parameters

I searched forum for 1h and didn't find nothing similar.
I have this problem: I want to compare two colums ID and DATE if they are the same in both tables i want to put number from table 2 next to it. But if it is not the same i want to fill yearly quota on the date. I am working in Access.
table1
id|date|state_on_date
1|30.12.2013|23
1|31.12.2013|25
1|1.1.2014|35
1|2.1.2014|12
2|30.12.2013|34
2|31.12.2013|65
2|1.1.2014|43
table2
id|date|year_quantity
1|31.12.2013|100
1|31.12.2014|150
2|31.12.2013|200
2|31.12.2014|300
I want to get:
table 3
id|date|state_on_date|year_quantity
1|30.12.2013|23|100
1|31.12.2013|25|100
1|1.1.2014|35|150
1|2.1.2014|12|150
2|30.12.2013|34|200
2|31.12.2013|65|200
2|1.1.2014|43|300
I tried joins and reading forums but didn't find solution.
Are you looking for this?
SELECT id, date, state_on_date,
(
SELECT TOP 1 year_quantity
FROM table2
WHERE id = t.id
AND date >= t.date
ORDER BY date
) AS year_quantity
FROM table1 t
Output:
| ID | DATE | STATE_ON_DATE | YEAR_QUANTITY |
|----|------------|---------------|---------------|
| 1 | 2013-12-30 | 23 | 100 |
| 1 | 2013-12-31 | 25 | 100 |
| 1 | 2014-01-01 | 35 | 150 |
| 1 | 2014-01-02 | 12 | 150 |
| 2 | 2013-12-30 | 34 | 200 |
| 2 | 2013-12-31 | 65 | 200 |
| 2 | 2014-01-01 | 43 | 300 |
Here is SQLFiddle demo It's for SQL Server but should work just fine in MS Accesss.