Minimum difference between dates in the same column in Redshift - sql

I have data like this:
person_id date1
1 2016-08-03
1 2016-08-04
1 2016-08-07
What i want a as a result is the minimum difference between all dates for person_id, in this case the minimum difference is 1 day(between 8/3 and 8/4).
Is there a way to query for this grouped by person_id in redshift?
Thanks!

I assume you want this for each person. If so, use lag() or lead() and aggregation:
select person_id, min(next_date1 - date1)
from (select t.*,
lead(date1) over (partition by person_id order by date1) as next_date1
from t
) t
group by person_id;

SELF JOIN should work you. Try this way
SELECT a.date1 - b.date1
FROM table1 a
JOIN table1 b
ON a.person_id = b.person_id
AND a.date1 <> b.date1
Where a.date1 - b.date1 > 0
ORDER BY a.date1 - b.date1 ASC
LIMIT 1

This one uses a self join to compare each date:
SELECT t1.person_id, MIN(datediff(t1.date1, t2.date1)) AS difference
FROM t t1
INNER JOIN t t2
ON t1.person_id = t2.person_id
AND t1.date1 > t2.date1
GROUP by t1.person_id
Tested here: http://sqlfiddle.com/#!9/1638f/1

Related

SQL Optimization: multiplication of two calculated field generated by window functions

Given two time-series tables tbl1(time, b_value) and tbl2(time, u_value).
https://www.db-fiddle.com/f/4qkFJZLkZ3BK2tgN4ycCsj/1
Suppose we want to find the last value of u_value in each day, the daily cumulative sum of b_value on that day, as well as their multiplication, i.e. daily_u_value * b_value_cum_sum.
The following query calculates the desired output:
WITH cte AS (
SELECT
t1.time,
t1.b_value,
t2.u_value * t1.b_value AS bu_value,
last_value(t2.u_value)
OVER
(PARTITION BY DATE_TRUNC('DAY', t1.time) ORDER BY DATE_TRUNC('DAY', t2.time) ) AS daily_u_value
FROM stackoverflow.tbl1 t1
LEFT JOIN stackoverflow.tbl2 t2
ON
t1.time = t2.time
)
SELECT
DATE_TRUNC('DAY', c.time) AS time,
AVG(c.daily_u_value) AS daily_u_value,
SUM( SUM(c.b_value)) OVER (ORDER BY DATE_TRUNC('DAY', c.time) ) as b_value_cum_sum,
AVG(c.daily_u_value) * SUM( SUM(c.b_value) ) OVER (ORDER BY DATE_TRUNC('DAY', c.time) ) as daily_u_value_mul_b_value
FROM cte c
GROUP BY 1
ORDER BY 1 DESC
I was wondering what I can do to optimize this query? Is there any alternative solution that generates the same result?
db filddle demo
from your query: Execution Time: 250.666 ms to my query Execution Time: 205.103 ms
seems there is some progress there. Mainly reduce the time of cast, since I saw your have many times cast from timestamptz to timestamp. I wonder why not just another date column.
I first execute my query then yours, which mean the compare condition is quite fair, since second time execute generally more faster than first time.
alter table tbl1 add column t1_date date;
alter table tbl2 add column t2_date date;
update tbl1 set t1_date = time::date;
update tbl2 set t2_date = time::date;
WITH cte AS (
SELECT
t1.t1_date,
t1.b_value,
t2.u_value * t1.b_value AS bu_value,
last_value(t2.u_value)
OVER
(PARTITION BY t1_date ORDER BY t2_date ) AS daily_u_value
FROM stackoverflow.tbl1 t1
LEFT JOIN stackoverflow.tbl2 t2
ON
t1.time = t2.time
)
SELECT
t1_date,
AVG(c.daily_u_value) AS daily_u_value,
SUM( SUM(c.b_value)) OVER (ORDER BY t1_date ) as b_value_cum_sum,
AVG(c.daily_u_value) * SUM( SUM(c.b_value) ) OVER
(ORDER BY t1_date ) as daily_u_value_mul_b_value
FROM cte c
GROUP BY 1
ORDER BY 1 DESC

Calculating difference in rows for many columns in SQL (Access)

What's up guys. I have an other question regarding using SQL to analyze. I have a table build like this.
ID Date Value
1 31.01.2019 10
1 30.01.2019 5
2 31.01.2019 20
2 30.01.2019 10
3 31.01.2019 30
3 30.01.2019 20
With many different IDs and many different Dates. What I would like to have as an output is an additional column, that gives me the difference to the previous date for each ID. So that I can then analyze the change of values between days for each Category (ID). To do that I would need to avoid that the command computes the difference of Last Day WHERE ID = 1 - First Day WHERE ID = 2.
Desired Output:
ID Date Difference to previous Days
1 31.01.2019 5
2 31.01.2019 10
3 31.01.2019 10
In the end I want to find outlier, so days where the difference in value between two days is very large. Does anyone have a solution? If it is not possible with Access, I am open to solutions with Excel, but Access should be the first choice as it is more scaleable.
Greetings and thanks in advance!!
With a self join:
select t1.ID, t1.[Date],
t1.[Value] - t2.[Value] as [Difference to previous Day]
from tablename t1 inner join tablename t2
on t2.[ID] = t1.[ID] and t2.[Date] = t1.[Date] - 1
Results:
ID Date Difference to previous Day
1 31/1/2019 5
2 31/1/2019 10
3 31/1/2019 10
Edit.
For the case that there are gaps between your dates:
select
t1.ID, t1.[Date], t1.[Value] - t2.[Value] as [Difference to previous Day]
from (
select t.ID, t.[Date], t.[Value],
(select max(tt.[Date]) from tablename as tt where ID = t.ID and tt.[Date] < t.[Date]) as prevdate
from tablename as t
) as t1 inner join tablename as t2
on t2.ID = t1.ID and t2.[Date] = t1.prevdate
In your example data, each id has the same two rows and the values are increasing. If this is generally true, then you can simply use aggregation:
select id, max(date), max(value) - min(value)
from t
group by id;
If the values might not be increasing, but the dates are the same, then you can use conditional aggregation:
select id,
max(date),
(max(iif(date = "31.01.2019", value, null)) -
max(iif(date = "30.01.2019", value, null))
) as diff
from t
group by id;
Note: Your date looks like it is using a bespoke format, so I am just doing the comparison as a string.
If previous date is exactly one day before, you can use a join:
select t.*,
(t.value - tprev.value) as diff
from t left join
t as tprev
on t.id = tprev.di and t.date = dateadd("d", 1, tprev.date);
If date is arbitrarily the previous date in the table, then you can use a correlated subquery
select t.*,
(t.value -
(select top (1) tprev.value
from t as tprev
where tprev.id = t.id and tprev.date < t.date
order by tprev.date desc
)
) as diff
(t.value - tprev.value) as diff
from t;
You can use a self join with an additional condition using a sub-query to determine the previous date
SELECT t.ID, t.Date, t.Value - prev.Value AS Diff
FROM
dtvalues AS t
INNER JOIN dtvalues AS prev
ON t.ID = prev.ID
WHERE
prev.[Date] = (SELECT MAX(x.[Date]) FROM dtvalues x WHERE x.ID=t.ID AND x.[Date]<t.[Date])
ORDER BY t.ID, t.[Date];
You could also include the where condition into the join condition, but the query designer would not be able to handle the query anymore. Like this, you can still edit the query in the query designer.

Count overlapping intervals by ID BigQuery

I want to count how many overlapping interval I have, according to the ID
WITH table AS (
SELECT 1001 as id, 1 AS start_time, 10 AS end_time UNION ALL
SELECT 1001, 2, 5 UNION ALL
SELECT 1002, 3, 4 UNION ALL
SELECT 1003, 5, 8 UNION ALL
SELECT 1003, 6, 8 UNION ALL
SELECT 1001, 6, 20
)
In this case the desired result should be:
2 overlapping for ID=1001
1 overlapping for ID=1003
0 overlapping for ID=1002
TOT OVERLAPPING = 3
Whenever there is a overlapping (even partial) I need to count it as such.
How can I achieve this in BigQuery?
Below is for BigQuery Standard SQL and is simple and quite straightforward self-joining and checking and counting overlaps
#standardSQL
SELECT a.id,
COUNTIF(
a.start_time BETWEEN b.start_time AND b.end_time
OR a.end_time BETWEEN b.start_time AND b.end_time
OR b.start_time BETWEEN a.start_time AND a.end_time
OR b.end_time BETWEEN a.start_time AND a.end_time
) overlaps
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.id = b.id AND TO_JSON_STRING(a) < TO_JSON_STRING(b)
GROUP BY id
If to apply to sample data in your question - it results with
Row id overlaps
1 1001 2
2 1002 0
3 1003 1
Another option (to avoid self-joining in favor of using analytics functions)
#standardSQL
SELECT id,
SUM((SELECT COUNT(1) FROM y.arr x
WHERE y.start_time BETWEEN x.start_time AND x.end_time
OR y.end_time BETWEEN x.start_time AND x.end_time
OR x.start_time BETWEEN y.start_time AND y.end_time
OR x.end_time BETWEEN y.start_time AND y.end_time
)) overlaps
FROM (
SELECT id, start_time, end_time,
ARRAY_AGG(STRUCT(start_time, end_time))
OVER(PARTITION BY id ORDER BY TO_JSON_STRING(t)
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) arr
FROM `project.dataset.table` t
) y
GROUP BY id
Obviously with same result / output as previous version
The logic for all overlaps compares the start and end times:
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM `project.dataset.table` t1 LEFT JOIN
`project.dataset.table` t2
ON t1.id = t2.id
GROUP BY t1.id;
That is not exactly what you want, because this compares every interval to every other interval, including itself. Removing the "same" one basically requires a unique identifier. We can get this using row_number().
Further, you don't seem to want to count overlaps twice. So:
with t as (
select t.*, row_number() over (partition by id order by start_time) as seqnum
from `project.dataset.table` t
)
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM t t1 LEFT JOIN
t t2
ON t1.id = t2.id AND t1.seqnum < t2.seqnum
GROUP BY t1.id;

Teradata Correlated subquery

I'm facing an issue since 2 days regarding this query :
select distinct a.id,
a.amount as amount1,
(select max (a.date) from t1 a where a.id=t.id and a.cesitc='0' and a.date<t.date) as date1,
t.id, t.amount as amount2, t.date as date2
from t1 a
inner join t1 t on t.id = a.id and a.cevexp in ('0', '1' )
and exists (select t.id from t1 t
where t.id= a.id and t.amount <> a.amount and t.date > a.date)
and t.cesitc='1' and t.dafms='2015-07-31' and t.date >='2015-04-30' and '2015-07-31' >= t.daefga
and '2015-07-31' <= t.daecga and t.cevexp='1' and t.amount >'1'
Some details, the goal is to compare the difference in valuation of assets (id), column n2 (a.amount/ amount1) is the one which needs to be corrected.
I would like my a.mount/amount1 being correlated with my subquery 'date1' which is actually not the case. Same criterias have to be applied to find the correct amount1.
The outcomes of this query are currently displaying like this :
Id Amount1 Date1 id amount2 date2
1 100 04/03/2014 1 150 30/06/2015
1 102 04/03/2014 1 150 30/06/2015
1 170 04/03/2014 1 150 30/06/2015
the Amount1 matches with all Date1 < date2 instead of max(date1) < date2 that's why I have several amount1
Thanks in advance for helping hand :)
have a good day !
You can access the previous row's data using a Windowed Aggregate Function, there's no LEAD/LAG in Teradata, but it's easy to rewrite.
This will return the correct data for your example:
SELECT t.*,
MIN(amount) -- previous amount
OVER (PARTITION BY Id
ORDER BY date_valuation, dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_amount,
MIN(date_valuation) -- previous date
OVER (PARTITION BY Id
ORDER BY date_valuation, dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_date
FROM test5 AS t
QUALIFY cesitc = '1' -- return only the current row
If it doesn't work as expected you need to add more details of the applied logic.
Btw, if a column is a DECIMAL you shouldn't add quotes, 150 instead of '150'. And there's only one recommended way to write a date, using a date literal, e.g. DATE '2015-07-31'
The final query :
SELECT a.id, a.mtvbie, a.date_valuation, t.id,
MIN(t.amount) -- previous amount
OVER (PARTITION BY t.Id
ORDER BY t.date_valuation, t.dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_amount,
MIN(t.date_valuation) -- previous date
OVER (PARTITION BY t.Id
ORDER BY t.date_valuation, t.dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_date
FROM test5 t
inner join test5 a on a.id=t.id
where t.amount <> a.amount and a.cesitc='1' and a.date_valuation > t.date_valuation and a.dafms ='2015-07-31' and another criteria....
QUALIFY row_number () over (partition by a.id order a.cogarc)=1

sql to select first n unique lines on sorted result

I have query resulting me 1 column of strings, result example:
NAME:
-----
SOF
OTP
OTP
OTP
SOF
VIL
OTP
SOF
GGG
I want to be able to get SOF, OTP, VIL - the first 3 unique top,
I tried using DISTINCT and GROUP BY, but it is not working, the sorting is damaged..
The query building this result is :
SELECT DISTINCT d.adst
FROM (SELECT a.date adate,
b.date bdate,
a.price + b.price total,
( b.date - a.date ) days,
a.dst adst
FROM flights a
JOIN flights b
ON a.dst = b.dst
ORDER BY total) d
I have "flights" table with details, and I need to get the 3 (=n) cheapest destinations.
Thanks
This can easily be done using window functions:
select *
from (
SELECT a.date as adate,
b.date as bdate,
a.price + b.price as total,
dense_rank() over (order by a.price + b.price) as rnk,
b.date - a.date as days,
a.dst as adst
FROM flights a
JOIN flights b ON a.dst = b.dst
) t
where rnk <= 3
order by rnk;
More details on window functions can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html
Find a way to do it.
I am selecting the DST and the PRICE, grouping by DST with MIN function on Price and limiting 3.
do I have better way to do it?
SELECT d.adst , min(d.total) mttl
FROM (SELECT a.date adate,
b.date bdate,
a.price + b.price total,
( b.date - a.date ) days,
a.dst adst
FROM flights a
JOIN flights b
ON a.dst = b.dst
ORDER BY total) d
group by adst order by mttl;
select
name
from
testname
where
name in (
select distinct(name) from testname)
group by name order by min(ctid) limit 3
SQLFIDDLE DEMO
You can tweak your query to return the correct result, by adding where days > 0 and limit 3 in the outer query like this:
select *
from
(
select
a.date adate,
b.date bdate,
(a.price + b.price) total,
(b.date - a.date) days ,
a.dst adst
from flights a
join flights b on a.dst = b.dst
order by total
) d
where days > 0
limit 3;
SQL Fiddle Demo
This assuming that the second entry is the return flight with date greater than the first entry. So that you got positive days difference.
Note that, your query without days > 0 will give you a cross join between the table and it self, for each flight you will get 4 rows, two with it self with days = 0 and other row with negative days so I used days > 0 to get the correct row.
I recommend that you add a new column, an Id Flight_Id as a primary key, and another foreign key something like From_Flight_Id. So the primary flight would have a null From_Flight_Id, and the returning flight will have a From_Flight_Id equal to the flight_id of the primary filght, this way you can join them properly instead.
SELECT DISTINCT(`EnteredOn`) FROM `rm_pr_patients` Group By `EnteredOn`
SELECT DISTINCT ON (column_name) FROM table_name order by name LIMIT 3;