Discrete Derivative in SQL - sql

I've got sensor data in a table in the form:
Time Value
10 100
20 200
36 330
46 440
I'd like to pull the change in values for each time period. Ideally, I'd like to get:
Starttime Endtime Change
10 20 100
20 36 130
36 46 110
My SQL skills are pretty rudimentary, so my inclination is to pull all the data out to a script that processes it and then push it back to the new table, but I thought I'd ask if there was a slick way to do this all in the database.

Select a.Time as StartTime
, b.time as EndTime
, b.time-a.time as TimeChange
, b.value-a.value as ValueChange
FROM YourTable a
Left outer Join YourTable b ON b.time>a.time
Left outer Join YourTable c ON c.time<b.time AND c.time > a.time
Where c.time is null
Order By a.time

Select a.Time as StartTime, b.time as EndTime, b.time-a.time as TimeChange, b.value-a.value as ValueChange
FROM YourTable a, YourTable b
WHERE b.time = (Select MIN(c.time) FROM YourTable c WHERE c.time>a.time)

you could use a SQL window function, below is an example based on BIGQUERY syntax.
SELECT
LAG(time, 1) OVER (BY time) AS start_time,
time AS end_time,
(value - LAG(value, 1) OVER (BY time))/value AS Change
from data

First off, I would add an id column to the table so that you have something that predictably increases from row to row.
Then, I would try the following query:
SELECT t1.Time AS 'Starttime', t2.Time AS 'Endtime',
(t2.Value - t1.Value) AS 'Change'
FROM SensorData t1
INNER JOIN SensorData t2 ON (t2.id - 1) = t1.id
ORDER BY t1.Time ASC
I'm going to create a test table to try this for myself so I don't know if it works yet but it's worth a shot!
Update
Fixed with one minor issue (CHANGE is a protected word and had to be quoted) but tested it and it works! It produces exactly the results defined above.

Does this work?
WITH T AS
(
SELECT [Time]
, Value
, RN1 = ROW_NUMBER() OVER (ORDER BY [Time])
, RN2 = ROW_NUMBER() OVER (ORDER BY [Time]) - 1
FROM SensorData
)
SELECT
StartTime = ISNULL(t1.[time], t2.[time])
, EndTime = ISNULL(t2.[time], 0)
, Change = t2.value - t1.value
FROM T t1
LEFT OUTER JOIN
T t2
ON t1.RN1 = t2.RN2

Related

SQL Optimization: multiplication of two calculated field generated by window functions

Given two time-series tables tbl1(time, b_value) and tbl2(time, u_value).
https://www.db-fiddle.com/f/4qkFJZLkZ3BK2tgN4ycCsj/1
Suppose we want to find the last value of u_value in each day, the daily cumulative sum of b_value on that day, as well as their multiplication, i.e. daily_u_value * b_value_cum_sum.
The following query calculates the desired output:
WITH cte AS (
SELECT
t1.time,
t1.b_value,
t2.u_value * t1.b_value AS bu_value,
last_value(t2.u_value)
OVER
(PARTITION BY DATE_TRUNC('DAY', t1.time) ORDER BY DATE_TRUNC('DAY', t2.time) ) AS daily_u_value
FROM stackoverflow.tbl1 t1
LEFT JOIN stackoverflow.tbl2 t2
ON
t1.time = t2.time
)
SELECT
DATE_TRUNC('DAY', c.time) AS time,
AVG(c.daily_u_value) AS daily_u_value,
SUM( SUM(c.b_value)) OVER (ORDER BY DATE_TRUNC('DAY', c.time) ) as b_value_cum_sum,
AVG(c.daily_u_value) * SUM( SUM(c.b_value) ) OVER (ORDER BY DATE_TRUNC('DAY', c.time) ) as daily_u_value_mul_b_value
FROM cte c
GROUP BY 1
ORDER BY 1 DESC
I was wondering what I can do to optimize this query? Is there any alternative solution that generates the same result?
db filddle demo
from your query: Execution Time: 250.666 ms to my query Execution Time: 205.103 ms
seems there is some progress there. Mainly reduce the time of cast, since I saw your have many times cast from timestamptz to timestamp. I wonder why not just another date column.
I first execute my query then yours, which mean the compare condition is quite fair, since second time execute generally more faster than first time.
alter table tbl1 add column t1_date date;
alter table tbl2 add column t2_date date;
update tbl1 set t1_date = time::date;
update tbl2 set t2_date = time::date;
WITH cte AS (
SELECT
t1.t1_date,
t1.b_value,
t2.u_value * t1.b_value AS bu_value,
last_value(t2.u_value)
OVER
(PARTITION BY t1_date ORDER BY t2_date ) AS daily_u_value
FROM stackoverflow.tbl1 t1
LEFT JOIN stackoverflow.tbl2 t2
ON
t1.time = t2.time
)
SELECT
t1_date,
AVG(c.daily_u_value) AS daily_u_value,
SUM( SUM(c.b_value)) OVER (ORDER BY t1_date ) as b_value_cum_sum,
AVG(c.daily_u_value) * SUM( SUM(c.b_value) ) OVER
(ORDER BY t1_date ) as daily_u_value_mul_b_value
FROM cte c
GROUP BY 1
ORDER BY 1 DESC

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!
The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run
You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1
Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

calculate time difference of consecutive row dates in SQL

Hello I am trying to calculate the time difference of 2 consecutive rows for Date (either in hours or Days), as attached in the image
Highlighted in Yellow is the result I want which is basically the difference of the date in that row and 1 above.
How can we achieve it in the SQL? Attached is my complex code which has the rest of the fields in it
with cte
as
(
select m.voucher_no, CONVERT(VARCHAR(30),CONVERT(datetime, f.action_Date, 109),100) as action_date,f.col1_Value,f.col3_value,f.col4_value,f.comments,f.distr_user,f.wf_status,f.action_code,f.wf_user_id
from attdetailmap m
LEFT JOIN awftaskfin f ON f.oid = m.oid and f.client ='PC'
where f.action_Date !='' and action_date between '$?datef' and '$?datet'
),
.*select *, ROW_NUMBER() OVER(PARTITION BY action_Date,distr_user,wf_Status,wf_user_id order by action_Date,distr_user,wf_Status,wf_user_id ) as row_no_1 from cte
cte2 as
(
select *, ROW_NUMBER() OVER(PARTITION BY voucher_no,action_Date,distr_user,wf_Status,wf_user_id order by voucher_no ) as row_no_1 from cte
)
select distinct(v.dim_value) as resid,c.voucher_no,CONVERT(datetime, c.action_Date, 109) as action_Date,c.col4_value,c.comments,c.distr_user,v.description,c.wf_status,c.action_code, c.wf_user_id,v1.description as name,r.rel_value as pay_office,r1.rel_value as site
from cte2 c
LEFT OUTER JOIN aagviuserdetail v ON v.user_id = c.distr_user
LEFT OUTER JOIN aagviuserdetail v1 ON v1.user_id = c.wf_user_id
LEFT OUTER JOIN ahsrelvalue r ON r.resource_id = v.dim_Value and r.rel_Attr_id = 'P1' and r.period_to = '209912'
LEFT OUTER JOIN ahsrelvalue r1 ON r1.resource_id = v.dim_Value and r1.rel_Attr_id = 'Z1' and r1.period_to = '209912'
where c.row_no_1 = '1' and r.rel_value like '$?site1' and voucher_no like '$?trans'
order by voucher_no,action_Date
The key idea is lag(). However, date/time functions vary among databases. So, the idea is:
select t.*,
(date - lag(date) over (partition by transaction_no order by date)) as diff
from t;
I should note that this exact syntax might not work in your database -- because - may not even be defined on date/time values. However, lag() is a standard function and should be available.
For instance, in SQL Server, this would look like:
select t.*,
datediff(second, lag(date) over (partition by transaction_no order by date), date) / (24.0 * 60 * 60) as diff_days
from t;

Calculating difference in rows for many columns in SQL (Access)

What's up guys. I have an other question regarding using SQL to analyze. I have a table build like this.
ID Date Value
1 31.01.2019 10
1 30.01.2019 5
2 31.01.2019 20
2 30.01.2019 10
3 31.01.2019 30
3 30.01.2019 20
With many different IDs and many different Dates. What I would like to have as an output is an additional column, that gives me the difference to the previous date for each ID. So that I can then analyze the change of values between days for each Category (ID). To do that I would need to avoid that the command computes the difference of Last Day WHERE ID = 1 - First Day WHERE ID = 2.
Desired Output:
ID Date Difference to previous Days
1 31.01.2019 5
2 31.01.2019 10
3 31.01.2019 10
In the end I want to find outlier, so days where the difference in value between two days is very large. Does anyone have a solution? If it is not possible with Access, I am open to solutions with Excel, but Access should be the first choice as it is more scaleable.
Greetings and thanks in advance!!
With a self join:
select t1.ID, t1.[Date],
t1.[Value] - t2.[Value] as [Difference to previous Day]
from tablename t1 inner join tablename t2
on t2.[ID] = t1.[ID] and t2.[Date] = t1.[Date] - 1
Results:
ID Date Difference to previous Day
1 31/1/2019 5
2 31/1/2019 10
3 31/1/2019 10
Edit.
For the case that there are gaps between your dates:
select
t1.ID, t1.[Date], t1.[Value] - t2.[Value] as [Difference to previous Day]
from (
select t.ID, t.[Date], t.[Value],
(select max(tt.[Date]) from tablename as tt where ID = t.ID and tt.[Date] < t.[Date]) as prevdate
from tablename as t
) as t1 inner join tablename as t2
on t2.ID = t1.ID and t2.[Date] = t1.prevdate
In your example data, each id has the same two rows and the values are increasing. If this is generally true, then you can simply use aggregation:
select id, max(date), max(value) - min(value)
from t
group by id;
If the values might not be increasing, but the dates are the same, then you can use conditional aggregation:
select id,
max(date),
(max(iif(date = "31.01.2019", value, null)) -
max(iif(date = "30.01.2019", value, null))
) as diff
from t
group by id;
Note: Your date looks like it is using a bespoke format, so I am just doing the comparison as a string.
If previous date is exactly one day before, you can use a join:
select t.*,
(t.value - tprev.value) as diff
from t left join
t as tprev
on t.id = tprev.di and t.date = dateadd("d", 1, tprev.date);
If date is arbitrarily the previous date in the table, then you can use a correlated subquery
select t.*,
(t.value -
(select top (1) tprev.value
from t as tprev
where tprev.id = t.id and tprev.date < t.date
order by tprev.date desc
)
) as diff
(t.value - tprev.value) as diff
from t;
You can use a self join with an additional condition using a sub-query to determine the previous date
SELECT t.ID, t.Date, t.Value - prev.Value AS Diff
FROM
dtvalues AS t
INNER JOIN dtvalues AS prev
ON t.ID = prev.ID
WHERE
prev.[Date] = (SELECT MAX(x.[Date]) FROM dtvalues x WHERE x.ID=t.ID AND x.[Date]<t.[Date])
ORDER BY t.ID, t.[Date];
You could also include the where condition into the join condition, but the query designer would not be able to handle the query anymore. Like this, you can still edit the query in the query designer.

sql calculate delta between the column value for one day and the previous day

I have a query like this:
;WITH A AS (SELECT * FROM T1 where T1.targetDate=#inputdate),
B AS (SELECT A.*, T2.SId, T2.Type, T2.Value
FROM A
INNER JOIN T2 ON A.SId = T2.SId )
SELECT A.*, B.Type, B.Value
FROM B
My question is, instead of getting the Value for #inputdate, how to get the delta of Value between #inputdate and the previous day (DATEADD(day, -1, #inputdate ))?
Edited:
Sorry for not being clear, the 'Value' is of type int. For example, if #inputdate = '20130708', the Value for '20130708' is 30, and the 'Value' for previous day '20130707' is 20, so it should return (30 - 20) which is 10.
Something like this, and assuming that Value is DATE format
;WITH A AS (SELECT * FROM T1 where T1.targetDate=#inputdate),
B AS (SELECT A.*, T2.SId, T2.Type, T2.Value
FROM A
INNER JOIN T2 on A.SId = T2.SId )
SELECT A.*, T2.Type, T2.Value, DATEDIFF(DAY, b.Value, DATEDADD(DAY, -1,#InputDate)) AS Delta
FROM B
let's say you have a stock prices table: which has the symbol, date and closing prices etc, you could use something like this:
select symb, ret_dt, close, (close-(lead(close,1) over (partition by symb order by ret_dt desc,close)))as difference, (lead(close,1) over
(partition by symb order by ret_dt desc,close)) as lead
from stocks.nyse2010;
Note: here ret_dt is the date, close is the closing price and I have added an additional lead column for representational purposes.