SQL Query: Calculating the deltas in a time series - sql

For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?

This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)

I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.

Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10

Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access

You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/

Related

Joining Tables on Time, IF NULL edit time by 1 minute

I have two tables.
Table 1 = My Trades
Table 2 = Market Trades
I want query the market trade 1 minute prior to my trade. If there is no market trade in Table 2 that is 1 minute apart from mine then I want to look back 2 minutes and so on till I have a match.
Right now my query gets me 1 minute apart but I cant figure out how to get 2 minutes apart if NULL or 3 minutes apart if NULL (up to 30 minutes). I think it would best using a variable but im not sure the best way to approach this.
Select
A.Ticker
,a.date_time
,CONVERT(CHAR(16),a.date_time - '00:01',120) AS '1MINCHANGE'
,A.Price
,B.Date_time
,B.Price
FROM
Trade..MyTrade as A
LEFT JOIN Trade..Market as B
on (a.ticker = b.ticker)
and (CONVERT(CHAR(16),a.date_time - '00:01',120) = b.Date_time)
There is no great way to do this in MySQL. But, because your code looks like SQL Server, I'll show that solution here, using APPLY:
select t.Ticker ,
convert(CHAR(16), t.date_time - '00:01', 120) AS '1MINCHANGE',
t.Price,
m.Date_time,
m.Price
from Trade..MyTrade as t outer apply
(select top 1 m.*
from Trade..Market m
where a.ticker = b.ticker and
convert(CHAR(16), t.date_time - '00:01', 120) >= b.Date_time)
order by m.DateTime desc
) m;

SQL - Value difference between specific rows

My query is as follows
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
SUM(Value) Value
FROM
f_Trans_GL
WHERE
Account = 228
GROUP BY
TimePeriod
And it returns
Period Value
---------------
201412 80
201501 20
201502 30
201506 50
201509 100
201509 100
I'd like to know the Value difference between rows where the period is 1 month apart. The calculation being [value period] - [value period-1].
The desired output being;
Period Value Calculated
-----------------------------------
201412 80 80 - null = 80
201501 20 20 - 80 = -60
201502 30 30 - 20 = 10
201506 50 50 - null = 50
201509 100 (100 + 100) - null = 200
This illustrates a second challenge, as the period needs to be evaluated if the year changes (the difference between 201501 and 201412 is one month).
And the third challenge being a duplicate Period (201509), in which case the sum of that period needs to be evaluated.
Any indicators on where to begin, if this is possible, would be great!
Thanks in advance
===============================
After I accepted the answer, I tailored this a little to suit my needs, the end result is:
WITH cte
AS (SELECT
ISNULL(CAST(TransactionID AS nvarchar), '_nullTransactionId_') + ISNULL(Description, '_nullDescription_') + CAST(Account AS nvarchar) + Category + Currency + Entity + Scenario AS UID,
LEFT(TimePeriod, 6) Period,
SUM(Value1) Value1,
CAST(LEFT(TimePeriod, 6) + '01' AS date) ord_date
FROM MyTestTable
GROUP BY LEFT(TimePeriod, 6),
TransactionID,
Description,
Account,
Category,
Currency,
Entity,
Scenario,
TimePeriod)
SELECT
a.UID,
a.Period,
--a.Value1,
ISNULL(a.Value1, 0) - ISNULL(b.Value1, 0) Periodic
FROM cte a
LEFT JOIN cte b
ON a.ord_date = DATEADD(MONTH, 1, b.ord_date)
ORDER BY a.UID
I have to get the new value (Periodic) for each UID. This UID must be determined as done here because the PK on the table won't work.
But the issue is that this will return many more rows than I actually have to begin with in my table. If I don't add a GROUP BY and ORDER by UID (as done above), I can tell that the first result for each combination of UID and Period is actually correct, the subsequent rows for that combination, are not.
I'm not sure where to look for a solution, my guess is that the UID is the issue here, and that it will somehow iterate over the field... any direction appreciated.
As pointed by other, first mistake is in Group by you need to Left(timeperiod, 6) instead of timeperiod.
For remaining calculation try something like this
;WITH cte
AS (SELECT LEFT(timeperiod, 6) Period,
Sum(value) Value,
Cast(LEFT(timeperiod, 6) + '01' AS DATE) ord_date
FROM f_trans_gl
WHERE account = 228
GROUP BY LEFT(timeperiod, 6))
SELECT a.period,
a.value,
a.value - Isnull(b.value, 0)
FROM cte a
LEFT JOIN cte b
ON a.ord_date = Dateadd(month, 1, b.ord_date)
If you are using SQL SERVER 2012 then this can be easily done using LAG analytic function
Using a derived table, you can join the data to itself to find rows that are in the preceding period. I have converted your Period to a Date value so you can use SQL Server's dateadd function to check for rows in the previous month:
;WITH cte AS
(
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
CAST(TimePeriod + '01' AS DATE) PeriodDate
SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
)
SELECT c1.Period,
c1.Value,
c1.Value - ISNULL(c2.Value,0) AS Calculation
FROM cte c1
LEFT JOIN cte c2
ON c1.PeriodDate = DATEADD(m,1,c2.PeriodDate)
Without cte, you can also try something like this
SELECT A.Period,A.Value,A.Value-ISNULL(B.Value) Calculated
FROM
(
SELECT LEFT(TimePeriod,6) Period
DATEADD(M,-1,(CONVERT(date,LEFT(TimePeriod,6)+'01'))) PeriodDatePrev,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS A
LEFT OUTER JOIN
(
SELECT LEFT(TimePeriod,6) Period
(CONVERT(date,LEFT(TimePeriod,6)+'01')) PeriodDate,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS B
ON (A.PeriodDatePrev = B.PeriodDate)
ORDER BY 1

Count how many first and last entries in given period of time are equal

Given a table structured like that:
id | news_id(fkey)| status | date
1 10 PUBLISHED 2016-01-10
2 20 UNPUBLISHED 2016-01-10
3 10 UNPUBLISHED 2016-01-12
4 10 PUBLISHED 2016-01-15
5 10 UNPUBLISHED 2016-01-16
6 20 PUBLISHED 2016-01-18
7 10 PUBLISHED 2016-01-18
8 20 UNPUBLISHED 2016-01-20
9 30 PUBLISHED 2016-01-20
10 30 UNPUBLISHED 2016-01-21
I'd like to count distinct news that, in given period time, had first and last status equal(and also status equal to given in query)
So, for this table query from 2016-01-01 to 2016-02-01 would return:
1 (with WHERE status = 'PUBLISHED') because news_id 10 had PUBLISHED in both first( 2016-01-10 ) and last row (2016-01-18)
1 (with WHERE status = 'UNPUBLISHED' because news_id 20 had UNPUBLISHED in both first and last row
notice how news_id = 30 does not appear in results, as his first/last statuses were contrary.
I have done that using following query:
SELECT count(*) FROM
(
SELECT DISTINCT ON (news_id)
news_id, status as first_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date
) first
JOIN (
SELECT DISTINCT ON (news_id)
news_id, status as last_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date DESC
) last
using (news_id)
where first_status = last_status
and first_status = 'PUBLISHED'
Now, I have to transform query into SQL our internal Java framework, unfortunately it does not support subqueries, except when using EXISTS or NOT EXISTS. I was told to transform the query to one using EXISTS clause(if it is possible) or try finding another solution. I am, however, clueless. Could anyone help me do that?
edit: As I am being told right now, the problem lies not with our framework, but in Hibernate - if I understood correctly, "you cannot join an inner select in HQL" (?)
Not sure if this adresses you problem correctly, since it is more of a workaround. But considering the following:
News need to be published before they can be "unpublished". So if you'd add 1 for each "published" and substract 1 for each "unpublished" your balance will be positive (or 1 to be exact) if first and last is "published". It will be 0 if you have as many unpublished as published and negative, if it has more unpublished than published (which logically cannot be the case but obviously might arise, since you set a date threshhold in the query where a 'published' might be occured before).
You might use this query to find out:
SELECT SUM(CASE status WHEN 'PUBLISHED' THEN 1 ELSE -1 END) AS 'publishbalance'
FROM news_events
WHERE date >= '2015-11-12 15:01:56.195'
GROUP BY news_id
First of all, subqueries are a substantial part of SQL. A framework forbidding their use is a bad framework.
However, "first" and "last" can be expressed with NOT EXISTS: where not exists an earlier or later entry for the same news_id and date range.
select count(*)
from mytable first
join mytable last on last.news_id = first.news_id
where date between #from and #to
and not exists
(
select *
from mytable before_first
where before_first.news_id = first.news_id
and before_first.date < first.date
and before_first.date >= #from
)
and not exists
(
select *
from mytable after_last
where after_last.news_id = last.news_id
and after_last.date > last.date
and after_last.date <= #to
)
and first.status = #status
and last.status = #status;
NOT EXISTS to the rescue:
SELECT ff.id ,ff.news_id ,ff.status , ff.zdate AS startdate
, ll.zdate AS enddate
FROM newsflash ff
JOIN newsflash ll
ON ff.news_id = ll.news_id
AND ff.status = ll.status
AND ff.zdate < ll.zdate
AND NOT EXISTS (
SELECT * FROM newsflash nx
WHERE nx.news_id = ff.news_id
AND nx.zdate >= '2016-01-01' AND nx.zdate < '2016-02-01'
AND (nx.zdate < ff.zdate OR nx.zdate > ll.zdate)
)
ORDER BY ff.id
;

Count records with a criteria like "within days"

I have a table as below on sql.
OrderID Account OrderMethod OrderDate DispatchDate DispatchMethod
2145 qaz 14 20/3/2011 23/3/2011 2
4156 aby 12 15/6/2011 25/6/2011 1
I want to count all records that have reordered 'within 30 days' of dispatch date where Dispatch Method is '2' and OrderMethod is '12' and it has come from the same Account.
I want to ask if this all can be achieved with one query or do I need to create different tables and do it in stages as I think I wll have to do now? Please can someone help with a code/query?
Many thanks
T
Try the following, replacing [tablename] with the name of your table.
SELECT Count(OriginalOrders.OrderID) AS [Total_Orders]
FROM [tablename] AS OriginalOrders
INNER JOIN [tablename] AS Reorders
ON OriginalOrders.Account = Reorders.Account
AND OriginalOrders.OrderDate < Reorders.OrderDate
AND DATEDIFF(day, OriginalOrders.DispatchDate, Reorders.OrderDate) <= 30
AND Reorders.DispatchMethod = '2'
AND Reorders.OrderMethod = '12';
By using an inner join you'll be sure to only grab orders that meet all the criteria.
By linking the two tables (which are essentially the same table with itself using aliases) you make sure only orders under the same account are counted.
The results from the join are further filtered based on the criteria you mentioned requiring only orders that have been placed within 30 days of the dispatch date of a previous order.
Totally possible with one query, though my SQL is a little stale..
select count(*) from table
where DispatchMethod = 2
AND OrderMethod = 12
AND DATEDIFF(day, OrderDate, DispatchDate) <= 30;
(Untested, but it's something similar)
One query can do it.
SELECT COUNT(*)FROM myTable reOrder
INNER JOIN myTable originalOrder
ON reOrder.Account = originalOrder.Account
AND reOrder.OrderID <> originalOrder.OrderID
-- all re-orders that are within 30 days or the
-- original orders dispatch date
AND DATEDIFF(d, originalOrder.DispatchDate, reOrder.OrderDate) <= 30
WHERE reOrder.DispatchMethod = 2
AND reOrder.OrderMethod = 12
You need a self-join.
The query below assumes that a given account will have either 1 or 2 records in the table - 2 if they've reordered, else 1.
If 3 records exist for a given account, 2 orders + 1 reorder then this won't work - but we'd then need more information on how to distinguish between an order and a reorder.
SELECT COUNT(*) FROM myTable new, myTable prev
WHERE new.DispatchMethod = 2
AND new.OrderMethod = 12
AND DATEDIFF(day, prev.DispatchDate, new.OrderDate) <=30
AND prev.Account == new.Account
AND prev.OrderDate < new.OrderDate
Can we use GROUP BY in this case, such as the following?
SELECT COUNT(Account)
FROM myTable
WHERE DispatchMethod = 2 AND OrderMethod = 12
AND DATEDIFF(d, DispatchDate, OrderDate) <=30
GROUP BY Account
Will the above work or am I missing something here?

SQL query to identify seasonal sales items

I need a SQL query that will identify seasonal sales items.
My table has the following structure -
ProdId WeekEnd Sales
234 23/04/09 543.23
234 30/04/09 12.43
432 23/04/09 0.00
etc
I need a SQL query that will return all ProdId's that have 26 weeks consecutive 0 sales. I am running SQL server 2005. Many thanks!
Update: A colleague has suggested a solution using rank() - I'm looking at it now...
Here's my version:
DECLARE #NumWeeks int
SET #NumWeeks = 26
SELECT s1.ProdID, s1.WeekEnd, COUNT(*) AS ZeroCount
FROM Sales s1
INNER JOIN Sales s2
ON s2.ProdID = s1.ProdID
AND s2.WeekEnd >= s1.WeekEnd
AND s2.WeekEnd <= DATEADD(WEEK, #NumWeeks + 1, s1.WeekEnd)
WHERE s1.Sales > 0
GROUP BY s1.ProdID, s1.WeekEnd
HAVING COUNT(*) >= #NumWeeks
Now, this is making a critical assumption, namely that there are no duplicate entries (only 1 per product per week) and that new data is actually entered every week. With these assumptions taken into account, if we look at the 27 weeks after a non-zero sales week and find that there were 26 total weeks with zero sales, then we can deduce logically that they had to be 26 consecutive weeks.
Note that this will ignore products that had zero sales from the start; there has to be a non-zero week to anchor it. If you want to include products that had no sales since the beginning, then add the following line after `WHERE s1.Sales > 0':
OR s1.WeekEnd = (SELECT MIN(WeekEnd) FROM Sales WHERE ProdID = s1.ProdID)
This will slow the query down a lot but guarantees that the first week of "recorded" sales will always be taken into account.
SELECT DISTINCT
s1.ProdId
FROM (
SELECT
ProdId,
ROW_NUMBER() OVER (PARTITION BY ProdId ORDER BY WeekEnd) AS rownum,
WeekEnd
FROM Sales
WHERE Sales <> 0
) s1
INNER JOIN (
SELECT
ProdId,
ROW_NUMBER() OVER (PARTITION BY ProdId ORDER BY WeekEnd) AS rownum,
WeekEnd
FROM Sales
WHERE Sales <> 0
) s2
ON s1.ProdId = s2.ProdId
AND s1.rownum + 1 = s2.rownum
AND DateAdd(WEEK, 26, s1.WeekEnd) = s2.WeekEnd;