Aggregation Calculation for All Items with Data rolling from Some Previous Data Points - sql

How do I do aggregation calculations on data rolling from previous data points?
For example, I have a table for the property value evaluations. Each data point is only for a property.
"Property Value Evaluation"
Date Property Value
1/5/2017 A 10
2/3/2017 B 8
2/20/2017 B 12
3/1/2017 A 9
4/10/2017 B 15
And assuming that the property value stays the same as it was evaluated last time, until it is evaluated again. For example, the value of Property A was 10 on 1/5, and it stayed 10 until 3/1 when it was re-evaluated as 9.
How can I have a report to show the trend of the values of all the properties? That is, I need to include all the properties on every data point of the report, such as:
"Value Trend of All Properties"
Date Total Average
1/5/2017 10 10
2/3/2017 18 9
2/20/2017 22 11
3/1/2017 21 10.5
4/10/2017 24 12
(Where Total is the sum of the values of Property A and Property B, and Average is the average value of these two.)
The problem I have is, for the sub-query which takes the Date and the Property as the parameters and returns the last Value, the following error occurs, even though my subquery is "select LAST_VALUE([Value]) ..." which only returns 1 value:
"Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression."

You can use window functions:
select date, sum(val),
avg(sum(val) * 1.0) over (order by date) as average
from t
group by date
order by date;
You can use window functions in conjunction with aggregation functions.

try this:
SELECT [Date],
Value+ISNULL(LAG(Value) OVER(order by (Select null) ),0) Total ,
(Value+ISNULL(LAG(Value) OVER(order by (Select null) ),0))*1.0/2 As Average
from Property_Evaluation

With the hints from the answers posted here by the others, I built the complete SQL statement as below.
SELECT sq1.Date
, sum(sq1.Value) Total
, avg(sq1.Value) Average
from (
select sq2.Date
, sq2.Property
, ISNULL(sq6.Value, (
select sq3.Value
From (
select sq6.Date
, sq6.Property
, isnull(sq6.Value, LAG(sq6.Value) OVER (partition by sq6.Property order by sq6.Date)) Value
from (
select isnull(sq4.Date, sq5.Date) Date
, isnull(sq4.Property, sq5.Property) Property
, sq4.Value Value
from (
select distinct sq2.Date Date
, sq2.Property Property
from [PropertyEvaluation]
) sq5
full outer join (
SELECT [Date] Date
, [Property] Property
, [Value] Value
FROM [PropertyEvaluation]
where [Property] = sq2.Property) sq4
on sq5.Date = sq4.Date
) sq6
) sq3
where sq3.Date = sq2.Date
and sq3.Property = sq2.Property
)
) Value
from (
select sq7.Date Date
, sq8.Property Property
from (
select distinct [Date] Date
from [PropertyEvaluation]) sq7
, (select distinct [Property] Property
from [PropertyEvaluation]) sq8
) sq2
left join (
SELECT [Date] Date
, [Property] Property
, [Value] Value
FROM [PropertyEvaluation]
) sq6
on sq2.Date = sq6.Date and sq2.Property = sq6.Property
) sq1
group by sq1.Date
order by 1
It returns the desired results:
Date Total Average
1/5/2017 10 10
2/3/2017 18 9
2/20/2017 22 11
3/1/2017 21 10.5
4/10/2017 24 12

Related

Calculating a value in SQL using previous row's values and current row value

I am trying to recreate the following in SQL where value at date of transaction needs to be calculated and value from other columns can be queried directly. It needs to add current value and transaction for first row of each type to get the value for 'value at date of transaction' and then for subsequent rows of that type, it needs to add 'value at date of transaction' from previous row to the 'transaction' value to get 'value at date of transaction' for current row. This process needs to start over for each type. Is this possible to recreate in SQL Server?
Type
Current Value
Transaction
Date of transaction
Value at date of transaction
A
5
2
12/31/2001
7
A
5
-3
12/30/2001
4
A
5
-1
12/29/2001
3
A
5
6
12/28/2001
9
B
100
20
12/31/2001
120
B
100
-50
12/30/2001
70
B
100
-10
12/29/2001
60
B
100
30
12/28/2001
90
C
20
7
12/31/2001
27
C
20
-3
12/30/2001
24
The structure seems odd to me.
But you can use the window function sum() over()
Declare #YourTable Table ([Type] varchar(50),[Current Value] int,[Transaction] int,[Date of transaction] date)
Insert Into #YourTable Values
('A',5,2,'12/31/2001')
,('A',5,-3,'12/30/2001')
,('A',5,-1,'12/29/2001')
,('A',5,6,'12/28/2001')
,('B',100,20,'12/31/2001')
,('B',100,-50,'12/30/2001')
,('B',100,-10,'12/29/2001')
,('B',100,30,'12/28/2001')
,('C',20,7,'12/31/2001')
,('C',20,-3,'12/30/2001')
Select *
,[Value at date] = [Current Value]
+ sum([Transaction]) over (partition by [Type] order by [Date of transaction] desc)
from #YourTable
Results
;with cte1
as (SELECT *,
/* conditional ROW_NUMBER to get the maxDate by Type and get only the Transaction Value+ immediately succeding row's Current Value*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Date_of_transaction DESC) = 1 then
LAG(Current_Value) OVER (PARTITION BY Type ORDER BY Date_of_transaction) + [Transaction]
else
[Transaction]
end as Base
FROM [Global_capturis_owner].[Book2]
)
select Type,
Current_Value,
[Transaction],
Date_of_transaction,
/* Windowed function to compute the Running Total*/
SUM(Base) OVER (PARTITION BY Type ORDER BY Date_of_transaction DESC) as RunningTotal
from cte1

See the distribution of secondary requests grouped by time interval in sql

I have the following table:
RequestId,Type, Date, ParentRequestId
1 1 2020-10-15 null
2 2 2020-10-19 1
3 1 2020-10-20 null
4 2 2020-11-15 3
For this example I am interested in the request type 1 and 2, to make the example simpler. My task is to query a big database and to see the distribution of the secondary transaction based on the difference of dates with the parent one. So the result would look like:
Interval,Percentage
0-7 days,50 %
8-15 days,0 %
16-50 days, 50 %
So for the first line from teh expected result we have the request with the id 2 and for the third line from the expected result we have the request with the id 4 because the date difference fits in this interval.
How to achieve this?
I'm using sql server 2014.
We like to see your attempts, but by the looks of it, it seems like you're going to need to treat this table as 2 tables and do a basic GROUP BY, but make it fancy by grouping on a CASE statement.
WITH dateDiffs as (
/* perform our date calculations first, to get that out of the way */
SELECT
DATEDIFF(Day, parent.[Date], child.[Date]) as daysDiff,
1 as rowsFound
FROM (SELECT RequestID, [Date] FROM myTable WHERE Type = 1) parent
INNER JOIN (SELECT ParentRequestID, [Date] FROM myTable WHERE Type = 2) child
ON parent.requestID = child.parentRequestID
)
/* Now group and aggregate and enjoy your maths! */
SELECT
case when daysDiff between 0 and 7 then '0-7'
when daysDiff between 8 and 15 then '8-15'
when daysDiff between 16 and 50 THEN '16-50'
else '50+'
end as myInterval,
sum(rowsFound) as totalFound,
(select sum(rowsFound) from dateDiffs) as totalRows,
1.0 * sum(rowsFound) / (select sum(rowsFound) from dateDiffs) * 100.00 as percentFound
FROM dateDiffs
GROUP BY
case when daysDiff between 0 and 7 then '0-7'
when daysDiff between 8 and 15 then '8-15'
when daysDiff between 16 and 50 THEN '16-50'
else '50+'
end;
This seems like basically a join and group by query:
with dates as (
select 0 as lo, 7 as hi, '0-7 days' as grp union all
select 8 as lo, 15 as hi, '8-15 days' union all
select 16 as lo, 50 as hi, '16-50 days'
)
select d.grp,
count(*) as cnt,
count(*) * 1.0 / sum(count(*)) over () as raio
from dates left join
(t join
t tp
on tp.RequestId = t. ParentRequestId
)
on datediff(day, tp.date, t.date) between d.lo and d.hi
group by d.grp
order by d.lo;
The only trick is generating all the date groups, so you have rows with zero values.

SQL - Value difference between specific rows

My query is as follows
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
SUM(Value) Value
FROM
f_Trans_GL
WHERE
Account = 228
GROUP BY
TimePeriod
And it returns
Period Value
---------------
201412 80
201501 20
201502 30
201506 50
201509 100
201509 100
I'd like to know the Value difference between rows where the period is 1 month apart. The calculation being [value period] - [value period-1].
The desired output being;
Period Value Calculated
-----------------------------------
201412 80 80 - null = 80
201501 20 20 - 80 = -60
201502 30 30 - 20 = 10
201506 50 50 - null = 50
201509 100 (100 + 100) - null = 200
This illustrates a second challenge, as the period needs to be evaluated if the year changes (the difference between 201501 and 201412 is one month).
And the third challenge being a duplicate Period (201509), in which case the sum of that period needs to be evaluated.
Any indicators on where to begin, if this is possible, would be great!
Thanks in advance
===============================
After I accepted the answer, I tailored this a little to suit my needs, the end result is:
WITH cte
AS (SELECT
ISNULL(CAST(TransactionID AS nvarchar), '_nullTransactionId_') + ISNULL(Description, '_nullDescription_') + CAST(Account AS nvarchar) + Category + Currency + Entity + Scenario AS UID,
LEFT(TimePeriod, 6) Period,
SUM(Value1) Value1,
CAST(LEFT(TimePeriod, 6) + '01' AS date) ord_date
FROM MyTestTable
GROUP BY LEFT(TimePeriod, 6),
TransactionID,
Description,
Account,
Category,
Currency,
Entity,
Scenario,
TimePeriod)
SELECT
a.UID,
a.Period,
--a.Value1,
ISNULL(a.Value1, 0) - ISNULL(b.Value1, 0) Periodic
FROM cte a
LEFT JOIN cte b
ON a.ord_date = DATEADD(MONTH, 1, b.ord_date)
ORDER BY a.UID
I have to get the new value (Periodic) for each UID. This UID must be determined as done here because the PK on the table won't work.
But the issue is that this will return many more rows than I actually have to begin with in my table. If I don't add a GROUP BY and ORDER by UID (as done above), I can tell that the first result for each combination of UID and Period is actually correct, the subsequent rows for that combination, are not.
I'm not sure where to look for a solution, my guess is that the UID is the issue here, and that it will somehow iterate over the field... any direction appreciated.
As pointed by other, first mistake is in Group by you need to Left(timeperiod, 6) instead of timeperiod.
For remaining calculation try something like this
;WITH cte
AS (SELECT LEFT(timeperiod, 6) Period,
Sum(value) Value,
Cast(LEFT(timeperiod, 6) + '01' AS DATE) ord_date
FROM f_trans_gl
WHERE account = 228
GROUP BY LEFT(timeperiod, 6))
SELECT a.period,
a.value,
a.value - Isnull(b.value, 0)
FROM cte a
LEFT JOIN cte b
ON a.ord_date = Dateadd(month, 1, b.ord_date)
If you are using SQL SERVER 2012 then this can be easily done using LAG analytic function
Using a derived table, you can join the data to itself to find rows that are in the preceding period. I have converted your Period to a Date value so you can use SQL Server's dateadd function to check for rows in the previous month:
;WITH cte AS
(
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
CAST(TimePeriod + '01' AS DATE) PeriodDate
SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
)
SELECT c1.Period,
c1.Value,
c1.Value - ISNULL(c2.Value,0) AS Calculation
FROM cte c1
LEFT JOIN cte c2
ON c1.PeriodDate = DATEADD(m,1,c2.PeriodDate)
Without cte, you can also try something like this
SELECT A.Period,A.Value,A.Value-ISNULL(B.Value) Calculated
FROM
(
SELECT LEFT(TimePeriod,6) Period
DATEADD(M,-1,(CONVERT(date,LEFT(TimePeriod,6)+'01'))) PeriodDatePrev,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS A
LEFT OUTER JOIN
(
SELECT LEFT(TimePeriod,6) Period
(CONVERT(date,LEFT(TimePeriod,6)+'01')) PeriodDate,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS B
ON (A.PeriodDatePrev = B.PeriodDate)
ORDER BY 1

Count how many first and last entries in given period of time are equal

Given a table structured like that:
id | news_id(fkey)| status | date
1 10 PUBLISHED 2016-01-10
2 20 UNPUBLISHED 2016-01-10
3 10 UNPUBLISHED 2016-01-12
4 10 PUBLISHED 2016-01-15
5 10 UNPUBLISHED 2016-01-16
6 20 PUBLISHED 2016-01-18
7 10 PUBLISHED 2016-01-18
8 20 UNPUBLISHED 2016-01-20
9 30 PUBLISHED 2016-01-20
10 30 UNPUBLISHED 2016-01-21
I'd like to count distinct news that, in given period time, had first and last status equal(and also status equal to given in query)
So, for this table query from 2016-01-01 to 2016-02-01 would return:
1 (with WHERE status = 'PUBLISHED') because news_id 10 had PUBLISHED in both first( 2016-01-10 ) and last row (2016-01-18)
1 (with WHERE status = 'UNPUBLISHED' because news_id 20 had UNPUBLISHED in both first and last row
notice how news_id = 30 does not appear in results, as his first/last statuses were contrary.
I have done that using following query:
SELECT count(*) FROM
(
SELECT DISTINCT ON (news_id)
news_id, status as first_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date
) first
JOIN (
SELECT DISTINCT ON (news_id)
news_id, status as last_status
FROM news_events
where date >= '2015-11-12 15:01:56.195'
ORDER BY news_id, date DESC
) last
using (news_id)
where first_status = last_status
and first_status = 'PUBLISHED'
Now, I have to transform query into SQL our internal Java framework, unfortunately it does not support subqueries, except when using EXISTS or NOT EXISTS. I was told to transform the query to one using EXISTS clause(if it is possible) or try finding another solution. I am, however, clueless. Could anyone help me do that?
edit: As I am being told right now, the problem lies not with our framework, but in Hibernate - if I understood correctly, "you cannot join an inner select in HQL" (?)
Not sure if this adresses you problem correctly, since it is more of a workaround. But considering the following:
News need to be published before they can be "unpublished". So if you'd add 1 for each "published" and substract 1 for each "unpublished" your balance will be positive (or 1 to be exact) if first and last is "published". It will be 0 if you have as many unpublished as published and negative, if it has more unpublished than published (which logically cannot be the case but obviously might arise, since you set a date threshhold in the query where a 'published' might be occured before).
You might use this query to find out:
SELECT SUM(CASE status WHEN 'PUBLISHED' THEN 1 ELSE -1 END) AS 'publishbalance'
FROM news_events
WHERE date >= '2015-11-12 15:01:56.195'
GROUP BY news_id
First of all, subqueries are a substantial part of SQL. A framework forbidding their use is a bad framework.
However, "first" and "last" can be expressed with NOT EXISTS: where not exists an earlier or later entry for the same news_id and date range.
select count(*)
from mytable first
join mytable last on last.news_id = first.news_id
where date between #from and #to
and not exists
(
select *
from mytable before_first
where before_first.news_id = first.news_id
and before_first.date < first.date
and before_first.date >= #from
)
and not exists
(
select *
from mytable after_last
where after_last.news_id = last.news_id
and after_last.date > last.date
and after_last.date <= #to
)
and first.status = #status
and last.status = #status;
NOT EXISTS to the rescue:
SELECT ff.id ,ff.news_id ,ff.status , ff.zdate AS startdate
, ll.zdate AS enddate
FROM newsflash ff
JOIN newsflash ll
ON ff.news_id = ll.news_id
AND ff.status = ll.status
AND ff.zdate < ll.zdate
AND NOT EXISTS (
SELECT * FROM newsflash nx
WHERE nx.news_id = ff.news_id
AND nx.zdate >= '2016-01-01' AND nx.zdate < '2016-02-01'
AND (nx.zdate < ff.zdate OR nx.zdate > ll.zdate)
)
ORDER BY ff.id
;

Dividing 2 numbers returns 0 [duplicate]

This question already has answers here:
Division of integers returns 0
(2 answers)
Closed 10 months ago.
I'm trying to divide 2 counts in order to return a percentage.
The following query is returning 0:
select (
(select COUNT(*) from saxref..AuthCycle
where endOfUse is null and addDate >= '1/1/2014') /
(select COUNT(*) from saxref..AuthCycle
where addDate >= '1/1/2014')
) as Percentage
Should I be applying a cast?
It can be done more succinctly by moving the common condition to the where clause:
select sum(case when endOfUse is null then 1 end) * 100.0 / count(*) percentage
from saxref..AuthCycle
where addDate >= '1/1/2014'
Note how you don't need the case of 0 for false either, since nulls are ignored with sum()
The issue is caused because you are dividing 2 int values, which by default will output an int as it takes the data types used in the calculation to determine the data type of the output, so effectively if you do this:
select 50/100 as result
You get 0.5 output as 0 as it rounds it to an int (no decimal places).
If you however specify decimals:
select 50.0/100.0 as result
You would get 0.5 as a decimal, which you could multiply by 100 to get 50%.
So updating your syntax to multiply by 1.0 and making the counts into decimals would give you the correct result:
select (
(select COUNT(*) from saxref..AuthCycle where endOfUse is null and addDate >= '1/1/2014')*1.0 /
(select COUNT(*) from saxref..AuthCycle where addDate >= '1/1/2014')*1.0
) as Percentage
I would do it differently, using two sums:
select sum
( case
when endOfUse is null and addDate >= '1/1/2014'
then 1
else 0
end
)
* 100.0 -- if you want the usual 0..100 range for percentages
/
sum
( case
when addDate >= '1/1/2014'
then 1
else 0
end
)
percentage
from saxref..AuthCycle