SQL Server Sum data between two dates group by date - sql

I have following data in my table:
eb |anz
05.03.2020 | 2
06.03.2020 | 3
07.03.2020 | 1
08.03.2020 | 9
09.03.2020 | 10
10.03.2020 | 2
11.03.2020 | 20
12.03.2020 | 25
Now I need to sum the values in specific range for each date.
For example "12.03.2020": I want to sum the value of the 12th, 11th, 10th and 9th of march for the date "12.03.2020". Additionally I want to sum the other four values before 9th of march and divide the summary 1 by summary 2 by select.
So my calculation would be: (25+20+2+10)/(9+1+3+2) = 3.8
I would like to output the date and the calculated value for each date in table.
I tried to sum the first group for each date (in example 9th to 12th march) but the output is the same as the data in the table.
select
eb,
sum(anz)
from (select eb, count(*) as anz from myTable where eb != '' group by eb) tmp
where
convert(date, eb, 104) >= dateadd(day,-3,convert(datetime, eb, 104))
and convert(date, eb, 104) <= convert(date, eb, 104)
group by eb
order by convert(date, eb, 104)
It looks like the condition is being ignored. Do you have any advice for me?
Thanks a lot

Let me assume that data is stored correctly as a date then you can use window functions:
select t.*,
(sum(anz) over (order by eb rows between 3 preceding and current row) /
sum(anz) over (order by eb rows between 8 preceding and 4 preceding)
)
from t;
Note that if value is an integer, then use * 1.0 / to avoid integer division.
Also, this assumes that you have data on each date.

Related

Apply SUM( where date between date1 and date2)

My table is currently looking like this:
+---------+---------------+------------+------------------+
| Segment | Product | Pre_Date | ON_Prepaid |
+---------+---------------+------------+------------------+
| RB | 01. Auto Loan | 2020-01-01 | 10645976180.0000 |
| RB | 01. Auto Loan | 2020-01-02 | 4489547174.0000 |
| RB | 01. Auto Loan | 2020-01-03 | 1853117000.0000 |
| RB | 01. Auto Loan | 2020-01-04 | 9350258448.0000 |
+---------+---------------+------------+------------------+
I'm trying to sum values of 'ON_Prepaid' over the course of 7 days, let's say from '2020-01-01' to '2020-01-07'.
Here is what I've tried
drop table if exists ##Prepay_summary_cash
select *,
[1W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 1 following and 7 following),
[2W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 8 following and 14 following),
[3W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 15 following and 21 following),
[1M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 22 following and 30 following),
[1.5M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 31 following and 45 following),
[2M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 46 following and 60 following),
[3M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 61 following and 90 following),
[6M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 91 following and 181 following)
into ##Prepay_summary_cash
from ##Prepay1
Things should be fine if the dates are continuous; however, there are some missing days in 'Pre_Date' (you know banks don't work on Sundays, etc.).
So I'm trying to work on something like
[1W] = SUM(ON_Prepaid) over (where Pre_date between dateadd(d,1,Pre_date) and dateadd(d,7,Pre_date))
something like that. So if per se there's no record on 2020-01-05, the result should only sum the dates on the 1,2,3,4,6,7 of Jan 2020, instead of 1,2,3,4,6,7,8 (8 because of "rows 7 following"). Or for example I have missing records over the span of 30 days or something, then all those 30 should be summed as 0s. So 45 days should return only the value of 15 days.
I've tried looking up all over the forum and the answers did not suffice. Can you guys please help me out? Or link me to a thread which the problem had already been solved.
Thank you so much.
Things should be fine if the dates are continuous
Then make them continuous. Left join your real data (grouped up so it is one row per day) onto your calendar table (make one, or use a recursive cte to generate you a list of 360 dates from X hence) and your query will work out
WITH d as
(
SELECT *
FROM
(
SELECT *
FROM cal
CROSS JOIN
(SELECT DISTINCT segment s, product p FROM ##Prepay1) x
) c
LEFT JOIN ##Prepay1 p
ON
c.d = p.pre_date AND
c.segment = p.segment AND
c.product = p.product
WHERE
c.d BETWEEN '2020-01-01' AND '2021-01-01' -- date range on c.d not c.pre_date
)
--use d.d/s/p not d.pre_date/segment/product in your query (sometimes the latter are null)
select *,
[1W_Prepaid] = sum(ON_Prepaid) over (partition by s, s order by d.d rows between 1 following and 7 following),
...
CAL is just a table with a single column of dates, one per day, no time, extending for n thousand days into the past/future
Wish to note that months have variable number of days so 6M is a bit of a misnomer.. might be better to call the month ones 180D, 90D etc
Also want to point out that your query performs a per row division of your data into into groups. If you want to perform sums up to 180 days after the date of the row you need to pull a year's worth of data so that on row 180(June) you have the December data available to sum (dec being 6 months from June)
If you then want to restrict your query to only showing up to June (but including data summed from 6 months after June) you need to wrap it all again in a sub query. You cannot "where between jan and jun" in the query that does the sum over because where clauses are done before window clauses (doing so will remove the dec data before it is summed)
Some other databases make this easier, Oracle and Postgres spring to mind; they can perform sum in a range where the other rows values are within some distance of the current row's values. SQL server only usefully supports distancing based on a row's index rather than its values (the distancing-based-on-values support is limited to "rows that have the same value", rather than "rows that have values n higher or lower than the current row"). I suppose the requirement could be met with a cross apply, or a coordinated sub in the select, though I'd be careful to check the performance..
SELECT *,
(SELECT SUM(tt.a) FROM x tt WHERE t.x = tt.x AND tt.y = t.y AND tt.z BETWEEN DATEADD(d, 1, t.z) AND DATEADD(d, 7, t.z) AS 1W
FROM
x t

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

Output number of occurrences of id in a table

PK Date ID
=== =========== ===
1 07/04/2017 22
2 07/05/2017 22
3 07/07/2017 03
4 07/08/2017 04
5 07/09/2017 22
6 07/09/2017 22
7 07/10/2017 05
8 07/11/2017 03
9 07/11/2017 03
10 07/11/2017 03
I want to count the number of ID occurred in a given week/month, something like this.
ID Count
22 3 --> count as 1 only in the same date occurred twice one 07/09/2017
03 2 --> same as above, increment only one regardless how many times it occurred in a same date
04 1
05 1
I'm trying to implement this in a perl file, to output/print it in a csv file, I have no idea on what query will I execute.
Seems like a simple case of count distinct and group by:
SELECT Id, COUNT(DISTINCT [Date]) As [Count]
FROM TableName
WHERE [Date] >= #StartDate
AND [Date] <= #EndDate
GROUP BY Id
ORDER BY [Count] DESC
You can use COUNT with DISTINCT e.g.:
SELECT ID, COUNT(DISTINCT Date)
FROM table
GROUP BY ID;
You can read more abot how to get month from a date in get month from a date (it also works for year).
Your query will be :
select DATEPART(mm,Date) AS month, COUNT(ID) AS count from table group by month
Hope that helped you.

How to identify and aggregate sequence from start and end dates

I'm trying to identify a consecutive sequence in dates, per person, as well as sum amount for that sequence. My records table looks like this:
person start_date end_date amount
1 2015-09-10 2015-09-11 500
1 2015-09-11 2015-09-12 100
1 2015-09-13 2015-09-14 200
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-05 300
2 2015-10-06 2015-10-06 1000
3 2015-04-23 2015-04-23 900
The resulting query should be this:
person sequence_start_date sequence_end_date amount
1 2015-09-10 2015-09-14 800
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-06 1400
3 2015-04-23 2015-04-23 900
Below, I can use LAG and LEAD to identify the sequence start_date and end_date, but I don't have a way to aggregate the amount. I'm assuming the answer will involve some sort of ROW_NUMBER() window function that will partition by sequence, I just can't figure out how to make the sequence identifiable to the function.
SELECT
person
,COALESCE(sequence_start_date, LAG(sequence_start_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_start_date"
,COALESCE(sequence_end_date, LEAD(sequence_end_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_end_date"
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' = start_date
THEN NULL
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' = end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
) sq
Even your updated (sub)query still isn't quite right for the data you've presented, which is inconsistent about whether the start date of the second and subsequent rows in a sequence should be equal to their previous rows' end date or one day later. The query can be updated pretty easily to accommodate both, if that's needed.
In any case, you cannot use COALESCE as a window function. Aggregate functions may be used as window functions by providing an OVER clause, but not ordinary functions. There are nevertheless ways to apply window function to this task. Here's a way to identify the sequences in your data (as presented):
SELECT
person
,MAX(sequence_start_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "sequence_start_date"
,MIN(sequence_end_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS "sequence_end_date"
,amount
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' >= start_date
THEN date '0001-01-01'
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' <= end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
order by person, start_date
) sq_part
ORDER BY person, sequence_start_date
That relies on MAX() and MIN() instead of COALESCE(), and it applies window framing to get the appropriate scope for each of those within each partition. Results:
person sequence_start_date sequence_end_date amount
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 500
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 100
1 October, 05 2015 00:00:00 October, 07 2015 00:00:00 2000
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 300
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 1000
3 April, 23 2015 00:00:00 April, 23 2015 00:00:00 900
Do note that that does not require an exact match of end date with subsequent start date; all rows for each person that abut or overlap will be assigned to the same sequence. If (person, start_date) cannot be relied upon to be unique, however, then you probably need to order the partitions by end date as well.
And now you have a way to identify the sequences: they are characterized by the triple person, sequence_start_date, sequence_end_date. (Or actually, you need only the person and one of those dates for identification purposes, but read on.) You can wrap the above query as an inline view of an outer aggregate query to produce your desired result:
SELECT
person,
sequence_start_date,
sequence_end_date,
SUM(amount) AS "amount"
FROM ( <above query> ) sq
GROUP BY person, sequence_start_date, sequence_end_date
Of course you need both dates as grouping columns if you're going to select them.
Why not:
select a1.person, a1.sequence_start_date, a1.sequence_end_date,
sum(rx.amount)
as amount
from (EXISTING_QUERY) a1
left join records rx
on rx.person = a1.person
and rx.start_date >= a1.start_date
and rx.end_date <= a1.end_date
group by a1.person, a1.sequence_start_date, a1.sequence_end_date

Open Ticket Count Per Day

I have a table that looks like this
id | Submit_Date | Close_Date
------------------------------
1 | 2015-02-01 | 2015-02-05
2 | 2015-02-02 | 2015-02-04
3 | 2015-02-03 | 2015-02-05
4 | 2015-02-04 | 2015-02-06
5 | 2015-02-05 | 2015-02-07
6 | 2015-02-06 | 2015-02-07
7 | 2015-02-07 | 2015-02-08
I can get a count of how many ticket were open on a particular day with this:
Select count(*) from tickets where '2015-02-05' BETWEEN Submit_Date and Close_Date
This gives me 4, but I need this count for each day of a month. I don't want to have to write 30 queries to handle this. Is there a way to capture broken down by multiple days?
I created a solution a way back using a mix of #Heinzi s solution with the trick from Generate a resultset of incrementing dates in TSQL
declare #dt datetime, #dtEnd datetime
set #dt = getdate()
set #dtEnd = dateadd(day, 100, #dt)
SELECT dates.myDate,
(SELECT COUNT(*)
FROM tickets
WHERE myDate BETWEEN Submit_Date and Close_Date
)
FROM
(select Dates_To_Checkselect dateadd(day, number, #dt) mydate
from
(select distinct number from master.dbo.spt_values
where name is null
) n
where dateadd(day, number, #dt) < #dtEnd) dates
Code is combined from memory, I don't have it in front of me so there can be some typo's
First, you'll need a table that contains each date you want to check. You can use a temporary table for that. Let's assume that this table is called Dates_To_Check and has a field myDate:
SELECT myDate,
(SELECT COUNT(*)
FROM tickets
WHERE myDate BETWEEN Submit_Date and Close_Date)
FROM Dates_To_Check
Alternatively, you can create a huge table containing every possible date and use a WHERE clause to restrict the dates to those you are interested in.
If you're in SQL Server 2012 or newer you can do this using window functions with a small trick where you add 1 to the open days -1 to the closing days and then do a running total of this amount:
select distinct date, sum(opencnt) over (order by date) from (
select
Submit_Date as date,
1 as opencnt
from
ticket
union all
select
dateadd(day, 1, Close_Date),
-1
from
ticket
) TMP
There's a dateadd + 1 day to include the close date amount to that day
You could generate the list of dates and then retrieve the count for each date in your dateset.
The cte part generates the date list since the beginning of the year (an ssumption) and the next part calculates the count from your data set.
with cte as
(select cast('2015-01-01' as date) dt // you should change this part to the correct start date
union all
select dateadd(DD,1,dt) dt from cte
where dt<getdate()
)
select count(*)
from tickets
inner join cte
on cte.dt between Submit_Date and Close_Date
group by cte.dt