SUM function and between two max dates - sql

I am trying to find out the total amount between 2 dates with MAX function in dates grouping by ID.
column 365date is the difference of column sessiondate-364.
Currently with this query I am getting the total amount, but I want to find out the amount between these 2-column date (i.e. 365 days).
This is my query:
SELECT
DATEADD(day, -364, (MAX(sessiondate))) AS 365date,
MAX(sessiondate)) AS lastdate,
SUM(Amount) AS amount,
ID
FROM
tablename
WHERE
date BETWEEN 365date AND lastdate
GROUP BY
MemberID
date | LastDate| Amount| ID| output amount(only last 365 days)| Total amount(all year)
29/07/2020 |28/07/2021 |100 |1 |1500 |63000
29/08/2020 |28/07/2021 |500 |1
02/05/2020 |28/07/2021 |600 |1
15/01/2020 |28/07/2021 |300 |1
10/10/2000 |28/07/2021 |50000 |1
10/10/1989 |28/07/2021 |10000 |1
"So need to take max(lastdate) for this ID which is 28/07/2021
and subtract 365 days from that then take all the days which lies
between 365 (29/07/2020,29/08/2020,02/05/2020,15/01/2020) and do sum and show it in last 365 days column.
For column totalamount(all year) needs to add all amount no matter of 365 days
Logic:
calculate date column
(MAX(date))-364
calculate lastdate column
Max(lastdate)
calculate last365 amount column
Sum (amount) Between (MAX(date))-364 and Max(lastdate)
calculate Total amount(all year)
sum(amount)
I need only 1 row which is highlighted. Not sure whats wrong with the query.
Can someone please help with this?

You can use CTE, I have convert your query to CTE for where clause.
;with cte AS (
SELECT
DATEADD(day, -364, (MAX(sessiondate))) AS '365date',
MAX(sessiondate)) AS 'lastdate',
SUM(Amount) AS 'amount',
ID
FROM
tablename
)
SELECT * FROM cte
WHERE
DATE BETWEEN 365date AND lastdate
GROUP BY
MemberID

Related

Find maximum number of days between consecutive events

I am trying to find the maximum number of days between any consecutive events for each company. I have a table events with fields company,eventid,date.
|eventid|company |date|
|1 | Company1 |2020-10-15|
|2 | Company2 |2018-03-22|
|3 | Company2 |2019-12-02|
|4 | Company3 |2021-01-02|
|5 | Company3 |2019-06-20|
|6 | Company1 |2018-07-21|
|7 | Company2 |2016-10-18|
|8 | Company2 |2017-04-12|
|9 | Company1 |2020-05-07|
|10| Company3 |2021-11-03|
I have managed to get a column of amount of days between each consecutive event:
select e1.company, e1.date, (e1.date - min(e2.date)) as daysbetween
from events e1 join events e2 on (e1.company=e2.company and e2.date > e1.date)
group by e1.company,e1.date;
This returns 10 results, but I only need the maximum from the daysbetween column for each company which would give 3 results. However since I used min() to get the daysbetween column I cannot use max() again on that column to find the maximum for each company.
I have been stuck on this for a few days now and cannot work out how I can find the maximum number of days between consecutive events for each company.
You can use LEAD function.
I am putting PosgreSQL example, since I noticed you tagged PostgreSQL.
Here is CTE broken into steps to make it clear how it works:
;with cte1 as (
select
company, eventDate, LEAD(eventDate, 1) OVER (PARTITION BY company ORDER BY eventDate) as nextEventDate
from tbl
order by company, eventDate
),
cte2 as (
select
company, nextEventDate - eventDate as daysBetweenEvents
from cte1
),
cte3 as (
select company, max(daysBetweenEvents)
from cte2
group by company
)
select *
from cte3
You can find the number of days between events for each company using lag function through a subquery, and then find the maximum number of days for each company in the main query.
Select Company, Max(daysbetween)
From
(Select Company, date - Lag(date) Over (Partition by Company Order by date) As daysbetween
From events) As T
Group by Company

group by value but only for continue value

OK, the title is far from obvious, I could not explain it better.
Let's consider the table with columns (date, xvalue, some other columns), what I need is to group them by xvalue but only when they are not interrupted considering time (column date), so for example, for:
Date |xvalue |yvalue|
1 Mar |10 |1 |
2 Mar |10 |2 |
3 Mar |20 |6 |
4 Mar |20 |1 |
5 Mar |10 |4 |
6 Mar |10 |2 |
From the above data, I would like to get three rows, for the first xvalue==10, for xvalue==20 and again for xvalue==10 and for each group aggregate of the other values, for example for sum:
1 Mar, 10, 3
3 Mar, 20, 7
5 Mar, 10, 6
It's like query:
select min(date), xvalue, sum(yvalue) from t group by xvalue
Except above will merge 1,2,5 and 6th of March and I want them separately
This is an example of a gaps-and-islands problem. But you need an ordering column. With such a column, you can use the difference of row numbers:
select min(date), xvalue, sum(yvalue)
from (select t.*,
row_number() over (partition by xvalue order by date) as seqnum_d,
row_number() over (order by date) as seqnum
from t
) t
group by xvalue, (seqnum - seqnum_d)
order by min(date)
Here is a db<>fiddle.
Datas in a database are logically stored in mathematicl sets inside which there is absolutly no order and no way to have a default ordering. they are comparable to bags in which objects can move during their use.
So there is no solution to answer your query until you add a specific column to give the requested sort order that the user need to have...

PostgreSQL: How to write a query for this scenario

I have this below table.
+_______+________+__________+________+
|Playid |billid| amount | Date |
+_______+________+__________+________+
|123 | 345 | 144.9 | 2015-09|
|123 | 456 | 200 | 2015-10|
+_______+________+__________+________+
I need to write a query to show only the bill amount that has most recent transaction date (Date) like below.
+_______+________+__________+________+
|Playid |billid| amount | Date |
+_______+________+__________+________+
|123 | 456 | 200 | 2015-10|
+_______+________+__________+________+
Please help me how do I do it.
MAX(Date) can be used if you want to display only the playid and the most recent date.
However, The issue with what you are trying to do, is that you want to display all the columns. And this where the ranking functions come into play. In this case you can use the row_number function like this:
SELECT PlayId, billid, amount, date
FROM
(
SELECT
PlayId, billid, amount, date,
row_number() over(partition by playid order by date dec) as rn
FROM tablename
) t
where rn = 1
The row_number() over(partition by playid order by date dec) will give each group of playid a ranking number, the first one (the lowest one) will be the one with the most recent date. Then you just need to filter on the row number equal to 1.
Postgres offers distinct on. This is simpler to write and often has the best performance:
select distinct on (playid) t.*
from t
order by playid, order by date desc;

Data Summarization in Apache Pig/Apache Hive For Given Date Range

I have a requirement where-in i need to do data summarization on the date range provided as input. To be more specific: If my data looks like:
Input:
Id|amount|date
1 |10 |2016-01-01
2 |20 |2016-01-02
3 |20 |2016-01-03
4 |20 |2016-09-25
5 |20 |2016-09-26
6 |20 |2016-09-28
And If I want the summarization for the month of September, then
I need to calculate count of records on 4 ranges which are:
Current Date, which is each day in September.
Week Start Date(Sunday of the week as per the current date) to Current Date, Ex. if Current Date is 2016-09-28 then week start date would be 2016-09-25
and record counts between 2016-09-25 to 2016-09-28.
Month Start Date to Current Date, which is from 2016-09-01 to Current Date.
Year Start Date to Current Date,which is record count from 2016-01-01 to Current Date.
So My output should have one record with 4 Columns for each day of the month(in this case, Month is September), Something like
Output:
Current_Date|Current_date_count|Week_To_Date_Count|Month_to_date_Count|Year_to_date_count
2016-09-25 |1 |1 |1 |4
2016-09-26 |1 |2 |3 |5
2016-09-28 |1 |3 |3 |6
Important: i can pass only 2 variables, which is range start date and range end date. Rest calculation need to be dynamic.
Thanks in advance
You can join on year, then test each condition separately (using sum(if())):
select a.date, sum(if(a.date=b.date,1,0)),
sum(if(month(a.date)=month(b.date) and weekofyear(a.date)=weekofyear(b.date),1,0)),
sum(if(month(a.date)=month(b.date),1,0)),
count(*) from
(select * from input_table where date >= ${hiveconf:start} and date <${hiveconf:end}) a,
(select * from input_table where date <${hiveconf:end}) b
where year(a.date)=year(b.date) and b.date <= a.date group by a.date;

Grouping 6-hour Data for Individual Days with SQL Server

Using SQL Server 2000,
I have data that contains a datetime and a value, say DateData and OtherData. Data is collected over time in five minute intervals, so for example, DateData is an ordered DateTime from 2/1/2016 to 2/24/2015, with a 5 minute gap between each new data point.
I am currently trying to average the data so that I can grab the average OtherData value every 6 hours, for each day individually. So far, I have come up with the following SQL that groups all the data into 6-hour intervals, so I end up with averages over all days, rather than individual.
SELECT
DATEPART(hour, DateData)/6,
AVG(OtherData) AS AvgData
FROM DATASITE
GROUP BY DATEPART(hour, DateData)/6
--Result:
|Column1 | AvgData|
|0 | 11|
|1 | 12|
|2 | 13|
|3 | 14|
How would I change the above query to give averages for individual days, rather than all days combined?
Add DateData in Group by to group the rows for each day
SELECT
DATEPART(hour, DateData)/6,
AVG(OtherData) AS AvgData
FROM DATASITE
GROUP BY cast(DateData as date),
DATEPART(hour, DateData)/6