Cumulative Column that starts at Zero every Year - sql

I have data from January 1st 2008 to today ordered by date in the first column of a table Ratio.
I have values in the second column. I was able to do a cumulative third column with the following code but I don't know how to make it re-start every January 1st to have cumulative per year.
SELECT
t3.Date,
SUM(cumul) AS cumul
FROM (
SELECT
t1.Date,
t1.nb,
SUM(t2.nb) AS cumul
FROM (
SELECT
Ratio.Date,
SUM(DailyValue) AS Nb
FROM Ratio
GROUP BY Ratio.Date
)t1
INNER JOIN (
SELECT
Ratio.Date,
SUM(DailyValue) AS nb
FROM Ratio
GROUP BY Ratio.Date
) t2
ON t1.Date >= t2.Date
GROUP BY t1.Date, t1.nb
)t3
GROUP BY PnLDate,nb
ORDER BY pnldate

There is a better way using window function SUM
select Date,
sum(sum(DailyValue)) over (
partition by year(date) order by date
) as cumul
from Ratio
group by Date
order by Date;

Related

Hive - max (rather than last) date in quarter

I'm querying a table and only want to select the end of quarter dates, I've done so like this:
select
yyyy_mm_dd,
id
from
t1
where
yyyy_mm_dd = cast(date_add(trunc(add_months(yyyy_mm_dd,3-pmod(month(yyyy_mm_dd)-1,3)),'MM'),-1) as date) --last day of q
With daily rows, from 2020-01-01 until 2020-12-31, the above works fine. However, 2021 rows end up being omitted as the quarter is incomplete. How could I modify the where clause so I select the last day of each quarter and the max date in the current quarter?
You can assign a row number for each quarter in descending order of date, and filter the rows with row number equals 1 (last date in each quarter):
select yyyy_mm_dd, id
from
(select
yyyy_mm_dd,
id,
row_number() over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd) order by yyyy_mm_dd desc) as rn
from
t1
) t2
where rn = 1
It is not clear if you have multiple rows on the end-of-quarter dates. It might be safer to take the max and use that:
select t1.*
from (select t1.*,
max(yyyy_mm_dd) over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd)) as max_yyyy_mm_dd
from t1
) t1
where yyyy_mm_dd = max_yyyy_mm_dd;
Note that this uses t1.* for the select. If you only wanted the maximum date, you can aggregate:
select id, max(yyyy_mm_dd)
from t1
group by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd);

Filter SQL Server Records by Latest Date on Every Year

How would I filter this SQL server database so only the green records are left aka the last recorded date every year for each Customer ID field.
If you want to get the rows, not only the date values, using ROW_NUMBER() is an option (you only need to use the appropriate PARTITON BY and ORDER BY clauses):
SELECT *
FROM (
SELECT
CustomerId,
[Date],
ROW_NUMBER() OVER (PARTITION BY CustomerId, YEAR[Date] ORDER BY [Date] DESC) AS Rn
FROM YourTable
) t
WHERE Rn = 1
To check the maximum date in the year, you can write a query to get for each year the date where not exists another (in the same year), as follow:
SELECT *
FROM yourtable t1
WHERE NOT EXISTS
(SELECT 1
FROM yourtable t2
WHERE t1.customerID = t2.customerID
AND t1.date > t2.date
AND DATEPART(YEAR, t1) = DATEPART(YEAR, t2))
If you have only two columns, then you can just use aggregation:
select customer_id, max(date)
from t
group by customer_id, year(date);

Selecting max date of each month

I have a table with a lot of cumulative columns, these columns reset to 0 at the end of each month. If I sum this data, I'll end up double counting. Instead, With Hive, I'm trying to select the max date of each month.
I've tried this:
SELECT
yyyy_mm_dd,
id,
name,
cumulative_metric1,
cumulative_metric2
FROM
mytable
WHERE
yyyy_mm_dd = last_day(yyyy_mm_dd)
mytable has daily data from the start of the year. In the output of the above, I only see the last date for January but not February. How can I select the last day of each month?
February is not over yet. Perhaps a window function does what you want:
SELECT yyyy_mm_dd, id, name, cumulative_metric1, cumulative_metric2
FROM (SELECT t.*,
MAX(yyyy_mm_dd) OVER (PARTITION BY last_day(yyyy_mm_dd)) as last_yyyy_mm_dd
FROM mytable t
) t
WHERE yyyy_mm_dd = last_yyyy_mm_dd;
This calculates the last day in the data.
use correlated subquery and date to month function in hive
SELECT
yyyy_mm_dd,
id,
name,
cumulative_metric1,
cumulative_metric2
FROM
mytable t1
WHERE
yyyy_mm_dd = select max(yyyy_mm_dd) from mytable t2 where
month(t1.yyyy_mm_dd)= month(t2.yyyy_mm_dd)

Calculating an AVG of COUNT by MONTH

I'm trying to display the average number of counts/records for each month in 2016. The following code does not display each month, rather displays only the monthly average for 2016:
SELECT AVG(DISTINCT DayCnt) AS AvgCnt
FROM
(
SELECT COUNT(*) As DayCnt
FROM table
WHERE YEAR(Insert_Date) = '2016'
GROUP BY MONTH(Insert_Date)
)
AS AvgCnt
You first should be group your result per month and days and count daily inserted records, after that to get average of per month inserted records try this:
SELECT monthGroup, AVG(DayCnt) AS AvgCnt
FROM
(
SELECT MONTH(Insert_Date) monthGroup, DAY(Insert_Date) dayGroup, COUNT(*) As DayCnt
FROM table
WHERE YEAR(Insert_Date) = '2016'
GROUP BY MONTH(Insert_Date), DAY(Insert_Date)
)
AS AvgCnt
GROUP BY monthGroup
I'm not sure since you didn't post the data but if I understand correctly, this should work for you:
SELECT AVG(DayCnt) AS AvgCnt, mth
FROM
(
SELECT COUNT(*) As DayCnt, MONTH(Insert_Date) as mth
FROM table
WHERE YEAR(Insert_Date) = '2016'
GROUP BY MONTH(Insert_Date), DAY(Insert_Date) )
AS AvgCnt
GROUP BY mth

refering to field out of subeselects scope

I'm working on a piece of SQL at the moment and i need to retrieve every row of a dataset with a median and an average aggregated in it.
Example
i have the following set
ID;month;value
and i would like to retrieve something like :
ID;month;value;average for this month;median for this month
without having to group by my result.
So it would be something like :
SELECT ID,month,value,
(SELECT AVG(value) FROM myTable) as "myAVG"
FROM myTable
but i would need that average to be the average for that month specifically. So, rows where the month="January" will have the average and median for "January" etc ...
Issue here is that i did not find a way to refer to the value of month in my subquery
(SELECT AVG(value) FROM myTable)
Does someone have a clue?
P.S: It's a redshift database i'm working on.
You would need to select all rows from the table, and do a left join with a select statement that does group by month. This way, you would get every row, and the group by results with them for that month.
Something like this:
SELECT * FROM myTable a
LEFT JOIN
(
SELECT Month, Sum(value being summed) as mySum
FROM myTable
GROUP BY Month
) b
ON a.Month = b.Month
Helpful?
with myavg as
(SELECT month, AVG(value) as avgval FROM myTable group by month)
, mymed as
(select month, median(value) as medval from myTable group by month)
select ID, month, value, ma.avgval, mm.medval
from mytable m left join myavg ma
on m.month = ma.month
left join mymed mm
on m.month = mm.month
You can use a cte to do this. However, you need a group by on month, as you are calculating an aggregate value.
In Redshift you can use Window Function.
select month,
avg(value) over
(PARTITION BY month rows unbounded preceding) as avg
from myTable
order by 1;