Selecting max date of each month - sql

I have a table with a lot of cumulative columns, these columns reset to 0 at the end of each month. If I sum this data, I'll end up double counting. Instead, With Hive, I'm trying to select the max date of each month.
I've tried this:
SELECT
yyyy_mm_dd,
id,
name,
cumulative_metric1,
cumulative_metric2
FROM
mytable
WHERE
yyyy_mm_dd = last_day(yyyy_mm_dd)
mytable has daily data from the start of the year. In the output of the above, I only see the last date for January but not February. How can I select the last day of each month?

February is not over yet. Perhaps a window function does what you want:
SELECT yyyy_mm_dd, id, name, cumulative_metric1, cumulative_metric2
FROM (SELECT t.*,
MAX(yyyy_mm_dd) OVER (PARTITION BY last_day(yyyy_mm_dd)) as last_yyyy_mm_dd
FROM mytable t
) t
WHERE yyyy_mm_dd = last_yyyy_mm_dd;
This calculates the last day in the data.

use correlated subquery and date to month function in hive
SELECT
yyyy_mm_dd,
id,
name,
cumulative_metric1,
cumulative_metric2
FROM
mytable t1
WHERE
yyyy_mm_dd = select max(yyyy_mm_dd) from mytable t2 where
month(t1.yyyy_mm_dd)= month(t2.yyyy_mm_dd)

Related

Hive - max (rather than last) date in quarter

I'm querying a table and only want to select the end of quarter dates, I've done so like this:
select
yyyy_mm_dd,
id
from
t1
where
yyyy_mm_dd = cast(date_add(trunc(add_months(yyyy_mm_dd,3-pmod(month(yyyy_mm_dd)-1,3)),'MM'),-1) as date) --last day of q
With daily rows, from 2020-01-01 until 2020-12-31, the above works fine. However, 2021 rows end up being omitted as the quarter is incomplete. How could I modify the where clause so I select the last day of each quarter and the max date in the current quarter?
You can assign a row number for each quarter in descending order of date, and filter the rows with row number equals 1 (last date in each quarter):
select yyyy_mm_dd, id
from
(select
yyyy_mm_dd,
id,
row_number() over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd) order by yyyy_mm_dd desc) as rn
from
t1
) t2
where rn = 1
It is not clear if you have multiple rows on the end-of-quarter dates. It might be safer to take the max and use that:
select t1.*
from (select t1.*,
max(yyyy_mm_dd) over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd)) as max_yyyy_mm_dd
from t1
) t1
where yyyy_mm_dd = max_yyyy_mm_dd;
Note that this uses t1.* for the select. If you only wanted the maximum date, you can aggregate:
select id, max(yyyy_mm_dd)
from t1
group by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd);

Oracle SQL Accumulated value for the date

I have a table with 3 columns: id, date and amount, but I would like to get accumulated SUM for each date (Last column).
Do you have an easy solution how to add this column?
I am trying with this:
SELECT date, sum(amount) as accumulated
FROM table group by date
WHERE max(date);
Should I user OVER() for this?
Use a window function to the total for each day:
SELECT date,
amount,
sum(amount) over (partition by date) as accumulated
FROM the_table;
However this will only work, if your dates all have the same time part (in Oracle a DATE column also contains a time). To make sure you ignore the time part, use trunc() to make sure all time parts are normalized to 00:00:00
SELECT date,
amount,
sum(amount) over (partition by trunc(date)) as accumulated
FROM the_table;
Use This:
SELECT T.ID, T.DATE, T.AMOUNT, (SELECT SUM(S.AMOUNT) FROM TABLE S WHERE S.DATE=T.DATE) ACCUMULATED
from
table T
This will give you the records from the table with a sum for all records for the date.

refering to field out of subeselects scope

I'm working on a piece of SQL at the moment and i need to retrieve every row of a dataset with a median and an average aggregated in it.
Example
i have the following set
ID;month;value
and i would like to retrieve something like :
ID;month;value;average for this month;median for this month
without having to group by my result.
So it would be something like :
SELECT ID,month,value,
(SELECT AVG(value) FROM myTable) as "myAVG"
FROM myTable
but i would need that average to be the average for that month specifically. So, rows where the month="January" will have the average and median for "January" etc ...
Issue here is that i did not find a way to refer to the value of month in my subquery
(SELECT AVG(value) FROM myTable)
Does someone have a clue?
P.S: It's a redshift database i'm working on.
You would need to select all rows from the table, and do a left join with a select statement that does group by month. This way, you would get every row, and the group by results with them for that month.
Something like this:
SELECT * FROM myTable a
LEFT JOIN
(
SELECT Month, Sum(value being summed) as mySum
FROM myTable
GROUP BY Month
) b
ON a.Month = b.Month
Helpful?
with myavg as
(SELECT month, AVG(value) as avgval FROM myTable group by month)
, mymed as
(select month, median(value) as medval from myTable group by month)
select ID, month, value, ma.avgval, mm.medval
from mytable m left join myavg ma
on m.month = ma.month
left join mymed mm
on m.month = mm.month
You can use a cte to do this. However, you need a group by on month, as you are calculating an aggregate value.
In Redshift you can use Window Function.
select month,
avg(value) over
(PARTITION BY month rows unbounded preceding) as avg
from myTable
order by 1;

SQL query for last entries in a period

I'm trying to write a SQL query for Oracle SQL in order to retrieve the last records for a certain period frequency. For example, say the frequency is Quarterly, (I'd also like monthly and annually to work), I can provide the start dates and end dates for the quarters if necessary, but I need to retrieve the last entry within each quarter. How can I do this? I've had limited luck so far without writing lots of subqueries.
Try something like:
select * from
(select t.*,
row_number() over (partition by trunc(date_field, 'MON')
order by date_field desc) rn
from my_table t)
where rn = 1
for months. (Use 'Q' or 'Y' instead of 'MON' in the trunc clause for quarters or years.)
SELECT col1, col2, col3...colN
FROM TableA
WHERE (colX = (SELECT MAX(Date) AS LastDate
FROM TableA
WHERE QuarterDate Between (BeginningDate AND EndingDate)
colX is the date column that is your date you need to be checking if in the quarter.
I think that should work.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)