PostgreSQL array_agg but with stop condition - sql

I have table with record of children & i want to get comma separated results in descending order by month but with a breaking condition of status of child in each month. if status is 0 push it to array but if status is 1 then don't push it and break it there and don't check previous months record.
Table
Desired Output:
I have tried it this way which gives me all the months. but i don't know how to break it on status = 1 condition for every child
SELECT name, ARRAY_AGG(month ORDER BY month DESC)
FROM children
GROUP BY name

I think of this as:
SELECT name, ARRAY_AGG(month ORDER BY month DESC)
FROM (SELECT c.*,
MAX(c.month) FILTER (c.status = 1) OVER (PARTITION BY c.name) as last_1_month
FROM children c
) c
WHERE month > last_1_month
GROUP BY name;
This logic simply gets the last month where status = 1 and then chooses all later months.
If month is actually sequential with no gaps then you can do:
SELECT name,
ARRAY_AGG(month ORDER BY month DESC)[1:MAX(month) - MAX(month) FILTER (c.status = 1)]
FROM children c
GROUP BY name;

I'd use a not exists condition to filter out the records you don't want:
SELECT name, ARRAY_AGG(month ORDER BY month DESC)
FROM children a
WHERE NOT EXISTS (SELECT *
FROM children b
WHERE a.name = b.name AND b.status = 1 and a.month <= b.month)
GROUP BY name

Related

Get last record by month/year and id

I need to get the last record of each month/year for each id.
My table captures daily, for each id, an order value which is cumulative. So, I need that at the end I only have the last record of the month for each id.
I believe without something simple, but with the examples found I could not replicate for my case.
Here is an example of my input data and the expected result: db_fiddle.
My attempt doesn't include grouping by month and year:
select ar.id, ar.value, ar.aquisition_date
from table_views ar
inner join (
select id, max(aquisition_date) as last_aquisition_date_month
from table_views
group by id
)ld
on ar.id = ld.id and ar.aquisition_date = ld.last_aquisition_date_month
You could do this:
with tn as (
select
*,
row_number() over (partition by id, date_trunc('month', aquisition_date) order by aquisition_date desc) as rn
from table_views
)
select * from tn where rn = 1
The tn cte adds a row number that counts incrementally in descending order of date, for each month/id.. Then you take only those with rn=1, which is the last aquisition_date of any given month, for each id

DB2 Get latest modified and previous value from audit table

I have a audit table, i am trying to get the current and previous value for a column(rank) with audit timestamp information. I would like to get the timestamp when the value was changed. E.g:
For id = 1, rank was latest changed from 3 to 5 on 13-05-2021 14:10 by userid = 2.
I have written below query it gives the current and previous modified value but it gives the latest date and userid (17-05-2021 20:00 and 2), because row_number is ordered by timestamp.
with v_rank as (
select * from (
select
id,
a.rank as current_rank,
b.rank as previous_rank,
a.log_timestamp,
a.log_username,
row_number() over(partition by a.id order by a.log_timestamp) as rnum
from
user a
inner join user b on a.id = b.id and a.log_timestamp > b.timestamp
where
a.rank != b.rank
order by a.log_timestamp, b.timestamp
) where rnum = 1
)
select * from v_rank
Any suggestion on how can i get the correct timestamp(13-05-2021 14:10) and userid(2).
Edit:
Rank can also be null, in that case i need to get the blank in query result.
Expected output:
You seem to want lag() with filtering:
select u.*
from (select u.*,
lag(rank) over (partition by id order by log_timestamp) as prev_rank
from user u
) u
where rank <> prev_rank;

Fill NULL rows based on some mathematical operations

I have a table A which contains id and report_day and other columns. Also I've a table B which contains also id, report_day and also subscribers. I want to create a VIEW with id, report_day, subscribers columns. So it's a simple join:
select a.id, a.report_day, b.subscribers from schema.a
left join schema.b on a.id = b.id
and a.report_day = b.report_day
Now i want to add column subscribers_increment based on subscribers. But for some days I don't have stats for subscribers column and it's set to NULL. subcribers_increment it's just a (subcribers(current_day) - subscribers (prev_day).
I read some articles and add next statement:
case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment
And now I've next result:
NULL is still NULL.
For example it has incorrect increment for 2021-04-07. It's increment for 2 days. Can i divide this value from 2021-04-08 by numbers of days (here it's 2) and write same value for 2021-04-07 and 2021-04-08 (or at least for 2021-04-07 where it was null)? And same logic for all days where subscribers is null?
So i need to follow next rules:
If I see NULL value in subcribers column I should go for the next (future) NOT NULL day and grab value for this next day. Substract from this (feature) value last not null value (past - order by date, so we looping back). Divide result of substraction by number of days and fill these rows for column subcribers_increment.
Is it possible?
UPDATE:
For my data it shoud look like this:
UPDATE v2
After applying script:
UPDATE v3
case (our increment) 25.03-27.03 still is NULL
The basic idea is:
Use lag() to get the previous subscribers and dates before joining. This assumes that the left join is the cause of all the NULL values.
Use a cumulative count in reverse to assign a grouping so NULL is combined with the next value in one grouping.
As a result of (2), the count of NULLs in a group is the denominator
As a result of (1) the difference between subscribers and prev_subscribers is the numerator.
The actual calculation requires more window functions and case logic.
So the idea is:
with t as (
select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
from first_table a left join
(select b.*,
lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
from second_table b
) b
on a.id = b.id and a.report_day = b.report_day
)
select t.*,
(case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
then t.subscribers - t.prev_subscribers
when t.subscribers is not null
then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
when t.subscribers is null
then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
) / count(*) over (partition by id, grp)
end)
from t;
Here is a db<>fiddle.

Get one record over period with condition

I have a big query creating a history of object changes. In short it result looks like this:
id changedOn recordtype
1 2019-12-5 history
1 2020-01-1 history
1 2020-01-7 actual
2 2018-10-9 history
The result I want:
id changedOn recordtype
1 2019-12-5 history
1 2020-01-7 actual
2 2018-10-9 history
If there is 2 records in the same month on each id I want to ommit history records for this Month.
I would like to avoid cursor if it possible. But I'm stuck.
If you want one record per month with a preference for "actual", then use row_number():
select t.*
from (select t.*,
row_number() over (partition by id, year(changedOn), month(changedOn) order by recordtype) as seqnum
from t
) t
where seqnum = 1;
If you want all "actual" records for a month -- and then if there are none -- all the history records, I would recommend logic like this:
select t.*
from t
where t.recordtype = 'actual' or
(t.recordtype = 'history' and
not exists (select 1
from t t2
where t2.id = t.id and
t2.recordtype = 'actual' and
year(t2.changedon) = year(t.changedon) and
month(t2.changedon) = month(t.changedon)
);
These two approaches are subtly different. But you will only notice the differences if you have multiple "actual"s or "history"s in a single month for a single id.
Just remove the records with changedOn that are not the most recent
select * from tbl a
where not exists
(select 1 from tbl b where a.id = b.id and a.recordtype = b.recordtype and a.changedOn < b.changedOn )

SQL retrieve recent record

I want to retrieve TOPIC 1 SCORES with the most recent score (excluding null) (sorted by date) for each detailsID, (there are only detailsID 2 and 3 here, therefore only two results should return)
What about getting rid of Topic 1 Scores in GROUP BYdetailsID,Topic 1 Scores ?
Use a subquery to get the max and then join to it.
SELECT a.detailsID,`Topic 1 Scores`, a.Date
FROM Information.scores AS a
JOIN (SELECT detailsID, MAX(Date) "MaxDate"
FROM Information.scores
WHERE `Topic 1 Scores` IS NOT NULL
GROUP BY detailsID) Maxes
ON a.detailsID = Maxes.detailsID
AND a.Date = Maxes.MaxDate
WHERE `Topic 1 Scores` IS NOT NULL
Assuming SQL Server:
SELECT
ROW_NUMBER() OVER (PARTITION BY detailsID ORDER BY Date DESC) AS RowNumber,
detailsID, Date, Topic 1 Scores
FROM
Information.scores
Try doing
SELECT detailsID,`Topic 1 Scores`, MAX(Date) as "Date" GROUP BY "Date"