Window Functions - sql

I want to add a window functions.
Take the min date when visit = Y and end as Associd.
TableA
ID Date AssocId Visit
1 1/1/17 10101 Y
1 1/2/17 10102 Y
End Results.
ID Date AssocId
1 1/1/17 10101
SQL > This gives me the min date but I need the AssocId associated to that date.
SELECT MIN(CASE WHEN A.VISIT = 'Y'
THEN A.DATE END) OVER (PARTITION BY ID)
AS MIN_DT,

You can use FIRST_VALUE():
SELECT MIN(CASE WHEN A.VISIT = 'Y' THEN A.DATE END) OVER (PARTITION BY ID) AS MIN_DT,
FIRST_VALUE(CASE WHEN A.VISIT = 'Y' THEN A.ASSOCID END) KEEP (DENSE_RANK FIRST OVER (PARTITION BY ID ORDER BY A.VISIT DESC, A.DATE ASC),
Note that this is a little tricky with conditional operations. I would be more inclined to use a subquery to nest the query operations. The outer expression would be:
SELECT MAX(CASE WHEN Date = MIN_DT THEN ASSOCID END) OVER (PARTITION BY ID)
If you wanted this per ID, I would suggest:
select id, min(date),
first_value(associd) over (partition by id order by date)
from t
where visit = 'Y'
group by id;
That is, use aggregation functions.

You seems want :
select t.*
from table t
where visit = 'Y' and
date= (select min(t1.date) from table t1 where t1.id = t.id);

Related

SQL - Set marker for special data-constellations

I need some SQL advice here...
I've got a table with an object (called "entityid") , an updated timestamp and a status of that object.
I now want to track, how often that object was set "inactive" by the user. But it should only count max. 1x inactive per day. If the status before was also inactive, it should not count!
So here's a little example i prepared in Excel to show where the marker should appear and where not:
Do you have any advice how I can solve this by using SQL ? (We're currently working with Redshift -> PostgreSQL).
If I understand correctly, you can use window functions. This returns the first "inactive" on each day:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at::date, content_status) = 1
) as needed_marker
from t;
If I understand correctly, you can use window functions. This returns the first "inactive" on each day:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at::date, content_status order by lastmodifiedtimestamp) = 1
) as needed_marker
from t;
Note: I'm not sure if updated_at is just the date. If it is, then the logic is more like:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at, content_status order by lastmodifiedtimestamp) = 1
) as needed_marker
from t;
EDIT:
If you want the first time that the status changes from active to inactive, then:
select t.*,
(content_status = 'inactive' and
num_actives = 1 and
prev_status = 'active'
) as needed_marker
from (select t.*,
sum(case when status = 'active' then 1 else 0 end) over (partition by entityid, updated_at order by lastmodifiedtimestamp) as num_actives,
lag(content_status) over (partition by entityid, updated_at lastmodifiedtimestamp) as prev_status
from t
) t;
Actually, the subquery is not needed:
select t.*,
(content_status = 'inactive' and
sum(case when status = 'active' then 1 else 0 end) over (partition by entityid, updated_at order by lastmodifiedtimestamp) = 1 and
lag(content_status) over (partition by entityid, updated_at lastmodifiedtimestamp) = 'active'
) as needed_marker
from t;

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().

Aggregation disables window function capability

I am trying to re-write the query where I am joining the query on itself:
select count(distinct case when cancelled_client_id is null and year(RUM.first_date) = year(date) and RUM.first_date <= .date then user_id
when cancelled_client_id is null and year(coalesce(RUM.first_date,RUR.first_date)) = year(date)
and coalesce(RUM.first_date,RUR.first_date) <= RUL.date then user_id end) as
from RUL
left join
(
select enrolled_client_id, min(date) as first_date
from RUL
where enrolled_client_id is not null
group by enrolled_client_id
) RUR on RUR.enrolled_client_id=RUL.enrolled_client_id
left join
(
select managed_client_id, min(date) as first_date
from RUL
where managed_client_id is not null
group by managed_client_id
) RUM on RUM.managed_client_id=RUL.managed_client_id
Using window functions:
count(distinct case when cancelled_client_id is null
and year(min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id)) = year(date)
and min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id) <= date
then user_id
when cancelled_client_id_rev is null
and year(coalesce(
min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id),
min(case when managed_client_id is not null then date end) over(partition by managed_client_id))) = year(date)
and coalesce(
min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id),
min(case when managed_client_id is not null then date end) over(partition by managed_client_id)) <= date
then user_id end)
from RUL
However I am getting an error that "Windowed functions cannot be used in the context of another windowed function or aggregate" due to the count(distinct min). Any work-arounds?
I have no idea what the count(distinct) is supposed to be doing, but you can simplify the code to:
select count(distinct case when cancelled_client_id is null and
year(rum_first_date) = year(date) and
rum_first_date <= rul.date
then user_id
when cancelled_client_id is null and
year(coalesce(RUM_first_date, RUR_first_date)) = year(rul.date) and
coalesce(rum_first_date, rur_first_date) <= RUL.date
then user_id
end) as . . .
from (select RUL.*,
min(date) over (partition by enrolled_client_id) as rur_date,
min(date) over (partition by managed_client_id) as rum_date
from RUL
) RUL

SQL query min and max group by flag

I have a table as below :
How can I craft a SQL select statement so that MIN AND MAX EVENT DATE groups results by FLAG (0,1)?
So the result would be:
Just do conditional aggregation with use of window function
SELECT card_no, descr_reader,
max(CASE WHEN flag = 0 THEN event_date END) date_in,
max(CASE WHEN flag = 1 THEN event_date END) date_out
FROM
(
SELECT *,
COUNT(flag) OVER (PARTITION BY flag ORDER BY id) Seq
FROM table t
)t
GROUP BY card_no, descr_reader, Seq
An alternative if Window function does not work:
SELECT
t1.card_no, t1.descr_reader,
t1.event_date date_in,
(select top 1 event_date from test t2
where t2.card_no = t1.card_no and
t2.reader_no = t1.reader_no and
t2.descr_reader = t1.descr_reader and
t2.event_date > t1.event_date and
t2.flag = 1
order by t2.event_date ) as date_out
FROM test t1
WHERE t1.flag = 0