I need some SQL advice here...
I've got a table with an object (called "entityid") , an updated timestamp and a status of that object.
I now want to track, how often that object was set "inactive" by the user. But it should only count max. 1x inactive per day. If the status before was also inactive, it should not count!
So here's a little example i prepared in Excel to show where the marker should appear and where not:
Do you have any advice how I can solve this by using SQL ? (We're currently working with Redshift -> PostgreSQL).
If I understand correctly, you can use window functions. This returns the first "inactive" on each day:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at::date, content_status) = 1
) as needed_marker
from t;
If I understand correctly, you can use window functions. This returns the first "inactive" on each day:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at::date, content_status order by lastmodifiedtimestamp) = 1
) as needed_marker
from t;
Note: I'm not sure if updated_at is just the date. If it is, then the logic is more like:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at, content_status order by lastmodifiedtimestamp) = 1
) as needed_marker
from t;
EDIT:
If you want the first time that the status changes from active to inactive, then:
select t.*,
(content_status = 'inactive' and
num_actives = 1 and
prev_status = 'active'
) as needed_marker
from (select t.*,
sum(case when status = 'active' then 1 else 0 end) over (partition by entityid, updated_at order by lastmodifiedtimestamp) as num_actives,
lag(content_status) over (partition by entityid, updated_at lastmodifiedtimestamp) as prev_status
from t
) t;
Actually, the subquery is not needed:
select t.*,
(content_status = 'inactive' and
sum(case when status = 'active' then 1 else 0 end) over (partition by entityid, updated_at order by lastmodifiedtimestamp) = 1 and
lag(content_status) over (partition by entityid, updated_at lastmodifiedtimestamp) = 'active'
) as needed_marker
from t;
Related
I want to select the first non-null row with the minimum date. I'll like to use a CASE WHEN that condition is met, then 1 ELSE 0.
So more like CASE WHEN row IS NOT and DATE is minimum DATE then 1 ELSE 0. I just need to select ONLY one row.
Another option (for BigQuery Standard SQL)
#standardSQL
SELECT *, 0 AS marker FROM `project.dataset.table` WHERE item_count IS NULL
UNION ALL
SELECT *, IF(1 = ROW_NUMBER() OVER(PARTITION BY user ORDER BY date), 1, 0)
FROM `project.dataset.table` WHERE NOT item_count IS NULL
ORDER BY user, date
Consider:
select
t.*
case when date = min(case when itemcount is not null then date end) over(partition by user order by date)
then 1
else 0
end as marker
from mytable t
I am unsure whether BigQuery supports minif() as a window function:
select
t.*
case when date = minif(date, itemcount is not null) over(partition by user order by date)
then 1
else 0
end as marker
from mytable
I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().
I want to add a window functions.
Take the min date when visit = Y and end as Associd.
TableA
ID Date AssocId Visit
1 1/1/17 10101 Y
1 1/2/17 10102 Y
End Results.
ID Date AssocId
1 1/1/17 10101
SQL > This gives me the min date but I need the AssocId associated to that date.
SELECT MIN(CASE WHEN A.VISIT = 'Y'
THEN A.DATE END) OVER (PARTITION BY ID)
AS MIN_DT,
You can use FIRST_VALUE():
SELECT MIN(CASE WHEN A.VISIT = 'Y' THEN A.DATE END) OVER (PARTITION BY ID) AS MIN_DT,
FIRST_VALUE(CASE WHEN A.VISIT = 'Y' THEN A.ASSOCID END) KEEP (DENSE_RANK FIRST OVER (PARTITION BY ID ORDER BY A.VISIT DESC, A.DATE ASC),
Note that this is a little tricky with conditional operations. I would be more inclined to use a subquery to nest the query operations. The outer expression would be:
SELECT MAX(CASE WHEN Date = MIN_DT THEN ASSOCID END) OVER (PARTITION BY ID)
If you wanted this per ID, I would suggest:
select id, min(date),
first_value(associd) over (partition by id order by date)
from t
where visit = 'Y'
group by id;
That is, use aggregation functions.
You seems want :
select t.*
from table t
where visit = 'Y' and
date= (select min(t1.date) from table t1 where t1.id = t.id);
I have a table as below :
How can I craft a SQL select statement so that MIN AND MAX EVENT DATE groups results by FLAG (0,1)?
So the result would be:
Just do conditional aggregation with use of window function
SELECT card_no, descr_reader,
max(CASE WHEN flag = 0 THEN event_date END) date_in,
max(CASE WHEN flag = 1 THEN event_date END) date_out
FROM
(
SELECT *,
COUNT(flag) OVER (PARTITION BY flag ORDER BY id) Seq
FROM table t
)t
GROUP BY card_no, descr_reader, Seq
An alternative if Window function does not work:
SELECT
t1.card_no, t1.descr_reader,
t1.event_date date_in,
(select top 1 event_date from test t2
where t2.card_no = t1.card_no and
t2.reader_no = t1.reader_no and
t2.descr_reader = t1.descr_reader and
t2.event_date > t1.event_date and
t2.flag = 1
order by t2.event_date ) as date_out
FROM test t1
WHERE t1.flag = 0
My table structure is as follows:
Sessionid Pageurl timestamp
abc1 /testpage1 1465374987308
abc1 /testpage2 1465375020477
abc2 /testpage2 1465374987308
I wish to create a report of entry page count, exit page count and bounces count per page.
For any session, the first page is entry page and last page an exit page.
A bounce occurs when user leaves after viewing the first page(session has a single entry)
Final report would be as below..
pageurl EntrypageCount ExitPagecount BounceCount
/testpage1 1 0 0
/testpage2 1 2 1
I have been able to get bounces but on per day basis.
For bounces, the base select is..
SELECT sessionid, min(timestamp),CASE WHEN count(*) = 1 THEN 1 ELSE 0 END AS bounces
FROM auditdata GROUP BY sessionid.
But can not figure out how to get them by pageurl.
All help is sincerely appreciated.
Thanks
The following is one way (demo).
SELECT Pageurl,
COUNT(CASE WHEN timestamp = First THEN 1 END) AS EntrypageCount,
COUNT(CASE WHEN timestamp = Last THEN 1 END) AS ExitPagecount,
COUNT(CASE WHEN Count = 1 THEN 1 END) AS BounceCount
FROM (SELECT Pageurl,
timestamp,
MIN(timestamp) OVER (PARTITION BY Sessionid) AS First,
MAX(timestamp) OVER (PARTITION BY Sessionid) AS Last,
COUNT(*) OVER (PARTITION BY Sessionid) AS Count
FROM auditdata) T
GROUP BY Pageurl;
The above uses window functions, which most modern RDBMSs support, a version without would be.
SELECT Pageurl,
COUNT(CASE WHEN timestamp = First THEN 1 END) AS EntrypageCount,
COUNT(CASE WHEN timestamp = Last THEN 1 END) AS ExitPagecount,
COUNT(CASE WHEN Count = 1 THEN 1 END) AS BounceCount
FROM auditdata a
JOIN (SELECT Sessionid,
MIN(timestamp) AS First,
MAX(timestamp) AS Last,
COUNT(*) AS Count
FROM auditdata
GROUP BY Sessionid) g
ON a.Sessionid = g.Sessionid
GROUP BY Pageurl;