What is the difference in syntax between the following queries? - sql

I have an huge table of policies and I need to find all policies with invalid movements. For example, if inforce - premium_paid to terminated - premium_paid is invalid, then I would need to find all policies with this movement.
My query was initially as follows:
SELECT *,
LEAD(STAT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_STAT,
LEAD(EVENT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_EVENT
FROM TABLE
WHERE STAT = 'inforce',
EVENT = 'premium_paid',
NEXT_STAT = 'terminated',
NEXT_EVENT = 'premium_paid'
ORDER BY STAT, EVENT, NEXT_STAT, NEXT_EVENT
However, when I ran it, the compiler said that my column names 'NEXT_POLSTAT' and 'NEXT_EVENT' were invalid. Then, when I tweaked it to the following, it worked:
SELECT *
FROM (
SELECT *,
LEAD(STAT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_STAT,
LEAD(EVENT) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) AS NEXT_EVENT
FROM TABLE) AS a
WHERE a.STAT = 'inforce',
a.EVENT = 'premium_paid',
a.NEXT_STAT = 'terminated',
a.NEXT_EVENT = 'premium_paid'
ORDER BY a.STAT, a.EVENT, a.NEXT_STAT, a.NEXT_EVENT
Thus, I am just curious to know why my initial query did not work.

Related

PostgresSQL/SQL Query

I want to get the activity_id of the row of every first "email" activity that happened in between the "completed_order" activity in a column "first_in_between"
I wrote this query
SELECT activity_id, customer , activity, ts,
case
when
activity = 'completed_order' and lead(activity) over (partition by customer order by ts) ='email'
then
lead(activity_id) over (partition by customer order by ts)
end as First_in_between
from activity_stream where customer = 'Lehmanns Marktstand'
order by ts
With the above Query, I am getting this result.
My Desired Results should be
You can readily get the timestamp of the email using:
select activity_id, customer , activity, ts,
(case when activity = 'completed_order' and
(min(ts) filter (where activity = 'email') over (partition by customer order by ts desc) <
min(ts) filter (where activity = 'completed_order') over (partition by customer order by ts desc)
)
then min(ts) filter (where activity = 'email') over (partition by customer order by ts desc)
end) as First_in_between
from activity_stream
where customer = 'Lehmanns Marktstand'
order by ts;
You can then join back to the table or use another level of window functions to get the corresponding activity_id for the timestamp.
Actually, I think I prefer another method, which is just to count the number of completed orders and then take the minimum ts:
select a.*,
min(ts) filter (where activity = 'email') over (partition by grp) as email_ts
from (select a.*,
count(*) filter (where activity = 'completed_order') over (partition by customer order by ts) as grp
from activity_stream a
where customer = 'Lehmanns Marktstand'
) a;
This should also allow you to use a twist to get the activity id without an additional subquery:
select a.*,
(array_agg(activity_id order by ts) filter (where activity = 'email') over (partition by grp))[1] as email_activity_id
from (select a.*,
count(*) filter (where activity = 'completed_order') over (partition by grp order by ts) as grp
from activity_stream a
where customer = 'Lehmanns Marktstand'
) a

What should be done to do multiple order by?

I want to sort by chart_num and DATE. However, the following results are printed out when aligned:
in this my code:
SELECT *
FROM (
SELECT id, chart_num, chart_name, MIN(DATE) AS DATE, amount, (COUNT(*) = 2) AS result, card_check
FROM (
(
SELECT id, hpd.chart_num AS chart_num, hpd.chart_name AS chart_name, hpd.visit AS DATE, card_amount_received AS amount, card_check_modify AS card_check
,row_number() over (PARTITION BY card_amount_received ORDER BY id) AS seqnum
FROM hospital_payment_data hpd
WHERE store_mbrno = '135790' AND card_amount_received > 0
)
UNION ALL (
SELECT id, ncd. chart_num AS chart_num, ncd. chart_name AS chart_name, DATE_FORMAT(ncd.tranDate,'%Y-%m-%d') AS DATA, amount, card_check_result AS card_check
,row_number() over (PARTITION BY amount ORDER BY id) AS seqnum
FROM noti_card_data ncd
WHERE (mbrNo = '135790' OR mbrNo = '135791') AND cmd ='승인'
)
) X
GROUP BY amount, seqnum
ORDER BY result DESC
) a
ORDER BY a.DATE DESC
The result I want is that the NULL value goes back to the latest DATE, and if there is a chart_num, I want to sort it in order of chart_num and DATE.
It feels like I'm missing something else with this question, but you can separate columns in the ORDER BY with a comma. It's not clear from your text whether you want dates grouped within the same chart_num or charts grouped within the same date, but if I guessed wrong you can just swap it.
Also, the ORDER BY result DESC is completely extra. It adds nothing to the results, and by removing it we can get rid of a whole level of nesting.
SELECT id, chart_num, chart_name, MIN(DATE) AS DATE, amount, (COUNT(*) = 2) AS result, card_check
FROM (
(
SELECT id, hpd.chart_num AS chart_num, hpd.chart_name AS chart_name, hpd.visit AS DATE, card_amount_received AS amount, card_check_modify AS card_check
,row_number() over (PARTITION BY card_amount_received ORDER BY id) AS seqnum
FROM hospital_payment_data hpd
WHERE store_mbrno = '135790' AND card_amount_received > 0
)
UNION ALL (
SELECT id, ncd.chart_num, ncd.chart_name, DATE_FORMAT(ncd.tranDate,'%Y-%m-%d'), amount, card_check_result
,row_number() over (PARTITION BY amount ORDER BY id) AS seqnum
FROM noti_card_data ncd
WHERE mbrNo IN ('135790', '135791') AND cmd ='승인'
)
) X
GROUP BY amount, seqnum
ORDER BY MIN(DATE), coalesce(chart_num,-1), result DESC
Dont order by result in the inner union all query.
Sort by chart_num and date in place of result.
So in place of
Order by result desc
use this:
Order by chart_num desc, DATE desc
Or,
in outer main query:
in place of
Order by a.DATE DESC
use
Order by a.chart_num desc, a.DATE desc
Hope it helps.!

How to find the max time difference in PostgreSQL that a value stayed on the same state?

I am working for a university project and this came up to me:
I have a table like this:
And I want to get the max duration that an actuator was on a state. For example, cool0 Was for 18 minutes.
The result table should look like this:
NAME COOL0
State False
Duration 18
This is a gaps and islands problem. Your data is a bit hard to follow, but I think:
select actuator, state, min(actuator_time), max(actuator_time)
from (select t.*,
row_number() over (partition by actuator order by actuator_time) as seqnum,
row_number() over (partition by actuator, state order by actuator_time) as seqnum_s
from t
) t
group by actuator, (seqnum- seqnum_s)
For the maximum per actuator, use distinct on:
select distinct on (actuator) actuator, state, min(actuator_time), max(actuator_time)
from (select t.*,
row_number() over (partition by actuator order by actuator_time) as seqnum,
row_number() over (partition by actuator, state order by actuator_time) as seqnum_s
from t
) t
group by actuator, (seqnum- seqnum_s)
order by actuator, max(actuator_time) - min(actuator_time) desc;

sql how can I do get the first and last date when in two columns different rows (islands problem)

I think this problem is called islands and I'm looking on the net but not getting it.
I have a table where I need to get the start date and end date (different columns) in a range.
The table has 100,000 rows and I want to group it down so result will be
I have created a http://sqlfiddle.com/#!18/f4800/1
From the internet I think I need to create rows so have this now:
But I'm stuck thinking over what my next step will be.
You need row_number() instead of dense_rank() & use the difference of sequences :
select [CodeID], min([DATE_START]) as DATE_START,
max(DATE_FINISH) as DATE_FINISH, state
from (select [CodeID],[DATE_START],[DATE_FINISH],[STATE],
row_number() over(partition by [CodeID] order by [DATE_START]) as seq1,
row_number() over(partition by [CodeID],[STATE] order by [DATE_START]) as seq2
from Row_State
--where codeid = 'code1'
) t
group by [CodeID], state, (seq1-seq2)
order by CodeID, DATE_START;
Here is db fiddle.
If you know that the final result will be tiled in time with no gaps, then you can also use lag() and lead() like this:
select code_id, state, date_start,
lead(date_start) over (partition by code_id order by date_start) - interval '1 day' as day_end
from (select rs.*,
lag(state) over (partition by code_id order by date_start) as prev_state
from row_state rs
) rs
where prev_state is null or prev_state <> state;
The only issue with this version is that it does not correctly calculate the final date. But for that:
select code_id, state, date_start,
coalesce(dateadd(day, -1, lead(date_start) over (partition by code_id order by date_start)),
max_date_end
) as day_end
from (select rs.*,
lag(state) over (partition by code_id order by date_start) as prev_state,
max(date_end) over (partition by code_id) as max_date_end
from row_state rs
) rs
where prev_state is null or prev_state <> state;
This could be faster than an approach that uses aggregation.

Optimizing sum() over(order by...) clause throwing 'resources exceeded' error

I'm computing a sessions table from event data from out website in BigQuery. The events table has around 12 million events (pretty small). After I add in the logic to create sessions, I want to sum all sessions and assign a global_session_id. I'm doing that using a sum()over(order by...) clause which is throwing a resources exceeded error. I know that the order by clause is causing all the data to be processed on a single node and that is causing the compute resources to be exceeded, but I'm not sure what changes I can make to my code to achieve the same result. Any work arounds, advice, or explanations are greatly appreciated.
with sessions_1 as ( /* Tie a visitor's last event and last campaign to current event. */
select visitor_id as session_user_id,
sent_at,
context_campaign_name,
event,
id,
LAG(sent_at,1) OVER (PARTITION BY visitor_id ORDER BY sent_at) as last_event,
LAG(context_campaign_name,1) OVER (PARTITION BY visitor_id ORDER BY sent_at) as last_event_campaign_name
from tracks_2
),
sessions_2 as ( /* Flag events that begin a new session. */
select *,
case
when context_campaign_name != last_event_campaign_name
or context_campaign_name is null and last_event_campaign_name is not null
or context_campaign_name is not null and last_event_campaign_name is null
then 1
when unix_seconds(sent_at)
- unix_seconds(last_event) >= (60 * 30)
or last_event is null
then 1
else 0
end as is_new_session
from sessions_1
),
sessions_3 as ( /* Assign events sessions numbers for total sessions and total user sessions. */
select id as event_id,
sum(is_new_session) over (order by session_user_id, sent_at) as global_session_id
#sum(is_new_session) over (partition by session_user_id order by sent_at) as user_session_id
from materialized_result_of_sessions_2_query
)
select * from sessions_3
If might help if you defined a CTE with just the sessions, rather than at the event level. If this works:
select session_user_id, sent_at,
row_number() over (order by session_user_id, sent_at) as global_session_id
from materialized_result_of_sessions_2_query
where is_new_session
group by session_user_id, sent_at;
If that doesn't work, you can construct the global id:
You can join this back to the original event-level data and then use a max() window function to assign it to all events. Something like:
select e.*,
max(s.global_session_id) over (partition by e.session_user_id order by e.event_at) as global_session_id
from events e left join
(<above query>) s
on s.session_user_id = e.session_user_id and s.sent_at = e.event_at;
If not, you can do:
select us.*, us.user_session_id + s.offset as global_session_id
from (select session_user_id, sent_at,
row_number() over (partition by session_user_id order by sent_at) as user_session_id
from materialized_result_of_sessions_2_query
where is_new_session
) us join
(select session_user_id, count(*) as cnt,
sum(count(*)) over (order by session_user_id) - count(*) as offset
from materialized_result_of_sessions_2_query
where is_new_session
group by session_user_id
) s
on us.session_user_id = s.session_user_id;
This might still fail if almost all users are unique and the sessions are short.