SQL Lag and LEAD query

SQL Lag and LEAD query - sql

I need data which is in Output column. When 1st column status is P then we need value from Filled date. But once status is anything from P then we need date from last P status. Pls. let me know if i am not able to explain. Thanks in advance.

In standard SQL, you can use:
select (case when status = 'P'
then filled_dt
else lag(case when status = 'P' then filled_dt end) over (partition by mbr_id order by filled_dt ignore nulls)
end) as imputed_filled_dt
This is standard SQL; however, not all databases support ignore nulls. This probably does what you want:
select (case when status = 'P'
then filled_dt
else max(case when status = 'P' then filled_dt end) over (partition by mbr_id order by filled_dt)
end) as imputed_filled_dt

Related

When else with partition by isn't working in redshift queries

I would like to exclude the categories sub_tag1, sub_tag2 and sub_tag3 of tag from the TAG_SALES_by_month but the rest whatever i mentioned in the where condition need to be included in the count. I couldn't achieve the desired result.can anyone help me to achieve the same, which would be very much appreciated.
select o.tag,
o.SOME, o.THING, o.ILIKE, o.date, c.THE, c.MOST,
date_part(month, o.date) as Month,
date_part(day, o.date) as day,
count(o.id) over (partition by day, CUST_Id) as SALE_NO,
count(o.id) over (partition by Month, CUST_Id) as SALE_NO_by_month,
count(case when (tag <> 'sub_tag1' AND tag <> 'sub_tag2' AND tag <> 'sub_tag3') then o.id else 0 END) over (partition by Month, CUST_Id) as TAG_SALES_by_month,
c.id as CUST_Id
from order_info o
left join config c on o.SOME = c.SOME
where date >= '05/01/2021' AND tag in ('sub_tag1', 'sub_tag2', 'sub_tag3', 'sub_tag4', 'sub_tag5',
'sub_tag6') AND ILIKE = 'JACK'
group by o.tag, o.SOME, o.THING, o.ILIKE, o.date, c.THE, c.MOST, CUST_Id, o.id
order by date

Per the comments, the issue here is the that COUNT will return 1 for any value, it counts existence vs not existence of a value/row.
So COUNT(CASE WHEN... ELSE 0...) will still count 1 on the ELSE condition, since 0 is a value that exists.
The solution is to use ELSE NULL or omit the ELSE clause which will default to NULL, because NULL will not be counted.

How to present a particular SQL queried row as columns in output

I need to present the attached output in PIC1 as the result in PIC2. The query used for generating PIC1 output in SQLDeveloper:
select subs_nm, as_of_date, run_status, (select max (tp.pr_vl)
from ual_mng.tqueue tq, ual_mng.tparams tp, ual_mng.tstatus ts
WHERE tq.tid = tp.tid AND tq.tid = ts.tid and tq.run_id = pcm.run_id and tp.pr_nm in ('TOT_RECORD_CNT')) as RECORD_COUNT
from UAL_MNG.PCM_SUBS_RUN_DTL_VW pcm where SUBS_NM='S_TS2_AQUA_A1_RLAP_DL' and AS_OF_DATE in ('2021-09-01','2021-09-02') order by run_start_dtm desc;
Appreciate all help.

If you don't need it to be dynamic (ie. it will only be two columns and you know which two months they are) you can do
select subs_nm,
max(case when as_of_date = '2021-09-01' then RECORD_COUNT else 0 end) as SEP1,
max(case when as_of_date = '2021-09-02' then RECORD_COUNT else 0 end) as SEP2,
from (
-- Your query
)
group by subs_nm
You can work out the percentage difference using the same expressions.
nb. I would always use an explicit date format mask. This might not run on a different machine / software. So use to_date('2021-09-01', 'yyyy-mm-dd')

Posting the query, which worked in the script :
select subs_nm, SEP1, SEP2, round((((SEP1-SEP2)/SEP1)*100),2) as DIFF_PER from ( select subs_nm,
max(case when as_of_date='2021-09-01' then RECORD_COUNT else '0' end) as SEP1,
max(case when as_of_date='2021-09-02' then RECORD_COUNT else '0' end) as SEP2 from (-- *Main Query*);

HAVING gives me "column...does not exist" but I see the column

This is a practice question from stratascratch and I'm literally stuck at the final HAVING statement.
Problem statement:
Find the total number of downloads for paying and non-paying users by date. Include only records where non-paying customers have more downloads than paying customers. The output should be sorted by earliest date first and contain 3 columns date, non-paying downloads, paying downloads.
There are three tables:
ms_user_dimension (user_id, acc_id)
ms_acc_dimension (acc_id, paying_customer)
ms_download_facts (date, user_id, downloads)
This is my code so far
SELECT date,
SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
SELECT date, paying_customer, SUM(downloads) AS cnt
FROM ms_download_facts d
LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
GROUP BY 1, 2
ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING no > yes;
If I remove the HAVING no > yes at the end, the code will run and I can see I have three columns: date, yes, and no. However, if I add the HAVING statement, I get the error "column "no" does not exist...LINE 13: HAVING no > yes"
Can't figure out for the sake of my life what's going on here. Please let me know if anyone figures out something. TIA!

You don't need a subquery for this:
SELECT d.date,
SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) AS no,
SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END) AS yes
FROM ms_download_facts d LEFT JOIN
ms_user_dimension u
ON d.user_id = u.user_id LEFT JOIN
ms_acc_dimension a
ON u.acc_id = a.acc_id
GROUP BY d.date
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) > SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END);
You can simplify the HAVING clause to:
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN 1 ELSE -1 END) > 0
This version assumes that paying_customer only takes on the values 'yes' and 'no'.
You may be able to simplify the query further, depending on the database you are using.

It doesn't like aliases in the having statement. Replace no with:
SUM(CASE WHEN paying_customer = 'no' THEN cnt END)
and do the similar thing for yes.
SELECT date,
SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
SELECT date, paying_customer, SUM(downloads) AS cnt
FROM ms_download_facts d
LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
GROUP BY 1, 2
ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING SUM(CASE WHEN paying_customer = 'no' THEN cnt END) > SUM(CASE WHEN paying_customer = 'yes' THEN cnt END);

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.

You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().

Merging data SQL Query

I have a query request where I have to show one customer activity for each web-site but it has to be only one row each, instead of one customer showing multiple times for each activity.
Following is the query I tried but brings lot more rows. please help me as how I can avoid duplicates and show only one customer by each row for each activity.
SELECT i.customer_id, i.SEGMENT AS Pistachio_segment,
(CASE when S.SUBSCRIPTION_TYPE = '5' then 'Y' else 'N' end ) PB_SUBS
(CASE WHEN S.SUBSCRIPTION_TYPE ='12' THEN 'Y' ELSE 'N' END) Daily_test,
(CASE when S.SUBSCRIPTION_TYPE ='8' then 'Y' else 'N' end) COOK_4_2
FROM IDEN_WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID and I.CUSTOMER_ID = S.CUSTOMER_ID
WHERE s.site_code ='PB' and s.subscription_end_date is null

Sounds like you need to group by customer_id and perform aggregations for the other columns you are selecting. For example:
sum(case when s.subscription_type = '5' then 1 else 0 end) as pb_subs_count

You could try one of two things:
Use a GROUP BY statement to combine all records with the same id, e.g.,
...
WHERE s.site_code ='PB' and s.subscription_end_date is null
GROUP BY i.customer_id
Use the DISTINCT command in your SELECT, e.g.,
SELECT DISTINCT i.customer_id, i.SEGMENT, ...

you could use a aggregation (SUM) on customer_id, but what do you expect to happen on the other fields? for example, if you have SUBSCRIPTION_TYPE 5 and 13 for the same customer (2 rows), which value do you want?

Perhaps you are looking for something like this:
SELECT i.customer_id, i.SEGMENT AS Pistachio_segment,
MAX(CASE when S.SUBSCRIPTION_TYPE = '5' then 'Y' else 'N' end ) PB_SUBS
MAX(CASE WHEN S.SUBSCRIPTION_TYPE ='12' THEN 'Y' ELSE 'N' END) Daily_test,
MAX(CASE when S.SUBSCRIPTION_TYPE ='8' then 'Y' else 'N' end) COOK_4_2
FROM IDEN_WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID and I.CUSTOMER_ID = S.CUSTOMER_ID
WHERE s.site_code ='PB' and s.subscription_end_date is null
GROUP BY i.customer_id, i.SEGMENT
I can't be sure, though, without knowing more about the tables involved.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Lag and LEAD query - sql

I need data which is in Output column. When 1st column status is P then we need value from Filled date. But once status is anything from P then we need date from last P status. Pls. let me know if i am not able to explain. Thanks in advance.

Related

When else with partition by isn't working in redshift queries

How to present a particular SQL queried row as columns in output

HAVING gives me "column...does not exist" but I see the column

Hive rolling sum of data over date

Merging data SQL Query

Categories

Resources