filtering with statement without using from - sql

I want to count products showed in events between two dates. I have to fill 9 columns, each with other product type.
I would like to ask you if there are possibility to short this statement.
Below sql is first working but not effective attempt.
with events(event_id, customer_id) as (
select * from event
where start_date >= :stare_date
and end_date <= :end_date
),
select
(select count(*) from event_product where event_id in (select event_id from events where customer_id = customer.customer_id) and product_type = 'YLW') customer_ylw_products -- it works but its ugly and non effective
-------
-- repeat seven times for other type of products
-------
(select count(*) from event_product where event_id in (select event_id from events where customer_id = customer.customer_id) and product_type = 'RTL') customer_rtl_products
from customer
;
Notice that line
(select event_id from events where customer_id = customer.customer_id)
repeats about 9 times.
I've been trying to short this one by add following:
with events(event_id, customer_id) as (
select * from event
where start_date >= :stare_date
and end_date <= :end_date
),
**customer_events (event_id, customer_id) as (select * from events)**
select
(select count(*) from event_product where event_id in (select event_id from customer_events) and product_type = 'RTL') customer_rtl_products
from customers
where customer_events.customer_id = customer.customer_id -- doesnt works
having customer_events.customer_id = customer.customer_id -- doesnt works

Why don't you use case expressions?
WITH
events (event_id, customer_id)
AS (
SELECT
*
FROM event
WHERE start_date >= :stare_date
AND end_date <= :end_date
)
SELECT
*
FROM customer
LEFT JOIN (
SELECT
event_product.customer_id
, COUNT(CASE
WHEN event_product.product_type = 'YLW' THEN 1 END) AS count_YLW
, COUNT(CASE
WHEN event_product.product_type = 'RTL' THEN 1 END) AS count_RTL
FROM event_product
INNER JOIN events
ON event_product.event_id = events.event_id
GROUP BY
event_product.customer_id
) ev_counts
ON customer.customer_id = ev_counts.customer_id
;
You could do this without the CTE too if you prefer, just use what you currently have in the CTE as a derived table where events is now placed in the inner join.
footnote select * is a convenience only I don't know what fields are to be used, but they should be specified.

#Used_By_Already thanks for inspire me with inner joins between event_product and event and that Event_product doesnt have column customer_id so I simply added it!
That's my solution
with events(event_id, customer_id) as (
select * from event
where start_date >= :stare_date
and end_date <= :end_date
),
product_events (customer_id, product_type) as (
select event.customer_id, event_product.product_type
from events,event_product
where event_product.event_id = event.event_id and event_product.product_type in (''product_types'')
)
select
(select count(*) from product_events where customer_id = customer.customer_id and product_type = 'RTL') customer_rtl_products
from customers;
Performance for 50 rows in search increased from 45 seconds to only 5!
Thank you so much!

Related

SUM with left outer join gets inflated result

The following query gives me the MRR (monthly recurring revenue) for my customer:
with dims as (
select distinct subscription_id, country_name, product_name from revenue
where site_id = '18XLsHIVSJg' and subscription_id is not null
)
select to_date('2022-07-01') as occurred_date,
count(distinct srm.subscription_id) as subscriptions,
count(distinct srm.receiver_contact) as subscribers,
sum(srm.baseline_mrr) as mrr_srm
from subscription_revenue_mart srm
join dims d on d.subscription_id = srm.subscription_id
where srm.site_id = '18XLsHIVSJg'
-- MRR as of the day before ie June 30th
and to_date(srm.creation_date) < '2022-07-01'
-- Counting the subscriptions active after July 1st
and ((srm.subscription_status = 'SUBL.A') or
-- Counting the subscriptions canceled/deactivated after July 1st
(srm.subscription_status = 'SUBL.C' and (srm.deactivation_date >= '2022-07-01') or (srm.canceled_date >= '2022-07-01')) ) group by 1;
I get a total of $5922.15 but I need to add data from another table to capture upgrades/downgrades a customer makes on a product subscription. Using the same approach as above, I can query my "change" table thusly:
select subscription_id, sum(mrr_change_amount) mrr_change_amount,max(subscription_event_date) subscription_event_date from subscription_revenue_mart_change srmc
where site_id = '18XLsHIVSJg'
and to_date(srmc.creation_date) < '2022-07-01'
and ((srmc.subscription_status = 'SUBL.A')
or (srmc.subscription_status = 'SUBL.C' and (srmc.deactivation_date >= '2022-07-01') or (srmc.canceled_date >= '2022-07-01')))
group by 1;
I get a total of $3635.47
When I combine both queries into one, I get an inflated result:
with dims as (
select distinct subscription_id, country_name, product_name from revenue
where site_id = '18XLsHIVSJg' and subscription_id is not null
),
change as (
select subscription_id, sum(mrr_change_amount) mrr_change_amount,
-- there can be multiple changes per subscription
max(subscription_event_date) subscription_event_date from subscription_revenue_mart_change srmc
where site_id = '18XLsHIVSJg'
and to_date(srmc.creation_date) < '2022-07-01'
and ((srmc.subscription_status = 'SUBL.A')
or (srmc.subscription_status = 'SUBL.C' and (srmc.deactivation_date >= '2022-07-01') or (srmc.canceled_date >= '2022-07-01')))
group by 1
)
select to_date('2022-07-01') as occurred_date,
count(distinct srm.subscription_id) as subscriptions,
count(distinct srm.receiver_contact) as subscribers,
-- See comment RE: LEFT OUTER join
sum(coalesce(c.mrr_change_amount,srm.baseline_mrr)) as mrr
from subscription_revenue_mart srm
join dims d
on d.subscription_id = srm.subscription_id
-- LEFT OUTER join required for customers that never made a change
left outer join change c
on srm.subscription_id = c.subscription_id
where srm.site_id = '18XLsHIVSJg'
and to_date(srm.creation_date) < '2022-07-01'
and ((srm.subscription_status = 'SUBL.A')
or (srm.subscription_status = 'SUBL.C' and (srm.deactivation_date >= '2022-07-01') or (srm.canceled_date >= '2022-07-01'))) group by 1;
It should be $9557.62 ie (5922.15 + $3635.47) but the query outputs $16116.91, which is wrong.
I think the explode-implode syndrome may cause this.
I had designed my "change" CTE to prevent this by aggregating all the relevant fields but it's not working.
Can someone provide pointers on the best way to work around this issue?
It would help if you gave us sample data too, but I see a problem here:
sum(coalesce(c.mrr_change_amount,srm.baseline_mrr)) as mrr
Why COALESCE? That will give you one of the 2 numbers, but I guess what you want is:
sum(ifnull(c.mrr_change_amount, 0) + srm.baseline_mrr) as mrr
That's the best I can offer with what you've given us.

How to sum from multiple columns and segregate into separate column if result is positive and negative

I am using postgresql and need to write a query to sum values from separate columns of two different tables and then segregate into separate columns if positive or negative.
For Example,
Below is the source table
Below is the resultant table which need to be created also used while populating it
I have written below query to aggregate sum and able to populate TOT_CREDIT and TOT_DEBIT column. Is there any optimized query to achieve that ?
select t.account_id,
t.transaction_date,
SUM(t.transaction_amt) filter (where t.transaction_amt >= 0) as tot_debit,
SUM(t.transaction_amt) filter (where t.transaction_amt < 0) as tot_credit,
case
when
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
) < 0
then
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
)
end as credit_balance,
case
when
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
) > 0
then
(
SUM(t.transaction_amt) +
SUM(COALESCE(b.credit_balance,0)) +
SUM(COALESCE(b.debit_balance,0))
)
end as debit_balance,
from
transaction t
LEFT OUTER JOIN balance b ON (t.account_id = b.account_id
and t.transaction_date = b.transaction_date
and b.transaction_date=t.transaction_date- INTERVAL '1 DAYS')
group by
t.account_id,
t.transaction_date
Please provide some pointer.
EDIT 1: This query is not working in expected manner.
One way is to break your logic into smal queries and join them in the end!
select tw.account_id, tw.t_date,tw.t_c,th.T_D,fo.C_B,fi.d_B from
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as t_C from TransactionTABLE
where Transaction_AMT<0 group by account_id, Transaction_date ) as tw inner join
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as t_d from TransactionTABLE
where Transaction_AMT>0 group by account_id, Transaction_date ) as th on tw.account_id=th.account_id and tw.t_date=th.t_date inner join
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as C_B from TransactionTABLE
where sum(Transaction_AMT)<0 group by account_id, Transaction_date ) as fo on th.account_id=fo.account_id and th.t_date=fo.t_date inner join
(select account_id, Transaction_date as t_date, sum(Transaction_AMT) as d_B from TransactionTABLE
where sum(Transaction_AMT)>0 group by account_id, Transaction_date ) as fi on fi.account_id=fo.account_id and fi.t_date=fo.t_date;
Or else
You could try something as follows which calculates the running count of d_B over the Transaction_date and account_id
select account_id,
transaction_date,
SUM(transaction_amt) filter (where transaction_amt >= 0) as tot_debit,
SUM(transaction_amt) filter (where transaction_amt < 0) as tot_credit,
sum(transaction_amt) over (partition by account_id where sum(transaction_amt)<0) as credit_balance,
sum(transaction_amt) over (partition by account_id where sum(transaction_amt)>=0) as debit_balance
from TransactionTABLE group by account_id, Transaction_date order by 1,2;

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

SQL Top and Join

Business Rule: We can only bill for followup events every 90 days. Any events that occur less than 90 days after the previous one cannot be billed, but they need to be recorded.
User requirement: They want to see the date the last event bill was submitted on the tab where they would submit the bill for the current event, to have a visual cue as to whether submitting a bill is worth doing.
Events have an event_id and an event_date in table event. Event_id is a foreign key in table event_bill, which has the submitted_date for the bill.
Events have a foreign key customer_id, for each customer, so a customer can have multiple events in any time period.
Given the current event_id and the customer_id, I'm trying to get the submitted_date for the most recent previous event.
Here's what I've tried:
SELECT TOP 1 (event_id) as prev_event_id
INTO #tmp
FROM event
WHERE customer_id = #custID
AND event_type = 'Followup'
AND event_id < #eventID
ORDER BY event_date DESC
SELECT eb.submitted_date
FROM event_bill eb
JOIN #tmp
ON eb.event_id = #tmp.prev_event_id
DROP TABLE #tmp
This would be all well and good, but my application's database permissions don't allow for the creation of the temp table.
In an attempt without the temp table, I got errors that I can't use the ORDER BY in a derived table, but I need that to make sure I get the last event before the current one for this customer:
SELECT eb.submitted_date
FROM event_bill eb
JOIN
(
SELECT TOP 1 (event_id) as prev_event_id
FROM event
WHERE customer_id = #custID
AND event_type = 'Followup'
AND event_id < #eventID
ORDER BY event_date DESC
) x
ON eb.event_id = x.prev_event_id
Could anyone give me a better way to approach this?
Maybe it will help you
SELECT eb.submitted_date
FROM event_bill eb
JOIN
(
SELECT event_id as prev_event_id
FROM event
WHERE customer_id = #custID
AND event_type = 'Followup'
AND event_id < #eventID
and event_date =
(
select max(event_date)
FROM event
WHERE customer_id = #custID
AND event_type = 'Followup'
AND event_id < #eventID
)
) x
ON eb.event_id = x.prev_event_id
Try this :)
SELECT eb.submitted_date
FROM event_bill eb
WHERE
eb.event_id IN (
SELECT event.event_id
FROM event
WHERE
event.customer_id = #custID
AND event.event_type = 'Followup'
AND event.event_id < #eventID
ORDER BY event_date DESC
LIMIT 1
)
I would think that the ORDER BY would be accepted with the top. If not, you can do this:
SELECT eb.submitted_date
FROM event_bill eb JOIN
event e
on eb.event_id = e.event_id join
(SELECT customer_id, MAX(eventdate) as maxdate
FROM event
WHERE customer_id = #custID AND
event_type = 'Followup' AND
event_id < #eventID
group by customer_id
) md
ON e.customer_id = md.customer_id and
eb.event_date = md.maxdate
This calculates the maxdate and then uses this for the join.

writing a sql query in MySQL with subquery on the same table

I have a table svn1:
id | date | startdate
23 2002-12-04 2000-11-11
23 2004-08-19 2005-09-10
23 2002-09-09 2004-08-23
select id,startdate from svn1 where startdate>=(select max(date) from svn1 where id=svn1.id);
Now the problem is how do I let know the subquery to match id with the id in the outer query. Obviously id=svn1.id wont work. Thanks!
If you have the time to read more:
This really is a simplified version of asking what I really am trying to do here. my actual query is something like this
select
id, count(distinct archdetails.compname)
from
svn1,svn3,archdetails
where
svn1.name='ant'
and svn3.name='ant'
and archdetails.name='ant'
and type='Bug'
and svn1.revno=svn3.revno
and svn3.compname=archdetails.compname
and
(
(startdate>=sdate and startdate<=edate)
or
(
sdate<=(select max(date) from svn1 where type='Bug' and id=svn1.id)
and
edate>=(select max(date) from svn1 where type='Bug' and id=svn1.id)
)
or
(
sdate>=startdate
and
edate<=(select max(date) from svn1 where type='Bug' and id=svn1.id)
)
)
group by id LIMIT 0,40;
As you notice select max(date) from svn1 where type='Bug' and id=svn1.id has to be calculated many times.
Can I just calculate this once and store it using AS and then use that variable later. Main problem is to correct id=svn1.id so as to correctly equate it to the id in the outer table.
I'm not sure you can eliminate the repetition of the subquery, but the subquery can reference the main query if you use a table alias, as in the following:
select id,
count(distinct archdetails.compname)
from svn1 s1,
svn3 s3,
archdetails a
where s1.name='ant' and
s3.name='ant' and
a.name='ant' and
type='Bug' and
s1.revno=s3.revno and
s3.compname = a.compname and
( (startdate >= sdate and startdate<=edate) or
(sdate <= (select max(date)
from svn1
where type='Bug' and
id=s1.id and
edate>=(select max(date)
from svn1
where type='Bug' and
id=s1.id)) or
(sdate >= startdate and edate<=(select max(date)
from svn1
where type='Bug' and
id=s1.id)) )
group by id LIMIT 0,40;
Share and enjoy.
You should be able to left join to a sub-select so you only run the query once. Then you can do a join condition to pull out the maximum for the ID on each record as shown below:
SELECT id,
COUNT(DISTINCT archdetails.compname)
FROM svn1,
svn3,
archdetails
LEFT JOIN (
SELECT id, MAX(date) AS MaximumDate
FROM svn1
WHERE TYPE = 'Bug'
GROUP BY id
) AS MaxDate ON MaxDate.id = svn1.id
WHERE svn1.name = 'ant'
AND svn3.name = 'ant'
AND archdetails.name = 'ant'
AND TYPE = 'Bug'
AND svn1.revno = svn3.revno
AND svn3.compname = archdetails.compname
AND (
(startdate >= sdate AND startdate <= edate)
OR (
sdate <= MaxDate.MaximumDate
AND edate >= MaxDate.MaximumDate
)
OR (
sdate >= startdate
AND edate <= MaxDate.MaximumDate
)
)
GROUP BY
id LIMIT 0,
40;
Try using alias, something like this should work:
select s.id,s.startdate from svn1.s where s.startdate>=(select max(date) from svn1.s2 where s.id=s2.id);