How to merge 2 group by select statements - sql

I have 2 tables, 1 for free users and 1 for paid users, they are both tracking clicks, I need to show for each affiliate link who are the referrers (domains), the user some of time may be a free user and some of the time can be a paid user, so I need to merge both stats from both tables.
this query doesn't work:
SELECT ref,COUNT(ref) AS clicks
FROM click_analytics_free
WHERE link_id = '$link_id'
GROUP BY ref
UNION ALL
SELECT ref,COUNT(ref) AS clicks
FROM click_analytics_paid
WHERE link_id = '$link_id '
GROUP BY ref

You can do a union all and then aggregate again:
SELECT ref, SUM(free_clicks), SUM(paid_clicks),
SUM(free_clicks + paid_clicks)
FROM ((SELECT ref, COUNT(ref) AS free_clicks, 0 as paid_clicks
FROM click_analytics_free
WHERE link_id = ?
GROUP BY ref
) UNION ALL
(SELECT ref, 0, COUNT(ref) AS paid_clicks
FROM click_analytics_paid
WHERE link_id = ?
GROUP BY ref
)
) c
GROUP BY ref;
The ? is a parameter placeholder. Your code should be using parameters rather than munging query strings.

You can do aggregation :
SELECT ref,
SUM(CASE WHEN flag = 'free' THEN 1 ELSE 0 END) AS free_click,
SUM(CASE WHEN flag = 'paid' THEN 1 ELSE 0 END) AS paid_click
FROM (SELECT ref, 'free' as flag
FROM click_analytics_free
WHERE link_id = '$link_id '
UNION ALL
SELECT ref, 'paid' as flag
FROM click_analytics_paid
WHERE link_id = '$link_id'
) t
GROUP BY ref;

Related

Improving a SQL teradata query

I have a table like below and I want 'Y' in front of Ref 345 and 789 in the result-set on basis of count(Ref) = 1 where the amount is less than 0. I am using this query to get the desired output. My question is, is there any other (and more efficient) way to do it in Teradata?
SELECT T.Ref,T.AMOUNT, R.Refund_IND as Refund_IND
FROM Table1 t
LEFT JOIN (select 'Y' as Refund_IND, Ref from Table1 where Ref in
(select Ref from Table1 where amount < 0)
group by Ref having count(Ref) = 1) R on t.Ref = R.Ref
You can use window functions to test these conditions:
SELECT
Ref,
Amount,
CASE WHEN COUNT(*) OVER (PARTITION BY REF) = 1 AND Amount < 0 THEN 'Y' ELSE '' END AS Refund_Ind
FROM Table1

Optimizing code with multple conditions on multiple tables?

I want to check whether these customers have LEAD action or SELL action which both stay in another tables. However, It takes like forever to finish it.
create table ct_nguyendang.visitor
as
select user_id, updated_at::date,
case
when user_id in (select distinct d_visitor_id from xiti.lead_detail) then 'lead'
else 'None'
end as lead_action,
case
when user_id in (select distinct account_id from ct_nguyendang.daily_listor) then 'sell'
else 'None'
end as sell_action
I think you can use union all and aggregation:
select user_id, max(is_lead) as has_lead, max(is_sale) as has_sale
from ((select d_visitor_id as user_id, 1 as is_lead, 0 as is_sale
from xiti.lead_detail
) union all
(select account_id, 0, 1
from ct_nguyendang.daily_listor
)
) ls
group by user_id;
If you have a table of users, then you can use correlated subqueries:
select u.*,
(case when exists (select 1
from xiti.lead_detail l
where u.user_id = l.d_visitor_id
)
then 1 else 0
end) as has_lead,
(case when exists (select 1
from ct_nguyendang.daily_listor s
where u.user_id = s.account_id
)
then 1 else 0
end) as has_sale
from users u;
Note that I prefer using 1 for "true" and 0 for "false". Of course, you can use string values if you prefer.
To optimize this query, you want indexes on xiti.lead_detail(d_visitor_id) and ct_nguyendang.daily_listor(account_id).

Find where two conditions are present in group

I have a table:
ref | name
===========
123abc | received
123abc | pending
134b | pending
156c | received
I want to be able to identify instances where a ref only has a pending and not a received. Note there could be multiple receives and pendings for the same ref.
How can I output the ref's that only have a pending and not a received?
So in my example, it would return:
134b | pending
I think it's something like:
SELECT ref, name FROM my_table
WHERE ref IS NOT NULL
GROUP BY ref, name
HAVING ref = 'pending' AND ref = 'received'
;
I would use aggregation:
select name
from my_table
where ref in ('pending', 'received')
group by name
having min(ref) = 'pending' and min(ref) = max(ref);
The second condition comparing min and max is, strictly speaking, not necessary. But it eliminates the dependence on the alphabetical ordering of the values.
You can use not exists for what you need (btw, from your data, column "name" contains values like pending and received):
select distinct ref, name
from my_table t1
where t1.name = 'pending' and not exists (select * from my_table t2 where t1.ref=t2.ref and t2.name='received')
PS. You can validate here with your sample data and my query:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=6fd633fe52129ff3246d8dba55e5fc17
Another way of doing it is with a WITH statement. This way, there is no need for nested sub-queries.
WITH ref_recieved_pending AS (
SELECT
ref,
sum(CASE WHEN name = 'received'
THEN 1
ELSE 0 END) as recieved_count,
sum(CASE WHEN name = 'pending'
THEN 1
ELSE 0 END) as pending_count
FROM test_table_2
GROUP BY ref
)
SELECT DISTINCT
ref,
'pending' as pending
FROM ref_recieved_pending
WHERE pending_count > 0 AND recieved_count = 0;

How to add where condition if result count is greater than one

I want to build SQL query that returns unique id.
My problem is that i need to add another condition to query if i have more than one result.
select u.id
from users u
where u.id in ('1','2','3')
and u.active = 'Y'
if i get more than one result i need to add:
and u.active_contact = 'Y'
I tried to build this query
select * from (
select u.id, count(u.id) as results
from users u
where u.id in ('1','2','3')
and u.active = 'Y'
group by u.id
) tab
If(tab.results > 1) then
where tab.u.active_contact = 'Y'
end
Thanks in advanced.
Hope i explained my self good enough.
Here's a different approach:
SELECT id
FROM (SELECT id, (CASE WHEN active ='Y' THEN 1 ELSE 0 END) + (CASE WHEN active_contact ='Y' THEN 1 ELSE 0 END) as actv FROM users ORDER BY actv DESC)
WHERE actv > 0
LIMIT 1
The subquery adds a column which aggregates active and active_contact. The main SELECT then optimizes the combination of these two fields, requiring at least one of them. I believe this provides the intended result.
Among the possible ways to solve this, here are two.
1) Use the active_contact id. If there is none use another id.
select coalesce( max(case when active_contact = 'Y' then id end), max(id) ) as id
from users
where id in ('1','2','3')
and active = 'Y';
2) Sort with active_contact coming first. Then get the first record.
select id
from
(
select id
from users
where id in ('1','2','3')
and active = 'Y'
order by case when active_contact = 'Y' then 1 else 2 end
) where rownum = 1;
A method using Analytic functions
SELECT id
FROM (SELECT u.id
, u.active_contact
, count(*) OVER () actives
FROM users u
WHERE u.id IN ('1','2','3')
AND u.active = 'Y')
WHERE ( actives = 1
OR ( actives > 1
AND active_contact = 'Y'))
If there is more than one record where active = 'Y' AND active_contact = 'Y' it will return them all. If only one of these is required you will need to identify the criteria for choosing that one.

How to optimize this query in SQL Server 2008

How can I optimize query in SQL Server 2008?
Here is my Query.
SELECT DISTINCT
ListName ,
( SELECT COUNT(id)
FROM tbl_SurveyAssign
WHERE ListName = a.ListName
AND UserName IN (
SELECT UserName
FROM tbl_Panelist
WHERE tbl_Panelist.Subscribe = '1'
AND tbl_Panelist.Pending = '0'
AND tbl_Panelist.UserName IN (
SELECT UserName
FROM tbl_PanelistActivity
WHERE tbl_PanelistActivity.ActivityDate > ( GETDATE()
- 180 ) ) )
) AS Active ,
( SELECT COUNT(id)
FROM tbl_SurveyAssign
WHERE ListName = a.ListName
AND UserName IN (
SELECT UserName
FROM tbl_Panelist
WHERE tbl_Panelist.Subscribe = '1'
AND tbl_Panelist.Pending = '1' )
) AS Pending ,
( SELECT COUNT(id)
FROM tbl_SurveyAssign
WHERE ListName = a.ListName
AND UserName IN (
SELECT UserName
FROM tbl_Panelist
WHERE tbl_Panelist.Subscribe = '0'
AND tbl_Panelist.Pending = '0' )
) AS UnSubscribe ,
( SELECT COUNT(id)
FROM tbl_SurveyAssign
WHERE ListName = a.ListName
AND UserName IN (
SELECT UserName
FROM tbl_Panelist
WHERE tbl_Panelist.Subscribe = '1'
AND tbl_Panelist.Pending = '0'
AND tbl_Panelist.UserName NOT IN (
SELECT UserName
FROM tbl_PanelistActivity
WHERE tbl_PanelistActivity.ActivityDate > ( GETDATE()
- 180 ) ) )
) AS Inactive ,
( SELECT COUNT(id)
FROM tbl_SurveyAssign
WHERE ListName = a.ListName
) AS Total ,
( SELECT COUNT(id)
FROM tbl_SurveyAssign
WHERE ListName = a.ListName
AND UserName NOT IN ( SELECT UserName
FROM tbl_Panelist )
) AS NotMember
FROM tbl_SurveyAssign a
Without knowing the data, indexes etc and looking on the execution plan I would say there is two ways of making the query easier for the SQL server to process.
Simple soluiotn.
If you doing SQL queries inside the select part, the server is likely to make the same query for each row in tbl_SurveyAssign.
if you for each query making a query groped by listname and (select listname, count(*) from xxxxx group by listname) and joining in the result, the server just needs to makeing one query for each column. But it also depends if hte content of tbl_SurveyAssign always contains all list rows...
More advanced
At a quick glance it looks like you should be able to make this query with just one or two queries using joins. If you for exemple using tbl_SurveyAssign as a main table and to a left join with tbl_Panelist using username grouping on list name, you could do a count on tbl_PanelList you should get the number of members. From that you can calculate NotMembers but substracting form count(*) (this will skip a Not In( that is quite heavy.
Someting like this
select a.listname, count() Total, count() - count(b.username) NonMembers,
sum(case when b.Subscribe = '0' and b.Pending = '0' then 1 else 0 end) Unsubscribed,
sum(case when b.Subscribe = '1' and b.Pending = '1' then 1 else 0 end) Pending
from tbl_SurveyAssign a
left outer join tbl_PanelList b on a.username = b.username
You shold also be able to do a grouping query calculating the pending,active and unsubscribed just once.
if you grouping the activity by username and doing a max on the activity date you should be able to calculate the activity in one query as well.
select a.listname, count() Total, count() - count(b.username) NonMembers,
sum(case when b.Subscribe = '0' and b.Pending = '0' then 1 else 0 end) Unsubscribed,
sum(case when b.Subscribe = '1' and b.Pending = '1' then 1 else 0 end) Pending,
sum(case when b.Subscribe = '1' and b.Pending = '0' and act.DaysSinceLastActivity < 180 then 1 else 0 end) Active,
sum(case when b.Subscribe = '1' and b.Pending = '0' and act.DaysSinceLastActivity >= 180 then 1 else 0 end) Inactive
from tbl_SurveyAssign a
left outer join tbl_PanelList b on a.username = b.username
left outer join
(
select UserName, DateDiff(day, max(ActivityDate), GetDate()) DaysSinceLastActivity from tbl_PanelistActivity group by UserName
) act on a.username = act.username
Something like that. Keep in mind that its not tested in anyway, and I am quit sure I have misspelled a keyword or two.. But it will give you much less code.. and I think it is more readable as well. AND if there still is a performance issue, you will have a much smaller activity plan to dig in to...
I hope that helps you!