BigQuery: group counters by month after self-join - sql

I have table that looks like this:
I'm trying to build a query, that will show specific partnerId counters groupped by keywordName and month.
To solve first part(without grouping by month), I've built this query:
SELECT keywordName, COUNT(keywordName) as total, IFNULL(b.ebay_count, 0) as ebay, IFNULL(c.amazon_count, 0) as amazon,
FROM LogFilesv2_Dataset.FR_Clickstats_v2 a
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='eBay' THEN 1 ELSE 0 END) as ebay_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'eBay' GROUP BY kw) b
ON keywordName = b.kw
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='AmazonApi' THEN 1 ELSE 0 END) as amazon_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'AmazonApi' GROUP BY kw) c
ON keywordName = c.kw
WHERE keywordName = 'flipper' -- just to filter out single kw.
GROUP BY keywordName, ebay, amazon
It works quite well and returns following output:
Now I'm trying to make additional group by month, but all my attempts returned incorrect results.
Final output supposed to be similar to this:

You can do this with conditional aggregation:
select
date_trunc(dt, month) dt,
keywordName,
count(*) total,
sum(case when partnerId = 'eBay' then 1 else 0 end) ebay,
sum(case when partnerId = 'AmazonApi' then 1 else 0 end) amazon
from LogFilesv2_Dataset.FR_Clickstats_v2
group by date_trun(dt, month), keywordName

Related

How to exclude 0 from count()? in sql?

I have a code as below where I want to count number of first purchases for a given period of time. I have a column in my sales table where if the buyer is not a first time buyer, then is_first_purchase = 0
For example:
buyer_id = 456391 is already an existing buyer who made purchases on 2 different dates.
Hence is_first_purchase column will show as 0 as per below.
If i do a count() on is_first_purchase for this buyer_id = 456391 then it should return 0 instead of 2.
My query is as follows:
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
count(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
It returned the below which is not an intended output
Appreciate if someone can help explain how to exclude is_first_purchase = 0 from the count, thanks.
Because COUNT function count when the value isn't NULL (include 0), if you don't want to count, need to let CASE WHEN return NULL
There are two ways you can count as your expectation, one is SUM other is COUNT but remove the part of else 0
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
COUNT(case when first_purchase = 'Yes' then 1 end) as no_of_first_purchases
From your question, I would combine CTE and main query as below
select
COUNT(case when is_first_purchase = 1 then 1 end) as no_of_first_purchases
from sales
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
I think that you are using COUNT() when you want SUM().
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
You could simplify your query as:
SELECT COUNT(*) AS
FROM sales no_of_first_purchases
WHERE is_first_purchase = 1
AND buyer_id = 456391
AND date_id BETWEEN '2021-02-01' AND '2021-03-01'
ORDER BY 1 DESC;
It is better to avoid the use of functions like IF and CASE when it can be done with WHERE.
The simplest approach for Trino (f.k.a. Presto SQL) is to use an aggregate with a filter:
count(name) FILTER (WHERE first_purchase = 'Yes') AS no_of_first_purchases

SQL Year over Year Performance with Criteria

I am trying to return year over year results based on date criteria. There is additional information I would like to include in the query i.e. first date of activity and first date of activity with spot name like '%6%'. The current query I have is multiplying the correct amounts by 6 and I can't figure out how to solve. When I remove the first "where" clause I get the correct amounts. Any help would be appreciated.
Select
V.IGB_License,
DBA,
V.Sci_Games_Name,
convert(date,v2.Activity_date) as First6thMachineDate,
convert(date,V3.Activity_date) as GoLiveDate,
sum(case when (v.Activity_date between '1/23/2019' and DATEADD(YEAR,-2,getdate()-1)) then v.Funds_in else 0 end) as FundsIn2019,
sum(case when (v.Activity_date between '1/23/2020' and DATEADD(YEAR,-1,getdate()-1)) then v.Funds_in else 0 end) as FundsIn2020,
sum(case when (v.Activity_date between '1/23/2021' and getdate()) then v.Funds_in else 0 end) as FundsIn2021,
sum(case when (v.Activity_date between '1/23/2019' and DATEADD(YEAR,-2,getdate()-1)) then v.Net_funds else 0 end) as NetFunds2019,
sum(case when (v.Activity_date between '1/23/2020' and DATEADD(YEAR,-1,getdate()-1)) then v.Net_funds else 0 end) as NetFunds2020,
sum(case when (v.Activity_date between '1/23/2021' and getdate()) then v.Net_funds else 0 end) as NetFunds2021
From VGT_activity V
Left Join Locations on v.IGB_License = Locations.IGB_License
left join VGT_activity V2 on v.IGB_License = v2.IGB_License
Left join VGT_activity V3 on v.IGB_License = v3.IGB_License
Where v2.Activity_date = (
Select Min(V1.Activity_date)
From VGT_activity V1
Where v1.IGB_License = V2.IGB_License
and Spot_name like '%6%'
)
and v3.Activity_date = (
Select Min(V1.Activity_date)
From VGT_activity V1
Where v1.IGB_License = V3.IGB_License
)
group by V.IGB_License, dba, V.Sci_Games_Name, v2.Activity_date, v3.Activity_date
order by 4

SQL - Dividing aggregated fields, very new to SQL

I have list of line items from invoices with a field that indicates if a line was delivered or picked up. I need to find a percentage of delivered items from the total number of lines.
SALES_NBR | Total | Deliveryrate
1 = Delivered 0 = picked up from FULFILLMENT.
SELECT SALES_NBR,
COUNT (ITEMS) as Total,
SUM (case when FULFILLMENT = '1' then 1 else 0 end) as delivered,
(SELECT delivered/total) as Deliveryrate
FROM Invoice_table
WHERE STORE IN '0123'
And SALE_DATE >='2020-02-01'
And SALE_DATE <='2020-02-07'
Group By SALES_NBR, Deliveryrate;
My query executes but never finishes for some reason. Is there any easier way to do this? Fulfillment field does not contain any NULL values.
Any help would be appreciated.
I need to find a percentage of delivered items from the total number of lines.
The simplest method is to use avg():
select SALES_NBR,
avg(fulfillment) as delivered_ratio
from Invoice_table
where STORE = '0123' and
SALE_DATE >='2020-02-01' and
SALE_DATE <='2020-02-07'
group by SALES_NBR;
I'm not sure if the group by sales_nbr is needed.
If you want to get a "nice" query, you can use subqueries like this:
select
qry.*,
qry.delivered/qry.total as Deliveryrate
from (
select
SALES_NBR,
count(ITEMS) as Total,
sum(case when FULFILLMENT = '1' then 1 else 0 end) as delivered
from Invoice_table
where STORE IN '0123'
and SALE_DATE >='2020-02-01'
and SALE_DATE <='2020-02-07'
group by SALES_NBR
) qry;
But I think this one, even being ugglier, could perform faster:
select
SALES_NBR,
count(ITEMS) as Total,
sum(case when FULFILLMENT = '1' then 1 else 0 end) as delivered,
sum(case when FULFILLMENT = '1' then 1 else 0 end)/count(ITEMS) as Deliveryrate
from Invoice_table
where STORE IN '0123'
and SALE_DATE >='2020-02-01'
and SALE_DATE <='2020-02-07'
group by SALES_NBR

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().

SQL select grouping and subtract

i have table named source table with data like this :
And i want to do query that subtract row with status plus and minus to be like this group by product name :
How to do that in SQL query? thanks!
Group by the product and then use a conditional SUM()
select product,
sum(case when status = 'plus' then total else 0 end) -
sum(case when status = 'minus' then total else 0 end) as total,
sum(case when status = 'plus' then amount else 0 end) -
sum(case when status = 'minus' then amount else 0 end) as amount
from your_table
group by product
There is another method using join, which works for the particular data you have provided (which has one "plus" and one "minus" row per product):
select tplus.product, (tplus.total - tminus.total) as total,
(tplus.amount - tminus.amount) as amount
from t tplus join
t tminus
on tplus.product = tminus.product and
tplus.status = 'plus' and
tplus.status = 'minus';
Both this and the aggregation query work well for the data you have provided. In other words, there are multiple ways to solve this problem (each has its strengths).
you can query as below:
select product , sum (case when [status] = 'minus' then -Total else Total end) as Total
, sum (case when [status] = 'minus' then -Amount else Amount end) as SumAmount
from yourproduct
group by product