How to get all months in a query that has no matches? - google-bigquery

My current output looks like this where I filtered out the result with only one SKU in my query.
I want to display results like the below image
So I tweaked my query a little bit but only got results like this
what should I do to display SKU as well, I don't want the Null value in the SKU column.
current query written so far
month_agg as
(select sku_number, extract(month from date_) as month, sum(v) as viewed, sum(a) as add_to_cart, sum(p) as purchased
from merge_and_pivot
where sku_number = '10671924'
group by sku_number, 2 order by 1,2)
, month_generate as
(SELECT extract(month from date_) as month
FROM UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2018-12-01', INTERVAL 1 MONTH)) AS date_)
select a.month, b.sku_number, coalesce(b.viewed, 0) , coalesce(b.add_to_cart, 0), coalesce(b.purchased, 0)
from month_generate a
left JOIN month_agg b on a.month = b.month

Consider below query
SELECT month,
MAX(sku_number) OVER() sku_number, -- assuming all sku_number is same
COALESCE(viewed, 0) viewed,
COALESCE(add_to_cart, 0) add_to_cart,
COALESCE(purchased, 0) purchased,
FROM UNNEST(GENERATE_ARRAY(1, 12)) month LEFT JOIN sample USING(month)
ORDER BY month;
output:
with sample:
CREATE TEMP TABLE sample AS
SELECT '10671924' sku_number, 1 month, 9 viewed, 6 add_to_cart, 0 purchased UNION ALL
SELECT '10671924', 10, 32, 8, 0 UNION ALL
SELECT '10671924', 11, 948, 688, 163 UNION ALL
SELECT '10671924', 12, 630, 299, 83;
Updated Query :
SELECT month,
sku.sku_number,
COALESCE(viewed, 0) viewed,
COALESCE(add_to_cart, 0) add_to_cart,
COALESCE(purchased, 0) purchased,
FROM (SELECT DISTINCT sku_number FROM sample) sku, UNNEST(GENERATE_ARRAY(1, 12)) month
LEFT JOIN sample USING(sku_number, month)
ORDER BY sku_number, month;
output:

Related

How to calculate value based on average of previous month and average of same month last year in SQL

I would like to calculate targets for opened rates and clicked rates based on actuals of the last month and the same month last year.
My table is aggregated at daily level and I have grouped it by month and year to get the monthly averages. I have then created a self-join to join my current dates on the results of the previous months. This works fine for all months except for January because SQL can't know that it's supposed to join 1 on 12. Is there a way to specify this in my join clause?
Essentially, the results for January 2021 shouldn't be null because I have December 2020 data.
This is my data and my query:
CREATE TABLE exasol_last_year_avg(
date_col date,
country text,
brand text,
category text,
delivered integer,
opened integer,
clicked integer
)
INSERT INTO exasol_last_year_avg
(date_col,country,brand,category,delivered,opened,clicked) VALUES
(2021-01-01,'AT','brand1','cat1',100,60,23),
(2021-01-01,'AT','brand1','cat2',200,50,45),
(2021-01-01,'AT','brand2','cat1',300,49,35),
(2021-01-01,'AT','brand2','cat2',400,79,57),
(2021-02-02,'AT','brand1','cat1',130,78,30),
(2021-02-02,'AT','brand1','cat2',260,65,59),
(2021-02-02,'AT','brand2','cat1',390,64,46),
(2021-02-02,'AT','brand2','cat2',520,103,74),
(2020-12-02,'AT','brand1','cat1',130,78,30),
(2020-12-02,'AT','brand1','cat2',260,65,59),
(2020-12-02,'AT','brand2','cat1',390,64,46),
(2020-12-02,'AT','brand2','cat2',520,103,74),
(2020-02-02,'AT','brand1','cat2',236,59,53),
(2020-02-02,'AT','brand2','cat1',355,58,41),
(2020-02-02,'AT','brand2','cat2',473,93,67),
(2020-02-02,'AT','brand1','cat1',118,71,27)
This is written in PostgresSQL because I think it's more accessible to most people, but my production database is Exasol!
select *
from
(Select month_col,
year_col,
t_campaign_cmcategory,
t_country,
t_brand,
(t2_clicktoopenrate + t3_clicktoopenrate)/2 as target_clicktoopenrate,
(t2_openrate + t3_openrate)/2 as target_openrate
from (
with CTE as (
select extract(month from date_col) as month_col,
extract(year from date_col) as year_col,
category as t_campaign_cmcategory,
country as t_country,
brand as t_brand,
round(sum(opened)/nullif(sum(delivered),0),3) as OpenRate,
round(sum(clicked)/nullif(sum(opened),0),3) as ClickToOpenRate
from public.exasol_last_year_avg
group by 1, 2, 3, 4, 5)
select t1.month_col,
t1.year_col,
t2.month_col as t2_month_col,
t2.year_col as t2_year_col,
t3.month_col as t3_month_col,
t3.year_col as t3_year_col,
t1.t_campaign_cmcategory,
t1.t_country,
t1.t_brand,
t1.OpenRate,
t1.ClickToOpenRate,
t2.OpenRate as t2_OpenRate,
t2.ClickToOpenRate as t2_ClickToOpenRate,
t3.OpenRate as t3_OpenRate,
t3.ClickToOpenRate as t3_ClickToOpenRate
from CTE t1
left join CTE t2
on t1.month_col = t2.month_col + 1
and t1.year_col = t2.year_col
and t1.t_campaign_cmcategory = t2.t_campaign_cmcategory
and t1.t_country = t2.t_country
and t1.t_brand = t2.t_brand
left join CTE t3
on t1.month_col = t3.month_col
and t1.year_col = t3.year_col + 1
and t1.t_campaign_cmcategory = t3.t_campaign_cmcategory
and t1.t_country = t3.t_country
and t1.t_brand = t3.t_brand) as target_base) as final_tbl
Start with an aggregation query:
select date_trunc('month', date_col), country, brand,
sum(opened) * 1.0 / nullif(sum(delivered), 0) as OpenRate,
sum(clicked) * 1.0 / nullif(sum(opened), 0) as ClickToOpenRate
from exasol_last_year_avg
group by 1, 2, 3;
Then, use window functions. Assuming you have a value for every month (with no gaps). you can just use lag(). I'm not sure what your final calculation is, but this brings in the data:
with mcb as (
select date_trunc('month', date_col) as yyyymm, country, brand,
sum(opened) * 1.0 / nullif(sum(delivered), 0) as OpenRate,
sum(clicked) * 1.0 / nullif(sum(opened), 0) as ClickToOpenRate
from exasol_last_year_avg
group by 1, 2, 3
)
select mcb.*,
lag(openrate, 1) over (partition by country, brand order by yyyymm) as prev_month_openrate,
lag(ClickToOpenRate, 1) over (partition by country, brand order by yyyymm) as prev_month_ClickToOpenRate,
lag(openrate, 12) over (partition by country, brand order by yyyymm) as prev_year_openrate,
lag(ClickToOpenRate, 12) over (partition by country, brand order by yyyymm) as prev_year_ClickToOpenRate
from mcb;
This works with a different join condition:
select *
from
(Select month_col,
year_col,
t_campaign_cmcategory,
t_country,
t_brand,
(t2_clicktoopenrate + t3_clicktoopenrate)/2 as target_clicktoopenrate,
(t2_openrate + t3_openrate)/2 as target_openrate
from (
with CTE as (
select extract(month from date_col) as month_col,
extract(year from date_col) as year_col,
category as t_campaign_cmcategory,
country as t_country,
brand as t_brand,
round(sum(opened)/nullif(sum(delivered),0),3) as OpenRate,
round(sum(clicked)/nullif(sum(opened),0),3) as ClickToOpenRate
from public.exasol_last_year_avg
group by 1, 2, 3, 4, 5)
select t1.month_col,
t1.year_col,
t2.month_col as t2_month_col,
t2.year_col as t2_year_col,
t3.month_col as t3_month_col,
t3.year_col as t3_year_col,
t1.t_campaign_cmcategory,
t1.t_country,
t1.t_brand,
t1.OpenRate,
t1.ClickToOpenRate,
t2.OpenRate as t2_OpenRate,
t2.ClickToOpenRate as t2_ClickToOpenRate,
t3.OpenRate as t3_OpenRate,
t3.ClickToOpenRate as t3_ClickToOpenRate
from CTE t1
left join CTE t2
-- adjusted join condition
on ((t1.month_col = (CASE WHEN t1.month_col = 1 then t2.month_col - 11 END) and t1.year_col = t2.year_col + 1)
or (t1.month_col = (CASE WHEN t1.month_col != 1 then t2.month_col + 1 END) and t1.year_col = t2.year_col))
and t1.t_campaign_cmcategory = t2.t_campaign_cmcategory
and t1.t_country = t2.t_country
and t1.t_brand = t2.t_brand
left join CTE t3
on t1.month_col = t3.month_col
and t1.year_col = t3.year_col + 1
and t1.t_campaign_cmcategory = t3.t_campaign_cmcategory
and t1.t_country = t3.t_country
and t1.t_brand = t3.t_brand) as target_base) as final_tbl

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

Grouping multiple selects within a SQL query

I have a table Supplier with two columns, TotalStock and Date. I'm trying to write a single query that will give me stock totals by week / month / year for a list of suppliers.
So results will look like this..
SUPPLIER WEEK MONTH YEAR
SupplierA 50 100 2000
SupplierB 60 150 2500
SupplierC 15 25 200
So far I've been playing around with multiple selects but I can't get any further than this:
SELECT Supplier,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-5-12'
GROUP BY Supplier
) AS StockThisWeek,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-5-1'
GROUP BY Supplier
) AS StockThisMonth,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-1-1'
GROUP BY Supplier
) AS StockThisYear
This query throws an error as each individual grouping returns multiple results. I feel that I'm close to the solution but can't work out where to go
You don't have to use subqueries to achieve what you want :
SELECT Supplier
, SUM(CASE WHEN Date >= CAST('2014-05-12' as DATE) THEN TotalStock END) AS StockThisWeek
, SUM(CASE WHEN Date >= CAST('2014-05-01' as DATE) THEN TotalStock END) AS StockThisMonth
, SUM(CASE WHEN Date >= CAST('2014-01-01' as DATE) THEN TotalStock END) AS StockThisYear
FROM StockBreakdown
GROUP BY Supplier
You may need to make the selects for the columns return only a single result. You could try this (not tested currently):
SELECT Supplier,
(
SELECT TOP 1 StockThisWeek FROM
(
SELECT Supplier, Sum(TotalStock) AS StockThisWeek
FROM StockBreakdown
WHERE Date >= '2014-5-12'
GROUP BY Supplier
) tmp1
WHERE tmp1.Supplier = Supplier
) AS StockThisWeek,
(
SELECT TOP 1 StockThisMonth FROM
(
SELECT Supplier, Sum(TotalStock) AS StockThisMonth
FROM StockBreakdown
WHERE Date >= '2014-5-1'
GROUP BY Supplier
) tmp2
WHERE tmp2.Supplier = Supplier
) AS StockThisMonth,
...
This selects the supplier and then tries to create two columns StockThisWeek and StockThisMonth by selecting the first entry from the select you created before. As through the GROUP BY there should only be one entry per supplier, so you don't lose and data.

How to get the real totals (without the TOP) when I do a ms access report with the SELECT TOP 10 ....?

When I do a report I use the variable
=sum([sell])
and the result here is the sum of TOP 10. My question is how do I show the result of the sum with all the elements, like the TOP 10 wouldn't exists?
SQL example:
Select top 10 name, cust, sell from sales
In practice the query is monstrous, big and dirty:
SELECT top 125 COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI, SUM(VENDA1) AS VENDAS1, SUM(VENDA2) AS VENDAS2, ROUND(IIF(SUM(VENDA1)=0, 9999, ((SUM(VENDA2)-SUM(VENDA1)))/abs(SUM(VENDA1))*100), 2) AS PER_DIFF FROM( SELECT quarter, month, COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI, VENDA AS VENDA1, 0 AS VENDA2 FROM STKQRY_VENDAS07_FAM_MONTH_VND_CLI_F1 WHERE year = '2012' AND Month between '00' and '05' UNION ALL SELECT quarter, month, COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI, 0 AS VENDA1, VENDA AS VENDA2 FROM STKQRY_VENDAS07_FAM_MONTH_VND_CLI_F1 WHERE year = '2013' AND Month between '00' and '05' ) GROUP BY COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI HAVING (SUM(VENDA1) > 1000 OR SUM(VENDA2) > 1000) ORDER BY vendas2 desc
You need to join 2 queries like
SELECT TOP 10 Company, SUM(Sales) from MyTable Group By Company --Query to get data for TOP 10
Union All
SELECT 'Grand Total', SUM(Sales) from MyTable --Query to get the Grand total

Output two columns for 1 field for different date ranges?

I have a SQL table "ITM_SLS" with the following fields:
ITEM
DESCRIPTION
TRANSACTION #
DATE
QTY SOLD
I want to be able to output QTY SOLD for a one month value and a year to date value so that the output would look like this:
ITEM, DESCRIPTION, QTY SOLD MONTH, QTY SOLD YEAR TO DATE
Is this possible?
You could calculate the total quantity sold using group by in a subquery. For example
select a.Item, a.Description, b.MonthQty, c.YearQty
from (
select distinct Item, Description from TheTable
) a
left join (
select Item, sum(Qty) as MonthQty
from TheTable
where datediff(m,Date,getdate()) <= 1
group by Item
) b on a.Item = b.Item
left join (
select Item, sum(Qty) as YearQty
from TheTable
where datediff(y,Date,getdate()) <= 1
group by Item
) c on a.Item = c.Item
The method to limit the subquery to a particular date range differs per DBMS, this example uses the SQL Server datediff function.
Assuming the "one month" is last month...
select item
, description
, sum (case when trunc(transaction_date, 'MM')
= trunc(add_months(sysdate, -1), 'MM')
then qty_sold
else 0
end) as sold_month
, sum(qty_sold) as sold_ytd
from itm_sls
where transaction_date >= trunc(sysdate, 'yyyy')
group by item, description
/
This will give you an idea of what you can do:
select
ITEM,
DESCRIPTION,
QTY SOLD as MONTH,
( select sum(QTY SOLD)
from ITM_SLS
where ITEM = I.ITEM
AND YEAR = i.YEAR
) as YEAR TO DATE
from ITM_SLS I