Showing all channel groups even without record in database - sql

I have advertiser and channel_group columns. My code is below and the output is there as well. I want my output to contain ALL channel groups ( for instance to A-J, even if there is no value to it) How can i accomplish that? Any tips because i dont have any idea.
(SELECT
advertiser,
channel_group,
ROUND ( sum(cost) ) AS cost
FROM student_37.data_table
JOIN student_37.dict
ON student_37.data_table.audiocode = student_37.dict.audiocode
JOIN student_37.channel_group
ON student_37.data_table.medium = student_37.channel_group.channel
GROUP BY advertiser,channel_group
ORDER BY advertiser)
SELECT
advertiser,
channel_group,
cost,
ROUND (cost::decimal/SUM(cost) OVER(PARTITION BY advertiser),2) AS sos_adv ,
ROUND (cost::decimal/SUM(cost) OVER(PARTITION BY channel_group),2) AS sos_channel_group ,
ROUND(cost / ( SELECT sum(cost) FROM sos),2) AS sos
FROM sos
ORDER BY advertiser,
array_position(ARRAY['TVP1','TVP2','TVP tem','TVN','TVN tem','Polsat','Polsat tem','unknown'],channel_group);
"company1";"B";"TV";16537
"company1";"C";"TV";20406
"company1";"D";"TV";33380
"company1";"E";"TV";193633
"company1";"F";"TV";14957
"company1";"G";"TV";5338
"company2";"A";"TV";46580
"company2";"B";"TV";56223
"company2";"G";"TV";80735
"company2";"H";"TV";80874
"company2";"J";"TV";38511
I want to get something like this, i dont have any records so i need to some kind of generate them?
"company1";"A";"TV";
"company1";"B";"TV";16537
"company1";"C";"TV";20406
"company1";"D";"TV";33380
"company1";"E";"TV";193633
"company1";"F";"TV";14957
"company1";"G";"TV";5338
"company1";"I";"TV";
"company1";"J";"TV";
"company2";"A";"TV";46580
"company2";"B";"TV";56223
"company2";"C";"TV";
"company2";"D";"TV";
"company2";"E";"TV";
"company2";"F";"TV";
"company2";"G";"TV";56223
"company2";"H";"TV";80874
"company2";"I";"TV";
"company2";"J";"TV";38511

If you want all channel groups -- even those with no data -- then you want outer joins.
You don't provide sample data, but I am guessing that you want:
SELECT advertiser, channel_group,
ROUND( SUM(cost) ) AS cost
FROM student_37.channel_group cg LEFT JOIN
student_37.data_table dt
ON dt.medium = cg.channel LEFT JOIN
student_37.dict d
ON dt.audiocode = d.audiocode
GROUP BY advertiser, channel_group

Related

Trying to join multiple tables without all the pair of common columns hence the values are repeating from the last tables. Need help to solve this

I Have the following lines and result is added in image link. The results of 1adjust` to be joined, there is no platform or date column in it, hence the records are repreated. Is there a way to avoid this. This will cause issue in visualizations at campaign level when the repeated items are getting summed
with
sent as (
select campaign_name, date(date) as date, platform, count(id) as sent
from send
group by 1,2,3
),
bounce as (
select campaign_name, platform, count(id) as bounce
from bounce
group by 1,2
),
open as (
select campaign_name, platform, count(id) as clicks
from open
group by 1,2
),
adjust as (
select campaign, sum(purchase_events) as transactions, count(distinct adjust_id) as sessions, sum(sessions) as s2, sum(clicks) as ad_clicks
from adjust
group by 1
)
select
s.campaign_name,
s.date,
s.platform,
s.sent,
(s.sent-b.bounce) as delivered,
b.bounce,
o.clicks,
a.ad_clicks,
a.sessions,
a.s2,
a.transactions
from sent s
join bounce b on s.campaign_name = b.campaign_name and s.platform = b.platform
join open o on s.campaign_name = o.campaign_name and s.platform = o.platform
left join adjust a on s.campaign_name = a.campaign
See the result here

For a given product, for each store, sum daily sales at all nearby stores

Given a daily_summary table containing columns {order_date, store_code, product_id, sales} and a stores table containing columns {store_code,latitude,longitude}, how can I:
For a given product_id (eg "1234"), for each store_code, get the daily SUM(sales) for the same product at nearby stores (within a 10km radius)? Output is a table with columns {store_code, order_date, sales_at_nearby_stores}, and I'm asking specifically for BigQuery.
My current query works, but is too slow. I'm sure there's a faster way to do it. Here's what I have so far:
WITH store_distances AS (
SELECT
t1.store_code store1,
t2.store_code store2,
ST_DISTANCE(
ST_GEOGPOINT(t1.longitude,t1.latitude),
ST_GEOGPOINT(t2.longitude,t2.latitude)
) AS distance_meters
FROM stores t1
CROSS JOIN stores t2
WHERE t1.store_code != t2.store_code
), nearby_stores_table AS (
SELECT
t1.store1 AS store_code,
STRING_AGG(DISTINCT t2.store2) AS nearby_stores
FROM store_distances t1
LEFT JOIN store_distances t2 USING (store1)
WHERE t2.distance_meters < 10000
GROUP BY t1.store1
ORDER BY t1.store1
), ds_with_nearby_stores AS (
SELECT
order_date, store_code, nearby_stores, sales
FROM daily_summary
LEFT JOIN nearby_stores_table USING (store_code)
WHERE product_id="1234"
)
SELECT DISTINCT
store_code, order_date,
(
SELECT SUM(sales)
FROM ds_with_nearby_stores t2
WHERE t2.store_code IN UNNEST(SPLIT(t1.nearby_stores)) AND t1.order_date=t2.order_date
) AS sales_at_nearby_stores,
FROM ds_with_nearby_stores t1
ORDER BY store_code, order_date
The first part of the query generates a table with {store1, store2, and the distance_meters between the 2}. The second part generates a table with {store_code, nearby_stores which is a comma-separated string of nearby stores}. The third part of the query joins the 2nd table with daily_summary (filtered on product_id), which gives us a table with {order_date, store_code, nearby_stores, sales}. Finally the last unpacks the string of nearby_stores and adds up the sales from those stores, giving us {store_code, order_date, sales_at_nearby_stores}
It is hard to say what exactly is slow here, without data, and without the query explanation that is displayed after the query finishes. If it finishes at all - please add query explanations.
One of the reasons it might be slow is it computes all the pair-wise distances between all stores - creating large join, and computing tons of distances. BigQuery has optimized Spatial JOIN that is able to do it much faster using ST_DWithin predicate - which filters out by given distance. The first two CTEs can be rewritten as
WITH stores_with_loc AS (
SELECT
store_code store,
ST_GEOGPOINT(longitude,latitude) loc
FROM stores
), nearby_stores_table AS (
SELECT
t1.store AS store_code,
ARRAY_AGG(DISTINCT IF(t2.store <> t1.store, t2.store, NULL) IGNORE NULLS) AS nearby_stores
FROM stores_with_loc t1
JOIN stores_with_loc t2
ON ST_DWithin(t1.loc, t2.loc, 10000)
GROUP BY t1.store
)
select * from nearby_stores_table
Other tweaks:
I used ARRAY_AGG, should be faster than converting to strings
Used regular join, rather than LEFT JOIN - BigQuery only optimized inner spatial join right now. The store always joins itself, so it is OK. We later drop the self-reference inside ARRAY_AGG expression.
Don't use ORDER BY in sub-queries, they don't change anything anyway.

calculating cohort data in firebase -> bigquery, but I want to separate them by tracking source. Grouping wont work?

I'm trying to calculate quality of users with cohort data in bigquery
My current query is:
WITH analytics_data AS (
SELECT user_pseudo_id, event_timestamp, event_name, app_info.id,geo.country as country,platform ,app_info.id as bundle_id,
UNIX_MICROS(TIMESTAMP("2019-12-05 00:00:00")) AS start_day,
3600*1000*1000*24 AS one_day_micros
FROM `table.events_*`
WHERE _table_suffix BETWEEN "20191205" AND "20191218"
)
SELECT day_7_cohort / day_0_cohort AS seven_day_conversion FROM (
WITH day_7_users AS (
SELECT DISTINCT user_pseudo_id
FROM analytics_data
WHERE event_name = 'watched_20_ads' AND event_timestamp BETWEEN start_day AND start_day+(12*one_day_micros)
), day_0_users AS (
SELECT DISTINCT user_pseudo_id
FROM analytics_data
WHERE event_name = "first_open"
AND bundle_id = "com.bundle.id"
AND country = "United States"
AND platform = "ANDROID"
AND event_timestamp BETWEEN start_day AND start_day+(1*one_day_micros)
)
SELECT
(SELECT count(*)
FROM day_0_users) AS day_0_cohort,(SELECT count(*)
FROM day_7_users
JOIN day_0_users USING (user_pseudo_id)) AS day_7_cohort
)
the problem is that I'm unable to separate the users by tracking source.
I want to separate the users by: tracking source and country.
What I'm curently getting:
what I would like to see:
What would be perfect:
I'm not sure if it's possible to write a query that would return the data in a single table, without involving more queries and data storage elsewhere.
So your question is missing some data/fields, but I will provide a 'general' solution.
with data as (
-- Select the fields you need to define criteria and cohorts
),
cohort_info as (
-- Cohort Logic (might be more complicated than this)
select user_id, source, country---, etc...
from data
group by 1,2,3
),
day_0_users as (
-- Logic to determine who you are measuring for your calculation
),
day_7_users as (
-- Logic to detemine who qualifies as a 7 day user for your calculation
),
joined as (
-- Join your CTEs together
select
cohort_info.source,
cohort_info.country,
count(distinct day_0_users.user_id) as day_0_count,
count(distinct day_7_users.user_id) as day_7_count
from day_0_users
left join day_7_users using(user_id)
inner join cohort_info using(user_id)
group by 1,2
)
select *, day_7_count/day_0_count as seven_day_conversion
from joined
I think using several CTEs in this manner will make your code more readable and will enable you to track your logic a bit better. Nested Subqueries tend to get ugly.

How to get a percentile rank based on a computation

there are four tables as :
T_SALES has columns like
CUST_KEY,
ITEM_KEY,
SALE_DATE,
SALES_DLR_SALES_QTY,
ORDER_QTY.
T_CUST has columns like
CUST_KEY,
CUST_NUM,
PEER_GRP_ID
T_PEER_GRP has columns like
PEER_GRP_ID,
PEER_GRP_DESC,
PRNT_PEER_GRP_ID
T_PRNT_PEEER has columns like
PRNT_PEER_GRP_ID,
PRNT_PEER_DESC
Now for the above tables, i need to generate a percentile rank of the customer based on the computation fillrate = SALES_QTY / ORDER_QTY * 100 by peer group within a parent peer.
could someone please help on this?
You can use the analytic function PERCENT_RANK() to calculate the percentile rank, as below:
SELECT
t_s.cust_key,
t_c.cust_num,
PERCENT_RANK() OVER (ORDER BY (t_s.SALES_DLR_SALES_QTY / ORDER_QTY) DESC) as pr
FROM t_sales t_s
INNER JOIN t_cust t_c ON t_s.cust_key = t_c.cust_key
ORDER BY pr;
Reference:
PERCENT_RANK on Oracle® Database SQL Reference
If by "percentile rank" you mean "percent rank" (documented here), then the harder part is the joins. I think this is the basic data that you want for the percentile rank:
select t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID,
sum(SALES_DLR_SALES_QTY * ORDER_QTY) as total
from t_sales s join
t_customers c
on s.CUST_KEY = c.cust_key join
t_peer_grp t
on t.PEER_GRP_ID = c.PEER_GRP_ID
group by t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID;
You can then calculate the percentile (0 to 100) as:
select t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID,
sum(SALES_DLR_SALES_QTY * ORDER_QTY) as total,
percentile_rank() over (partition by t.PRNT_PEER_GRP_ID
order by sum(SALES_DLR_SALES_QTY * ORDER_QTY)
)
from t_sales s join
t_customers c
on s.CUST_KEY = c.cust_key join
t_peer_grp t
on t.PEER_GRP_ID = c.PEER_GRP_ID
group by t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID;
Note that this mixes analytic functions with aggregation functions. This can look awkward when you first learn about it.

SQL: Using UNION

Here is the question and database info.
Use the UNION command to prepare a full statement for customer 'C001' - it should be laid out as follows. (Note that the values shown below are not correct.) You may be able to use '' or NULL for blank values - if necessary use 0.
Here is a link to the webpage with the database info. http://sqlzoo.net/5_0.htm or see the image below.
Here is what I have tried:
SELECT sdate AS LineDate, "delivery" AS LEGEND, price*quantity AS Total,"" AS Amount
FROM shipped
JOIN product ON (shipped.product=product.id)
WHERE badguy='C001'
UNION
SELECT rdate,notes, "",receipt.amount
FROM receipt
WHERE badguy='C001'
Here is what I get back:
Wrong Answer. The correct answer has 5 row(s).
The amounts don't seem right in the amount column and I can't figure out how to order the data by the date since it is using two different date columns (sdate and rdate which are UNIONED).
Looks like the data in the example is being aggregated by date and charge type using group by, that's why you are getting too many rows.
Also, you can sort by the alias of the column (LineDate) and the order by clause will apply to all the rows in the union.
SELECT sdate AS LineDate, "delivery" AS LEGEND, SUM(price*quantity) AS Total,"" AS Amount
FROM shipped
JOIN product ON (shipped.product=product.id)
WHERE badguy='C001'
GROUP BY sdate
UNION
SELECT rdate, notes, "",receipt.amount
FROM receipt
WHERE badguy='C001'
ORDER BY LineDate
It's usually easiest to develop each part of the union separately. Pay attention to the use of "null" to separate the monetary columns. The first select gets to name the columns.
select s.sdate as tr_date, 'Delivery' as type, sum((s.quantity * p.price)) as extended_price, null as amount
from shipped s
inner join product p on p.id = s.product
where badguy = 'C001'
group by s.sdate
union all
select rdate, notes, null, sum(amount)
from receipt
where badguy = 'C001'
group by rdate, notes
order by tr_date