How to use multiple custom dimensions in Google Big Query - google-bigquery

Is there a way to use multiple custom dimensions in GBQ without using the Max function? My problem of using Max function is that it only saves the max pax_num, but I would like to have the count of visitors for all of the combinations of ( Date,product.v2ProductCategory,eCommerceAction.action_type
,product.v2ProductName). Note the pax_num is number of pax on that ticket. I need every combination of the dest+pax_num, not the dest+max(pax_num)
SELECT
Date
,count(distinct( concat(FULLVISITORID,cast(visitID as string)))) as visitor
, product.v2ProductCategory as product_category
,max(if(customDimensions.index=2, customDimensions.value,null)) as dest
,max((if(customDimensions.index=21, customDimensions.value,null)) ) as pax_num
,eCommerceAction.action_type as Action_type
,product.v2ProductName as product_name
FROM `table` as t
CROSS JOIN UNNEST(hits) AS hit
CROSS JOIN UNNEST(hit.customDimensions) AS customDimensions
CROSS JOIN UNNEST(hit.product) AS product
GROUP BY
Date
,product.v2ProductCategory
,eCommerceAction.action_type
,product.v2ProductName

Not sure if this is what you are looking for, but if you include the field pax_num in the group by you might already find what you need, like so:
select
date,
count(distinct( concat(FULLVISITORID,cast(visitID as string)))) as sessions,
product.v2ProductCategory category,
max(if(customDimensions.index=2, customDimensions.value, null)) as dest,
if(customDimensions.index=21, customDimensions.value,null) as pax_num,
eCommerceAction.action_type as act_type,
product.v2ProductName as product_name
from `table` as t,
unnest(hits) as hit,
unnest(hit.customDimensions) customDimensions,
unnest(hit.product) as product
group by
date,
category,
act_type,
pax_num,
product_name
having pax_num is not null
You gave as an example the pax_num values "paxnum_5" and "paxnum_6". If you insert the value pax_num in the group by operation, the count aggregation should happen on the level of pax_num which would preserve the values (and not mix everything into the max value as before).
Also, notice that if you count the distinct combination of fullvisitorids and visitids you are actually computing the total amount of sessions and not visitors (their definition is not the same).

Add the fullvisitorID solve the problem
SELECT
Date
,concat(fullVisitorID,cast(visitID as string)) as visitorID
,count(distinct( concat(FULLVISITORID,cast(visitID as string)))) as visitor
, product.v2ProductCategory as product_category
,max(if(customDimensions.index=2, customDimensions.value,null)) as dest
,max((if(customDimensions.index=21, customDimensions.value,null)) ) as pax_num
,eCommerceAction.action_type as Action_type
,product.v2ProductName as product_name
FROM `table` as t
CROSS JOIN UNNEST(hits) AS hit
CROSS JOIN UNNEST(hit.customDimensions) AS customDimensions
CROSS JOIN UNNEST(hit.product) AS product
GROUP BY
Date
,product.v2ProductCategory
,eCommerceAction.action_type
,product.v2ProductName
,visitorID

Related

How to return first record and totalizers

How can I return first purchase date, first salesman and first store by customer along with his total expenses?
select
bi.biifclie as customer,
aux.salesman as first_salesman,
aux.store as first_store,
aux.date_ as first_date,
CAST(SUM(bi.biifpliq) as float64) as total_bought,
CAST((SUM(bi.biifptab)-SUM(bi.biifpliq)) as float64) as discount,
CAST(SUM(bi.biifpliq)-SUM(bi.biifcrep)-SUM(bi.biifvari + bi.biifcomb + bi.biifcomc + bi.biificmc)-SUM(bi.biiffixo) as float64) as rentability,
COUNT(DISTINCT bi.biifcodi) AS orders
MAX(bi.biifdata) AS last_purchase_date,
MIN(bi.biifdata) AS first_purchase_date,
DATE_DIFF(MAX(bi.biifdata),current_date(),month)*-1 as inactivity_time,
FROM yyyyyyy.gix.bi_biif bi
LEFT JOIN
(
SELECT
aux0.biifclie as customer,
aux0.biifvend as salesman,
aux0.biifempe as store,
aux0.biifdata as date_
FROM yyyyyyy.gix.bi_biif aux0 ORDER BY date_ ASC LIMIT 1
) AS aux ON aux.cliente = bi.biifclie
GROUP BY customer,first_salesman,first_store,first_date
I tried to do that using a left join sub query, ordering it by date (so that I can return the first date), but those fields (
aux.salesman as first_salesman,
aux.store as first_store,
aux.date_ as first_date,
)
all returned null
Am I doing smethng wrong or the logic is not correct?
Thanks!
Consider below
select biifclie as customer,
array_agg(struct(biifvend as salesman, biifempe as store, biifdata as date) order by biifdata limit 1)[offset(0)].*,
cast(sum(biifpliq) as float64) as total_bought
from `yyyyyyy.gix.bi_biif` t
group by customer
Above solution, does 1) grouping by customer 2) for each customer it takes all the respective rows and leaves the one - the first one ordered by date 3) than it "converts result from array to separate columns

Which part of my query is wrong? UNNEST function

I couldn't figure out which part of my code is wrong.
I used a UNNEST function but the error msg is still
'Cannot access field productSKU on a value with type ARRAY>' in Google Bigquery.
My query is below:
SELECT
hits.product.productSKU AS product_SKU,
hits.product.v2ProductName AS Product_Name,
SUM(totals.transactionRevenue) AS Total_Revenue,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits.product) AS hits
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND totals.transactions >= 1
Group by
hits.product.productSKU
Order by
v2ProductName DESC
Assuming the overall logic of your query reflect what you want to achieve - below is correct version that fixes unnest'ing part as well as adds missing field in group by - hope you see what gets corrected
#standardSQL
SELECT
product.productSKU AS product_SKU,
product.v2ProductName AS Product_Name,
SUM(totals.transactionRevenue) AS Total_Revenue,
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS hit,
UNNEST(hit.product) AS product
WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170731' AND totals.transactions >= 1
GROUP BY product_SKU, Product_Name
ORDER BY v2ProductName DESC

Converting Legacy SQL to Standard SQL - Enhannced Ecommerce

I am in no way a coder so I have tried but falling over on this.
I want to use this query from Googles Google Analytics Big Query Cookbook
Products purchased by customers who purchased product A (Enhanced Ecommerce)
I have pasted the code below
Into Standard SQL.
I have made a few attemps but am falling over and not
Thank you in advance
John
SELECT hits.product.productSKU AS other_purchased_products,
COUNT(hits.product.productSKU) AS quantity
FROM (
SELECT fullVisitorId, hits.product.productSKU, hits.eCommerceAction.action_type
FROM TABLE_DATE_RANGE([bigquery-public-data:google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-04-01'), TIMESTAMP('2017-04-20'))
)
WHERE fullVisitorId IN (
SELECT fullVisitorId
FROM TABLE_DATE_RANGE([bigquery-public-data:google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-04-01'), TIMESTAMP('2017-04-20'))
WHERE hits.product.productSKU CONTAINS 'GGOEYOCR077799'
AND hits.eCommerceAction.action_type = '6'
GROUP BY fullVisitorId
)
AND hits.product.productSKU IS NOT NULL
AND hits.product.productSKU !='GGOEYOCR077799'
AND hits.eCommerceAction.action_type = '6'
GROUP BY other_purchased_products
ORDER BY quantity DESC;
Below is pure equivalent in BigQuery Standard SQL (no any optimizations, improvements, etc. - just pure translation from legacy to standard)
SELECT productSKU AS other_purchased_products, COUNT(productSKU) AS quantity
FROM (
SELECT fullVisitorId, prod.productSKU, hit.eCommerceAction.action_type
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) hit, UNNEST(hit.product) prod
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170420'
)
WHERE fullVisitorId IN (
SELECT fullVisitorId
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) hit, UNNEST(hit.product) prod
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170420'
AND prod.productSKU LIKE '%GGOEYOCR077799%'
AND hit.eCommerceAction.action_type = '6'
GROUP BY fullVisitorId
)
AND productSKU IS NOT NULL
AND productSKU !='GGOEYOCR077799'
AND action_type = '6'
GROUP BY other_purchased_products
ORDER BY quantity DESC
obviously produces exactly same result as legacy version

Google Big Query: Get New Visitor Count using Custom Dimension

select PARSE_DATE('%Y%m%d', t.date) as Date
,count(distinct(fullvisitorid)) as User
,SUM( totals.newVisits ) AS New_Visitors
,(if(customDimensions.index=1, customDimensions.value,null)) as Orig
FROM `table` as t
CROSS JOIN UNNEST(hits) AS hit
CROSS JOIN UNNEST(hit.customDimensions ) AS customDimensions
group by Date, orig
Is there a way to get new visitor count and use the customDimension at the same time? The sum(total.newVisits) doesn't work.
Thanks
Below is for BigQuery Standard SQL
SELECT DATE
,COUNT(DISTINCT(fullvisitorid)) AS User
,SUM( newVisits ) AS New_Visitors
,Orig
FROM (
SELECT PARSE_DATE('%Y%m%d', t.date) AS DATE
,fullvisitorid
,totals.newVisits AS newVisits
,(IF(customDimensions.index=1, customDimensions.value,NULL)) AS Orig
FROM `table` AS t
CROSS JOIN UNNEST(hits) AS hit
CROSS JOIN UNNEST(hit.customDimensions ) AS customDimensions
GROUP BY DATE, orig, fullvisitorid, newVisits
)
GROUP BY DATE, Orig
The best way in your case is to remove the cross-joins and use sub-selects instead:
SELECT
PARSE_DATE('%Y%m%d', t.date) AS Date
,(SELECT value FROM UNNEST(customDimensions) WHERE index=1) Orig
,COUNT(DISTINCT(fullvisitorid)) AS User
,SUM( totals.newVisits ) AS New_Visitors
FROM
`table` t
GROUP BY Orig, Date
In case you have a dimension on hit scope and really need to flatten the table, you need to build a session id you can count distinct. That is because you are repeating all session scoped fields on hit-scope by applying the cross-join:
SELECT
PARSE_DATE('%Y%m%d', t.date) AS Date
,(SELECT value FROM h.customDimensions WHERE index=2) justAHitCd
,h.page.pagePathLevel1
,COUNT(DISTINCT(fullvisitorid)) AS User
-- create session id and count distinct
,COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS STRING)) ) AS all_sessions
-- only count distinct session id of sessions where totals.newVisits = 1
,COUNT(DISTINCT
IF(totals.newVisits=1,
CONCAT(fullvisitorid, CAST(visitstarttime AS STRING)),
NULL )
) AS New_Visitors
FROM
-- flatten table to hit scope (comma means cross-join in stnd sql)
`table` t, t.hits h
GROUP BY 1,2,3
So in case for new visitors I only provide a session id if totals.newVisits=1 - else the if-statement provides NULL which is not countable.
If you have something similar on product-scope, you'd need to create an ID that takes into account session and hit.
E.g. counting pages for productSku:
SELECT
PARSE_DATE('%Y%m%d', t.date) AS Date
,(SELECT value FROM h.customDimensions WHERE index=2) justAHitCd
,p.productSku
,COUNT(DISTINCT fullvisitorid) AS users
,COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS STRING))) AS sessions
,COUNT(DISTINCT
IF(h.type='PAGE',
CONCAT(fullvisitorid, cast(visitstarttime AS STRING),CAST(hitNumber AS STRING)),
NULL)
) as pageviews
,COUNT(1) AS products
FROM
`table` t, t.hits h LEFT JOIN h.product p
GROUP BY 1,2,3
Note, that I'm left joining the product array. Since it sometimes is empty a cross-join would destroy all hits information: cross-join with empty table results in empty table.
Hope that helps!

Build SQL query with JOIN and limits

Help me please build PostgreSQL query.
There are 2 tables: products(id, title) and prices(id, product_id, price_type, moment, value)
moment - timestamp, can be in past or future
Assume that price_type has only two option: retail or purchase
But one product may has many retail prices with different moments.
I need select all products with actual retail and purchase prices, where moment less than now.
It's I can done
SELECT
products.id,
products.title_translations AS title,
retail_prices.moment AS ret_moment,
pur_prices.value AS purchase,
retail_prices.value AS retail
FROM products
LEFT OUTER JOIN prices AS pur_prices ON products.id=pur_prices.product_id AND pur_prices.price_type='purchase' AND pur_prices.moment<current_timestamp
LEFT OUTER JOIN prices AS retail_prices ON products.id=retail_prices.product_id AND retail_prices.price_type='retail' AND retail_prices.moment<current_timestamp
ORDER BY products.id;
It works, but returns
product with all prices, but I need only last prices(by moment).
Just use ROW_NUMBER to find what is the last price before current time
with last_prices as (
SELECT
products.id,
products.title_translations AS title,
prices.moment,
prices.value,
prices.price_type,
ROW_NUMBER() OVER (PARTITION BY product_id, price_type
ORDER BY moment DESC) as rn
FROM products
LEFT JOIN prices
ON products.id = prices.product_id
WHERE moment < now()
)
SELECT id, title,
MAX(CASE WHEN price_type = 'retail'
THEN moment
END) as retail_moment,
MAX(CASE WHEN price_type = 'retail'
THEN value
END) as retail_price,
MAX(CASE WHEN price_type = 'purchase'
THEN moment
END) as purchase_moment,
MAX(CASE WHEN price_type = 'purchase'
THEN value
END) as purchase_price
FROM last_prices
WHERE rn = 1
GROUP BY id, title
ORDER BY id
To keep things organized, and straight in my mind, I'd use CTEs to generate two subsets of price data, one for purchase one for retail and assign a row number in ascending sequence with the lowest number having the most recent moment less than the currenttimestmap. And then when we join to these ctes, we only return the lowest number assigned.
With Pur_prices as (SELECT P.*, row_Number() over (partition by product_ID order by moment desc) RN
FROM prices P
WHERE price_Type = 'purchase'
and p.moment < current_timestamp)
, Retail_prices as (SELECT P.*, row_Number() over (partition by product_ID order by moment desc) RN
FROM prices P
WHERE price_Type = 'retail'
and p.moment < current_timestamp)
SELECT
p.id,
p.title_translations AS title,
rp.moment AS ret_moment,
rp.value AS retail,
pp.moment AS Pur_moment,
pp.value AS purchase
FROM products p
LEFT JOIN pur_prices pp
ON p.id=pp.product_id
AND pp.RN = 1 --Only show the most recent price less than current time
LEFT JOIN retail_prices rp
ON p.id=rp.product_id
AND RP.RN = 1 --Only show the most recent price less than current time
ORDER BY p.id;
The end result should be all products regardless if they have a retail or purchase price; but if they do show the retail/purchase pricing for the most recent moment before now. My only concern is this implies all pricing has a moment they start (no null values allowed!)
You may be wanting it be ordered with respect to moment in descending order.
Change
ORDER BY products.id;
to
ORDER BY product.id ASC, moment DESC;