I have this query that helps me to find separate key words within strings (very useful with utm_campaign and utm_content):
SELECT
utm_campaign,
splits[SAFE_OFFSET(0)] AS country,
splits[SAFE_OFFSET(1)] AS product,
splits[SAFE_OFFSET(2)] AS budget,
splits[SAFE_OFFSET(3)] AS source,
splits[SAFE_OFFSET(4)] AS campaign,
splits[SAFE_OFFSET(5)] AS audience
FROM (
SELECT
utm_campaign,
SPLIT(REGEXP_REPLACE(
utm_campaign,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'\1|\2|\3|\4|\5|\6|\7'),
'|') AS splits
FROM funnel_campaign)
For example: if I have a umt_campaign like this:
us_latam_mkt_google_black-friday_audiencie-custom_NNN-NNN_nnn_trafic_responsiv
The query from above will help me to separate each word with a _ in between. So I'll have a result like this:
utm_campaign
country
product
budget
source
campaign
audience
us_latam_mkt_google_black-friday_audiencie-custom_NNN-NNN_nnn_trafic_responsiv
us
latam
mkt
google
black-friday
audience-custom
What I want from the query from above is to give me in this case the audience column. I tried to add the query from above as a sub-query on this query in REVENUE because in this table I don't have the audience column but I have the utm_campaign column. Inside the utm_campaign string, the sixth fragment is the audience (with this query I have the error "Scalar subquery produced more than one element"):
WITH COST AS (
SELECT
POS AS POS,
DATE AS DATE,
EXTRACT(WEEK FROM DATE) AS WEEK,
SOURCE AS SOURCE,
MEDIUM AS MEDIUM,
CAMPAIGN AS CAMPAIGN,
AD_CONTENT,
FORMAT AS FORMAT,
"" AS BU_OD,
SUM(CLICKS)/1000 AS CLICKS,
SUM(IMPRESSIONS)/1000 AS IMPRESSIONS,
SUM(COST)/1000 AS COST,
sum(0) as SESSIONS,
SUM(0) AS TRANSACTIONS,
SUM(0) AS search_flight_pv,
SUM(0) AS search_flight_upv,
SUM(0) AS PAX,
SUM(0) AS REVENUE,
FROM MSR_funnel_campaign_table
WHERE DATE >= DATE '2019-01-01'
AND MEDIUM NOT LIKE 'DISPLAY_CORP'
GROUP BY 1,2,3,4,5,6,7,8,9
),
REVENUE AS(
SELECT
POS AS POS,
date AS DATE,
EXTRACT(WEEK FROM DATE) AS WEEK,
SOURCE_CAT AS SOURCE,
medium_group_2 AS MEDIUM,
CAMPAIGN AS CAMPAIGN,
AD_CONTENT,
CASE
WHEN SOURCE_CAT = 'FACEBOOK' THEN
(
SELECT
splits[SAFE_OFFSET(5)] AS FORMAT,
FROM (
SELECT
ad_content,
SPLIT(REGEXP_REPLACE(
ad_content,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'\1|\2|\3|\4|\5|\6|\7'),
'|') AS splits
FROM ga_digital_marketing)) END AS FORMAT,
BU_OD AS BU_OD,
SUM(0) AS CLICKS,
SUM(0) AS IMPRESSIONS,
SUM(0) AS COST,
sum(sessions)/1000 as SESSIONS,
SUM(TRANSACTIONS)/1000 AS TRANSACTIONS,
SUM(search_flight_pv)/1000 AS search_flight_pv,
SUM(search_flight_upv)/1000 AS search_flight_upv,
SUM(PAX)/1000 AS PAX,
SUM(REVENUE)/1000 AS REVENUE,
FROM ga_digital_marketing
WHERE PAX_TYPE = 'PAID'
AND DATE >= DATE '2019-01-01'
AND MEDIUM NOT LIKE 'DISPLAY_CORP'
GROUP BY 1,2,3,4,5,6,7,8,9
),
COST_REVENUE AS (
SELECT *
FROM COST
UNION ALL
SELECT *
FROM REVENUE
)
SELECT
DATE,
WEEK,
POS,
SOURCE,
MEDIUM,
CAMPAIGN,
AD_CONTENT,
FORMAT,
BU,
CLICKS,
IMPRESSIONS,
SESSIONS,
TRANSACTIONS,
search_flight_pv,
search_flight_upv,
COST,
PAX,
REVENUE,
FROM COST_REVENUE
WHERE
1=1
AND DATE >= '2019-01-01'
What am I doing wrong here?
What I would like too see is having a match between the format dimension from COST and the format dimension from REVENUE (which it doesn't exists, but it is within the campaign column).
You don't really need the interior select statements as your campaign data should be in the same row of the table.
Change this:
CASE
WHEN SOURCE_CAT = 'FACEBOOK' THEN
(
SELECT
splits[SAFE_OFFSET(5)] AS FORMAT,
FROM (
SELECT
ad_content,
SPLIT(REGEXP_REPLACE(
ad_content,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'\1|\2|\3|\4|\5|\6|\7'),
'|') AS splits
FROM ga_digital_marketing)) END AS FORMAT,
to something like this:
-- also replacing case with if for only 1 case
IF(SOURCE_CAT = 'FACEBOOK',
SPLIT(REGEXP_REPLACE(
ad_content,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'\1|\2|\3|\4|\5|\6|\7'),
'|')[SAFE_OFFSET(5)], NULL) AS FORMAT,
Related
I am trying to calculate the percentatge of sessions per month based on some categories the users have visited on the website on Google BigQuery.
The query is the following one. It seems it has no error apparently, but when I run it, it says: 'Scalar subquery produced more than one element'.
Do you know why this is happening? Thanks!
WITH cte AS (
SELECT
partition_date as session_date,
EXTRACT(MONTH FROM CAST(partition_date AS date)) AS month,
COUNT(CONCAT(CAST(fullVisitorId as string),
CAST(visitId as string)
)) AS sessions,
h.page.pagePath AS page_path,
CASE WHEN h.page.pagePath LIKE '%/account/profile/%' THEN 'My Profile'
WHEN h.page.pagePath LIKE '%/myaccount/orders%' THEN 'My Orders'
WHEN h.page.pagePath LIKE '%/myaccount/wishlist' THEN 'My Wishlist'
ELSE NULL END as categories,
(SELECT count(CONCAT(CAST(fullVisitorId as string),
CAST(visitId as string)
)) over (partition by EXTRACT(MONTH FROM CAST(partition_date AS date)))
FROM `*.BO_*.ga_sessions`,
UNNEST(hits) AS h
WHERE partition_date BETWEEN '2022-07-31' AND '2022-08-01') as total_sessions
FROM
`*.BO_*.ga_sessions`, UNNEST(hits) AS h
WHERE partition_date BETWEEN '2022-07-31' AND '2022-08-01'
GROUP BY 1, 5
)
SELECT
month,
categories,
total_sessions,
sessions/total_sessions,
FROM cte
WHERE categories IS NOT NULL
GROUP BY 1, 2, 3, 4
I am writing two seperate SQL queries to get data for two different dates like so:
SELECT number, sum(sales) as sales, sum(discount) sa discount, sum(margin) as margin
FROM table_a
WHERE day = '2019-08-09'
GROUP BY number
SELECT number, sum(sales) as sales, sum(discount) sa discount, sum(margin) as margin
FROM table_a
WHERE day = '2018-08-10'
GROUP BY number
I tried fusing them like so to get the results for the same number in one row from two different dates:
SELECT number, sum(sales) as sales, sum(discount) sa discount, sum(margin) as margin, 0 as sales_n1, 0 as discount_n1, 0 as margin_n1
FROM table_a
WHERE day = '2019-08-09'
GROUP BY number
UNION
SELECT number, 0 as sales, 0 as discount, 0 as margin, sum(sales_n1) as sales_n1, sum(discount_n1) as discount_n1, sum(margin_n1) as margin_n1
FROM table_a
WHERE day = '2018-08-10'
GROUP BY number
But it didn't work as I get the rows for the first query with zeroes for the columns defined as zero followed by the columns of the second query in the same fashion.
How can I correct this to have the desired output ?
Use conditional aggregation:
SELECT number,
sum(case when day = '2019-08-09' then sales end) as sales_20190809,
sum(case when day = '2019-08-09' then discount end) sa discount, sum(margin) as margin_20190810,
sum(case when day = '2019-08-10' then sales end) as sales_20190809,
sum(case when day = '2019-08-10' then discount end) sa discount, sum(margin) as margin_20190810
FROM table_a
WHERE day IN ('2019-08-09', '2019-08-10')
GROUP BY number;
If you want the numbers in different rows (which you don't seem to), then use aggregation:
SELECT day, number, sum(sales) as sales, sum(discount) as discount, sum(margin) as margin
FROM table_a
WHERE day IN ('2019-08-09', '2019-08-10')
GROUP BY day, number
I want to combine dimensions date, country & source with sessions and unique events for event category "Downloads". Based on this data, I want to calculate the Download Conversionrate in DataStudio later on. To be honest I'm a noob in SQL. But I hope I'm thinking the right way at least.
Trying the query below I get the following error:
Unrecognized name: Downloads at [40:3]
WITH
ga_tables AS (
SELECT
date,
trafficSource.source AS Source,
geoNetwork.country AS Country,
COUNT ( trafficSource.source ) AS Sessions
FROM
`xxxxxx.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20190301'
AND '20190301'
GROUP BY
date,
Source,
Country
UNION ALL
SELECT
date,
trafficSource.source AS Source,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS string),'-',CAST(visitId AS string),'-',CAST(date AS string),'-',ifnull(hits.eventInfo.eventLabel,
'null'))) AS Downloads
FROM
`xxxxxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN '20190301'
AND '20190301'
AND hits.type = 'EVENT'
AND hits.eventInfo.eventCategory = 'Downloads'
GROUP BY
date,
Source,
Country )
SELECT
date,
Country,
Source,
Downloads,
Sessions
FROM
ga_tables
ORDER BY
Sessions ASC
In your with statement, the fourth column in the first select statement is named Sessions, while the fourth column in the statement it's unioned with is called Downloads. Due to the nature of UNION ALL, the final output column will be called Sessions, so it does not exist when you are querying it. If you want Sessions and Downloads to be separate columns, make the query look something like this:
WITH
ga_tables AS (
SELECT
date,
trafficSource.source AS Source,
geoNetwork.country AS Country,
COUNT ( trafficSource.source ) AS Sessions,
NULL AS Downloads
FROM
`xxxxxx.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20190301'
AND '20190301'
GROUP BY
date,
Source,
Country
UNION ALL
SELECT
date,
trafficSource.source AS Source,
geoNetwork.country AS Country,
NULL AS Sessions,
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS string),'-',CAST(visitId AS string),'-',CAST(date AS string),'-',ifnull(hits.eventInfo.eventLabel,
'null'))) AS Downloads
FROM
`xxxxxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN '20190301'
AND '20190301'
AND hits.type = 'EVENT'
AND hits.eventInfo.eventCategory = 'Downloads'
GROUP BY
date,
Source,
Country )
Edit: Given what it looks like you want to do with the table though, you might want to rewrite ga_tables to be something like this instead:
WITH
ga_tables AS (SELECT
date,
trafficSource.source AS Source,
geoNetwork.country AS Country,
MAX(Sessions) AS Sessions,
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS string),'-',CAST(visitId AS string),'-',CAST(date AS string),'-',ifnull(hits.eventInfo.eventLabel,
'null'))) AS Downloads
FROM (
SELECT
*,
COUNT(trafficSource.source) OVER (PARTITION BY date, Source, Country) AS Sessions
FROM
`xxxxxx.ga_sessions_*`),
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN '20190301'
AND '20190301'
AND hits.type = 'EVENT'
AND hits.eventInfo.eventCategory = 'Downloads'
GROUP BY
date,
Source,
Country)
Im trying to create a master view of a group in properties that are been imported into big query but it seem by using the unnest(hits) the SQL is duplicating the data leading to inaccurate values for revenues etc...
I have try to look at understanding why the unnest has caused this but I can't figure it out.
SELECT Date, hostname, channelGrouping, sum(transactionRevenue) as Revenue, sum(Shipping) as Shipping, sum(visits) as Sessions, sum(bounces) as Bounces, sum(transactions) as Transactions
FROM
(SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `102674002.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509')
Group By Date, hostname, channelGrouping
Order by Date
This might do the trick:
SELECT
date,
channelGrouping,
SUM(Revenue) Revenue,
SUM(Shipping) Shipping,
SUM(bounces) bounces,
SUM(transactions) transactions,
hostname,
COUNT(date) sessions
FROM(
SELECT
date,
channelGrouping,
totals.totaltransactionrevenue / 1e6 Revenue,
ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL)) hostnames,
(SELECT SUM(hits.transaction.transactionshipping) / 1e6 FROM UNNEST(hits) hits) Shipping,
totals.bounces bounces,
totals.transactions transactions
FROM `project_id.dataset_id.ga_sessions_*`
WHERE 1 = 1
AND ARRAY_LENGTH(ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL))) > 0
AND _TABLE_SUFFIX BETWEEN '20170601' AND '20170609'
UNION ALL
(...)
),
UNNEST(hostnames) hostname
GROUP BY
date, channelGrouping, hostname
Notice that in this query I avoided applying the UNNEST operation in the hits field and I do so only inside subselects.
In order to understand why this is the case you have to understand how ga data is aggregated into BigQuery. Notice that we basically have 2 types of data: the session level data and the hits level. Each client visiting your website ends up generating a row into BigQuery, like so:
{fullvisitorid: 1, visitid:1, date: '20170601', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:0, bounces: 0}]
If the same customer comes back a day later it generates another row into BQ, something like:
{fullvisitorid: 1, visitid:2, date: '20170602', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:50000000, bounces: 2}]
As you can see, fields outside the key hits are related to the session level (and therefore each hit, i.e, each interaction the customer has in your website, adds up another entry here). When you apply UNNEST, you basically, apply a cross-join with all values inside of the array to the outer fields.
And this is where duplication happens!
Given the past example, if we apply UNNEST to the hits field, you end up with something like:
fullvisitorid visitid totals.totalTransactionRevenue hits.hitNumber
1 1 0 1
1 1 0 2
1 2 50000000 1
1 2 50000000 2
Notice that for each hit inside the hits field causes the outer fields, such as totals.totalTransactionRevenue to be duplicated for each hitNumber that happened inside the hits ARRAY.
So, if later on, you apply some operation like SUM(totals.totalTransactionRevenue) you end up summing this field multiplied by each hit that the customer had in that visitid.
What I tend to do is to avoid the (costly depending on the data volume) UNNEST operation on the hits field and I do so only in subqueries (where the unnesting happens only at the row level which does not duplicate data).
I use SqlExpress
Following is the query using which I get the attached result.
SELECT ReceiptId, Date, Amount, Fine, [Transaction]
FROM (
SELECT ReceiptId, Date, Amount, 'DR' AS [Transaction]
FROM ReceiptCRDR
WHERE (Amount > 0)
UNION ALL
SELECT ReceiptId, Date, Amount, 'CR' AS [Transaction]
FROM ReceiptCR
WHERE (Amount > 0)
UNION ALL
SELECT strInvoiceNo AS ReceiptId, CONVERT(datetime, dtInvoiceDt, 103) AS Date, floatTotal AS Amount, 'DR' AS [Transaction]
FROM tblSellDetails
) AS t
ORDER BY Date
Result
want a new column which would show balance amount.
For example. 1 Row should show -2500, 2nd should -3900, 3rd should -700 and so on.
basically, it requires previous row' Account column's data and carry out calculation based on transaction type.
Sample Result
Well, that looks like SQL-Server , if you are using 2012+ , then use SUM() OVER() :
SELECT t.*,
SUM(CASE WHEN t.transactionType = 'DR'
THEN t.amount*-1
ELSE t.amount END)
OVER(PARTITION BY t.date ORDER BY t.receiptId,t.TransactionType DESC) as Cumulative_Col
FROM (YourQuery Here) t
This will SUM the value when its CR and the value*-1 when its DR
Right now I grouped by date, meaning each day will recalculate this column, if you want it for all time, replace the OVER() with this:
OVER(ORDER BY t.date,t.receiptId,t.TransactionType DESC) as Cumulative_Col
Also, I didn't understand why in the same date, for the same ReceiptId DR is calculated before CR , I've add it to the order by but if thats not what you want then explain the logic better.