I'm trying to export data from GA using BigQuery and the Query failed.
I use this functions:
FLATTEN
TABLE_DATA_RANGE
Because I need data from hits.
Can anyone help me about this Error?
Error:
The project hits has not enabled BigQuery
Now, the error is other: Field CampaignGrouping not found:
SELECT
a.hits.contentGroup.contentGroup2 AS CampaignGrouping,
a.customDimensions.value AS member_PK,
'Web' AS Canal,
'ES' AS country_id,
count(a.hits.contentGroup.contentGroupUniqueViews2) AS VistasUnicas
FROM FLATTEN(FLATTEN(
(SELECT
hits.contentGroup.contentGroupUniqueViews2,
hits.contentGroup.contentGroup2,
customDimensions.value
FROM TABLE_DATE_RANGE([###.ga_sessions_], TIMESTAMP('2017-04-01'), TIMESTAMP('2017-04-30'))),
hits.contentGroup.contentGroupUniqueViews2), customDimensions.value
)a
WHERE hits.contentGroup.contentGroup2<>'(not set)' AND customDimensions.value<>'null' AND hits.contentGroup.contentGroupUniqueViews2 IS NOT NULL
GROUP BY 1,2,3,4
ORDER BY 5 ASC
Solving your problem in Standard SQL is much easier than in Legacy.
This query might help you on computing this:
SELECT
hits.contentgroup.contentgroup2 CampaignGrouping,
custd.value member_PK,
'Web' Canal,
'ES' AS country_id,
SUM(hits.contentGroup.contentGroupUniqueViews2) VistasUnicas
FROM
`project_id.dataset_id.ga_sessions_*`,
UNNEST(customdimensions) custd,
UNNEST(hits) AS hits
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-06')
and hits.contentGroup.contentGroup2<>'(not set)'
AND custd.value<>'null'
AND hits.contentGroup.contentGroupUniqueViews2 IS NOT NULL
GROUP BY
1, 2
ORDER BY 5 ASC
You just need to enable it and it's already ready to run.
As you said you are learning SQL, it's highly recommended that you start by learning the Standard version instead of the Legacy one as it's more stable and offers several different techniques to better assist you on your analyzes.
Related
When I run the following query, my Netezza NPS reboots. Would someone please let me know what is causing this behaviour?
select avg ( bse.WEEKS_BETWEEN_RESPONSES_HR ) as g_AVG
, sqlext.median( bse.WEEKS_BETWEEN_RESPONSES_HR ) as g_med
from (
select WEEKS_BETWEEN_RESPONSES_HR
FROM (
select distinct LOYALTY_ACCOUNT_CARD_ID
, BONUS_END_DATE
, LAG(BONUS_END_DATE,1) OVER (partition by LOYALTY_ACCOUNT_CARD_ID order by BONUS_END_DATE) as PRIOR_BONUS_END_DATE
,(( BONUS_END_DATE - PRIOR_BONUS_END_DATE)/7) as WEEKS_BETWEEN_RESPONSES_HR
from JO_ACT_PTD_STEP_1 bse
where upper ( bonus_desc ) like '%SPEND%'
and redemption = 1
) BSE
where WEEKS_BETWEEN_RESPONSES_HR is not null and WEEKS_BETWEEN_RESPONSES_HR > 0
) bse limit 500 ```
You need to call the support people at IBM
There is probably a stack trace or a dump file somewhere that will tell them what happened
If I was experiencing your problem I would remove each of the function calls one by one and make the sql simpler and simpler until the error disappeared
But of course you will need to do that in the middle of the night or at a time when nobody else is being bothered by the constant re-boots
I am working on a project to build some queries from Google Analytics data in BigQuery to replicate some reports for one particular KPI, I have a table with a list of sites that I need to have excluded from the Google Analytics data in order to get the correct metric.
My list might have something such as:
sitename.com
However I need to match this to the eventLabel column in BigQuery where the URL could come back as:
http://sitename.com/subpage/extra-subpage
I can't do a Not In as this requires a direct match, I have tried using a like statement however I get the following error
Scalar subquery produced more than one element
I'm not really sure how else to proceed and am wondering if I need to do a query that say does the string match (as i can get it to work if i use an inner join and then use this new table to do the exclusions as I can keep the eventLabel and then do my Not In based on that?
SELECT Distinct
h.eventinfo.eventAction eventAction,
h.eventinfo.eventlabel eventlabel
FROM `projectName.ga_sessions_*`, unnest(Hits) h
WHere
_TABLE_SUFFIX BETWEEN "20190101" AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
and type = 'EVENT'
and h.eventInfo.eventCategory = 'EventName'
and Replace(Replace(Replace(h.eventInfo.eventLabel,'http://',''),'https://',''),'www.','')
Not like (select concat(ThirdPartyURL,'%') from `projectName.datasetName.ExclusionList`)
I hope the above makes sense.
TIA.
After reproducing your problem the solution is to use NOT IN instead of NOT LIKE as follow:
WITH `projectName.datasetName.ExclusionList` AS
(SELECT 'label1' AS ThirdPartyURL UNION ALL
SELECT 'label2')
SELECT DISTINCT h.eventinfo.eventAction eventAction,
h.eventinfo.eventlabel eventlabel
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(Hits) h
WHERE _TABLE_SUFFIX BETWEEN "20170801" AND "20170802"
AND TYPE = 'EVENT'
AND h.eventInfo.eventCategory = 'EventName'
AND Replace(Replace(Replace(h.eventInfo.eventLabel, 'http://', ''), 'https://', ''), 'www.', '')
NOT IN
(SELECT ThirdPartyURL FROM `projectName.datasetName.ExclusionList`)
This is the link to BigQuery related SQL documentation
I am working with the GoogleAnalytics data in the BigQuery.
I want to output 2 columns: specific event actions (hits) and custom dimension (session based). All that, using Standard SQL. I cannot figure out how to do it correctly. Documentation does not help either. Please help me. This is what I am trying:
SELECT
(SELECT MAX(IF(index=80, value, NULL)) FROM UNNEST(customDimensions)) AS is_app,
(SELECT hits.eventInfo.eventAction) AS ea
FROM
`table-big-query.105229861.ga_sessions_201711*`, UNNEST(hits) hits
WHERE
totals.visits = 1
AND _TABLE_SUFFIX BETWEEN '21' and '21'
AND EXISTS(SELECT 1 FROM UNNEST(hits) hits
WHERE hits.eventInfo.eventCategory = 'SomeEventCategory'
)
Try to give your tables and sub-tables names that are not part of the original table schema. Always tell to which table you're referring - when cross joining, you're basically adding new columns (here h.* - flattened) - but the old ones (hits.* - nested) still exist.
I named ga_sessions_* t and use it to refer the cross-join and also the customDimension.
Also: You don't need the legacy sql trick using MAX() for customDimensions anymore. It's a simple sub-query now :)
try:
SELECT
(SELECT value FROM t.customDimensions where index=80) AS is_app, -- use h.customDimensions if it is hit-scope
eventInfo.eventAction AS ea
FROM
`projectid.dataset.ga_sessions_201711*` t, t.hits h
WHERE
totals.visits = 1
AND _TABLE_SUFFIX BETWEEN '21' and '21'
AND h.eventInfo.eventCategory is not null
Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?
The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.
If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.
I searched online and found out many had the same problem, but non of the solutions worked for me.
I'm really hoping you could help me:
I have this ORACLE SQL query that is working fine in PL/SQL:
select a.bzq_terminate_provider, a.callsnum, a.at_call_dur_sec, sum_charge
From (select * from usage_cycle_sum
where ban='80072922' and ben='1'
and subscriber_no='036585305'
and start_cycle_code ='20150207'
and feature_code_rank='1') a, (select bzq_terminate_provider,sum(charge_amount) as sum_charge from usage_cycle_sum
where ban='80072922' and ben='1'
and subscriber_no='036585305'
and start_cycle_code ='20150207' group by bzq_terminate_provider) b
where a.bzq_terminate_provider=b.bzq_terminate_provider
I also tried this other version that works fine as well:
select PROVIDER,sum(CALLS),sum(CHARGE),sum(DUR)
from (
select bzq_terminate_provider PROVIDER,callsnum CALLS,charge_amount CHARGE,at_call_dur_sec DUR
from usage_cycle_sum
where ban='80072922' and ben='1'
and subscriber_no='036585305'
and start_cycle_code ='20150207'
and feature_code_rank='1'
union
select bzq_terminate_provider PROVIDER,0 CALLS,charge_amount CHARGE,0 DUR
from usage_cycle_sum
where ban='80072922' and ben='1'
and subscriber_no='036585305'
and start_cycle_code ='20150207'
and feature_code_rank='2'
)
group by PROVIDER
My problem is that when i create a datagrid in Visual Studio web application, i get an error: syntax error: expecting identifier or quoted identifier
The connection is ok, i checked the simple select queries as well as the whole union part in the second query i attached, they work!
But when i use those two versions, i get this error.
What can be the problem? Is there another way to solve this?
Thanks.
EDIT 21/06/2015
It seems that visual studio doesn't work well with complex queries and i'm still looking for a solution for this, since my next queries are more complex...
Your second query is so much nicer to write as:
select bzq_terminate_provider as PROVIDER, sum(callsnum) as CALLS,
sum(charge_amount) as CHARGE, sum(at_call_dur_sec) as DUR
from usage_cycle_sum
where ban = '80072922' and ben = '1' and
subscriber_no = '036585305' and
start_cycle_code ='20150207' and
feature_code_rank in ('1', '2')
group by bzq_terminate_provider ;
Or, perhaps the select needs to be:
select bzq_terminate_provider as PROVIDER,
sum(case when feature = '1' then callsnum else 0 end) as CALLS,
sum(charge_amount) as CHARGE,
sum(case when feature = '1' then at_call_dur_sec else 0 end) as DUR
(The first version assumed that the fields were zeroed out in the second subquery because they are NULL in the data, but that might not be true.)
However, application software is not yet smart enough to identify such awkwardly written queries, so that is not the actual problem you are facing. If the query works in the database, but not in the application, then typical problems are:
The application is not connected to the right database.
The application does not have permissions on the database or table.
The application query is different from the query run in the database, typically due to some substitution problem.
The results from running the query in the application are not being interpreted correctly.