I am using Google Big Query to query the daily Google analytics reports for my website. I am running queries on 7 tables (the 7 daily reports) at a time, because I want to use weekly results.
I would like to run a query that shows "Users with >= x sessions and with >= y page views". I am having difficulties framing this query.
The resulting table should show the fullVisitorId, totals.visits (The number of sessions), totals.pageviews (Total number of pageviews within the session). Should I use a subquery, or is there some other method?
Please use the following link if you'd like to have a look at the complete scheme: https://support.google.com/analytics/answer/3437719?hl=en
A basic query would look like:
SELECT
fullVisitorId,
SUM(totals.visits) as visits,
SUM(totals.pageviews) as pageviews,
FROM
TABLE_DATE_RANGE([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_],
TIMESTAMP('2013-09-10'),
TIMESTAMP('2013-09-17'))
GROUP BY
fullVisitorId
HAVING visits>0 and pageviews>0
To run this query on a sample database visit: https://support.google.com/analytics/answer/3416091?hl=en
Related
I'm trying display the user growth per day using Grafana Time Series with SQL Server. However I found the documentation to be unhelpful and my queries are incorrect.
The following returns a constant value of 1 for every day. What do I need to change to display the number of new users created per day?
Thank you very much in advance.
SELECT
$__timeGroup([created_at],'1d') as time,
COUNT(id) as value,
'users' as metric
FROM [db].[user]
WHERE $__timeFilter([created_at])
GROUP BY [created_at]
ORDER BY 1
This works for me:
SELECT
$__timeGroup(created_at, '1d') AS time,
COUNT(id) as 'New Users'
FROM [db].[user]
GROUP BY $__timeGroup(created_at, '1d')
ORDER BY 1
I'm trying to extract Firebase Analytics DAU and MAU using BigQuery. The query I'm using for daily users is below -
SELECT
event_date AS day,
COUNT(DISTINCT user_id) AS daily_visitors
FROM `XXXXXXX.analytics_153729556.events_20190825`
WHERE
app_info.id = 'XXXXXXX'
AND
event_name = 'user_engagement'
GROUP BY day;
I have a few questions I would love some help with.
There is a significant(2000+) difference between the value from the query result and the value the Firebase dashboard shows for the same date(s). Is there a specific reason for this or is my query just plain wrong?
There are instances where I see dates other than the actual table selected. Example, I see 20190502 in the results when 20190501 should be the only row (based on the table name). Is this possibly because the events being dumped into the table are for an app in a different timezone? If not, what else could be the reason behind this?
I also want to extract historical MAU and DAU data, and store it on MongoDB for any future requirements that may arise. Is there a specific way in which I can extract them - after overcoming the problem I'm facing, of course?
I'm building a dashboard in Grafana. One of my graphs is based on this query:
SELECT
users.name as Name,
users.last_name as Last,
users.email as Mail,
sum(orders.total_amount) as Sales
FROM orders
INNER JOIN users ON orders.user_id=users.id
WHERE orders.status=4
GROUP BY 1,2,3;
So I'm looking for the top clients based on how much they spent. What I need now is that the total sales change with the Time Ranges of the dashboard. Right now I have the same result with every Quick Range I use (it returns me the total sales in all the database and not in the time range I request). Any help?
Check used Grafana datasource doc (MySQL, PostgreSQL). There are available macros, which you will use in your SQL WHERE time condition.
http://docs.grafana.org/features/datasources/postgres/#macros
http://docs.grafana.org/features/datasources/mysql/#macros
I know the title is probably pretty stupid but I have a hard time phrasing it differently.
I have to use BigQuery at work atm for some report. BigQuery is connected to a Google Analytics view of ours. This gives us a dataset with 1 table for each day. The rows of the tables are user-sessions on our site, while columns have some information about the sessions.
The problem I have is the following:
I want to select sessions with transactions, but only if the user was referred to our site by a certain referrer in the last x days before the transaction happened. I'm only familiar with basic SQL and not with any advanced concepts. It's really frustrating to me because this would be a no-brainer with any proper programming language given a .csv of the data, but I'm lacking knowledge of the relevant concepts in SQL.
#standardSQL
SELECT
COUNT(*)
FROM
`dataset.ga_sessions_2017*`
WHERE
totals.transactions > 0 AND
fullVisitorId IN (SELECT
fullVisitorId
FROM
`dataset.ga_sessions_2017*`
WHERE
trafficSource.source = "xyz.com"
) AND
< date difference thing>
I could filter for the date difference like I did with the trafficSource (referrer). The problem for me is that while "xyz.com" is a static thing, I'd need to reference the date value of the current row I'm in. So the date by which I'd filter the 2nd SELECT would be dynamically changing from row to row. Can anyone guide me on how this is usually done? This seems like a thing that would come up often.
I'm not familiar with the GA tables specifically, but having written some wildcard queries in BigQuery before, I think what you're looking for can be done using the _TABLE_SUFFIX pseudo column:
CAST(_TABLE_SUFFIX AS INT64) >= 1217
Where 1217 is today's date in MMDD format minus 3 days, assuming the table names are _20171217, _20171218, etc. Otherwise you can just use REPLACE to remove underscores before casting to an int. There are also functions that will generate today's date for you if you needed this query to run automatically.
Also, I think the fullVisitorId business could be replaced with a simple WHERE trafficSource.source = "xyz.com" but it's hard to say for sure without being able to run the query myself.
So the full query would look something like this:
#standardSQL
SELECT
COUNT(*)
FROM
`dataset.ga_sessions_2017*`
WHERE
totals.transactions > 0 AND
trafficSource.source = "xyz.com" AND
CAST(_TABLE_SUFFIX AS INT64) >= 1217
Our Google Analytics 'User Count' is not matching our Big Query 'User Count.'
Am I calculating it correctly?
Typically, GA and BQ align very closely…albeit, not exactly.
Recently, User Counts in GA vs.BQ are incongruous.
Our number of ‘Sessions per User' typically has a very normal
distribution.
In the last 4 weeks, 'Sessions per User' (in GA) has been
several deviations from the norm.
I cannot replicate this deviation when cross-checking data from the same time period in BQ
The difference lies in the User Counts.
What I'm hoping someone can answer is:
Am I at least using the correct SQL syntax to get to the answer in BQ?
This is the query I’m running in BQ:
SELECT
WEEK(Week) AS Week,
Week AS Date_Week,
Total_Sessions,
Total_Users,
Total_Pageviews,
( Total_Time_on_Site / Total_Sessions ) AS Avg_Session_Duration,
( Total_Sessions / Total_Users ) AS Sessions_Per_User,
( Total_Pageviews / Total_Sessions ) AS Pageviews_Per_Session
FROM
(
SELECT
FORMAT_UTC_USEC(UTC_USEC_TO_WEEK (date,1)) AS Week,
COUNT(DISTINCT CONCAT(STRING(fullVisitorId), STRING(VisitID)), 1000000) AS Total_Sessions,
COUNT (DISTINCT(fullVisitorId), 1000000) AS Total_Users,
SUM(totals.pageviews) As Total_Pageviews,
SUM(totals.timeOnSite) AS Total_Time_on_Site,
FROM
(
TABLE_DATE_RANGE([zzzzzzzzz.ga_sessions_],
TIMESTAMP('2015-02-09'),
TIMESTAMP('2015-04-12'))
)
GROUP BY Week
)
GROUP BY Week, Date_Week, Total_Sessions, Total_Users, Total_Pageviews, Avg_Session_Duration, Sessions_Per_User, Pageviews_Per_Session
ORDER BY Week ASC
We have well under 1,000,000 users/sessions/etc a week.
Throwing that 1,000,000 into the Count Distinct clause should be preventing any sampling on BQ’s part.
Am I doing this correctly?
If so, any suggestion on how/why GA would be reporting differently is welcome.
Cheers.
*(Statistically) significant discrepancies begin in Week 11
Update:
We have Premium Analytics, as #Pentium10 suggested. So, I reached out to their paid support.
Now when I pull the exact same data from GA, I get this:
Looks to me like GA has now fixed the issue.
Without actually admitting there ever was one.
::shrug::
I have this problem before. The way I fixed it was by using COUNT(DISTINCT FULLVISITORID) for total_users.
In standard SQL use COUNT(DISTINCT fullVisitorId)
Google Analytics shows an approximation for users, Big Query is exact. You can test this with unsampled reports in Google Analytics - numbers will match.
Also: GA uses all available data to count users, even where totals.visits is NULL!
In contrast GA counts sessions only where totals.visits = 1!