a noob question.
I want to query my database looking for pageviews for a given page, and i wrote a query that returns the page / number of pageviews daily. How i should change my query to get the same statistics but not daily but mothly?
So instead:
page pv date
/mysite 10 2017-01-01
get
page pv date
/mysite 500 2017-01
my query:
select
date,
hits.page.pagePath as pagePath,
count(totals.pageviews) as pageViews
from Table_DATE_RANGE ([818251235.ga_sessions_] , Timestamp('2016-01-01'), Timestamp('2017-11-01'))
group by 1,2
It's not clear what you are trying to count in your original query, but here is a query that uses standard SQL and performs the grouping on a monthly basis:
#standardSQL
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d', date), MONTH) AS month,
hit.page.pagePath,
COUNT(*)
FROM `818251235.ga_sessions_*`,
UNNEST (hits) AS hit
WHERE _TABLE_SUFFIX BETWEEN
'20160101' AND '20181101'
GROUP BY 1, 2;
Edit: fixed to use DATE_TRUNC instead of EXTRACT(MONTH FROM ...) since both the year and month are relevant.
you can use date functions like UTC_USEC_TO_MONTH, UTC_USEC_TO_WEEK, UTC_USEC_TO_DAY to normalize them to the first day of the month, first day of the week.
select
date(UTC_USEC_TO_MONTH(date)) as monthly,
.....
Related
Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily
How to see the data above in Big Query-The tables are there since an year.
What code should I use to see the above result?
User subscription status is Session based dimension which has made transactions.
I have enabled data in Big Query but how to see the exact the same results in BQ.?
Try code below. Change table name and date interval according to your request.
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170731'
GROUP BY date
ORDER BY date ASC
These documents could be useful for you before posting questions:
https://support.google.com/analytics/answer/4419694?hl=tr
https://support.google.com/analytics/answer/3437719?hl=tr
For custom dimensions on session scope write a subquery that runs on the unnested array.
#standardSQL
SELECT
date,
-- select one value from unnested array
(SELECT value FROM UNNEST(customDimensions) WHERE index=4) AS cd4,
SUM(totals.transactions) AS transactions,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20160802'
GROUP BY
date, cd4
ORDER BY
date ASC
you need to change the condition in the subquery to your custom dimension index
I am working with Google Analytics data in BigQuery, looking to aggregate the date of last visit and first visit up to UserID level, however my code is currently returning the max visit date for that user, so long as they have purchased within the selected date range, because I am using MAX().
If I remove MAX() I have to GROUP by DATE, which I don't want as this then returns multiple rows per UserID.
Here is my code which returns a series of dates per user - last_visit_date is currently working, as it's the only date that can simply look at the last date of user activity. Any advice on how I can get last_ord_date to select the date on which the order actually occurred?
SELECT
customDimension.value AS UserID,
# Last order date
IF(COUNT(DISTINCT hits.transaction.transactionId) > 0,
(MAX(DATE)),
"unknown") AS last_ord_date,
# first visit date
IF(SUM(totals.newvisits) IS NOT NULL,
(MAX(DATE)),
"unknown") AS first_visit_date,
# last visit date
MAX(DATE) AS last_visit_date,
# first order date
IF(COUNT(DISTINCT hits.transaction.transactionId) > 0,
(MIN(DATE)),
"unknown") AS first_ord_date
FROM
`XXX.XXX.ga_sessions_20*` AS t
CROSS JOIN
UNNEST (hits) AS hits
CROSS JOIN
UNNEST(t.customdimensions) AS customDimension
CROSS JOIN
UNNEST(hits.product) AS hits_product
WHERE
parse_DATE('%y%m%d',
_table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 day)
AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND customDimension.index = 2
AND customDimension.value NOT LIKE "true"
AND customDimension.value NOT LIKE "false"
AND customDimension.value NOT LIKE "undefined"
AND customDimension.value IS NOT NULL
GROUP BY
UserID
the most efficient and clear way to do this (and also most portable) is to have a simple table/view that has two columns: userid, last_purchase and another that has other two cols userid, first_visit.
then you inner join it with the original raw table on userid and hit timestamp to get, say, the session IDs you're interested in. 3 steps but simple, readable and easy to maintain
It's very easy to hit too much complexity for a query that relies on first or last purchase/action (just look at the unnest operations you have there) that is becomes unusable and you'll spend way too much time trying to figure out the meaning of the output.
Also keep in mind that using the wildcard in the query has a limit of 1000 tables, so your last and first visits are in a rolling window of 1000 days.
I have Climate data stored in table such as (Temperature,Humidity,CO2,Save_Timestamp) in realtime.
How can i write a sql to select average of data by every hour of the day
because when i do full select and render it on html5 with Chart.js
It's BOOM!!
try Something like this
For avg for current date by hour:
select hour(Save_Timestamp) HourSave,
avg(Temperature) avgTemperature, avg(Humidity) avgHumidity, avg(CO2) avgCO2
from yourtable
where date(Save_Timestamp)=current date
group by hour(Save_Timestamp)
For avg for all date by hour:
select date(Save_Timestamp) DateSave, hour(Save_Timestamp) HourSave,
avg(Temperature) avgTemperature, avg(Humidity) avgHumidity, avg(CO2) avgCO2
from yourtable
group by date(Save_Timestamp) , hour(Save_Timestamp)
SELECT CONVERT(VARCHAR(10),Save_Timestamp,112) AS Date,
DATEPART(hh,Save_Timestamp) AS Hour,
SUM(TEMPERATURE)/COUNT(*) AS AvgTemp
FROM CLIMATE_TABLE
GROUP BY
CONVERT(VARCHAR(10),Save_Timestamp,112),
DATEPART(hh,Save_Timestamp)
This might get you what you are looking for.
I am working on a sql view that should get the average number of hits by hour of the day, regardless of what day/date it is for traffic monitoring (12:00:00.000 - 12:59:59.999). Any ideas?
EDIT
Now I have the total, how do I get the average? SELECT AVG("FUNCTION BELOW") DOES NOT WORK
SELECT COUNT(*) AS total, DATEPART(hh, LogDate) AS HourOfDay
FROM dbo.Log
GROUP BY DATEPART(hh, LogDate)
Convert to DATEPART(hh,.....
Example SELECT DATEPART(hh,GETDATE())
Since you are on SQL Server 2008, you can use the time data type, just convert to time
example
SELECT CONVERT(TIME,GETDATE())
Then you can filter that also
Since I am not sure what your output is supposed to be like I am showing you both, but if all you need is to group by hour, then just do a datepart(hh.....
The query below may be good enough for you. It divides the count by the difference between todays date and the minimum date in the LogDate column.
SELECT DATEPART(hh,LogDate) as Hour
,CAST(COUNT(*)as decimal)/DATEDIFF(d,(SELECT MIN(LogDate) from log)
,CURRENT_TIMESTAMP) as AverageHits
, COUNT(*) as Count
FROM log
GROUP BY DATEPART(hh,LogDate)
ORDER by DATEPART(hh,LogDate) asc