How to use GROUP BY command properly in Google Big Query? - sql

I got a few problems when trying to fetch only specific data. First I don't know how to create a sql query (current sql query I can grab only one user) so I can grab the data like this.
Second I want to grab 1 year data until current date. Below is my sql query done so far (I need to do it manual one by one).
SELECT type, COUNT(*) FROM (
TABLE_DATE_RANGE([githubarchive:day.events_],
TIMESTAMP('2013-1-01'),
TIMESTAMP('2015-08-28')
)) AS events
WHERE type IN ("CommitCommentEvent","CreateEvent","DeleteEvent","DeploymentEvent","DeploymentStatusEvent","DownloadEvent","FollowEvent",
"ForkEvent","ForkApplyEvent","GistEvent","GollumEvent","IssueCommentEvent","IssuesEvent","MemberEvent","MembershipEvent","PageBuildEvent",
"PublicEvent","PullRequestEvent","PullRequestReviewCommentEvent","PushEvent","ReleaseEvent","RepositoryEvent","StatusEvent","TeamAddEvent",
"WatchEvent") AND actor.login = "datomnurdin"
GROUP BY type;
Reference:
https://www.githubarchive.org/
https://github.com/igrigorik/githubarchive.org

Here is how to properly pivot the data:
SELECT actor.login,
ifnull(sum(if(type='CommitCommentEvent',1,null)),0) as CommitCommentEvent,
ifnull(sum(if(type='CreateEvent',1,null)),0) as CreateEvent,
ifnull(sum(if(type='DeleteEvent',1,null)),0) as DeleteEvent,
ifnull(sum(if(type='DeploymentEvent',1,null)),0) as DeploymentEvent,
ifnull(sum(if(type='DeploymentStatusEvent',1,null)),0) as DeploymentStatusEvent,
ifnull(sum(if(type='DownloadEvent',1,null)),0) as DownloadEvent,
ifnull(sum(if(type='FollowEvent',1,null)),0) as FollowEvent,
ifnull(sum(if(type='ForkEvent',1,null)),0) as ForkEvent,
ifnull(sum(if(type='ForkApplyEvent',1,null)),0) as ForkApplyEvent,
ifnull(sum(if(type='GistEvent',1,null)),0) as GistEvent,
ifnull(sum(if(type='GollumEvent',1,null)),0) as GollumEvent,
ifnull(sum(if(type='IssueCommentEvent',1,null)),0) as IssueCommentEvent,
ifnull(sum(if(type='IssuesEvent',1,null)),0) as IssuesEvent,
ifnull(sum(if(type='MemberEvent',1,null)),0) as MemberEvent,
ifnull(sum(if(type='MembershipEvent',1,null)),0) as MembershipEvent,
ifnull(sum(if(type='PageBuildEvent',1,null)),0) as PageBuildEvent,
ifnull(sum(if(type='PublicEvent',1,null)),0) as PublicEvent,
ifnull(sum(if(type='PullRequestEvent',1,null)),0) as PullRequestEvent,
ifnull(sum(if(type='PullRequestReviewCommentEvent',1,null)),0) as PullRequestReviewCommentEvent,
ifnull(sum(if(type='PushEvent',1,null)),0) as PushEvent,
ifnull(sum(if(type='ReleaseEvent',1,null)),0) as ReleaseEvent,
ifnull(sum(if(type='RepositoryEvent',1,null)),0) as RepositoryEvent,
ifnull(sum(if(type='StatusEvent',1,null)),0) as StatusEvent,
ifnull(sum(if(type='TeamAddEvent',1,null)),0) as TeamAddEvent,
ifnull(sum(if(type='WatchEvent',1,null)),0) as WatchEvent,
FROM (
TABLE_DATE_RANGE([githubarchive:day.events_],
DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR"),
CURRENT_TIMESTAMP()
)) AS events
WHERE type IN ("CommitCommentEvent","CreateEvent","DeleteEvent","DeploymentEvent","DeploymentStatusEvent","DownloadEvent","FollowEvent",
"ForkEvent","ForkApplyEvent","GistEvent","GollumEvent","IssueCommentEvent","IssuesEvent","MemberEvent","MembershipEvent","PageBuildEvent",
"PublicEvent","PullRequestEvent","PullRequestReviewCommentEvent","PushEvent","ReleaseEvent","RepositoryEvent","StatusEvent","TeamAddEvent",
"WatchEvent")
GROUP BY 1
limit 100

Related

SQL BigQuery - Error that variable is not grouped by even though it is

SQL Code:
SELECT community_table.community_name,
community_table.id,
DATE(timestamp) as date,
ifnull(COUNT(distinct app_opened.user_id), 0) as num_opened_DAU,
lag(COUNT(distinct app_opened.user_id)) OVER
(ORDER BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
FROM *** app_opened
LEFT JOIN (
SELECT DISTINCT id, community_id_2, context_traits_first_name, context_traits_last_name
FROM (
SELECT *
FROM ***,
UNNEST (JSON_EXTRACT_ARRAY(context_traits_community_ids, "$")) as community_id_2
)
GROUP by community_id_2, id, context_traits_first_name, context_traits_last_name) as community_id_table
ON community_id_table.id = app_opened.user_id
LEFT JOIN (
SELECT DISTINCT id, name as community_name
FROM ***) as community_table
ON TO_JSON_STRING(community_table.id) = community_id_table.community_id_2
WHERE app_opened.user_id is not null AND
EXTRACT(DAYOFWEEK FROM DATE(timestamp)) = 2 AND
community_table.community_name is not null
GROUP BY community_table.community_name, community_table.id, DATE(timestamp)
Error Message:
I am quite confused on what could be going wrong here, as the error says that timestamp is not grouped, even though I have grouped it at the bottom. I tried including just timestamp rather than Date(timestamp), but that ruins the table data that I am trying to create, where I find the number of users on a single day. Does anyone have any other ideas? My goal is for a single row, get the previous row's data, but because I am grouping by specific metrics, I need to make sure they are ordered by them as well. Thank you so much!
I think you simply need to modify OVER part as:
OVER (PARTITION BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
UPDATE. Seems that the problem was caused by using DATE() function within OVER so it can be solved by using DATE(timestamp) inside of subquery and passing alias to OVER

Google Analytics BigQuery get the time difference between two different pages by user

I'm trying to get the difference in time by user between the first step checkout and final purchase. This is my query:
SELECT transactionid1,MAX((t1.hit_moment1-t2.hit_moment2)) as diff_hits,MAX(t2.checkout_step_2) as day FROM ((SELECT clientId as client1_id,
hits_1.page.pagePath as page_event1,
hits_1.eventInfo.eventAction as action_event1,
hits_1.transaction.transactionId as transactionId1,
TIMESTAMP_SECONDS(visitStartTime) as checkout_step_1,
hits_1.hour as hour1,
hits_1.minute as minute1,
(hits_1.hour*60+hits_1.minute) as hit_moment1
from `616180.ga_sessions_*` ,
UNNEST(hits) as hits_1 where hits_1.page.pagePath like '%/buy1/suscription%' and hits_1.eventInfo.eventAction="Transaction" and hits_1.transaction.transactionId is not null)t1 INNER JOIN (SELECT clientId as client2_id,
hits_2.page.pagePath as page_event2,
hits_2.eventInfo.eventAction as action_event2,
TIMESTAMP_SECONDS(visitStartTime) as checkout_step_2,
hits_2.hour as hour2,
hits_2.minute as minute2,
(hits_2.hour*60+hits_2.minute) as hit_moment2
from `616180.ga_sessions_*` ,UNNEST(hits) as hits_2 where hits_2.page.pagePath like '%/buy4/suscription%' and hits_2.eventInfo.eventAction="Checkout" )t2 on t1.client1_id=t2.client2_id) where (t1.hit_moment1-t2.hit_moment2)>0 and (t1.hit_moment1-t2.hit_moment2)<180 group by transactionId1 order by transactionid1
Where pagePath contains /buy1/suscription represents the transaction event and pagePath equal to buy4/suscription represents the first checkout step. I get results, but many of them are extremely large periods of time. Have i made a mistake?
Thank you.
I don't fully follow what the sample data looks like or exactly the format your want for the result set.
That said, you can use aggregation to do the calculation you want. The following assumes that the checkout is after the transaction, but it gives the basic idea:
select s.transaction_id,
max(hit.hour * 60 + hit.minutes) - min(hit.hour * 60 + hit.minutes) as diff_minute
from `616180.ga_sessions_*` s cross join
unnest(s.hits) as hit
where (hit.page.pagePath like '%/buy1/suscription%' and
hit.eventInfo.eventAction = 'Transaction' or
) or
(hit.page.pagePath like '%/buy4/suscription%' and
hit.eventInfo.eventAction = 'Checkout'
)
group by s.transaction_id;

Getting a query result taken from the same data but with temporary var

I got a simple thing to do.
Well, maybe not, but someone somewhere surely can help me out : P
I got a simple data structure that contains
expedition date
delivery date
transaction type
I would need to create a query which could
order the rows by a date specific to the transaction type.
(ie : using the expedition date for transaction of type "selling", and delivery date for transaction of type "purchasing")
I was wondering if there was a more efficient way to do this than
by fetching 2 times the same data with different clause where(while adding a column used to order them(tempDate)) and then using another select to encompass these 2 queries to which I would add the order clause on the tempDate.
--> the initial fetching I would do 2 times works on many tables(many, many, many joins)
Basically my current solution is :
Select * from
(
Select ...
date_exp as dateTemp;
from ...
where conditions* And dateRelatedCondition
UNION
Select ...
date_livraison as dateTemp;
from ...
Where conditions* And NOT(dateRelatedCondition)
) as comboSelect
Order By MIN(comboSelect.dateTemp)
OVER(PARTITION BY(REF_product)),
(REF_product),
comboSelect.dateTemp asc;
*
->Those conditions are the same in both inner Select query
Thank you for your time.
Without the UNION:
dateRelatedCondition should be removed from WHERE and put to the SELECT like:
CASE WHEN dateRelatedCondition THEN date_exp ELSE date_livraison END as dateTemp
Without the subquery:
in ORDER BY you need the same expression in the window function:
Order By MIN(CASE WHEN dateRelatedCondition THEN date_exp ELSE date_livraison END)
OVER(PARTITION BY(REF_product)),
(REF_product),
dateTemp asc
You mean like this?:
ORDER BY CASE
WHEN TransactionType = 'Selling' THEN ExpeditionDate
WHEN TransactionType = 'purchasing' THEN DeliveryDate
END

IBM DB2: Using MINUS to exclude information in the subselect statement

Currently I am having an issue bringing back the correct data for this particular query below. I am attempting to bring back data that excludes select criteria from the subselect statement after MINUS keyword.
SELECT
DISTINCT ORDER.OWNER, ORDER_H.PO_ID
FROM ORDER ORDER
WHERE ORDER.TYPE != 'X'
AND ORDER.STATUS='10'
AND ORDER.CLOSE_DATE IS NULL MINUS
(
SELECT
DISTINCT ORDER.OWNER, ORDER.PO_ID
FROM ORDER ORDER
INNER JOIN COST COST ON COST.PO_ID = ORDER.PO_ID
AND COST.CODE IN
(
'LGSF',
'DFCDC',
'BOF',
'TFR',
'RFR',
'TFLHC',
'BF',
'CBF',
'CHAP',
'DYPH' ,
'OFFP',
'PTWT',
'DTEN',
'OTHR',
'DMSG',
'STOR',
'TOF',
'ANTCV',
'ANTIP',
'CVD',
'TRAN'
)
WHERE ORDER.TYPE != 'OTR'
AND ORDER.STATUS = '10'
AND (COST.E_AMT > 0 AND COST.A_AMT IS NULL)
)
FOR READ ONLY WITH UR
The data coming back includes the data within the subquery instead of excluding this data from the resultset. I cannot figure out why this is the case. Does anyone have any idea why after MINUS it doesn't exclude this data and is bringing back data where COST.E_AMT is actually greater than 0 and COST.A_AMT is actually populated for each CODE listed in the subquery? Any help would be appreciated, thanks.

Need to convert SQL Query to LINQ

I have the following SQL query that I need to convert into LINQ with VB.NET
SELECT *
FROM (SELECT Id
,LocationCode
,LocationName
,ContactName
,ContactEmail
,Comments
,SBUName
,CreatedBy
,CreatedDtm
,ModifiedBy
,ModifiedDtm
,ROW_NUMBER() OVER (PARTITION BY LocationCode ORDER BY ID) AS RowNumber
FROM testDB ) as rows
WHERE ROWNUMBER = 1
There are many duplicates of location code so I only want to display one record of each and the user will be able to edit the information. Once they edit I will save the info for all records that are for that specific location code.
I couldn't use DISTINCT here, it would still bring back all of the data since the CreatedBy/ModifiedBy are different.
By using the following LINQ query to select all of the data, is there a way I can get the DISTINCT records for LocationCode out of it?
queryLocMaint = From MR In objcontextGSC.TestDB
Select MR.Id,
MR.LocationCode,
MR.LocationName,
MR.SBUName,
MR.ContactName,
MR.ContactEmail,
MR.Comments,
MR.CreatedBy,
MR.CreatedDtm,
MR.ModifiedBy,
MR.ModifiedDtm()
ROW_NUMBER is not supported in LINQ, maybe you can use this GROUP BY approach:
Dim q = From mr In objcontextGSC.TestDB
Group mr By mr.LocationCode Into LocationCodeGroup = Group
Select LocationCodeGroup.OrderBy(Function(mr) mr.Id).First()
This takes the first row of each LocationCode-group ordered by id.