mysql counts in a group by - sql

Say I have a voting table, where users can vote on values up, down, or flat.
Say a user gets a point each time the corrcet projection is made.
At the end of each week I want to display some statistics.
Something like:
SELECT user_id, sum( user_points ) as sum_points FROM voting_results
WHERE voting_date > ('2009-09-18' - INTERVAL 1 WEEK)
GROUP BY user_id
ORDER BY sum_points DESC
Fine. This will get me a nice list where the "best guessing" user comes up first.
Here's my question:
How do I - in the same query - go about obtaining how many times each user has voted during the given timeperiod?
Put another way: I want a count - per row - that need to contain the number of rows found with the user_id within the above mentioned query.
Any suggestions?
Thanks.

Just add COUNT(*):
SELECT user_id,
SUM(user_points) as sum_points,
COUNT(*) AS num_votes
FROM voting_results
WHERE voting_date > ('2009-09-18' - INTERVAL 1 WEEK)
GROUP BY
user_id
ORDER BY
sum_points DESC

Related

Sql query to get bounce rate based on session id and datetime

We have table with 3 columns- Url of Page Visited, User Session ID and Datetime.
Based on this information we have generate result with 2 columns - Date (unique) and Bounce Rate.
It is very clear that we need to look for single occurrences of session id, if there are 2 entries for same session id it means the user hitted the another page and didn't bounced but one entry means it bounced.
I can not write a sql query for this. I tried grouping data by session id and date but couldn't get the result in required format.
Can anyone do this?
If you want the number of sessions with only one page per day, you can use aggregation:
select dte,
avg( (num_pages = 1)::int ) as bounce_rate
from (select sessionid, min(datetime)::date as dte, count(*) as num_pages
from t
group by sessionid
) t
group by dte;

SQL question: count of occurrence greater than N in any given hour

I'm looking through login logs (in Netezza) and trying to find users who have greater than a certain number of logins in any 1 hour time period (any consecutive 60 minute period, as opposed to strictly a clock hour) since December 1st. I've viewed the following posts, but most seem to address searching within a specific time range, not ANY given time period. Thanks.
https://dba.stackexchange.com/questions/137660/counting-number-of-occurences-in-a-time-period
https://dba.stackexchange.com/questions/67881/calculating-the-maximum-seen-so-far-for-each-point-in-time
Count records per hour within a time span
You could use the analytic function lag to look back in a sorted sequence of time stamps to see whether the record that came 19 entries earlier is within an hour difference:
with cte as (
select user_id,
login_time,
lag(login_time, 19) over (partition by user_id order by login_time) as lag_time
from userlog
order by user_id,
login_time
)
select user_id,
min(login_time) as login_time
from cte
where extract(epoch from (login_time - lag_time)) < 3600
group by user_id
The output will show the matching users with the first occurrence when they logged a twentieth time within an hour.
I think you might do something like that (I'll use a login table, with user, datetime as single column for the sake of simplicity):
with connections as (
select ua.user
, ua.datetime
from user_logons ua
where ua.datetime >= timestamp'2018-12-01 00:00:00'
)
select ua.user
, ua.datetime
, (select count(*)
from connections ut
where ut.user = ua.user
and ut.datetime between ua.datetime and (ua.datetime + 1 hour)
) as consecutive_logons
from connections ua
It is up to you to complete with your columns (user, datetime)
It is up to you to find the dateadd facilities (ua.datetime + 1 hour won't work); this is more or less dependent on the DB implementation, for example it is DATE_ADD in mySQL (https://www.w3schools.com/SQl/func_mysql_date_add.asp)
Due to the subquery (select count(*) ...), the whole query will not be the fastest because it is a corelative subquery - it needs to be reevaluated for each row.
The with is simply to compute a subset of user_logons to minimize its cost. This might not be useful, however this will lessen the complexity of the query.
You might have better performance using a stored function or a language driven (eg: java, php, ...) function.

Find most recent date of purchase in user day table

I'm trying to put together a query that will fetch the date, purchase amount, and number of transactions of the last time each user made a purchase. I am pulling from a user day table that contains a row for each time a user does anything in the app, purchase or not. Basically all I am trying to get is the most recent date in which the number of transactions field was greater than zero. The below query returns all days of purchase made by a particular user when all I'm looking for is the last purchase so just the 1st row shown in the attached screenshot is what I am trying to get.
screen shot of query and result set
select tuid, max(event_day),
purchases_day_rev as last_dop_rev,
purchases_day_num as last_dop_quantity,
purchases_day_rev/nullif(purchases_day_num,0) as last_dop_spend_pp
from
(select tuid, event_day,purchases_day_rev,purchases_day_num
from
app.user_day
where purchases_day_num > 0
and tuid='122d665e-1d71-4319-bb0d-05c7f37a28b0'
group by 1,2,3,4) a
group by 1,3,4,5
I'm not going to comment on the logic of your query... if all you want is the first row of your result set, you can try:
<your query here> ORDER BY 2 DESC LIMIT 1 ;
Where ORDER BY 2 DESC orders the result set on max(event_day) and LIMIT 1 extracts only the first row.
I don't know all of the ins and outs of your data, but I don't understand why you are grouping within the subquery without any aggregate function (sum, average, min, max, etc). With that said, I would try something like this:
select tuid
,event_day
,purchases_day_rev as last_dop_rev
,purchases_day_num as last_dop_quantity
,purchases_day_rev/nullif(purchases_day_num,0) as last_day_spend_pp
from app.user_day a
inner join
(
select tuid
,max(event_day) as MAX_DAY
from app.user_day
where purchases_day_num > 0
and tuid='122d665e-1d71-4319-bb0d-05c7f37a28b0'
group by 1
) b
on a.tuid = b.tuid
and a.event_day = b.max_day;

Selecting the first and last event per user, per day

I have a Google Analytics event which fires on my website when certain interactions are made, this may or may not fire for a user in a session, or can fire many times.
I'd like to return results showing the userID and the value of the first and last event label, per day. I have tried to do this with MAX(hits.eventInfo.eventLabel), but when I fact check my results this is not returning the last value for that user in the day as I was expecting.
SELECT Date,
customDimension.value AS UserID,
MAX(hits.eventInfo.eventLabel) AS last_value
FROM `project.dataset.ga_sessions_20*` AS t
CROSS JOIN UNNEST(hits) AS hits
CROSS JOIN UNNEST(t.customdimensions) AS customDimension
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
AND hits.eventInfo.eventAction = "Value"
AND customDimension.index = 2
GROUP BY Date, UserID
For example, the query above returns results where user X has the following MAX() value:
20180806 User_x 69.96
But when I look at the details of that users interactions on the day I see:
Based on this, I would expect to see 79.95 as my MAX() result as it has the highest hit number, instead I seem to have selected a value from somewhere in the middle of the session - how can I adjust my query to ensure I select the last event value?
When you are looking for maximum value of column colA while doing GROUP BY - obviously MAX(colA) will work
But when you are looking for value in column colA based on maximum value in column colB - you should use STRING_AGG(colA ORDER BY colB DESC LIMIT 1) or similar using ARRAY_AGG()
So, in you case, I think it will be something like below (you should tune it further)
STRING_AGG(eventInfo.eventLabel ORDER BY hiNumber DESC LIMIT 1) AS last_value
In your case one should work with subqueries on the hits array. This allows full control over what you want to have. I used the example ga data from Google, so labels are different. But I wrote it in a way you can easily modify to fit your needs:
SELECT
date,
fullvisitorid,
visitstarttime,
(SELECT value FROM t.customDimensions WHERE index=2) userId,
(SELECT
--STRUCT(hour, minute, hitNumber, eventinfo.eventlabel) -- for testing, comment out next line
eventInfo.eventLabel
FROM t.hits
WHERE type='EVENT' AND eventInfo.eventAction <> '' -- modify to fit your condition
ORDER BY hitNumber ASC LIMIT 1
) AS firstEventLabel,
(SELECT
--STRUCT(hour, minute, hitNumber, eventinfo.eventlabel) -- for testing, comment out next line
eventInfo.eventLabel
FROM t.hits
WHERE type='EVENT' AND eventInfo.eventAction <> '' -- modify to fit your condition
ORDER BY hitNumber DESC LIMIT 1
) AS lastEventLabel
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t
LIMIT 1000 -- for testing
Basically, I'm querying events order them by hitNumber ascending or descending and limit to one to only have one result per row. The line with userId also shows how to properly get a custom dimension value.
If you are very new to this concept of working with arrays you can learn all about it here: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays
MAX() should work. The one time it would return an unexpected value is if it is operating on a string, not a number.
Does this fix the problem?
MAX(CAST(hits.eventInfo.eventLabel as float128)) AS last_value

SQL: filter by date

I have a table SIGNUPS, where I register all signups to a specific event. Now, I would like to get all people who signed up to an event, with an extra column STATUS telling if the user is actually accepted (STATUS = "OK") or if it is in a waiting list (STATUS="WL"). I tried something like this
SELECT *, IDUSER IN (SELECT IDUSER FROM SIGNUPS ORDER BY DATE ASC LIMIT 10)
as STATUS from SIGNUPS WHERE IDEVENT = 1
This should return STATUS 1 for the first 10 users who signed up, and 0 for all other ones. Unluckily, I get a Mysql error telling me that LIMIT in subqueries is not yet supported.
Could you please suggest another way to get the same information?
Thanks
Something like the following will get what you need - although I haven't tested it against some sample tables. The subqueries find the date above which the last ten signups occur, which is then used to comapre to the date of the current row.
select
s.*,
s.DATE > d.min_date_10 AS STATUS
from SIGNUPS s
join (
select MIN(DATE) AS min_date_10 from (
select DATE from SIGNUPS order by DATE asc LIMIT 10
) a
) d
WHERE IDEVENT = 1