Count of unique identifier based on 2 columns sql - sql

I have a simple question! I need to calculate how many user_id's attended multiple events (confirmed or registered). There are 24 distinct event id's in the real dataset. Here is my query that did not provide me the results I was looking for: Where did I go wrong? This is done in BigQuery if that info was needed.
SELECT COUNT(user_id) as volunteer_count,
status,
COUNT(DISTINCT event_id) as event_count
FROM `nice-incline.events_participations.events_participations`
WHERE status = 'CONFIRMED' OR status = 'REGISTERED'
GROUP BY status
HAVING event_count > 1;
Below is a sample table.
event_id
user_id
status
378398
1783965
confirmed
418729
4518485
registered
378398
4518485
registered
418729
4432831
canceled
The expected results would be just be a count of user_id's who have attended multiple (>1) event_id's. So in this case, we would only have 1 since user '4518485' attended 2 events and they are registered

Use below
select count(*) from (
select user_id
from your_table
where status in ('confirmed', 'registered')
group by user_id
having count(*) > 1
)
or (with same output)
select count(*) from (
select distinct user_id
from your_table
where status in ('confirmed', 'registered')
qualify count(*) over(partition by user_id) > 1
)

Related

BigQuery / Count the number of rows until a specific row is reached 2

I have data in BigQuery.
I want to count the number of 'pending' events before their 'approved' event per ID.
Attention. If the ID does not have 'approved', then that group of events should not be counted (see the last two rows).
How would I get the value for every individual ID?
Table events
id event
1 pending
1 pending
1 pending
1 approved
2 pending
1 pending
1 pending
1 approved
2 approved
1 pending
1 pending
In this example the right result is
id count_events
1 3
1 2
2 1
As pointed out by #Schwern, if you don't have a column giving you an idea of the order of the events, you cannot get the result you expect.
That being said, here is a solution if you have a event_date or event_timestamp column:
WITH temp AS(
SELECT
id,
event,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY event_date) AS rownum
FROM
sample )
SELECT
id,
event,
rownum-COALESCE(LAG(rownum) OVER(PARTITION BY id ORDER BY rownum), 0)-1 AS count_events
FROM
temp
WHERE
event = 'approved'
With the data you provided, it returns the desired output:
The logic behind the query is to say that the count of 'pending's before an 'approved' is the position of the 'approved' (it's row number) minus the position of the previous 'approved' minus 1.
There really isn't enough info for me to write an exact query but you could use GROUP BY and COUNT
SELECT * FROM (
SELECT id, event, COUNT(event)
FROM 'table'
GROUP BY id, event)
WHERE event <> 'approved'

Oracle sum 1 to count by partition

I'd like to, based on the first 6 columns, calculate the desired count. For each partition of user_id, session_id and orig_id, ordered by rank_agse ascending, I'd like to, starting in 1, add one every time the lag_agse column equals 'ACCESSED'. Please find it populated to illustrate what I would want in the table below.
It seems to me that you are looking for
select user_id, session_id, orig_id, type, lag_agse, rank_agse,
count(case when type = 'ACCESSED' then 1 end)
over (partition by user_id, session_id, orig_id
order by rank_agse) as desired_count
from your_table
order by user_id, session_id, orig_id, rank_agse desc
;
See my Comment under your question regarding ascending vs descending order by RANK_AGSE.
Note that count() does the same job as summing over 1 when type is 'ACCESSED' and 0 otherwise - and it does the same job in a simpler way.

SQL - timeline based queries

I have a table of events which has:
user_id
event_name
event_time
There are event names of types: meeting_started, meeting_ended, email_sent
I want to create a query that counts the number of times an email has been send during a meeting.
UPDATE: I'm using Google BigQuery.
Example query:
SELECT
event_name,
count(distinct user_id) users,
FROM
events_table WHERE
and event_name IN ('meeting_started', 'meeting_ended')
group by 1
How can I achieve that?
Thanks!
You can do this in BigQuery using last_value():
Presumably, an email is send during a meeting if the most recent "meeting" event is 'meeting_started'. So, you can solve this by getting the most recent meeting event for each event and then filtering:
select et.*
from (select et.*,
last_value(case when event_name in ('meeting_started', 'meeting_ended') then event_name end) ignore nulls) over
(partition by user_id order by event_time) as last_meeting_event
from events_table et
) et
where event_name = 'email_sent' and last_meeting_event = 'meeting_started'
This reads likes some kind of gaps-and-islands problem, where an island is a meeting, and you want emails that belong to islands.
How do we define an island? Assuming that meeting starts and ends properly interleave, we can just compare the count of starts and ends on a per-user basis. If there are more starts than ends, then a meeting is in progress. Using this logic, you can get all emails that were sent during a meeting like so:
select *
from (
select e.*,
countif(event_name = 'meeting_started') over(partition by user_id order by event_time) as cnt_started,
countif(event_name = 'meeting_ended' ) over(partition by user_id order by event_time) as cnt_ended
from events_table e
) e
where event_name = 'email_sent' and cnt_started > cnt_ended
It is unclear where you want to go from here. If you want the count of such emails, just use select count(*) instead of select * in the outer query.

Postgresql count results for returned user field

I am running the following postgres SQL query:
SELECT user_id FROM user_log WHERE date>='2016-08-09' ORDER by user_id ASC
It returns the result and groups them by user_id, so for example I can end up with multiple results from same user_id, like the example below:
user_id
1001
1001
1001
1008
1008
instead of listing each user_id, i want to just count how many results for each user_id. So for the example above I would like to know that 1001 is 3 and 1008 is 2.
Is there any way to do this directly with a SQL query?
You can try doing a simple GROUP BY query, with user_id determining the group for which you want the count:
SELECT user_id, COUNT(*) AS userCount
FROM user_log
WHERE date>='2016-08-09'
GROUP BY user_id
ORDER by user_id ASC
If you want to restrict, for example, to users only having a count of at least 3, then you can add a HAVING clause:
SELECT user_id, COUNT(*) AS userCount
FROM user_log
WHERE date>='2016-08-09'
GROUP BY user_id
HAVING COUNT(*) >= 3
ORDER by user_id ASC

How can I return top 4 results by user using Postgres sql

Here is my problem: I have a table that stores the results of surveys taken by users. A single user can take multiple surveys. The key columns are: user_id, the user identifier and survey_id, the unique identifier of the survey. survey_id is incremented each time a survey is taken so if I query a specific user_id, order by survey_id descending and limit to top 4 rows I can get the last 4 surveys for a given user. My problem is I would like to query the last 4 surveys for all users in the table. I'm stumped on how to do this but this is what I have so far:
SELECT *
FROM
(
SELECT user_id
FROM
(
SELECT
user_id, count(all_survey_res_id) as numsurveys
FROM
all_survey_res
GROUP BY user_id
ORDER BY count(all_survey_res_id) DESC
) AS T1
WHERE numsurveys >= 4
)
ORDER BY user_id, all_survey_res_id
This gives me all of the records for each user that has more than 4 surveys but I then want to limit the result to just those top 4 surveys. I could solve this with code back in the application but I would rather see if I can just get the query to do this.
I think you can do this with window functions:
select
*
from (
select
user_id,
survey_id,
row_number() over (partition by user_id order by survey_id desc) rn
from
all_survey_res
) x
where
x.rn <= 4