SQL database – track user's activity - sql

I have a simple database structure, with models and relations:
Models:
User
Group
Activity
Relations:
User/Group –> User belongs to Group, Group has many Users
User/Activity –> User has many Activities, Acitivty belongs to User
Group/Activity –> Activity belongs to Group, Group has many Activities
My problem – I want to be able to track number of activities performed in the group by the user within a given period (probably per week, but possibly per day) and I do not know what's the best/ most performant way to achieve this.
Theoretically, I can just perform a query that would count those activities based on the created_at date attribute but I assume this is not the most performant way (am I wrong?)
Does anyone know how to properly structure something like this?

As per the relationship provided by you, your activity table has a foreign key reference to user_id and group_id, you can get the count of a user activity under a group in a day.
SELECT a.user_id, a.group_id, count(a.user_id)
FROM activity a
WHERE a.user_id = '123'
AND a.group_id = '1'
AND a.activity_time >= '2019-08-31'
AND a.activity_time < '2019-08-31' + INTERVAL 1 DAY
Create a composite key on user_id, group_id, activity_time for faster retrieval if table size increases in the future.
Please note this query is in MySQL.

Related

Select the last user_property in a dataset in bigQuery per user

I have this query, and the goal of it is to get all the user_properties of the events that are stored in the dataset, now the result is around 300k+ per day and it is quite too big and I only care for one user_property per user since it will have the keys that I want
To explain it more, we record the events done by a user on the mobile/web app in the dataset, so each button he clicks or every screen he searches for, we record those in order to be used later for analysis by clients, so a single user may have 0 or 100 events per day or more and usually, the last event been recorded contains all the updated keys I want
SELECT
user_pseudo_id AS user_id,
user_properties AS user_properties
FROM
`TABLENAME`
order by user_pseudo_id, event_timestamp
I tried grouping the user_properties by user_pseaudo_id, but that obviously didn't work because the properties are not the same
My solution was to get all the results from the query above, loop over the result, and store them in a Map<String, List<FieldValue>>, well this solution Is doing the trick but userPropertiesResult.iterateAll() is too expensive and is taking a lot of time
So I came up with a better query that reduced the number of rows by a lot following this answer https://stackoverflow.com/a/43863450/7298897
SELECT
a.user_pseudo_id AS user_id,
a.user_properties AS user_properties
FROM
`TABLENAME` AS a
JOIN (
SELECT
user_pseudo_id,
MAX(event_timestamp) AS event_timestamp
FROM
`TABLENAME`
GROUP BY
user_pseudo_id) AS b
ON
a.user_pseudo_id = b.user_pseudo_id
AND a.event_timestamp = b.event_timestamp
But the problem is that the data returned is not accurate as it was before
So my question would be, How I can get the last user_property only per user?

Subscription History

I have a table of user_ids, subscription_tier, activity_date (which is each date the user was active). How would I construct a table which would for each user show their movement and dates between tiers.
Using Min(activity_date) and Max(activity_date) would only work if the user didn't move to a tier they had previously been at. Where the reality is people upgrade and downgrade all the time.
What I want to create is a table with the columns user_id, subscription_tier, tier_start_date, tier_end_date.
Do you just want `lead()?
select user_id, subscription_tier, activity_date as tier_start_date,
lead(activity_date) over (partition by user_id order by activity_date) as tier_end_date
from t;

SQL schema Site Leader Board

So I am trying to set up a site which has challenges and then want to convert that to leader boards for each challenge, and then an all time leaderboard.
So I have a challenges table that looks like this:
Challenge ID Challenge Name Challenge Date Sport Prize Pool
Then I need a way so each challenge has its own leader board of say 50 people.
linked by the challenge ID where that will = Leaderboard ID
I have a leader board of 50 people for that challenge that will look something like this:
Challenge ID User Place Prize Won
My question is 2 things:
How can I make a table auto create when a new challenge is added to the challenges table?
How can I get an A site wide leader board for every challenge so it will show the following:
Rank USER Prize Money Won(total every challenge placed)
and then base rank order by how much money won..
I know this is a lot of questions all wrapped in one, schema design and logic.
Any insights greatly appreciated
A better approach than one table per challenge is one table for all of them. That way you can compute grand totals and individual challenge rankings all with the same table. You'd also want to not record the place directly but compute it on the fly with the appropriate window function depending on how you want to handle ties (rank(), dense_rank(), and row_number() will have different results in those cases); that way you don't have to keep adjusting it as you add new records.
A table something like (You didn't specify a SQL database, so I'm going to assume Sqlite. Adjust as needed.):
CREATE TABLE challenge_scores(user_id INTEGER REFERENCES users(id),
challenge_id INTEGER REFERENCES challenges(id),
prize_amount NUMERIC,
PRIMARY KEY(user_id, challenge_id));
will let you do things like
SELECT *
FROM (SELECT user_id,
sum(prize_amount) AS total,
rank() OVER (ORDER BY sum(prize_amount) DESC) AS place
FROM challenge_scores
GROUP BY user_id)
WHERE place <= 50
ORDER BY place;
for the global leaderboard, or the similar:
SELECT *
FROM (SELECT user_id,
prize_amount,
rank() OVER (ORDER BY prize_amount DESC) AS place
FROM challenge_scores
WHERE challenge_id = :some_challenge_id
GROUP BY user_id)
WHERE place <= 50
ORDER BY place;
for a specific challenge's.

Database Design for user defined groups

I'm trying to figure out the best way to design a database to support private user-defined groups. Pretty much identical to how Google Circles are. These are to be for JUST the user, much like circles are - that's why creating a user group design like I found here: https://stackoverflow.com/a/9805712/2580503 would be undesirable.
So far the only solution I can come up with is to have a table like this:
USER_ID | GROUP_ID | ARRAY(USER_ID)
Where the PKEY would actually be a compound key of (USER_ID, GROUP_ID). This way a user could have multiple groups.
Would greatly appreciate any feedback on this proposed solution and would love to hear if there is a better way to do it.
Thanks!
Edit: Just to clarify, GROUP_ID would not reference a separate table, it would just indicate the number group for that user. Also there would be a name etc. for the group as well - just wasn't necessary to include as part of the question.
This must involve at least three (3) tables if you want a normalized design. USERS, USER_GROUPS, and USER_GROUPS_MEMBERS. You are correct that the PK of USER_GROUPS would be a dyad (USER, GROUP). The PK of USER_GROUPS_MEMBERS would be a triad (USER, GROUP, USER).
What about?
Groups (GROUP_ID, USER_ID, GROUP_NAME)
Members (MEMBER_ID, GROUP_ID, USER_ID)
Although Groups might appear backwards, it actually lists the USER_ID that owns a GROUP_ID while Members gives the MEMBER_ID to which could be associated rows that have to do with this USER_ID in the given GROUP_ID.
How about
**Users**
id, name
**Groups**
id, name
**User_Groups**
id, user_id, group_id
**Group_users**
id, user_group_id, user_id
I have separated groups and user_groups assuming that there could be possibilities that you wish to have a few default groups for every user, If this is not the case, you can move the group_name directly to the user_groups and ignore the groups table

Compute Users average weight

I have two tables, Users and DoctorVisit
User
- UserID
- Name
DoctorsVisit
- UserID
- Weight
- Date
The doctorVisit table contains all the visits a particular user did to the doctor.
The user's weight is recorded per visit.
Query: Sum up all the Users weight, using the last doctor's visit's numbers. (then divide by number of users to get the average weight)
Note: some users may have not visited the doctor at all, while others may have visited many times.
I need the average weight of all users, but using the latest weight.
Update
I want the average weight across all users.
If I understand your question correctly, you should be able to get the average weight of all users based on their last visit from the following SQL statement. We use a subquery to get the last visit as a filter.
SELECT avg(uv.weight) FROM (SELECT weight FROM uservisit uv INNER JOIN
(SELECT userid, MAX(dateVisited) DateVisited FROM uservisit GROUP BY userid) us
ON us.UserID = uv.UserId and us.DateVisited = uv.DateVisited
I should point out that this does assume that there is a unique UserID that can be used to determine uniqueness. Also, if the DateVisited doesn't include a time but just a date, one patient who visits twice on the same day could skew the data.
This should get you the average weight per user if they have visited:
select user.name, temp.AvgWeight
from user left outer join (select userid, avg(weight)
from doctorsvisit
group by userid) temp
on user.userid = temp.userid
Write a query to select the most recent weight for each user (QueryA), and use that query as an inner select of a query to select the average (QueryB), e.g.,
SELECT AVG(weight) FROM (QueryA)
I think there's a mistake in your specs.
If you divide by all the users, your average will be too low. Each user that has no doctor visits will tend to drag the average towards zero. I don't believe that's what you want.
I'm too lazy to come up with an actual query, but it's going to be one of these things where you use a self join between the base table and a query with a group by that pulls out all the relevant Id, Visit Date pairs from the base table. The only thing you need the User table for is the Name.
We had a sample of the same problem in here a couple of weeks ago, I think. By the "same problem", I mean the problem where we want an attribute of the representative of a group, but where the attribute we want isn't included in the group by clause.
I think this will work, though I could be wrong:
Use an inner select to make sure you have the most recent visit, then use AVG. Your User table in this example is superfluous: since you have no weight data there and you don't care about user names, it doesn't do you any good to examine it.
SELECT AVG(dv.Weight)
FROM DoctorsVisit dv
WHERE dv.Date = (
SELECT MAX(Date)
FROM DoctorsVisit innerdv
WHERE innerdv.UserID = dv.UserID
)
If you're using SQL Server 2005 you don't need the sub query on the GROUP BY.
You can use the new ROW_NUMBER and PARTION BY functionality.
SELECT AVG(a.weight) FROM
(select
ROW_NUMBER() OVER(PARTITION BY dv.UserId ORDER BY Date desc) as ID,
dv.weight
from
DoctorsVisit dv) a
WHERE a.Id = 1
As someone else has mentioned though, this is the average weight across all the users who have VISITED the doctor. If you want the average weight across ALL of the users then anyone not visiting the doctor will give a misleading average.
Here's my stab at the solution:
select
avg(a.Weight) as AverageWeight
from
DoctorsVisit as a
innner join
(select
UserID,
max (Date) as LatestDate
from
DoctorsVisit
group by
UserID) as b
on a.UserID = b.UserID and a.Date = b.LatestDate;
Note that the User table isn't used at all.
This average omits entirely users who have no doctors visits at all, or whose weight is recorded as NULL in their latest doctors visit. This average is skewed if any users have more than one visit on the same date, and if the latest date is one of those date where the user got wighed more than once.