what exactly PCU refer?
I have data like this can anyone tell me from this data what would be the result for peak concurrent users?
How to calculate PCU using sql query according to this data?
PCU would mean the maximum number of users logged in at the same time during a given time period.
In your data you only have two users, so it can never be more than 2. We can see that at times, both users are logged in at the same time, so PCU=2 for this data.
Assuming you have access to Window Functions / Analytic Functions...
SELECT
MAX(concurrent_users) AS peak_concurrent_users
FROM
(
SELECT
SUM(CASE WHEN event = 'logout' THEN -1 ELSE 1 END)
OVER (ORDER BY __time, user_id, event)
AS concurrent_users
FROM
yourTable
WHERE
event IN ('login', 'logout')
)
AS running_total
This just adds up how many people have logged in or out in time order, and keeps a running total. When someone logs in, the number goes up, when someone logs out, the number goes down.
Where two events happen at exactly the same time, it assumes the lowest user_id actually went first. And if a user has a login and logout at exactly the same time, it assumes the login actually went first.
Related
I am building an event reminder page where people can set a reminder for certain events. There is an option for the user to set the amount of time before they need to be notified. It is stored in notification_time and notification_unit. notification_time keeps track of the time before they want to be notified and notification_unit keeps track of the PHP date format in which they selected the time, eg. i for minutes, H for hours.
Eg. notification_time - 2 and notification_unit - H means they need to be notified 2 hours before.
I have Cron jobs running in the background for handling the notification. This function is being hit once every minute.
Reminder::where(function ($query) {
$query->where('event_time', '>=', now()->subMinutes(Carbon::createFromFormat('i', 60)->diffInMinutes() - 1)->format('H:i:s'));
$query->where('event_time', '<=', now()->subMinutes(Carbon::createFromFormat('i', 60)->diffInMinutes())->format('H:i:s'));
})
In this function, I am hard coding the 'i', 60 while it should be fetched from the database. event_time is also part of the same table
The table looks something like this -
id event_time ... notification_unit notification_time created_at updated_at
Is there any way to solve this issue? Is it possible to do the same logic with SQL instead?
A direct answer to this question is not possible. I found 2 ways to resolve my issue.
First solution
Mysql has DATEDIFF and DATE_SUB to get timestamp difference and subtract certain intervals from a timestamp. In my case, the function runs every minute. To use them, I have to refactor my database to store the time and unit in seconds in the database. Then do the calculation. I chose not to use this way because both operations are a bit heavy on the server-side since I am running the function every minute.
Second Solution
This is the solution that I personally did in my case. Here I did the calculations while storing it in the database. Meaning? Let me explain. I created a new table notification_settings which is linked to the reminder (one-one relation). The table looks like this
id, unit, time, notify_at, repeating, created_at, updated_at
The unit and time columns are only used while displaying the reminder. What I did is, I calculated when to be notified in the notify_at column. So in the event scheduler, I need to check for the reminders at present (since I am running it every minute). The repeating column is there to keep track of whether the reminder is repeating or not. If it is repeating I re-calculate the notify_at column at the time of scheduling. Once the user is notified notify_at is set to null.
I am trying to count how many 'Uses' occur in my data set using HIVE.
I have columns for individual user IDs, timestamps in unix epoch time, event names, and length of events in seconds in my data.
'Use' is considered anytime a user triggers an event. The problem is if a user triggers an event and then triggers another within five minutes, I am to count as the same 'Use'
I'm having a difficult time mentally figuring out how to account for the five minute window when counting. I don't seem to be able to make a bunch of 'create tables' in HIVE like I would messily do in SQL to avoid too many subqueries, as I get lost easily in those.
This seems like it would be a standard problem, is there a smart or obvious solution to handling items like these?
Thank You
In Hive, you can use lag() to see if there is another record five minutes before a given record. If there is not, then set a flag to 1 and count that:
select count(*)
from (select t.*,
lag(timestamp) over (partition by user order by timestamp) as prev_timestamp
from t
) t
where prev_timestamp is null or
(timestamp - prev_timestamp) > 5*60;
I'm making a website that has logos that need to be guessed. Currently this is my db setup:
Users(user_id, etc)
Logos(logo_id, img, company, level)
Guess(guess_id, user_id, logo_id, guess, guess_count, guessed, time)
When a user does a guess, it's done with an ajax request. In this request, 2 queries are done. 1 to retrieve the company-data (from Logos), one to insert/update the new guess in the db (Guess).
Now, on every page-load I need to know the total amount of guesses, and how many logos there are per level. This requires 2 queries - one that checks the Logos, one that gets the amount of guessed (guessed = 1) guesses from Guess per level.
Now I want to implement some kind of badge-system, like here on SO. Reading through some other questions, I saw that it might be better to have a separate table containing a the total amount of guesses and such, so that it takes the same resources if a user has 10 guesses or 10000. I didn't do this for several reasons:
requires an extra query in my ajax-call, which I'd like to keep as short as possible
page-reloading shouldn't happen that frequently, so that shouldn't take too long
I wouldn't know how to count the total amounts of guesses per level, unless the table would look like: AmountOfGuesses(id, user_id, level, counter) but then it'd take more resources depending on the amount of levels you've unlocked.
As for the badge system, I know the terms should just be checked eg when a user submits an answer. If course this requires yet another query every time an answer is submitted, namely to check the total amount of answers the user has. Then depending on that amount, the badge should be assigned. As for the badges, I was thinking about a table-structure like so:
Badges( badge_id, name, description, etc)
BadgeAssigned( user_id, badge_id, time )
Does this structure seem good for badges?
Is the current structure of the rest of my database good, or is it better if it is adjusted?
I was told redis was born for analytic, and I came across some bitmap using cases. They are useful when counting based on yes/no(0/1), but I can't find an efficient way to count the number of user who login at least 4 times during the last 10 days. Because redis runs in memory, I tried using bit map to keep track login flag of each user, and using bitcount to filer, on my laptop, it took a minute to return the count from about 4Million users' login activity.
Is there any way to solve this problem? I guess the round trips between my node redis client and redis server may be the issue, I'll try batch command or lua script to see if it works.
I think you need to use SortedSets with user id in value, and timestamp in score.
When user logs in, score (time stamp) for this user updates to current. Than you can get ether N last logged in users (ZREVRANGE), or users, logged in between some datetime range (ZRANGEBYSCORE)
I have a MySQL table LOGIN_LOG with fields ID, PLAYER, TIMESTAMP and ACTION. ACTION can be either 'login' or 'logout'. Only around 20% of the logins have an accompanying logout row. For those that do, I want to calculate the average duration.
I'm thinking of something like
select avg(LL2.TIMESTAMP - LL1.TIMESTAMP)
from LOGIN_LOG LL1
inner join LOGIN_LOG LL2 on LL1.PLAYER = LL2.PLAYER and LL2.TIMESTAMP > LL1.TIMESTAMP
left join LOGIN_LOG LL3 on LL3.PLAYER = LL1.PLAYER
and LL3.TIMESTAMP between LL1.TIMESTAMP + 1 and LL2.TIMESTAMP - 1
and LL3.ACTION = 'login'
where LL1.ACTION = 'login' and LL2.ACTION = 'logout' and isnull(LL3.ID)
is this the best way to do it, or is there one more efficient?
Given the data you have, there probably isn't anything much faster you can do because you have to look at a LOGIN and a LOGOUT record, and ensure there is no other LOGIN (or LOGOUT?) record for the same user between the two.
Alternatively, find a way to ensure that a disconnect records a logout, so that the data is complete (instead of 20% complete). However, the query probably still has to ensure that the criteria are all met, so it won't help the query all that much.
If you can get the data into a format where the LOGIN and corresponding LOGOUT times are both in the same record, then you can simplify the query immensely. I'm not clear if the SessionManager does that for you.
Do you have a SessionManager type object that can timeout sessions? Because a timeout could be logged there, and you could get the last activity time from that and the timeout period.
Or you log all activity on the website/service, and thus you can query website/service visit duration directly, and see what activities they performed. For a website, Apache log analysers can probably generate the required stats.
I agree with JeeBee, but another advantage to a SessionManager type object is that you can handle the sessionEnd event and write a logout row with the active time in it. This way you would likely go from 20% accompanying logout rows to 100% accompanying logout rows. Querying for the activity time would then be trivial and consistent for all sessions.
If only 20% of your users actually log out, this search will not give you a very accurate time of each session. A better way to gauge how long an average user session is would be to take the average time between actions, or avg. time per page. This, then, can multiplied by the average number of pages/actions per visit to give a more accurate time.
Additionally, you can determine avg. time for each page, and then get your session end time = session time to that point + avg time spent on their last page. This will give you a much more fine-grained(and accurate) measure of time spent per session.
Regarding the given SQL, it seems to be more complicated than you really need. This sort of statistical operation can often be better handled/more maintainable in code external to the database where you can have the full power of whichever language you choose, and not just the rather convoluted abilities of SQL for statistical calculations