Rails: Statistic

Rails: Statistic - sql

Hi I am new to programming and rails, and I am trying to create an admin interface in my app that shows stats. I have a Job model that has many Responses, and I need to collect the average response time for each day. In order to collect the response time for the first job I would do the following
job = Job.first
response = job.responses.first
response_time = response.created_at - job.created_at
This is very simple, but I am having trouble trying to collect this information for all the jobs of that day. Im trying to come up with a solution that will give me an array of data pairs. For example {[June 17, 51s], [June 18, 60s], [June 19, 38s], ... etc}.
I cant seem to figure out the correct rails active record call that will give me what I need

Don't think you are going to find an active record solution, but you have what you need, just need to add a little ruby.
Probably a 100 ways to do it, here is one way that creates a hash with the number of whole days from the job creation date as the key and the count as the value
job = Job.first
start_date = job.created_at
response_dates = job.responses.pluck(:created_at) #creates an array of created_at datetimes
day_stats = response_dates.each_with_object(Hash.new(0)) { |dt, h| h[((dt - start_date)/1.day).round(0)] += 1 }
This basically iterates through the datetime array, subtracts the response date from the job date, divides it by 1 day and rounds it to a whole day.
Output would be something like:
=> {0=>1, 5=>2, 6=>1, 7=>2, 9=>1, 31=>1, 37=>6, 40=>1, 42=>3, 44=>1, 59=>32, 60=>59, 61=>2, 64=>1, 65=>2, 78=>168, 97=>39, 93=>2, 110=>1, 214=>1}
If you want the date, you could add the key*1.day to the start_date

Related

Using IF THEN in Access 2010 Query

I'm not very knowledgeable in coding of Access queries, so I hope someone can help with this issue.
I have a query (using the query builder) that has a field named RetrainInterval from table tblProcedures (this will return a number like 1, 3, 6, 12, etc.; the rotational months the particular document have to be retrained on) and another field named Training/Qualification Date from table tblTrainingRecords.
I want the query to look at the RetrainInterval for a given record (record field is ClassID in tblProcedures) and then look at the Training/Qualification Date and calculate if that record should be in the query.
In a module I would do this:
IF RetrainInterval = 1 Then
DateAdd("m",1,[Training/Qualification Date]) <add to query if <=today()+30>
ElseIf RetrainInterval = 3 Then
DateAdd("m",3,[Training/Qualification Date]) <add to query if <=today()+30>
ElseIF......
How can I translate this into something that would work in a query? My end goal is to generate a report that will show me what document class numbers are due within a specified time interval (say I enter 30 in the form textbox to represent any upcoming required training within 30 days of the query), but all of the calculations to determine this is based off of when the last training date was (stored in the training records table). I also want to make sure that I do not get multiple returns for the same class number since there will be multiple training entries for each class, just grab the minimum last training date. I hope I explained it well enough. It's hard to put this into words on what I am trying to do without positing up the entire database.
UPDATE
I think I have simplified this a bit after getting some rest. Here are two images, one is the current query, and one is what comes up in the report. I have been able to refine this a bit, but now my problem is I only want the particular Class to show once on the report, not twice, even though I have multiple retrain due dates (because everything is looking at the table that holds the employee training data and will have multiple training's for each Class number). I would like to only show one date, the oldest. Hope that makes sense.
Query - http://postimg.org/image/cpcn998zx/
Report - http://postimg.org/image/krl5945l9/

When RetrainInterval = 1, you add 1 month to [Training/Qualification Date].
When RetrainInterval = 3, you add 3 months to [Training/Qualification Date].
And so on.
The pattern appears to be that RetrainInterval is the number of months to add. If that is true, use RetrainInterval directly in your DateAdd() expression and don't bother about IF THEN.
DateAdd("m", RetrainInterval, [Training/Qualification Date])

You can not do that in a query. Been there, cursed that!
You can use the IFF( 2>x ; 1 ;0)
Giving that if the first statement is true, 1 is returned, and 0 if false.
You can not return a criteria like IFF(2>x ; Cell>2 ; Cell>0) (Not possible) It will just return 0 if you try, i think. it will not give an error all the time.
You have to use criterias!
I would to something like this picture:
I hope you follow, else let me know.

Django method to limit grouped queries

I want to return 3 results for every date and have the results ordered by the date and by a separate 'rating' column for each query
For example, my query would return something like this:
Event on Dec 1st rated 36
Event on Dec 1st rated 29
Event on Dec 1st rated 12
Event on Dec 2nd rated 45
Event on Dec 2nd rated 12
Event on Dec 2nd rated 9
Event on Dec 3rd rated 118
Event on Dec 3rd rated 15
Event on Dec 3rd rated 13
I know this should be possible using raw sql with something like this: SQL group - limit
But I am wondering whether there is a way to do this within the Django ORM in a single query or at least a way to make it as painless as possible if I do need to convert to a raw SQL query.
Edit:
Models are simple. Relevant fields are:
class Event(models.Model):
title = models.CharField(max_length=120)
day = models.DateField()
score = models.SmallIntegerField()

I tried to assemble a union of querysets, but django complained with:
AssertionError
Cannot combine queries once a slice has been taken.
This was the view code:
def home2(request):
dates_qs = Event.objects.values('day').order_by('day').distinct()
ev_qss = []
for date in dates_qs:
my_qs = Event.objects.filter(day=date['day']).order_by('score')[:3]
ev_qss.append(my_qs)
answer_qs = ev_qss[0]
for qs in ev_qss[1:]:
answer_qs |= qs
return render_to_response ('home2.html',
{'dates_qs':dates_qs,
'answer_qs':answer_qs},
RequestContext(request))
The error was issued for the line answer_qs |= qs, ie, wanting to take the union of answer_qs and qs. qs being the queryset of the scores for a date, limited to 3 results.
So I guess you're stuck with raw SQL. The raw SQL example you pointed to has data in several tables, and you have all your data in one table, so your SQL is a bit simpler:
SELECT sE.* FROM so_event AS sE
WHERE 3>(
SELECT COUNT(*)
FROM so_event iE
WHERE iE.day = sE.day AND
sE.score - iE.score < 0
)
ORDER BY sE.day ASC, sE.score DESC;
As now I know this is the query we are aiming for, I searched for django orm subqueries, and came across this SO article and answer:
How to django ORM with a subquery?
Which says a bunch of stuff, and hints that you might be able to do what you want with a different ORM (SQLAlchemy). I've heard good things about SQLAlchemy.

How should I model this in Redis?

FYI: Redis n00b.
I need to store search terms in my web app.
Each term will have two attributes: "search_count" (integer) and "last_searched_at" (time)
Example I've tried:
Redis.hset("search_terms", term, {count: 1, last_searched_at: Time.now})
I can think of a few different ways to store them, but no good ways to query on the data. The report I need to generate is a "top search terms in last 30 days". In SQL this would be a where clause and an order by.
How would I do that in Redis? Should I be using a different data type?
Thanks in advance!

I would consider two ordered sets.
When a search term is submitted, get the current timestamp and:
zadd timestamps timestamp term
zincrby counts 1 term
The above two operations should be atomic.
Then to find all terms in the given time interval timestamp_from, timestamp_to:
zrangebyscore timestamps timestamp_from timestamp_to
after you get these, loop over them and get the counts from counts.
Alternatively, I am curious whether you can use zunionstore. Here is my test in Ruby:
require 'redis'
KEYS = %w(counts timestamps results)
TERMS = %w(test0 keyword1 test0 test1 keyword1 test0 keyword0 keyword1 test0)
def redis
#redis ||= Redis.new
end
def timestamp
(Time.now.to_f * 1000).to_i
end
redis.del KEYS
TERMS.each {|term|
redis.multi {|r|
r.zadd 'timestamps', timestamp, term
r.zincrby 'counts', 1, term
}
sleep rand
}
redis.zunionstore 'results', ['timestamps', 'counts'], weights: [1, 1e15]
KEYS.each {|key|
p [key, redis.zrange(key, 0, -1, withscores: true)]
}
# top 2 terms
p redis.zrevrangebyscore 'results', '+inf', '-inf', limit: [0, 2]
EDIT: at some point you would need to clear the counts set. Something similar to what #Eli proposed (https://stackoverflow.com/a/16618932/410102).

Depends on what you want to optimize for. Assuming you want to be able to run that query very quickly and don't mind expending some memory, I'd do this as follows.
Keep a key for every second you see some search (you can go more or less granular if you like). The key should point to a hash of $search_term -> $count where $count is the number of times $search_term was seen in that second.
Keep another key for every time interval (we'll call this $time_int_key) over which you want data (in your case, this is just one key where your interval is the last 30 days). This should point to a sorted set where the items in the set are all of your search terms seen over the last 30 days, and the score they're sorted by is the number of times they were seen in the last 30 days.
Have a background worker that every second grabs the key for the second that occurred exactly 30 days ago and loops through the hash attached to it. For every $search_term in that key, it should subtract the $count from the score associated with that $search_term in $time_int_key
This way, you can just use ZRANGE $time_int_key 0 $m to grab the m top searches ([WITHSCORES] if you want the amounts they were searched) in O(log(N)+m) time. That's more than cheap enough to be able to run as frequently as you want in Redis for just about any reasonable m and to always have that data updated in real time.

Grouping, totaling in Rails and Active Record

I'm trying to group a series of records in Active Record so I can do some calculations to normalize that quantity attribute of each record for example:
A user enters a date and a quantity. Dates are not unique, so I may have 10 - 20 quantities for each date. I need to work with only the totals for each day, not every individual record. Because then, after determining the highest and lowest value, I convert each one by basically dividing by n which is usually 10.
This is what I'm doing right now:
def heat_map(project, word_count, n_div)
return "freezing" if word_count == 0
words = project.words
counts = words.map(&:quantity)
max = counts.max
min = counts.min
return "max" if word_count == max
return "min" if word_count == min
break_point = (max - min).to_f/n_div.to_f
heat_index = (((word_count - min).to_f)/break_point).to_i
end
This works great if I display a table of all the word counts, but I'm trying to apply the heat map to a calendar that displays running totals for each day. This obviously doesn't total the days, so I end up with numbers that are out of the normal scale.
I can't figure out a way to group the word counts and total them by day before I do the normalization. I tried doing a group_by and then adding the map call, but I got an error an undefined method error. Any ideas? I'm also open to better / cleaner ways of normalizing the word counts, too.

Hard to answer without knowing a bit more about your models. So I'm going to assume that the date you're interested in is just the created_at date in the words table. I'm assuming that you have a field in your words table called word where you store the actual word.
I'm also assuming that you might have multiple entries for the same word (possibly with different quantities) in the one day.
So, this will give you an ordered hash of counts of words per day:
project.words.group('DATE(created_at)').group('word').sum('quantity')
If those guesses make no sense, then perhaps you can give a bit more detail about the structure of your models.

Using index_by to add the attributes of duplicate keys

With some help, I was able to take a method that displays a word count for each day in a project and reduce it down to pretty much one database quere. Here's the method
def project_range(project, start, finish, &blk)
words = project.words.where(:wrote_on => start..finish)
word_map = words.index_by(&:wrote_on)
for day in start..finish
word_count = word_map[day] ? word_map[day].quantity : 0
blk.call(day, word_count)
end
end
The original question is here:
Reducing database hits in Rails
Now though, we want to change the way users enter word counts for each day, however — we want to give them a chance to enter multiple counts for a day, instead of just one per day. This method, however, only returns the word count for the last day entered, not all the word counts for that specific day.
I tried changing index_by to group_by, but I got an undefined method error for quantity. So, my question is... how can I handle multiple entries for the same day, where there are two wrote_on objects for say, 5/4/2012?

group_by worked with another modification. Changes are commented out.
def project_range(project, start, finish, &blk)
words = project.words.where(:wrote_on => start..finish)
#word_map = words.index_by(&:wrote_on)
word_map = words.group_by(&:wrote_on)
for day in start..finish
#word_count = word_map[day] ? word_map[day].quantity : 0
word_count = word_map[day] ? word_map[day].sum(&:quantity) : 0
blk.call(day, word_count)
end
end

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Rails: Statistic - sql

Related

Using IF THEN in Access 2010 Query

Django method to limit grouped queries

How should I model this in Redis?

Grouping, totaling in Rails and Active Record

Using index_by to add the attributes of duplicate keys

Categories

Resources