Django method to limit grouped queries - sql

I want to return 3 results for every date and have the results ordered by the date and by a separate 'rating' column for each query
For example, my query would return something like this:
Event on Dec 1st rated 36
Event on Dec 1st rated 29
Event on Dec 1st rated 12
Event on Dec 2nd rated 45
Event on Dec 2nd rated 12
Event on Dec 2nd rated 9
Event on Dec 3rd rated 118
Event on Dec 3rd rated 15
Event on Dec 3rd rated 13
I know this should be possible using raw sql with something like this: SQL group - limit
But I am wondering whether there is a way to do this within the Django ORM in a single query or at least a way to make it as painless as possible if I do need to convert to a raw SQL query.
Edit:
Models are simple. Relevant fields are:
class Event(models.Model):
title = models.CharField(max_length=120)
day = models.DateField()
score = models.SmallIntegerField()

I tried to assemble a union of querysets, but django complained with:
AssertionError
Cannot combine queries once a slice has been taken.
This was the view code:
def home2(request):
dates_qs = Event.objects.values('day').order_by('day').distinct()
ev_qss = []
for date in dates_qs:
my_qs = Event.objects.filter(day=date['day']).order_by('score')[:3]
ev_qss.append(my_qs)
answer_qs = ev_qss[0]
for qs in ev_qss[1:]:
answer_qs |= qs
return render_to_response ('home2.html',
{'dates_qs':dates_qs,
'answer_qs':answer_qs},
RequestContext(request))
The error was issued for the line answer_qs |= qs, ie, wanting to take the union of answer_qs and qs. qs being the queryset of the scores for a date, limited to 3 results.
So I guess you're stuck with raw SQL. The raw SQL example you pointed to has data in several tables, and you have all your data in one table, so your SQL is a bit simpler:
SELECT sE.* FROM so_event AS sE
WHERE 3>(
SELECT COUNT(*)
FROM so_event iE
WHERE iE.day = sE.day AND
sE.score - iE.score < 0
)
ORDER BY sE.day ASC, sE.score DESC;
As now I know this is the query we are aiming for, I searched for django orm subqueries, and came across this SO article and answer:
How to django ORM with a subquery?
Which says a bunch of stuff, and hints that you might be able to do what you want with a different ORM (SQLAlchemy). I've heard good things about SQLAlchemy.

Related

SQL Time Series Homework

Imagine you have this two tables.
a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:
username: Channel username
timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
game: Name of the game that the user was playing at that time
viewers: Number of concurrent viewers that the user had at that time
followers: Number of total followers that the channel had at that time
b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:
game: Name of the game
release_date: Timestamp, in seconds, corresponding to the date when the game was released
publisher: Publisher of the game
genre: Genre of the game
Now I want the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.
The problem is I don't have any database, I created one and inputted some values by hand.
I thought of this query, but I'm not sure if it is what I want. It may be right (I don't feel like it is ), but I'd like a second opinion
SELECT publisher,
(cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter,
COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game
WHERE quarter = 3
GROUP BY publisher,quarter
ORDER BY total_hours_watch DESC
Looks about right to me. You don't need to include quarter in the GROUP BY since the where clause limits you to only one quarter. You can modify the query to get only the top 10 publishers in a couple of ways depending on the SQL server you've created.
For SQL Server / MS Access modify your select statement: SELECT TOP 10 publisher, ...
For MySQL add a limit clause at the end of your query: ... LIMIT 10;

Access query, grouped sum of 2 columns where either column contains values

Another team has an Access database that they use to track call logs. It's very basic, really just a table with a few lookups, and they enter data directly in the datasheet view. They've asked me to assist with writing a report to sum up their calls by week and reason and I'm a bit stumped on this problem because I'm not an Access guy by any stretch.
The database consists of two core tables, one holding the call log entries (Calls) and one holding the lookup list of call reasons (ReasonsLookup). Relevant table structures are:
Calls
-----
ID (autonumber, PK)
DateLogged (datetime)
Reason (int, FK to ReasonLookup.ID)
Reason 2 (int, FK to ReasonLookup.ID)
ReasonLookup
------------
ID (autonumber PK)
Reason (text)
What they want is a report that looks like this:
WeekNum Reason Total
------- ---------- -----
10 Eligibility Request 24
10 Extension Request 43
10 Information Question 97
11 Eligibility Request 35
11 Information Question 154
... ... etc ...
My problem is that there are TWO columns in the Calls table, because they wanted to log a primary and secondary reason for receiving the call, i.e. someone calls for reason A and while on the phone also requests something under reason B. Every call will have a primary reason column value (Calls.Reason not null) but not necessarily a secondary reason column value (Calls.[Reason 2] is often null).
What they want is, for each WeekNum, a single (distinct) entry for each possible Reason, and a Total of how many times that Reason was used in either the Calls.Reason or Calls.[Reason 2] column for that week. So in the example above for Eligibility Request, they want to see one entry for Eligibility Request for the week and count every record in Calls that for that week that has Calls.Reason = Eligibility Request OR Calls.[Reason 2] = Eligibility Request.
What is the best way to approach a query that will display as shown above? Ideally this is a straight query, no VBA required. They are non-technical so the simpler and easier to maintain the better if possible.
Thanks in advance, any help much appreciated.
The "normal" approach would be to use a union all query as a subquery to create a set of weeks and reasons, however Access doesn't support this, but what you can do that should work is to first define a query to make the union and then use that query as a source for the "main" query.
So the first query would be
SELECT datepart("ww",datelogged) as week, Reason from calls
UNION ALL
SELECT datepart("ww",datelogged), [Reason 2] from calls;
Save this as UnionQuery and make another query mainQuery:
SELECT uq.week, rl.reason, Count(*) AS Total
FROM UnionQuery AS uq
INNER JOIN reasonlookup AS rl ON uq.reason = rl.id
GROUP BY uq.week, rl.reason;
You can use a Union query to append individual Group By Aggregate queries for both Reason and Reason 2:
SELECT DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason,
Sum(Calls.ID) As [Total]
FROM Calls
INNER JOIN Calls.Reason = ReasonLookup.ID
GROUP BY DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason;
UNION
SELECT DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason,
Sum(Calls.ID) As [Total]
FROM Calls
INNER JOIN Calls.[Reason 2] = ReasonLookup.ID
GROUP BY DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason;
DatePart() outputs the specific date's week number in the calendar year. Also, UNION as opposed to UNION ALL prevents duplicate rows from appearing.

sql statement to calculate the average for a selected set of values

I am new to access and SQL statements. I have two tables, Site_ID and SE_WaterQuality_Data. For each site, several water quality parameters were collected over 5 weeks in summer and 5 weeks in winter. I want to be able to run a query that will return a table that shows the average of a particular parameter (eg Temp) grouped by the Site_ID and the sample period (eg summer 2013). I am close but my output table only shows the average value and not the site ID or sample period. The query also prompts the user to enter a particular Site_ID and I want it to run the query for all sites.
My SQL statement at the moment is
SELECT Avg(SE_WaterQuality_Data.[TEMP (C)]) AS [AvgOfTEMP (C)]
FROM SE_WaterQuality_Data
WHERE (((SE_WaterQuality_Data.EMS_ID)=[Site_ID].[EMS_ID]))
GROUP BY SE_WaterQuality_Data.EMS_ID, SE_WaterQuality_Data.SummaryPeriod;
And my output is
AveOFTEMP(C)
14.7
5.2
How can I change the SQL statement to 1) run the query for all sites and 2) return a table such as the one below:
Desired Output
Site_ID* SamplePeriod* AveTemp
1 Sum2013 14.2
1 Win2013 5.6
5 Sum2013 18.5
Help please......
If you want to run for all sites, take out the WHERE clause. And if you want to show other columns, include them in your SELECT clause.
SELECT [EMS_ID] AS [Site_ID],
[SummaryPeriod] AS [Sample_Period],
Avg(SE_WaterQuality_Data.[TEMP (C)]) AS [AvgOfTEMP (C)]
FROM SE_WaterQuality_Data
GROUP BY SE_WaterQuality_Data.EMS_ID, SE_WaterQuality_Data.SummaryPeriod;
I hope I got the syntax details right. I don't use SQL Server, I use MySQL. But the basic ideas are the same in all SQL dialects.
SELECT Site.Site_Id, WQ.SummaryPeriod, Avg(WQ.TEMP) AS AveTemp
FROM SE_WaterQuality_Data WQ, Site
WHERE WQ.EMS_ID = Site.EMS_ID
GROUP BY 1, 2
;

Rails: Statistic

Hi I am new to programming and rails, and I am trying to create an admin interface in my app that shows stats. I have a Job model that has many Responses, and I need to collect the average response time for each day. In order to collect the response time for the first job I would do the following
job = Job.first
response = job.responses.first
response_time = response.created_at - job.created_at
This is very simple, but I am having trouble trying to collect this information for all the jobs of that day. Im trying to come up with a solution that will give me an array of data pairs. For example {[June 17, 51s], [June 18, 60s], [June 19, 38s], ... etc}.
I cant seem to figure out the correct rails active record call that will give me what I need
Don't think you are going to find an active record solution, but you have what you need, just need to add a little ruby.
Probably a 100 ways to do it, here is one way that creates a hash with the number of whole days from the job creation date as the key and the count as the value
job = Job.first
start_date = job.created_at
response_dates = job.responses.pluck(:created_at) #creates an array of created_at datetimes
day_stats = response_dates.each_with_object(Hash.new(0)) { |dt, h| h[((dt - start_date)/1.day).round(0)] += 1 }
This basically iterates through the datetime array, subtracts the response date from the job date, divides it by 1 day and rounds it to a whole day.
Output would be something like:
=> {0=>1, 5=>2, 6=>1, 7=>2, 9=>1, 31=>1, 37=>6, 40=>1, 42=>3, 44=>1, 59=>32, 60=>59, 61=>2, 64=>1, 65=>2, 78=>168, 97=>39, 93=>2, 110=>1, 214=>1}
If you want the date, you could add the key*1.day to the start_date

SQL - Finding open spaces in a schedule

I am working with a SQLite database and I have three tables describing buildings,rooms and scheduled events.
The tables look like this:
Buildings(ID,Name)
Rooms(ID,BuildingID,Number)
Events(ID,BuildingID,RoomID,Days,s_time,e_time)
So every event is associated with a building and a room. The column Days contains an integer which is a product of prime numbers corresponding to days of the week ( A value of 21 means the event occurs on Tuesday = 3 and Thursday = 7).
I am hoping to find a way to generate a report of rooms in a specific building that will be open in the next few hours, along with how long they will be open for.
Here is what I have so far:
SELECT Rooms.Number
FROM Rooms
INNER JOIN Buildings on ( Rooms.BuildingID = Buildings.ID )
WHERE
Buildings.Name = "BuildingName"
EXCEPT
SELECT Events.RoomID
FROM Events
INNER JOIN Buildings on ( Events.BuildingID = Buildings.ID )
WHERE
Buildings.Name = "BuildingName" AND
Events.days & 11 = 0 AND
time("now", "localtime" BETWEEN events.s_time AND events.e_time;
Here I find all rooms for a specific building and then I remove rooms which currently have an scheduled event in progress.
I am looking forward to all helpful tips/comments.
If you're storing dates as the product of primes, the modulo (%) operator might be more useful:
SELECT * FROM Events
INNER JOIN Buildings on (Events.BuildingID = Buildings.ID)
WHERE
(Events.Days % 2 = 0 AND Events.Days % 5 = 0)
Would select events happening on either a Monday or Wednesday.
I do have to point out though, that storing the product of primes is both computationally and storage expensive. Much easier to store the sum of powers of two (Mon = 1, Tues = 2, Wed = 4, Thurs = 8, Fri = 16, Sat = 32, Sun = 64).
The largest possible value for your current implementation is 510,510. The smallest data type to store such a number is int (32 bits per row) and retrieving the encoded data requires up to 7 modulo (%) operations.
The largest possible value for a 2^n summation method is 127 which can be stored in a tinyint (8 bits per row) and retrieving the encoded data would use bitwise and (&) which is somewhat cheaper (and therefore faster).
Probably not an issue for what you're working with, but it's a good habit to choose whatever method gives you the best space and performance efficiency lest you hit serious problems should your solution be implemented at larger scales.