Django - Article "trend" query - sql

I'm trying to obtain a list of articles and order them by their popularity over time. For example, older articles should rank lower even if they have a higher number of views.
In order to do this each article has a view count and a posted date. I'm guessing the simplest way would be to divide the article view count by the date posted... something like:
(view_count+comment_count) / date_posted = trend_score
I'm trying to understand if this is possible with the Django ORM, even if it is raw SQL? Would appreciate any help.

I guess the simpliest and most effective way to do is to add a trend_score field to your model and update it when the model is saved (you neeed to save the model anyways if you have a view count/comment count on it). Then you can easily filter by this field. You can fore sure do it somehow with SQL, but if you have to update the values you need to update already within your model, calculate also the score upon saving.

Related

Insert ceros instead of interopolate ARIMA_PLUS bigquery

I want to do ARIMA_plus forecasting on a series of sale records. The problem is that sale records only contain sales. When doing the forecast we need to insert for every product the "non sales", which, essentially, are rows with the import column set to cero for every day the product has not been sold. We have here two options:
Fill the database with those zero-rows (uses a lot of space)
When doing the forecasting with ARIMA_PLUS in bigquery tell the model to fill with zeros instead of interpolating (default and seemingly unique option).
I want to follow the second option, yet, i dont see how. Here you can see a screenshot of the documentation Google info about interpolation
The first option would be carried out with a merge, nevertheless I would prefer to discard it since it increases the size of the sales table.
I have scanned the documentation and havent seen any solution
You need to provide an input dataset covering the missing values with the right method for your use case.
In other words, the SQL query must solve the interpolation so that the input for the model already contains the expected data.
You can, for example, create a query to add a liner interpolation solution for your use case.
So, the first approach you mentioned can be solved using that input SQL (rather than adding the data to the source table) and the second approach is not valid in bigquery, as far as I know.
Here you have an example: https://justrocketscience.com/post/interpolation_sql/

SQL complicated query with joins

I have problem with one query.
Let me explain what I want:
For the sake of bravity let's say that I have three tables:
-Offers
-Ratings
-Users
Now what I want to do is to create SQL query:
I want Offers to be listed with all its fields and additional temporary column that IS NOT storred anywhere called AverageUserScore.
This AverageUserScore is product of grabbing all offers, belonging to particular user and then grabbing all ratings belonging to these offers and then evaluating those ratings average - this average score is AverageUserScore.
To explain it even further, I need this query for Ruby on Rails application. In the browser inside application you can see all offers of other users , with AverageUserScore at the very end, as the last column.
Associations:
Offer has many ratings
Offer belongs to user
Rating belongs to offer
User has many offers
Assumptions made:
You actually have a numeric column (of any type that SQL's AVG is fine with) in your Rating model. I'm using a column ratings.rating in my examples.
AverageUserScore is unconventional, so average_user_score is better.
You don't mind not getting users that have no offers: average rating is not clearly defined for them anyway.
You don't deviate from Rails' conventions far enough to have a primary key other than id.
Displaying offers for each user is a straightforward task: in a loop of #users.each do |user|, you can do user.offers.each do |offer| and be set. The only problem here is that it will execute a separate query for every user. Not good.
The "fetching offers" part is a standard N+1 counter seen even in the guides.
#users = User.includes(:offers).all
The interesting part here is only getting the averages.
For that I'm going to use Arel. It's already part of Rails, ActiveRecord is built on top of it, so you don't need to install anything extra.
You should be able to do a join like this:
User.joins(offers: :ratings)
And this won't get you anything interesting (apart from filtering users that have no offers). Inside though, you'll get a huge set of every rating joined with its corresponding offer and that offer's user. Since we're taking averages per-user we need to group by users.id, effectively making one entry per one users.id value. That is, one per user. A list of users, yes!
Let's stop for a second and make some assignments to make Arel-related code prettier. In fact, we only need two:
users = User.arel_table
ratings = Rating.arel_table
Okay. So. We need to get a list of users (all fields), and for each user fetch an average value seen on his offers' ratings' rating field. So let's compose these SQL expressions:
# users.*
user_fields = users[Arel.star] # Arel.star is a portable SQL "wildcard"
# AVG(ratings.rating) AS average_user_score
average_user_score = ratings[:rating].average.as('average_user_score')
All set. Ready for the final query:
User.includes(:offers) # N+1 counteraction
.joins(offers: :ratings) # dat join
.select(user_fields, average_user_score) # fields we need
.group(users[:id]) # grouping to only get one row per user

SRSS: Dynamic amount of subreports in a report

it might be possible I'm searching for the wrong keywords, but so far I couldn't find anything useful.
My problem is quite simple: At the moment I get a list of individual Ids through a report parameter, I pass them to a procedure and show the results.
The new request is like this: Instead of showing the list for all individuals at once, there should be a list for each individual id.
Since I'm quite a beginner in srss, I thought the easiest approach would be the best: Create a subreport, copy the shown list, and create a subreport per individual id.
The amount of this IDs is dynamic, so I have to create a dynamic amount of subreports.
Funny enought, this doesnt seem to be possible. This http://forums.asp.net/t/1397645.aspx url doesnt show exactly the problem, but it shows the limit of the subreports.
I even ran trough the whole msdn pages starting http://technet.microsoft.com/en-us/library/dd220581.aspx but I couldnt find anything there.
So is there a possibility, to create a loop like:
For each Individual ID in Individual IDs, create a subreport and pass ONE ID to this?
Or is there another approach I should use to make this work?
I tried to create a 'Fake'-Dataset with no sql query but just for iterating the id list, but it seems the dataset needs a data-source...
As usual, thanks so far for all answers!
Matthias Müller
Or is there another approach I should use to make this work?
You didn't provide much detail about what sort of information needs to be included in the subreport, but assuming it's a small amount of data (say, showing a personnel record), and not a huge amount (such as a persons sales for the last year), a List might be the way to go.
I tried to create a 'Fake'-Dataset with no sql query but just for iterating the id list, but it seems the dataset needs a data-source...
All datasets require a data source, though if you're merely hard-coding some fake return data, any data source will do, even a local SQL instance with nothing in it.

Selecting specific joined record from findAll() with a hasMany() include

(I tried posting this to the CFWheels Google Group (twice), but for some reason my message never appears. Is that list moderated?)
Here's my problem: I'm working on a social networking app in CF on Wheels, not too dissimilar from the one we're all familiar with in Chris Peters's awesome tutorials. In mine, though, I'm required to display the most recent status message in the user directory. I've got a User model with hasMany("statuses") and a Status model with belongsTo("user"). So here's the code I started with:
users = model("user").findAll(include="userprofile, statuses");
This of course returns one record for every status message in the statuses table. Massive overkill. So next I try:
users = model("user").findAll(include="userprofile, statuses", group="users.id");
Getting closer, but now we're getting the first status record for each user (the lowest status.id), when I want to select for the most recent status. I think in straight SQL I would use a subquery to reorder the statuses first, but that's not available to me in the Wheels ORM. So is there another clean way to achieve this, or will I have to drag a huge query result or object the statuses into my CFML and then filter them out while I loop?
You can grab the most recent status using a calculated property:
// models/User.cfc
function init() {
property(
name="mostRecentStatusMessage",
sql="SELECT message FROM statuses WHERE userid = users.id ORDER BY createdat DESC LIMIT 1,1"
);
}
Of course, the syntax of the SELECT statement will depend on your RDBMS, but that should get you started.
The downside is that you'll need to create a calculated property for each column that you need available in your query.
The other option is to create a method in your model and write custom SQL in <cfquery> tags. That way is perfectly valid as well.
I don't know your exact DB schema, but shouldn't your findAll() look more like something such as this:
statuses = model("status").findAll(include="userprofile(user)", where="userid = users.id");
That should get all statuses from a specific user...or is it that you need it for all users? I'm finding your question a little tricky to work out. What is it you're exactly trying to get returned?

Want an efficient approach to retrieving records from a database when the retrieval is weighted and balanced

Im working on something incredibly unique..... a property listings website. ;)
It displays a list of properties. For each property a teaser image and some caption data is displayed. If the teaser image and caption takes a site visitors interest, they can click on it and get a full property profile. All very standard.
The customer wants to be able to allow property owners to add multiple teaser images and to be able to track which teaser images got the most click throughs. No worries there.
But they also want to allow the property owner to weight each teaser image to control when it is shown. So for 3 images with weightings of 2, 6, 2, the 2nd image would be shown 6/10 times. This needs to be balanced. If the first 6 times the 2nd image is shown, it cant be shown again until the 1st and 3rd images have be shown twice each.
So I need to both increment how often an image has been retrieved and also retrieve images in a balanced way. Forget about actual image handling, Im actually just talking about Urls.
Note incrementing how often it has been retrieved is a different animal to incrementing how often it has captured a click through.
So i can think of a few different ways to approach the problem using database triggers or maybe some LINQ2SQL, etc but it strikes me that someone out there will know of a solution that could be orders fo magnitude faster than what i might come up with.
My first rough idea is to have a schema like so:
TeaseImage(PropId, ImageId, ImageUrl, Weighting, RetrievedCount, PropTotalRetrievedCount)
and then
select ImageRanks.*
from (Select t.ImageID,
t.ImageUrl,
rank() over (partition by t.RetrievedCount order by sum(t.RetrievedCount) desc) as IMG_Rank
from TeaseImage t
where t.RetrievedCount<t.Weighting
group by t.PropID) ImageRanks
where ImageRanks.IMG_Rank <= 1
And then
1. for each ImageId in the result set increment RetrievedCount by 1 and then
2. for each PropId in ResultSet increment PropTotalRetrievedCount by 1 and then
3. for each PropId in ResultSet check if PropTotalRetrievedCount ==10 and if so reset it to PropTotalRetrievedCount = 0 and RetrievedCount=0 for each associated ImageId
Which frankly sounds awful :(
So any ideas?
Note: if I have to step out of the datalayer I'd be using C# / .Net. Thanks.
If you want to do this entirely in your database, you could split your table in two:
Image(ImageId, ImageUrl)
TeaseImage(TeaseImageId, PropId, ImageId, DateLastAccessed)
The TeaseImage table manages weightings by storing additional (redundant) copies of each property-image pair. So an image with a weight of six would get six records.
Then the following query gives you the least-recently used record.
select top 1 ti.TeaseImageId, i.ImageUrl
from TeaseImage ti
join Image i
on i.ImageId = ti.ImageId
where ti.PropId = #PropId
order by ti.DateLastAccessed
Following the select, just update the record's DateLastAccessed. (Or even update it as part of the select procedure, depending on how fault-tolerant you need to be.)
Using this technique would give you fine-grained control over the order of image delivery, (by seeding their DateLastAccessed values appropriately) and you could easily modify the ratios if need be.
Of course, as the table grows, the additional records would degrade query performance earlier than other approaches, but depending on the cost of the query relative to everything else that's going on that may not be significant.