Specific complex SQL query and Django ORM? - sql

I have a set of tables that contain content that is created and voted on by users.
Table content_a
id /* the id of the content */
user_id /* the user that contributed the content */
content /* the content */
Table content_b
id
user_id
content
Table content_c
id
user_id
content
Table voting
user_id /* the user that made the vote */
content_id /* the content the vote was made on */
content_type_id /* the content type the vote was made on */
vote /* the value of the vote, either +1 or -1 */
I want to be able to select a set of users and order them by the sum of the votes on the content they have produced. For example,
SELECT * FROM users ORDER BY <sum of votes on all content associated with user>
Is there a specific way this can be achieved using Django's ORM, or do I have to use a raw SQL query? And what would the most efficient way be to achieve this in raw SQL?

Update
Assuming the models are
from django.contrib.contenttypes import generic
from django.contrib.contenttypes.models import ContentType
class ContentA(models.Model):
user = models.ForeignKey(User)
content = models.TextField()
class ContentB(models.Model):
user = models.ForeignKey(User)
content = models.TextField()
class ContentC(models.Model):
user = models.ForeignKey(User)
content = models.TextField()
class GenericVote(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
user = models.ForeignKey(User)
vote = models.IntegerField(default=1)
Option A. Using GenericVote
GenericVote.objects.extra(select={'uid':"""
CASE
WHEN content_type_id = {ct_a} THEN (SELECT user_id FROM {ContentA._meta.db_table} WHERE id = object_id)
WHEN content_type_id = {ct_b} THEN (SELECT user_id FROM {ContentB._meta.db_table} WHERE id = object_id)
WHEN content_type_id = {ct_c} THEN (SELECT user_id FROM {ContentC._meta.db_table} WHERE id = object_id)
END""".format(
ct_a=ContentType.objects.get_for_model(ContentA).pk,
ct_b=ContentType.objects.get_for_model(ContentB).pk,
ct_c=ContentType.objects.get_for_model(ContentC).pk,
ContentA=ContentA,
ContentB=ContentB,
ContentC=ContentC
)}).values('uid').annotate(vc=models.Sum('vote')).order_by('-vc')
The above ValuesQuerySet,(or use values_list()) gives you a sequence of IDs of User()s in the order of descending votes count. You could then use it to fetch top users.
Option B. Using User.objects.raw
When I use User.objects.raw, I got almost same query w/ the answer given by forsvarir :
User.objects.raw("""
SELECT "{user_tbl}".*, SUM("gv"."vc") as vote_count from {user_tbl},
(SELECT id, user_id, {ct_a} AS ct FROM {ContentA._meta.db_table} UNION
SELECT id, user_id, {ct_b} AS ct FROM {ContentB._meta.db_table} UNION
SELECT id, user_id, {ct_c} as ct FROM {ContentC._meta.db_table}
) as c,
(SELECT content_type_id, object_id, SUM("vote") as vc FROM {GenericVote._meta.db_table} GROUP BY content_type_id, object_id) as gv
WHERE {user_tbl}.id = c.user_id
AND gv.content_type_id = c.ct
AND gv.object_id = c.id
GROUP BY {user_tbl}.id
ORDER BY "vc" DESC""".format(
user_tbl=User._meta.db_table, ContentA=ContentA, ContentB=ContentB,
ContentC=ContentC, GenericVote=GenericVote,
ct_a=ContentType.objects.get_for_model(ContentA).pk,
ct_b=ContentType.objects.get_for_model(ContentB).pk,
ct_c=ContentType.objects.get_for_model(ContentC).pk
))
Option C. Other possible ways
De-normalize vote_count to User or profile model, for example, UserProfile, or other relative model, as suggested by Michael Dunn. This behaves much better if you access vote_count on-fly frequently.
Build a DB view which does the UNIONs for you, then map a model to it, this could make the construction of the query easier.
Sort in Python, usually it's best way to work for large-scale data, because of dozen of toolkits and extension ways.
You need some Django Models mapping those tables before use Django ORM to query. Assuming they are User and Voting models that matching users and voting tables, you could then
User.objects.annotate(v=models.Sum('voting__vote')).order_by('v')

For a raw SQL solution, I've created a rough replication of your problem on ideone here
Data setup:
create table content_a(id int, user_id int, content varchar(20));
create table content_b(id int, user_id int, content varchar(20));
create table content_c(id int, user_id int, content varchar(20));
create table voting(user_id int, content_id int, content_type_id int, vote int);
create table users(id int, name varchar(20));
insert into content_a values(1,1,'aaaa');
insert into content_a values(2,1,'bbbb');
insert into content_a values(3,1,'cccc');
insert into content_b values(1,2,'dddd');
insert into content_b values(2,2,'eeee');
insert into content_b values(3,2,'ffff');
insert into content_c values(1,1,'gggg');
insert into content_c values(2,2,'hhhh');
insert into content_c values(3,3,'iiii');
insert into users values(1, 'first');
insert into users values(2, 'second');
insert into users values(3, 'third');
insert into users values(4, 'voteonly');
-- user 1 net votes (2)
insert into voting values (1, 1, 1, 1);
insert into voting values (2, 3, 1, -1);
insert into voting values (3, 1, 1, 1);
insert into voting values (4, 2, 1, 1);
-- user 2 net votes (3)
insert into voting values (1, 2, 2, 1);
insert into voting values (1, 1, 2, 1);
insert into voting values (2, 3, 2, -1);
insert into voting values (4, 2, 2, 1);
insert into voting values (4, 2, 3, 1);
-- user 3 net votes (-1)
insert into voting values (2, 3, 3, -1);
I've basically assumed that content_a has a type of 1, content_b has a type of 2 and content_c has a type of 3. Using raw SQL, there seems to be two obvious approaches. The first is to union all of the content together, then join it with the users and voting tables. I've tested this approach below.
select users.*, sum(voting.vote)
from users,
voting, (
SELECT id, 1 AS content_type_id, user_id
FROM content_a
UNION
SELECT id, 2 AS content_type_id, user_id
FROM content_b
UNION
SELECT id, 3 AS content_type_id, user_id
FROM content_c) contents
where contents.user_id = users.id
and voting.content_id = contents.id
and voting.content_type_id = contents.content_type_id
group by users.id
order by sum(voting.vote) desc;
The alternative would seem to be to outer join the content tables to the voting tables, without the union step. This may be more performant, but I haven't been able to test it because visual studio keeps rewriting my sql for me... I'd expect the SQL to look something like this (but I haven't tested it):
select users.*, sum(voting.vote)
from users, voting, content_a, content_b, content_c
where users.id = content_a.user_id (+)
and users.id = content_b.user_id (+)
and users.id = content_c.user_id (+)
and ((content_a.id = voting.content_id and voting.content_type_id = 1) OR
(content_b.id = voting.content_id and voting.content_type_id = 2) OR
(content_c.id = voting.content_id and voting.content_type_id = 3))
group by users.id
order by sum(voting.vote) desc;

I would do this using precalculated values. First make a separate table to store the votes that each user has received:
class VotesReceived(models.Model):
user = models.OneToOneField(User, primary_key=True)
count = models.IntegerField(default=0, editable=False)
then use a post_save signal to update the count every time a vote is made:
def update_votes_received(sender, instance, **kwargs):
# `instance` is a Voting object
# assuming here that `instance.content.user` is the creator of the content
vr, _ = VotesReceived.objects.get_or_create(user=instance.content.user)
# you should recount the votes here rather than just incrementing the count
vr.count += 1
vr.save()
models.signals.post_save.connect(update_votes_received, sender=Voting)
Usage:
user = User.objects.get(id=1)
print user.votesreceived.count
If you already have data in your database you'd have to update the vote counts manually the first time of course.

Related

Query database for distinct values and aggregate data based on condition

I am trying to extract distinct items from a Postgres database pairing a column from a table with a column from another table based on a condition. Simplified version looks like this:
CREATE TABLE users
(
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE photos
(
id INT PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
flag VARCHAR(255)
);
INSERT INTO users VALUES (1, 'Bob');
INSERT INTO users VALUES (2, 'Alice');
INSERT INTO users VALUES (3, 'John');
INSERT INTO photos VALUES (1001, 1, 'a');
INSERT INTO photos VALUES (1002, 1, 'b');
INSERT INTO photos VALUES (1003, 1, 'c');
INSERT INTO photos VALUES (1004, 2, 'a');
INSERT INTO photos VALUES (1004, 2, 'x');
What I need is to extract each user name, only once, and a flag value for each of them. The flag value should prioritize a specific one, let's say b. So, the result should look like:
Bob b
Alice a
Where Bob owns a photo having the b flag, while Alice does not and John has no photos. For Alice the output for the flag value is not important (a or x would be just as good) as long as she owns no photo flagged b.
The closest thing I found were some self-join queries where the flag value would have been aggregated using min() or max(), but I am looking for a particular value, which is not first, nor last. Moreover, I found out that you can define your own aggregate functions, but I wonder if there is an easier way of conditioning the query in order to obtain the required data.
Thank you!
Here is a method with aggregation:
select u.name,
coalesce(max(flag) filter (where flag = 'b'),
min(flag)
) as flag
from users u left join
photos p
on u.id = p.user_id
group by u.id, u.name;
That said, a more typical method would be a prioritization query. Perhaps:
select distinct on (u.id) u.name, p.flag
from users u left join
photos p
on u.id = p.user_id
order by u.id, (p.flag = 'b') desc;

SQL Limit number of references to another table without locking

Is there a technique to avoid locking a row but still be able to limit the number of rows in another table that reference it?
For example:
create table accounts (
id integer,
name varchar,
max_users integer
);
create table users (
id integer,
account_id integer,
email varchar
);
If I want to limit the number of users that are part of an account using the max_users value in accounts. Is there another way to ensure that concurrent calls won't create more users than permitted without locking the group row?
Something like this doesn't work, since this happening in two concurrent transactions can have select count(*)... be true even if the count is just at the limit:
begin;
insert into users(id, account_id, email)
select 1, 1, 'john#abc.com' where (select count(*) from users where account_id = 1) < (select max_users from accounts where id = 1);
commit;
And the following works, but I'm having performance issues that are mostly based transactions waiting for locks:
begin;
select id from accounts where id = 1 for update;
insert into users(id, account_id, email)
select 1, 1, 'john#abc.com' where (select count(*) from users where account_id = 1) < (select max_users from accounts where id = 1);
commit;
EDIT: Bonus question: what if the value is not stored in the database, but is something you can set dynamically?

SQL select join the row with max (arithmatic(value1, value2))

I am trying to make a Trade system where people can make offer on the items they want. There are two currencies in the system, gold and silver. 100 silver = 1 gold. Note that people can make offers the same price as others, so there could be duplicate highest offer price.
Table structure looks roughly like this
Trade table
ID
TradeOffer table
ID
UserID
TradeID references Trade(ID)
GoldOffer
SilverOffer
I want to display to the user a list of trades sorted by the highest offer price whenever they do a search with constraint.
The Ideal output would be similar to this
Trade.ID TradeOffer.ID HighestGoldOffer HighestSilverOffer UserID
where HighestGoldOffer and HighestSilverOffer are the value of GoldOffer and SilverOffer column of the Offer with highest (GoldOffer * 100 + SilverOffer) and UserID is the user who made the offer
I know I can run 2 separate queries, one to retrieve all the Trades that satisfies all the constraint and extract all the ID to run another query to get the highest offer, but I am a perfectionist so I would prefer to do it with one sql instead of two.
I could just select all offers that are (GoldOffer * 100 + SilverOffer) = MAX (GoldOffer * 100 + SilverOffer) but this would possibly return duplicated Trade if there are multiple people offered the same price. Also there could be nobody offered on the Trade yet so GoldOffer and SilverOffer will be empty, I would still like to show the Trade as no offer when this happened.
Hope I made myself clear and thanks for any help
Model and test data
CREATE TABLE Trade (ID INT)
CREATE TABLE TradeOffer
(
ID INT,
UserID INT,
TradeID INT,
GoldOffer INT,
SilverOffer INT
)
INSERT Trade VALUES (1), (2), (3)
INSERT TradeOffer VALUES
(1, 1, 1, 10, 15),
(2, 2, 1, 11, 15),
(3, 1, 2, 10, 16),
(4, 2, 2, 10, 16)
Query
SELECT
[TradeID],
[TradeOfferID],
[HighestGoldOffer],
[HighestSilverOffer],
[UserID]
FROM (
SELECT
t.ID AS [TradeID],
tOffer.ID AS [TradeOfferID],
tOffer.GoldOffer AS [HighestGoldOffer],
tOffer.SilverOffer AS [HighestSilverOffer],
tOffer.[UserID],
RANK() OVER (
PARTITION BY t.ID
ORDER BY (([GoldOffer] * 100) + [SilverOffer]) DESC
) AS [Rank]
FROM Trade t
LEFT JOIN TradeOffer tOffer
ON tOffer.TradeID = t.ID
) x
WHERE [Rank] = 1
Result

Join Query in Sql server

I am having trouble with a join in sql.
I have 3 tables.
1: Lists the user details
2: Lists the permissions the user group has
3: Lists the page that that group can access
Table1 users :
****************************************
username | group
****************************************
admin | administrator
Table2 groups :
*********************************************
user_group | create | view | system_admin
*********************************************
administrator | 1 | 0 | 1
Table3 urls:
*********************************************
create | view | system_admin
*********************************************
create.php | view.php | system.php
(apologies for my table drawing)
What I am doing via php , is grabbing the user_group they belong to.
I then need to check if they have access to the page they have just hit or redirect them back.
Can I accomplish this with the current table layout the way they are through a join?, Or shall I look to re-design these tables as they are not intuitive for this kind of thing.
I actually might redesign the tables to make them easier to query:
create table users
(
id int,
username varchar(10),
groupid int
);
insert into users values (1, 'admin', 1);
create table groups
(
groupid int,
groupname varchar(20)
);
insert into groups values (1, 'administrator');
create table permissions
(
permissionid int,
permissionname varchar(20)
);
insert into permissions values (1, 'create');
insert into permissions values (2, 'view');
insert into permissions values (3, 'system_admin');
create table urls
(
urlid int,
name varchar(10)
);
insert into urls values(1, 'create.php');
insert into urls values(2, 'view.php');
insert into urls values(3, 'system.php');
create table group_permission_urls
(
groupid int,
permissionid int,
urlid int
);
insert into group_permission_urls values(1, 1, 1);
insert into group_permission_urls values(1, 0, 2);
insert into group_permission_urls values(1, 3, 3);
Then your query would be similar to this:
select *
from users us
left join groups g
on us.groupid = g.groupid
left join group_permission_urls gpu
on us.groupid = gpu.groupid
left join permissions p
on gpu.permissionid = p.permissionid
left join urls u
on gpu.urlid = u.urlid
see SQL Fiddle with Demo
By comparing the $current_page with the results of an IN() subquery, you can do this in one query. If the page matches any listed in a column the user has permission for, this will return a row. It should not return any row if there is no match in an allowed column.
SELECT
groups.create,
groups.view,
groups.system_admin,
1 AS can_access
FROM
users
JOIN groups ON users.group = groups.user_group
WHERE
users.username = '$some_username'
AND (
/* Substitute the current page. Better, use a prepared statement placeholder if your API supports it */
(groups.create = 1 AND '$current_page' IN (SELECT DISTINCT create FROM urls))
OR
(groups.view = 1 AND '$current_page' IN (SELECT DISTINCT view FROM urls))
OR
(groups.system_admin = 1 AND '$current_page' IN (SELECT DISTINCT system_admin FROM urls))
)
This works by comparing the $current_page to the distinct set of possible values from each of your 3 columns. If it matches a column and also the user's group has permission on that type, a row is returned.
select case when count(1) > 0 then 'come in' else 'go away' end
from users, groups, urls
where
users.username = '$username' and
users.user_group = groups.user_group and
((urls.create = '$url' and groups.create = 1) or
(urls.view = '$url' and groups.view = 1) or
(urls.system_admin = '$url' and groups.system_admin = 1))

SQL - how to efficiently select distinct records

I've got a very performance sensitive SQL Server DB. I need to make an efficient select on the following problem:
I've got a simple table with 4 fields:
ID [int, PK]
UserID [int, FK]
Active [bit]
GroupID [int, FK]
Each UserID can appear several times with a GroupID (and in several groupIDs) with Active='false' but only once with Active='true'.
Such as:
(id,userid,active,groupid)
1,2,false,10
2,2,false,10
3,2,false,10
4,2,true,10
I need to select all the distinct users from the table in a certain group, where it should hold the last active state of the user. If the user has an active state - it shouldn't return an inactive state of the user, if it has been such at some point in time.
The naive solution would be a double select - one to select all the active users and then one to select all the inactive users which don't appear in the first select statement (because each user could have had an inactive state at some point in time). But this would run the first select (with the active users) twice - which is very unwanted.
Is there any smart way to make only one select to get the needed query? Ideas?
Many thanks in advance!
What about a view such as this :
createview ACTIVE as select * from USERS where Active = TRUE
Then just one select from that view will be sufficient :
select user from ACTIVE where ID ....
Try this:
Select
ug.GroupId,
ug.UserId,
max(ug.Active) LastState
from
UserGroup ug
group by
ug.GroupId,
ug.UserId
If the active field is set to 1 for a user / group combination you will get the 1, if not you will get a 0 for the last state.
I'm not a big fan of the use of an "isActive" column the way you're doing it. This requires two UPDATEs to change an active status and has the effect of storing the information about the active status several times in the different records.
Instead, I would remove the active field and do one of the following two things:
If you already have a table somewhere in which (userid, groupid) is (or could be) a PRIMARY KEY or UNIQUE INDEX then add the active column to that table. When a user becomes active or inactive with respect to a particular group, update only that single record with true or false.
If such a table does not already exist then create one with '(userid, groupid)as thePRIMARY KEYand the fieldactive` and then treat the table as above.
In either case, you only need to query this table (without aggregation) to determine the users' status with respect to the particular group. Equally importantly, you only store the true or false value one time and only need to UPDATE a single value to change the status. Finally, this tables acts as the place in which you can store other information specific to that user's membership in that group that applies only once per membership, not once per change-in-status.
Try this:
SELECT t.* FROM tbl t
INNER JOIN (
SELECT MAX(id) id
FROM tbl
GROUP BY userid
) m
ON t.id = m.id
Not sure that I understand what you want your query to return but anyway. This query will give you the users in a group that is active in the last entry. It uses row_number() so you need at least SQL Server 2005.
Table definition:
create table YourTable
(
ID int identity primary key,
UserID int,
Active bit,
GroupID int
)
Index to support the query:
create index IX_YourTable_GroupID on YourTable(GroupID) include(UserID, Active)
Sample data:
insert into YourTable values
(1, 0, 10),
(1, 0, 10),
(1, 0, 10),
(1, 1, 10),
(2, 0, 10),
(2, 1, 10),
(2, 0, 10),
(3, 1, 10)
Query:
declare #GroupID int = 10
;with C as
(
select UserID,
Active,
row_number() over(partition by UserID order by ID desc) as rn
from YourTable as T
where T.GroupID = #GroupID
)
select UserID
from C
where rn = 1 and
Active = 1
Result:
UserID
-----------
1
3