Element with the most votes for each combination of attributes - sql

In my schema, a user can vote for different monsters that have different powers (eg lighting, fire) and different bodies.
Body is a polymorphic association, as it can be from different types of animals.
Here's the relevant pieces of the schema:
votes:
monster_id
power_id
body_id #polymorphic association
body_type #polymorphic association
For every combination of power and body with representation on the votes table, I want to find out the monsters that got the most votes.
Eg of a specific example:
--------------------------------------------------
| votes |
--------------------------------------------------
| monster_id| power_id | body_id | body_type |
--------------------------------------------------
| 1 | 1 | 1 | Body::Mammal |
| 2 | 1 | 1 | Body::Mammal |
| 2 | 1 | 1 | Body::Mammal |
| 11 | 2 | 11 | Body::Reptile |
| 11 | 2 | 11 | Body::Reptile |
| 22 | 2 | 11 | Body::Reptile |
--------------------------------------------------
Results I would like:
- ["For the combination (power_id: 1, body_id: 1, body_type: Body::Mammal), the monster with most votes is monster_id: 2",
"For the combination (power_id: 2, body_id: 11, body_type: Body::Reptile), the monster with most votes is monster_id: 11",
...]
I am using Rails 6 and postgres so I have the option to use ActiveRecord, for which I have a slight preference, but I realize this likely needs raw sql.
I understand the answer is very likely an extension of the one given in this question, where they do a similar thing with less attributes, but I can't seem to add the extra complexity needed to accommodate increased number of columns in play.
sql: select most voted items from each user

If I follow you correctly, you can use distinct on and aggregation:
select distinct on (body_id, power_id, body_type)
body_id, power_id, body_type, monster_id, count(*) cnt_votes
from votes
group by body_id, power_id, body_type, monster_id
order by body_id, power_id, body_type, count(*) desc

Related

How to count the unique rows after aggregating to array

Trying to solve the problem in a read-only manner.
My table (answers) looks like the one below:
| user_id | value |
+----------------+-------------+
| 6 | pizza |
| 6 | tosti |
| 9 | fries |
| 9 | tosti |
| 10 | pizza |
| 10 | tosti |
| 12 | pizza |
| 12 | tosti |
| 13 | sushi | -> did not finish the quiz.
NOTE: the actual table has 15+ different possible values. (Answers to questions).
I've been able to make create the table below:
| value arr | count | user_id |
+----------------+--------------+-----------+
| pizza, tosti | 2 | 6 |
| fries, tosti | 2 | 9 |
| pizza, tosti | 2 | 10 |*
| pizza, tosti | 2 | 12 |*
| sushi | 1 | 13 |
I'm not sure if the * rows show up in my current query (DB has 30k rows and 15+ value options). The problem here is that "count" is counting the number of answers and not the number of unique outcomes.
Current query looks a bit like:
select string_agg(DISTINCT value, ',' order by value) AS value, user_id,
COUNT(DISTINCT value)
FROM answers
GROUP BY user_id;
Looking for the unique answer combinations like the table shown below:
| value arr | count unique |
+----------------+--------------+
| pizza, tosti | 3 |
| fries, tosti | 1 |
| sushi | 1 | --> Hidden in perfect situation.
Tried a bunch of queries, both written and generated by tools. From super simplified to quite complex, I keep ending up with the answers being count instead of the unique combination accros users.
If this is a duplicate question, please re-direct me to it. Learned a lot these last few days, but haven't been able to find the answer yet.
Any help would be highly appreciated.
Here's what you need. Your almost there.
select t1.value, count(1) From (
select string_agg(DISTINCT value, ',' order by value) AS value, user_id
FROM answers
GROUP BY user_id) t1
group by t1.value;
You can try (this is for SQL Server):
select count(*), string_agg(value, ",")
within group (order by value) as count_unique
from answers
group by string_agg(value, ",")

Correct Database Design / Relationship

Below I have shown a basic example of my proposed database tables.
I have two questions:
Categories "Engineering", "Client" and "Vendor" will have exactly the same "Disciplines", "DocType1" and "DocType2", does this mean I have to enter these 3 times over in the "Classification" table, or is there a better way? Bear in mind there is the "Vendor" category that is also covered in the classification table.
In the "Documents" table I have shown "category_id" and "classification_id", I'm not sure if the will depend on the answer to the first question, but is "category_id" necessary, or should I just be using a JOIN to allow me to filter the category based on the classification_id?
Thank you in advance.
Table: Category
id | name
---|-------------
1 | Engineering
2 | Client
3 | Vendor
4 | Commercial
Table: Discipline
id | name
---|-------------
1 | Electrical
2 | Instrumentation
3 | Proposals
Table: DocType1
id | name
---|-------------
1 | Specifications
2 | Drawings
3 | Lists
4 | Tendering
Table: Classification
id | category_id | discipline_id | doctype1_id | doctype2
---|-------------|---------------|-------------|----------
1 | 1 | 1 | 2 | 00
2 | 1 | 1 | 2 | 01
3 | 2 | 1 | 2 | 00
4 | 4 | 3 | 4 | 00
Table: Documents
id | title | doc_number | category_id | classification_id
---|-----------------|------------|-------------|-------------------
1 | Electrical Spec | 0001 | 1 | 1
2 | Electrical Spec | 0002 | 2 | 3
3 | Quotation | 0003 | 3 | 4
From what you've provided, it looks like we have three simple lookup tables: category, discipline, and doctype1. The part that's not intuitively obvious to me and may also be causing confusion on your end, is that the last two tables are both serving as cross-references of the lookup tables. The classification table in particular seems like it might be out of place. If there are only certain combinations of category, discipline, and doctype that would ever be valid, then the classification table makes sense and the right thing to do would be to look up that valid combination by way of the classification ID from the document table. If this is not the case, then you would probably just want to reference the category, discipline, and document type directly from the document table.
In your example, the need to make this distinction is illuminated by the fact that the document table has a referenc to the classification table and a references to the category table. However the row that is looked up in the classification table also references a category ID. This is not only redundant but also opens the door to the possibility of having conflicting category IDs.
I hope this helps.

Relative incremental ID by reference field

I have a table to store reservations for certain events; relevant part of it is:
class Reservation(models.Model):
# django creates an auto-increment field "id" by default
event = models.ForeignKey(Event)
# Some other reservation-specific fields..
first_name = models.CharField(max_length=255)
Now, I wish to retrieve the sequential ID of a given reservation relative to reservations for the same event.
Disclaimer: Of course, we assume reservations are never deleted, or their relative position might change.
Example:
+----+-------+------------+--------+
| ID | Event | First name | Rel.ID |
+----+-------+------------+--------+
| 1 | 1 | AAA | 1 |
| 2 | 1 | BBB | 2 |
| 3 | 2 | CCC | 1 |
| 4 | 2 | DDD | 2 |
| 5 | 1 | EEE | 3 |
| 6 | 3 | FFF | 1 |
| 7 | 1 | GGG | 4 |
| 8 | 1 | HHH | 5 |
+----+-------+------------+--------+
The last column is the "Relative ID", that is, a sequential number, with no gaps, for all reservations of the same event.
Now, what's the best way to accomplish this, without having to manually calculate relative id for each import (I don't like that)? I'm using postgresql as underlying database, but I'd prefer to stick with django abstraction layer in order to keep this portable (i.e. no database-specific solutions, such as triggers etc.).
Filtering using Reservation.objects.filter(event_id = some_event_id) should suffice. This will give you a QuerySet that should have the same ordering each time. Or am I missing something in your question?
I hate always being the one that responds its own questions, but I solved using this:
class Reservation(models.Model):
# ...
def relative_id(self):
return self.id - Reservation.objects.filter(id__lt=self.id).filter(~Q(event=self.event)).all().count()
Assuming records from reservations are never deleted, we can safely assume the "relative id" is the incremental id - (count of reservations before this one not belonging to same event).
I'm thinking of any drawbacks, but I didn't find any.

How to get top 3 frequencies in MySQL?

In MySQL I have a table called "meanings" with three columns:
"person" (int),
"word" (byte, 16 possible values)
"meaning" (byte, 26 possible values).
A person assigns one or more meanings to each word:
person word meaning
-------------------
1 1 4
1 2 19
1 2 7 <-- Note: second meaning for word 2
1 3 5
...
1 16 2
Then another person, and so on. There will be thousands of persons.
I need to find for each of the 16 words the top three meanings (with their frequencies). Something like:
+--------+-----------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+-----------------+------------------+-----------------+
| 1 | meaning 5 (35%) | meaning 19 (22%) | meaning 2 (13%) |
| 2 | meaning 8 (57%) | meaning 1 (18%) | meaning 22 (7%) |
+--------+-----------------+------------------+-----------------+
...
Is it possible to solve this with a single MySQL query?
Well, if you group by word and meaning, you can easily get the % of people who use each word/meaning combination out of the dataset.
In order to limit the number of meanings for each word returned, you will need create some sort of filter per word/meaning combination.
Seems like you just want the answer to your homework, so I wont post more than this, but this should be enough to get you on the right track.
Of course you can do
SELECT * FROM words WHERE word = 2 ORDER BY meaning DESC LIMIT 3
But this is cheating since you need to create a loop.
Im working on a better solution
I believe the problem I had a while ago looks similar. I ended up with the #counter thing.
Note about the problem
Let's suppose there is only one person, who says:
+--------+----------------+
| Person | Word | Meaning |
+--------+----------------+
| 1 | 1 | 7 |
| 1 | 1 | 3 |
| 1 | 2 | 8 |
+--------+----------------+
The report should read:
+--------+------------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1 | meaning 7 (100%) | meaning 3 (100%) | NULL |
| 2 | meaning 8 (100%) | NULL | NULL |
+--------+------------------+------------------+-----------------+
The following is not OK (50% frequency is absurd in a population of one person):
+--------+------------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1 | meaning 7 (50%) | meaning 3 (50%) | NULL |
| 2 | meaning 8 (100%) | NULL | NULL |
+--------+------------------+------------------+-----------------+
The intended meaning of the frequencies is "How many people think this meaning corresponds to that word"?
So it's not merely about counting "cases", but about counting persons in the table.

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .