How to get top 3 frequencies in MySQL?

How to get top 3 frequencies in MySQL? - sql

In MySQL I have a table called "meanings" with three columns:
"person" (int),
"word" (byte, 16 possible values)
"meaning" (byte, 26 possible values).
A person assigns one or more meanings to each word:
person word meaning
-------------------
1 1 4
1 2 19
1 2 7 <-- Note: second meaning for word 2
1 3 5
...
1 16 2
Then another person, and so on. There will be thousands of persons.
I need to find for each of the 16 words the top three meanings (with their frequencies). Something like:
+--------+-----------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+-----------------+------------------+-----------------+
| 1 | meaning 5 (35%) | meaning 19 (22%) | meaning 2 (13%) |
| 2 | meaning 8 (57%) | meaning 1 (18%) | meaning 22 (7%) |
+--------+-----------------+------------------+-----------------+
...
Is it possible to solve this with a single MySQL query?

Well, if you group by word and meaning, you can easily get the % of people who use each word/meaning combination out of the dataset.
In order to limit the number of meanings for each word returned, you will need create some sort of filter per word/meaning combination.
Seems like you just want the answer to your homework, so I wont post more than this, but this should be enough to get you on the right track.

Of course you can do
SELECT * FROM words WHERE word = 2 ORDER BY meaning DESC LIMIT 3
But this is cheating since you need to create a loop.
Im working on a better solution

I believe the problem I had a while ago looks similar. I ended up with the #counter thing.

Note about the problem
Let's suppose there is only one person, who says:
+--------+----------------+
| Person | Word | Meaning |
+--------+----------------+
| 1 | 1 | 7 |
| 1 | 1 | 3 |
| 1 | 2 | 8 |
+--------+----------------+
The report should read:
+--------+------------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1 | meaning 7 (100%) | meaning 3 (100%) | NULL |
| 2 | meaning 8 (100%) | NULL | NULL |
+--------+------------------+------------------+-----------------+
The following is not OK (50% frequency is absurd in a population of one person):
+--------+------------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1 | meaning 7 (50%) | meaning 3 (50%) | NULL |
| 2 | meaning 8 (100%) | NULL | NULL |
+--------+------------------+------------------+-----------------+
The intended meaning of the frequencies is "How many people think this meaning corresponds to that word"?
So it's not merely about counting "cases", but about counting persons in the table.

Related

Best database design to find relationships between two persons

I want to find relationships between two persons using a database. For example, I have a database like this:
Person:
Id| Name
1 | Edvard
2 | Ivan
3 | Molly
4 | Julian
5 | Emily
6 | Katarina
Relationship:
Id| Type
1 | Parent
2 | Husband\Wife
3 | ex-Husband\ex-Wife
Relationships:
Id| Person_1_Id | Person_2_Id | Relation_Id
1 | 1 | 3 | 2
2 | 3 | 4 | 3
3 | 3 | 2 | 1
4 | 4 | 2 | 1
5 | 1 | 6 | 3
6 | 1 | 5 | 1
7 | 6 | 5 | 1
What the best way to find what relationship between Person-2 and Person-5? This example is not large enough, but what if there were 5 families or 10000. I think, if there are too many families, then it is necessary to introduce the concept of depth. Maybe it will be better to change the database design? Is this possible to make it like trees or graphs? Some ideas on how to solve this problem differently?

As soon as you get above a handful of nodes and a few relationships between them, this becomes a very complex problem: there are whole branches of maths based around this type of challenge and how long it takes to compute a result.
For any non-trivial set of nodes/relationships you are going to need to look at deploying a graph database e.g. Neo4j

Element with the most votes for each combination of attributes

In my schema, a user can vote for different monsters that have different powers (eg lighting, fire) and different bodies.
Body is a polymorphic association, as it can be from different types of animals.
Here's the relevant pieces of the schema:
votes:
monster_id
power_id
body_id #polymorphic association
body_type #polymorphic association
For every combination of power and body with representation on the votes table, I want to find out the monsters that got the most votes.
Eg of a specific example:
--------------------------------------------------
| votes |
--------------------------------------------------
| monster_id| power_id | body_id | body_type |
--------------------------------------------------
| 1 | 1 | 1 | Body::Mammal |
| 2 | 1 | 1 | Body::Mammal |
| 2 | 1 | 1 | Body::Mammal |
| 11 | 2 | 11 | Body::Reptile |
| 11 | 2 | 11 | Body::Reptile |
| 22 | 2 | 11 | Body::Reptile |
--------------------------------------------------
Results I would like:
- ["For the combination (power_id: 1, body_id: 1, body_type: Body::Mammal), the monster with most votes is monster_id: 2",
"For the combination (power_id: 2, body_id: 11, body_type: Body::Reptile), the monster with most votes is monster_id: 11",
...]
I am using Rails 6 and postgres so I have the option to use ActiveRecord, for which I have a slight preference, but I realize this likely needs raw sql.
I understand the answer is very likely an extension of the one given in this question, where they do a similar thing with less attributes, but I can't seem to add the extra complexity needed to accommodate increased number of columns in play.
sql: select most voted items from each user

If I follow you correctly, you can use distinct on and aggregation:
select distinct on (body_id, power_id, body_type)
body_id, power_id, body_type, monster_id, count(*) cnt_votes
from votes
group by body_id, power_id, body_type, monster_id
order by body_id, power_id, body_type, count(*) desc

How to count the unique rows after aggregating to array

Trying to solve the problem in a read-only manner.
My table (answers) looks like the one below:
| user_id | value |
+----------------+-------------+
| 6 | pizza |
| 6 | tosti |
| 9 | fries |
| 9 | tosti |
| 10 | pizza |
| 10 | tosti |
| 12 | pizza |
| 12 | tosti |
| 13 | sushi | -> did not finish the quiz.
NOTE: the actual table has 15+ different possible values. (Answers to questions).
I've been able to make create the table below:
| value arr | count | user_id |
+----------------+--------------+-----------+
| pizza, tosti | 2 | 6 |
| fries, tosti | 2 | 9 |
| pizza, tosti | 2 | 10 |*
| pizza, tosti | 2 | 12 |*
| sushi | 1 | 13 |
I'm not sure if the * rows show up in my current query (DB has 30k rows and 15+ value options). The problem here is that "count" is counting the number of answers and not the number of unique outcomes.
Current query looks a bit like:
select string_agg(DISTINCT value, ',' order by value) AS value, user_id,
COUNT(DISTINCT value)
FROM answers
GROUP BY user_id;
Looking for the unique answer combinations like the table shown below:
| value arr | count unique |
+----------------+--------------+
| pizza, tosti | 3 |
| fries, tosti | 1 |
| sushi | 1 | --> Hidden in perfect situation.
Tried a bunch of queries, both written and generated by tools. From super simplified to quite complex, I keep ending up with the answers being count instead of the unique combination accros users.
If this is a duplicate question, please re-direct me to it. Learned a lot these last few days, but haven't been able to find the answer yet.
Any help would be highly appreciated.

Here's what you need. Your almost there.
select t1.value, count(1) From (
select string_agg(DISTINCT value, ',' order by value) AS value, user_id
FROM answers
GROUP BY user_id) t1
group by t1.value;

You can try (this is for SQL Server):
select count(*), string_agg(value, ",")
within group (order by value) as count_unique
from answers
group by string_agg(value, ",")

Efficient Classification of records by common letters in impala

I have a table in impala (TBL1), that contains different names with different number of first common letters. The table contains about 3M records. I would like to add add an new attribute to the table, where each common first letters will have a class. It is the same way as DENSE_RANK work but with dynamic number of first letters. The number of same first letters should not be less than p=3 letters (p = parameter).
Here is an example for the table and the required results:
| ID | Attr1 | New_Attr1 | Some more attribute...
+-------+--------------+-------------+-----------------------
| 1 | ZXA-12 | 1 |
| 2 | YL3300 | 2 |
| 3 | ZXA-123 | 1 |
| 4 | YL3400 | 2 |
| 5 | YL3-aaa | 2 |
| 6 | TSA 789 | 3 |
...

Does this do what you want?
select t.*,
dense_rank() over (order by strleft(attr1, 3)) as newcol
from . . .;
The "3" is your parameter.
As a note: In your example, you seem to have assigned the new value in reverse alphabetic order. Hence, you would want desc for the order by.

Most efficient way to query a word & synonym table

I have a WORDTB table with words and their synonyms: ID, WORD1, WORD2, WORD3, WORD4, WORD5. These words are arranged according to their frequency. When any word is given I want to query and retrieve the most frequent synonym of that particular word which is the word in WORD1 column.
This is the query I tried and it works fine, but I think this is inefficient.
SELECT WORD1
FROM WORDTB
WHERE WORD1='xxxx'
OR WORD2='xxxx'
OR WORD3='xxxx'
OR WORD4='xxxx'
OR WORD5='xxxx'
Can anyone suggest a more efficient way of doing this.

A more scalable solution would be to use a single row for each word.
synonym_words(word_id, synonym_id, word, popularity)
Fields:
word_id: The primary key for a word.
synonym_id: The word_id of the first synonym word.
word: The synonym text.
popularity: The sort order for the list of synonyms, 1 being the most popular.
Sample table data:
word_id | synonym_id | word | popularity
==============================================
1 | 1 | start | 1
2 | 1 | begin | 2
3 | 1 | originate | 3
4 | 1 | initiate | 4
5 | 1 | commence | 5
6 | 1 | create | 6
7 | 1 | startle | 7
8 | 1 | leave | 8
9 | 9 | end | 1
10 | 9 | ending | 2
11 | 9 | last | 3
12 | 9 | goal | 4
13 | 9 | death | 5
14 | 9 | conclusion | 6
15 | 9 | close | 7
16 | 9 | closing | 8
Assuming that the words will not change but their popularity may over time, the query should not break if you were to change the popularity order of the words so that the most popular synonym for a word was changed. You want your query to return the most popular word (popularity = 1) which shares the same synonym_id as the word used in the search.
SQL query:
SELECT word FROM synonym_words
WHERE synonym_id = (SELECT synonym_id FROM synonym_words WHERE word = 'conclusion')
AND popularity = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get top 3 frequencies in MySQL? - sql

Of course you can do SELECT * FROM words WHERE word = 2 ORDER BY meaning DESC LIMIT 3 But this is cheating since you need to create a loop. Im working on a better solution

I believe the problem I had a while ago looks similar. I ended up with the #counter thing.

Related

Best database design to find relationships between two persons

Element with the most votes for each combination of attributes

How to count the unique rows after aggregating to array

Efficient Classification of records by common letters in impala

Most efficient way to query a word & synonym table

Categories

Resources