SQL: Find minimal occurence of a specific value - sql

I have trouble exactly explaining what my problem is. Let me start with what I am NOT asking: I am NOT asking for the minimal value of a column.
Assume the following table, which in one column lists names and in the other column lists guesses estimating the age of the person on the left. Multiple people are guessing so there are different guesses:
Name
AgeGuess
Max
34
Jennifer
21
Jordan
88
Max
29
Jennifer
22
Jordan
22
Jordan
36
...
...
and so on and so on. My question is: What is an SQL command that could give me a table filled by all names who were guessed the LEAST to be for example 36 (must be a specific value !). Additionally I'd like also like to know how often they were guessed 36. If nobody guessed them 36 I'd like to know that too.
In this example only Jordan was guessed to be 36. All the others were never guessed to be 36. I would expect an output like this:
Name
GuessedToBe36Count
Max
0
Jennifer
0
The table above is the result of me asking which people were guessed to be 36 the least amount of times.
My attempt was to group them by how often they were guessed 36. However if they were never rated 36, they also do not appear in the table at all, meaning I cannot just compute the minimum of the column.

If you want to get count of guessed ages by each user, group by user
SELECT name, COUNT(ageGuess) AS total, ageGuess FROM user_guess GROUP BY name, ageGuess ORDER BY name, total ASC
You will get then something similar to this:
name
total
ageGuess
Jordan
1
33
Jordan
3
65
Max
1
34
Please note that it will not return not guessed values. You can fill-in it when processing in back-end.
To have your wanted output, do it with sub-query:
SELECT name, (SELECT COUNT(ageGuess) FROM guesses g2 WHERE g2.name = g1.name AND ageGuess = 36) FROM guesses g1 GROUP BY name
Example

Related

How does SQL count(distinct) work in this case?

I'm trying to find the match no in which Germany played against Poland. This is from https://www.w3resource.com/sql-exercises/soccer-database-exercise/sql-subqueries-exercise-soccer-database-4.php. There are two tables : match_details and soccer_country. I don't understand how the count(distinct) works in this case. Can someone please clarify? Thanks!
SELECT match_no
FROM match_details
WHERE team_id = (
SELECT country_id
FROM soccer_country
WHERE country_name = 'Germany')
OR team_id = (
SELECT country_id
FROM soccer_country
WHERE country_name = 'Poland')
GROUP BY match_no
HAVING COUNT(DISTINCT team_id) = 2;
As Lamak mentioned, what an ugly consideration for a query, but many ways to approach a query.
As mentioned, counting for (Distinct team_id) makes sure that there are only 2 unique teams. If there is ever a Cartesian result, you could get repetition of multiple rows showing more than one instance of both teams. So the count of distinct on the TEAM_ID eliminates that.
Now, that said, Other "team" query data structures I have seen have a single record for the match and a column for EACH TEAM playing the match. That is easier by a long-shot, but still a relatively easy query.
Break the query down a little, and consider a large scale set of data (not that this, or any sort of even professional league would have such large record counts to give delay with a sql engine).
Your first criteria is games with Germany. So lets start with that.
SELECT
md1.match_no
FROM
match_details md1
JOIN soccer_country sc1
on md1.team_id = sc1.country_id
AND sc1.country_name = 'Germany'
So, why even look at any other record/match if Germany is not even part of the match on either side. Of which this in itself would return 6 matches from the sample data of 51 matches. So now, all you need to do is join AGAIN to the match details table a second time for only those matches, but ALSO the second team is Poland
SELECT
md1.match_no
FROM
match_details md1
JOIN soccer_country sc1
on md1.team_id = sc1.country_id
AND sc1.country_name = 'Germany'
-- joining again for the same match Germany was already qualified
JOIN match_details md2
on md1.match_no = md2.match_no
-- but we want the OTHER team record since Germany was first team
and md1.team_id != md2.team_id
-- and on to the second country table based on the SECOND team ID
JOIN soccer_country sc2
on md2.team_id = sc2.country_id
-- and the second team was Poland
AND sc2.country_name = 'Poland'
Yes, may be a longer query, but by eliminating 45 other matches (again, thinking a LARGE database), you have already saved blowing through tons of data to a very finite set. And now finishing only those Germany / Poland. No aggregates, counts, distincts, just direct joins.
FEEDBACK
Lets take a look at some BAD sample data... which as all programmers know, there is no such thing (NOT). Anyhow, lets take a look at these few matches.
Match Team ID blah
52 Poland Just put the names here for simplistic purposes
52 Poland
53 Germany
53 Germany
If you were to run the query without DISTINCT Teams, both match 52 and 53 would show up... As Poland is one team and appears 2 times for match 52, and similarly Germany 2 times for match 53. By doing DISTINCT Team, you can see that for each match, there is only 1 team being returned and thus excluded. Does that help? Again, no such thing as bad data :)
And yet another sample match where more than 2 teams created
Match Team ID
54 France
54 Poland
54 England
55 Hungary
56 Austria
In each of these matches, NONE would be returned. Match 54 has 3 distinct teams, and Match 55 and 56 only have single entry, thus no opponent to compete against.
2nd FEEDBACK
To clarify the query. If you look at the short query for just Germany, that aliased instance of "md1" is already sitting on any given record for a Germany match. So the second join to the "md2", I only care about the same match, so I can join on the same match_no. However, in the "md2" alias, the "!=" means NOT EQUAL. ! = logical NOT. So the join is saying from the MD1, join to the MD2 alias on the same match id. However, only give me where the teams are NOT the same. So the first instance holds Germany's team ID (already qualified) and thus give me the secondary team id. So now I can use the secondary (md2) instance team ID to join to the country to confirm only for Poland.
Does this now clarify things for you?

Merge SQL Rows in Subquery

I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.

Second highest column

I have seen a similar question asked How to get second highest value among multiple columns in SQL ... however the solution won't work for Microsoft Access (Row_Number/Over Partition isn't valid in Access).
My Access query includes dozens of fields. I would like to create a new field/column that would return the second highest value of 10 specific columns that are included in the query, I will call this field "Cover". Something like this:
Product Bid1 Bid2 Bid3 Bid4 Cover
Watch 104 120 115 108 115
Shoe 65 78 79 76 18
Hat 20 22 19 20 20
I can do a really long SWITCH formula such as the following equivalent Excel formula:
IF( AND(Bid1> Bid2, Bid1 > Bid3, Bid1 > Bid4), Bid1,
AND(Bid2> Bid1, Bid2 > Bid3, Bid2 > Bid4), Bid2,
.....
But there must be a more efficient solution. A MAXIF equivalent would work perfectly if MS-Access Query had such a function.
Any ideas? Thank you in advance.
This would be easier if the data were laid out in a more normalized way. The clue is the numbered field names.
Your data is currently organized as a Pivot (known in Access as crosstab), but can easily be Unpivoted.
This data is much easier to work with if laid in a more normalized fashion which is this case would be:
Product Bid Amount
--------- ----- --------
Watch 1 104
Watch 2 120
Watch 3 115
Watch 4 108
Shoe 1 65
Shoe 2 78
Shoe 3 79
Shoe 4 76
Hat 1 20
Hat 2 22
Hat 3 19
Hat 4 20
This way querying becomes simpler.
It looks like you want the maximum of the bids, grouped by Product, so:
select Product, max(amount) as maxAmount
from myTable
group by product
Really, we shouldn't be storing text fields at all, so Product should be an ID number, with associated Product Names stored once in a separate table, instead of several times in the this one, like:
ProdID ProdName
-------- ----------
1 Watch
2 Shoe
3 Hat
... but that's another lesson.
Generally speaking repeating of anything should be avoided... that's pretty much the purpose of a database... but the links below will explain than I. :)
Quackit : Microsoft Access Tutorial
YouTube : DB Planning
Microsoft : Database Design Basics
Microsoft : Database Normalization Basics
Wikipedia : Database Normalization

Get the row with the max date value with criteria - access 2007/2010

My main table, from which I take all the data from is "RequestTable" (I reduced it down to make it easier) in which I have:
ID_student
ID_professor
Date (and the three altogether are primary keys)
changeprofessor-note - if student wants to change the professor
then he/she should write in that field a sentence
why he/she wants to do the change
professor-reject-note - if the professor is not happy about the work of
the student, then he can choose not to mentor that
student anymore, leaving him without a mentor and the
student should choose another mentor later.
ID-seminar- after choosing a mentor the students
can choose the seminar they want to work on
changeofSeminar-note - if the student wants to change the seminar
then they need to write the reason why in here
(then the ID of the new seminar should be written in
the ID seminar field also)
IDapprove-reject - all approving or rejecting is going through this field
My initial theory was that the students could choose the mentor and the seminar in one row, but it seems too complicated now because I have no idea how to make everything work after changing mentors, declined mentoring, changing seminars and so on.
I set a more comfortable theory that all the students need to choose the mentor first. So that I could get easier the data of mentoring when needed. And I set "is null" in the query under the "ID_seminar" and "changeofseminar-note" because any changes on just the seminar part can't affect the rows where the students chosen their mentors/professors and got approved.
I implemented your code and got this:
SELECT [requesttable].ID_Student, Max([requesttable].Datum) AS MaxOfDatum, First([requesttable].ID_Profesor) AS ID_Profesor, [requesttable].ID_status_odobrenja
FROM [requesttable]
WHERE ((([requesttable].ID_Student) Not In (SELECT [ID_Student]
FROM [requesttable]
WHERE [IDapprove-reject] IS NOT NULL )))
GROUP BY [requesttable].ID_Student, [requesttable].IDapprove-reject, [requesttable].changeseminar-note, [requesttable].ID_seminar
HAVING ((([requesttable].IDapprovereject)=1) AND (([requesttable].changeseminar-note) Is Null) AND (([requesttable].Id_seminar) Is Null))
ORDER BY [requesttable].ID_Student, Max([requesttable].Datum), First([requesttable].ID_Profesor), [requesttable].IDapproved-reject;
And i get:
3 12 1
15 11 1
55 5 1
And I need:
3 6 1
15 6 1
52 5 1 - after being rejected by mentor 10,
the student choose another mentor (id 5) and got approved.
55 5 1
Old info below:
I got my query to this point and two other data are set to show only rows with null values to get this:
ID student Id professor date professor-reject-note ID accept/reject
3 12 12.11.2012 null 1
3 6 13.11.2012 null 1
52 10 12.11.2012 null 1
52 10 15.11.2012 NOT null 1
55 5 12.11.2012 null 1
I want my results to be
3 6 12.10.2013 null 1
15 6 7.1.2013 null 1
55 5 12.11.2012 null 1
Totally exclude StudentID 52 because of the professor-reject-note meaning the professor doesn't want to mentor the student anymore. Also I have a doubt about the ID accept/reject number in that option , maybe I could set it to 2 instead of 1 to make it easier. 1 means accepted, 2 would mean rejected, but if I set it to 2 and exclude the entire row I still can't get rid of the other ID 52 row. I'm a bit confused about it and have no clue how make it work.
If I set date to maxdate and Id professor to group by FIRST I almost get what I want, all the data is right except the Student ID 52 is still there - both rows.
You could use:
SELECT t.[id student],
t.[id professor],
t.DATE,
t.[professor-reject-note],
t.[id accept/reject]
FROM atable t
WHERE t.[id student] NOT IN
(SELECT [id student]
FROM atable
WHERE [professor-reject-note] IS NOT NULL)
Your field / column names could do with some work.

Table structure of a student

I want a table structure which can store the details of the student like the below format.
If the student is in
10 th standard -> I need his aggregate % from 1st standard to 9th standard.
5 th standard -> I need his aggregate % from 1st standard to 4th standard.
1 st standard -> No aggregate % has to be displayed.
And the most important thing is ' we need to use only one table'. Please form a table structure with no redundant values.
Any ideas will be greatly appreciated......
No friends this is not a home work. This is asked in Oracle interview, conducted in Hyderabad day before yesterday '24th July, 2010',. He asked me the table structure.
He even did not asked me the query. He asked me how I will design the table. Please advice me.
id | name | grade | aggregate
This would do the trick, id is your primary key, name is students first last name, grade is what grade he is in and aggregate is aggregate % based on the grade.
Fro example some rows might be:
10 | Bill Cosby | 10 | 90
11 | Jerry Seinfeld | 4 | 60
Bill Cosby would have aggregate percent of 90 in grades 1-9, and jerry would have 60 in grades 1-3. In this case it is one table and boils down to you managing the rule of aggregation for this table, since it has to be one table.
If this is an interview question, it looks like they would like to check your knowledge on Nested Tables. Essentially you would have one column as roll number, and other column which is a nested table as Class and Percentage.