Finding 1 MAX value in query - sql

I need to find the longest movie and only print the title of that movie. However, when I try to do it, it just prints the title of every movie and all their lengths. So i'd like to know what I am doing wrong.
SELECT m.movie_title, MAX(m.movie_len)
FROM movie m
GROUP BY m.movie_title;

One method uses order by and limit:
select m.*
from movie m
order by length desc
limit 1;
MAX() is a function that operates on one column. It has no effect on other columns.

you must have clause where to limit rows, or use your query as sub query or if your db engine support use "limit 1 like Gordon Linoff writ, or select top 1 like in sql-serwer, or first over like in oracle... you didn't write db engine name...

Related

SQL Select n random groups and return all records

I can't seem to find a solution to this exact question, without chaining together 2 or more queries together via pandas manipulation. (I had previously been attempting a random sampling in postgresql in the vein of cur.execute("select distinct group from data where random() < {0}".format(rand_coef)), but I was unable to combine the resulting array into a single query, nor specify the exact n value.)
A hypothetical dataset and query is as follows:
Say I want n = 3 random groups from the following data.
id, group, value
1,a,23
1,a,3
1,b,2
1,a,432
1,b,123
1,d,23
1,d,11
1,c,23
1,c,234
1,a,223
1,c,32
An example result of a query would be n=3 random groups (i.e. b,c,d):
id, group, value
1,b,2
1,b,123
1,d,23
1,d,11
1,c,23
1,c,234
1,c,32
How might this work?
One method would be:
select t.*
from t join
(select group
from t
group by group
order by random()
limit 3
) g
on t.group = g.group;
Note that group is a really bad name for a column, because it is a SQL keyword.

What do OrientDB's functions do when applied to the results of another function?

I am getting very strange behavior on 2.0-M2. Consider the following against the GratefulDeadConcerts database:
Query 1
SELECT name, in('written_by') AS wrote FROM V WHERE type='artist'
This query returns a list of artists and the songs each has written; a majority of the rows have at least one song.
Query 2
Now try:
SELECT name, count(in('written_by')) AS num_wrote FROM V WHERE type='artist'
On my system (OSX Yosemite; Orient 2.0-M2), I see just one row:
name num_wrote
---------------------------
Willie_Cobb 224
This seems wrong. But I tried to better understand. Perhaps the count() causes the in() to look at all written_by edges...
Query 3
SELECT name, in('written_by') FROM V WHERE type='artist' GROUP BY name
Produces results similar to the first query.
Query 4
Now try count()
SELECT name, count(in('written_by')) FROM V WHERE type='artist' GROUP BY name
Wrong path -- So try LET variables...
Query 5
SELECT name, $wblist, $wbcount FROM V
LET $wblist = in('written_by'),
$wbcount = count($wblist)
WHERE type='artist'
Produces seemingly meaningless results:
You can see that the $wblist and $wbcount columns are inconsistent with one another, and the $wbcount values don't show any obvious progression like a cumulative result.
Note that the strange behavior is not limited to count(). For example, first() does similarly odd things.
count(), like in RDBMS, computes the sum of all the records in only one value. For your purpose .size()seems the right method to call:
in('written_by').size()

Finding most popular and most unique records using SQL

My mom wanted a baby name game for my brother's baby shower. Wanting to learn python, I volunteered to do it. I pretty much have the python bit, it's the SQL that is throwing me.
The way the game is supposed to work is everyone at the shower writes down names on paper, I manually enter them into Excel (normalizing spellings as much as possible) and export to MS Access. Then I run my python program to find the player with the most popular names and the player with the most unique names. The database, called "babynames", is just four columns.
ID | BabyFirstName | BabyMiddleName | PlayerName
---|---------------|----------------|-----------
My mom has changed things every so often, but as they stand right now, I have to figure out :
a) The most popular name (or names if there is a tie) out of all first and middle names
b) The most unique name (or names if there is a tie) out of all the first and middle names
c) The player that has the most number of popular names (wins a prize)
d) The player that has the most number of unique names (wins a prize)
I've been working on this for about a week now and can't even get a SQL query for a) and b) to work, much less c) and d). I'm more than just a bit frustrated.
BTW, I'm just looking at spellings of the names, not phonetics. As I manually enter names, I will change names like "Kris" to "Chris" and "Xtina" to "Christina" etc.
Editing to add a couple of the most recent queries I tried for a)
SELECT [BabyFirstName],
COUNT ([BabyFirstName]) AS 'FirstNameOccurrence'
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY 'FirstNameOccurrence' DESC
LIMIT 1
and
SELECT [BabyFirstName]
FROM [babynames]
GROUP BY [BabyFirstName]
HAVING COUNT(*) =
(SELECT COUNT(*)
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY COUNT(*) DESC
LIMIT 1)
These both lead to syntax errors.
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error in ORDER BY clause. (-3508) (SQLExecDirectW)')
I've tried using [FirstNameOccurrence] and just FirstNameOccurrence as well with the same error. Not sure why it's not recognizing it by that column name to order by.
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error. in query expression 'COUNT(*) = (SELECT COUNT(*) FROM [babynames] GROUP BY [BabyFirstName] ORDER BY COUNT(*) DESC LIMIT 1)'. (-3100) (SQLExecDirectW)")
I'll admit that I'm not really grokking all of the COUNT(*) commands here, but this was a solution for a similar issue here in stackoverflow that I figured I'd try when my other idea didn't pan out.
For A and B, use a group by clause in your SQL, and then count, and order by the count. Use descending order for A and ascending order for B, and just take the first result for each.
For C and D, essentially use the same strategy but now just add the PlayerName (e.g. group by babyname,playername) and then use the ascending order/descending order question.
Here's Microsoft's write-up for a group by clause in MS Access: https://office.microsoft.com/en-us/access-help/group-by-clause-HA001231482.aspx
Here's an even better write-up demonstrating how to do both group by and order by at the same time: http://rogersaccessblog.blogspot.com/2009/06/select-queries-part-3-sorting-and.html
For the first query you tried, change it to:
SELECT TOP 1 [BabyFirstName],
COUNT ([BabyFirstName]) AS 'FirstNameOccurrence'
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY 'FirstNameOccurrence' DESC
For the second, change it to:
SELECT [BabyFirstName]
FROM [babynames]
GROUP BY [BabyFirstName]
HAVING COUNT(*) =
(SELECT TOP 1 COUNT(*)
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY COUNT(*) DESC)
Limiting the number of records returned by a SQL Statement in Access is achieved by adding a TOP statement directly after SELECT, not with ORDER BY... LIMIT
Also, Access TOP statement will return all instances of the top n (or n percent) unique records, so if there are two or more identical records in the query output (before TOP), and TOP 1 is specified, you'll see them all.

Optimize annotated query in Django

I'm trying to convert this simple SQL query into something Django can handle:
SELECT *
FROM location AS a
WHERE a.travel_distance = (
SELECT MAX(travel_distance)
FROM location AS b
WHERE b.person_id = a.person_id
)
ORDER BY a.travel_distance DESC
What this basically does is fetching all traveled locations and select only the rows that contain the maximum travel distance.
This is what i got so far:
travels = Location.objects.filter(pk__in=Location.objects.order_by().values('person_id').annotate(max_id=Max('id')).values('max_id')).order_by('travel_distance')[::-1]
Although the results match each other. It takes a whole lot longer for the second method to return results.
Is there anyway I can rewrite this query, so it becomes faster?
If I understand correctly you want the maximum distance travelled for each person. Assuming there is a Person model, perhaps ask from the other direction. Something like:
Person.objects.values('id').annotate(max_distance=Max('location__travel_distance'))
I haven't tested this since I don't have an equivalent data schema handy, but does this work for you?
Isn't this works? Something like:
select max(id), sum(travel_distance) from table group by person_id;

changing sorting criteria after the first result

I am selecting from a database of news articles, and I'd prefer to do it all in one query if possible. In the results, I need a sorting criteria that applies ONLY to the first result.
In my case, the first result must have an image, but the others should be sorted without caring about their image status.
Is this something I can do with some sort of conditionals or user variables in a MySQL query?
Even if you manage to find a query that looks like one query, it is going to be logicaly two queries. Have a look at MySQL UNION if you really must make it one query (but it will still be 2 logical queries). You can union the image in the first with a limit of 1 and the rest in the second.
Something like this ensures an article with an image on the top.
SELECT
id,
title,
newsdate,
article
FROM
news
ORDER BY
CASE WHEN HasImage = 'Y' THEN 0 ELSE 1 END,
newsdate DESC
Unless you define "the first result" closer, of course. This query prefers articles with images, articles without will appear at the end.
Another variant (thanks to le dorfier, who deleted his answer for some reason) would be this:
SELECT
id,
title,
newsdate,
article
FROM
news
ORDER BY
CASE WHEN id = (
SELECT MIN(id) FROM news WHERE HasImage = 'Y'
) THEN 0 ELSE 1 END,
newsdate DESC
This sorts the earliest (assuming MIN(id) means "earliest") article with an image to the top.
I don't think it's possible, as it's effectively 2 queries (the first query the table has to get sorted for, and the second unordered), so you might as well use 2 queries with a LIMIT 1 in the first.