Why does this query need order by to work?

Why does this query need order by to work? - sql

Why the wrong query is wrong?
-- Correct
SELECT first_name, count(*)
FROM customer
GROUP BY first_name
ORDER BY count(*) DESC
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
source:
https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/

I couldn't understand your explanation in comments, thus explaining here:
In first query, you are selecting columns First_Name and aggregating them into virtual buckets (group by First_Name). Second column is an aggregation function counting how many of them in given bucket. So for example:
First_Name, Last_Name
John, Doe
John, Carpenter
Frank, Sinatra
Frank, Doe
Frank, Short
You do a grouping by First_Name (and select it in the select list), you would have rows only:
First_Name
John
Frank
Adding an aggregate function (count in your case), it turns out (* in count means count rows with no particular column):
First_Name, Count(*)
John, 2
Frank, 3
Now if you consider second query:
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
For the result set:
First_Name, Last_Name, Count(*)
John, ????, 2
Frank, ????, 3
There isn't something that tells where the content for Last_Name come from (thus it would be a bug to include it).
If you wrote it as:
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name
It would be ok. Now your problem would be getting more than you need:
First_Name, Last_Name, Count(*)
John, Doe, 1
John, Carpenter, 1
Frank, Sinatra, 1
Frank, Doe, 1
Frank, Short, 1
That would at least reveal those having duplicated and could be useful in cases.

Issue
Given:
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
What was the error responded by SQL parser?
Presumably similar to:
ERROR: column "customer.last_name" must appear in the GROUP BY clause or be used in an aggregate function Position: 20
(Run the SQL and see this error output from PostgreSQL in SQLfiddle demo)
So just make sure all columns from your SELECT list, that are not aggregate-functions like count or sum are present in the GROUP BY list, they form the groups for which is aggregated.
Working GROUP BY
-- Correct
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name
See it working in SQLfiddle demo resulting in counting duplicates for the names.

Related

Finding Duplicate Rows in a Table

I am trying to find out how many duplicate records I have in a table. I can use count, but I'm not sure how best to eliminate records where the count is only 1.
select first_name, last_name, start_date, count(1)
from employee
group by first_name, last_name, start_date;
I can try to order by the count, but I am still not eliminating those with a count of one.

you can use having clause as having Count(*) > 1 after group by like this :
select
first_name,
last_name,
start_date,
Count(*) AS Count
from
employee
group by
first_name,
last_name,
start_date
having
Count(*) > 1

How to combine SELECT DISTINCT and ROWNUM in Oracle Query

I need to combine the two MySQL statements below into a single ORACLE query if possible.
The initial query is
SELECT DISTINCT FIRST_NAME FROM PEOPLE WHERE LAST_NAME IN ("Smith","Jones","Gupta")
then based on each FIRST_NAME returned I query
SELECT *
FROM PEOPLE
WHERE FIRST_NAME = {FIRST_NAME}
AND LAST_NAME IN ("Smith","Jones","Gupta")
ORDER BY FIELD(LAST_NAME, "Smith","Jones","Gupta") DESC
LIMIT 1
The "List of last names" serves as a "default / override" indicator, so I only have one person for each first name, and where multiple rows for the same first name exist, only the Last match from the list of "Last Names" is used.
I need a SQL query that returns the last row from the "in" clause based on the order of the values in the IN(a,b,c). Here is a sample table, and the results I need from the query.
For the Table PEOPLE, with values
LAST_NAME FIRST_NAME
.....
Smith Mike
Smith Betty
Smith Jane
Jones Mike
Jones Sally
....
I need a query based on DISTINCT FIRST_NAME and LAST_NAME IN ('Smith','Jones') that returns
Betty Smith
Jane Smith
Mike Jones
Sally Jones

You can do it like this:
select first_name, last_name
from (
select p.first_name,
p.last_name,
row_number() over (partition by p.first_name
order by case p.last_name
when 'Smith' then 1
when 'Jones' then 2
when 'Gupta' then 3
end desc) as rn
from people p
where p.last_name in ('Smith','Jones','Gupta')
)
where rn = 1;
Demo: SQL Fiddle
EDIT
It's not hard to get more columns. I'm sure you could have figured it out with a bit more effort:
select *
from (
select p.*,
row_number() over (partition by p.first_name
order by case p.last_name
when 'Smith' then 1
when 'Jones' then 2
when 'Gupta' then 3
end desc) as rn
from people p
where p.last_name in ('Smith','Jones','Gupta')
)
where rn = 1;

Or like this:
select first_name,
max(last_name)
keep (dense_rank first order by decode(last_name,
'Smith', 1,
'Jones', 2,
'Gupta', 3) desc)
group by first_name
Oracle "FIRST"/"LAST" functions allow to get values from other columns of row with maximum/minimum value (for example get last_name of employee with maximum salary, or like in this case - get last_name from row with maximum rank)
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions056.htm

SQL: select from same table and same column, just different counts

I have a table called names, and I want to select 2 names after being count(*) as uniq, and then another 2 names just from the entire sample pool.
firstname
John
John
Jessica
Mary
Jessica
John
David
Walter
So the first 2 names would select from a pool of John, Jessica, and Mary etc giving them equal chances of being selected, while the second 2 names will select from the entire pool, so obvious bias will be given to John and Jessica with multiple rows.
I'm sure there's a way to do this but I just can't figure it out. I want to do something like
SELECT uniq.firstname
FROM (SELECT firstname, count(*) as count from names GROUP BY firstname) uniq
limit 2
AND
SELECT firstname
FROM (SELECT firstname from names) limit 2
Is this possible? Appreciate any pointers.

I think you are close but you need some randomness for the sampling:
(SELECT uniq.firstname
FROM (SELECT firstname, count(*) as count from names GROUP BY firstname) uniq
ORDER BY rand()
limit 2
)
UNION ALL
(SELECT firstname
FROM from names
ORDER BY rand()
limit 2
)

As mentioned here you can use RAND or similar functions to achieve it depending on the database.
MySQL:
SELECT firstname
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY RAND()
LIMIT 2
PostgreSQL:
SELECT firstname
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY RANDOM()
LIMIT 2
Microsoft SQL Server:
SELECT TOP 2 firstname
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY NEWID()
IBM DB2:
SELECT firstname , RAND() as IDX
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY IDX FETCH FIRST 2 ROWS ONLY
Oracle:
SELECT firstname
FROM(SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname ORDER BY dbms_random.value )
WHERE rownum in (1,2)
Follow the similar approach for selecting from entire pool

group by while selecting many more columns

I have this query :
select first_name, last_name, MAX(date)
from person p inner join
address a on
a.person_id = p.id
group by first_name, last_name
with person(sid, last_name, first_name), address(data, cp, sid, city)
My question is how I can have a query that select first_name, last_name, MAX(date), city, cp
without adding city and cp to the group
I mean I want to have all 5 columns but only for the datas grouped by first_name, last_name and date
Many Thanks

This is not possible. Say you have three John Smith in your database, each of them having one or two addresses. When you group by name now, then what city do you want to get? The city of which John Smith and of which of his addresses? As there is no implicit answer to this question, there is no way to write a select statement without explicitly saying which city is to be selected.

Understanding ROLLUP in SQL

I've figured out CUBE as just generating all the permutations, but I am having trouble with ROLLUP. There don't seem to be any good resources online or in the book I'm reading for explaining SQL for people like me who struggle with it.
My book says that ROLLUP is a special case of the CUBE operator that excludes all cases that don't follow a hierarchy within the results.
I'm not entirely sure what it means, but running it on a table I made kinda produces some useful results.
I made a table from another page on google like this:
Type Store Number
Dog Miami 12
Cat Miami 18
Turtle Tampa 4
Dog Tampa 14
Cat Naples 9
Dog Naples 5
Turtle Naples 1
Then here is query I made:
select store,[type], SUM(number) as Number from pets
group by store, [type]
with rollup
This shows me the number of each type of pet in each store, and total pets in each store, which is kinda cool. If I want to see the query based on pets, I found I have to switch the group by order around so type comes first.
So is rollup based on the first group by clause?
The other question is, I read you use ROLLUP instead of CUBE when you have a year and month column to stop it aggregating the same month across multiple years. I think I understand what this means, but could anyone clarify it? And how do you set it up like this?
Can you use ROLLUP to exclude other combinations of columns as well? My table above is quite simple and the query shows you "pets by store", but if there were other columns, could you include/exclude them from the results?

Best explained through an example. Suppose you group by A, B, C. You then get the following groupings with rollup:
(A, B, C)
(A, B)
(A)
()
So you see that the order is important, as you already found out. If you group by A, C, B, you get the following groupings instead:
(A, C, B)
(A, C)
(A)
()

SELECT departments.department_name
FROM departments
LEFT OUTER JOIN job_history
ON departments.department_id = job_history.department_id
WHERE job_history.employee_id IS NULL;
ex.3
SELECT department_id, manager_id, COUNT() as Numar
FROM Employees
GROUP BY ROLLUP(department_id, manager_id)
UNION
SELECT department_id, manager_id, COUNT() as Numar
FROM Employees
GROUP BY ROLLUP(manager_id, department_id);
SELECT department_id, manager_id, COUNT(employee_id) as employee_count
FROM employee
GROUP BY ROLLUP (department_id, manager_id);
SELECT employee_id, first_name, manager_id
FROM employee
START WITH employee_id = :employee_id
CONNECT BY PRIOR employee_id = manager_id;

ex3.1SELECT department_id, manager_id, COUNT() as Numar
FROM Employees
GROUP BY ROLLUP(department_id, manager_id)
UNION
SELECT department_id, manager_id, COUNT() as Numar
FROM Employees
GROUP BY ROLLUP(manager_id, department_id);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas