group by while selecting many more columns - sql

I have this query :
select first_name, last_name, MAX(date)
from person p inner join
address a on
a.person_id = p.id
group by first_name, last_name
with person(sid, last_name, first_name), address(data, cp, sid, city)
My question is how I can have a query that select first_name, last_name, MAX(date), city, cp
without adding city and cp to the group
I mean I want to have all 5 columns but only for the datas grouped by first_name, last_name and date
Many Thanks

This is not possible. Say you have three John Smith in your database, each of them having one or two addresses. When you group by name now, then what city do you want to get? The city of which John Smith and of which of his addresses? As there is no implicit answer to this question, there is no way to write a select statement without explicitly saying which city is to be selected.

Related

How to get values of one column without the aggregate column?

I have this table:
first_name
last_name
age
country
John
Doe
31
USA
Robert
Luna
22
USA
David
Robinson
22
UK
John
Reinhardt
25
UK
Betty
Doe
28
UAE
How can I get only the names of the oldest per country?
When I do this query
SELECT first_name,last_name, MAX(age)
FROM Customers
GROUP BY country
I get this result:
first_name
last_name
MAX(age)
Betty
Doe
31
John
Reinhardt
22
John
Doe
31
But I want to get only first name and last name without the aggregate function.
If window functions are an option, you can use ROW_NUMBER for this task.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY country ORDER BY age DESC) AS rn
FROM tab
)
SELECT first_name, last_name, age, country
FROM cte
WHERE rn = 1
Check the demo here.
It sounds like you want to get the oldest age per country first,
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
With that, you want to match that back to the original table (aka a join) to see which names they match up to.
So, something like this perhaps:
SELECT Customers.*
FROM Customers
INNER JOIN
(
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
) AS max_per_country_query
ON Customers.Country = max_per_country_query.Country
AND Customers.Age = max_per_country_query.MAX_AGE_IN_COUNTRY
If your database supports it, I prefer using the CTE style of handling these subqueries because it's easier to read and debug.
WITH cte_max_per_country AS (
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
)
SELECT Customers.*
FROM Customers C
INNER JOIN cte_max_per_country
ON C.Country = cte_max_per_country.Country
AND C.Age = cte_max_per_country.MAX_AGE_IN_COUNTRY

Why does this query need order by to work?

Why the wrong query is wrong?
-- Correct
SELECT first_name, count(*)
FROM customer
GROUP BY first_name
ORDER BY count(*) DESC
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
source:
https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/
I couldn't understand your explanation in comments, thus explaining here:
In first query, you are selecting columns First_Name and aggregating them into virtual buckets (group by First_Name). Second column is an aggregation function counting how many of them in given bucket. So for example:
First_Name, Last_Name
John, Doe
John, Carpenter
Frank, Sinatra
Frank, Doe
Frank, Short
You do a grouping by First_Name (and select it in the select list), you would have rows only:
First_Name
John
Frank
Adding an aggregate function (count in your case), it turns out (* in count means count rows with no particular column):
First_Name, Count(*)
John, 2
Frank, 3
Now if you consider second query:
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
For the result set:
First_Name, Last_Name, Count(*)
John, ????, 2
Frank, ????, 3
There isn't something that tells where the content for Last_Name come from (thus it would be a bug to include it).
If you wrote it as:
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name
It would be ok. Now your problem would be getting more than you need:
First_Name, Last_Name, Count(*)
John, Doe, 1
John, Carpenter, 1
Frank, Sinatra, 1
Frank, Doe, 1
Frank, Short, 1
That would at least reveal those having duplicated and could be useful in cases.
Issue
Given:
-- Wrong
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name
What was the error responded by SQL parser?
Presumably similar to:
ERROR: column "customer.last_name" must appear in the GROUP BY clause or be used in an aggregate function Position: 20
(Run the SQL and see this error output from PostgreSQL in SQLfiddle demo)
So just make sure all columns from your SELECT list, that are not aggregate-functions like count or sum are present in the GROUP BY list, they form the groups for which is aggregated.
Working GROUP BY
-- Correct
SELECT first_name, last_name, count(*)
FROM customer
GROUP BY first_name, last_name
See it working in SQLfiddle demo resulting in counting duplicates for the names.

Get order count of employees, which have different count than in one city

SQL studying is going good speed forward. Now got practice where I need some help.
I would like to get names of the employees who have different count of orders than employees in New York.
Tables:
ORDERS, which including EMPLOYEE_ID, CITY, ORDER_ID
EMPLOYEES, which including LAST_NAME, FIRST_NAME, EMPLOYEE_ID, CITY
I have been stuck in this situation:
SELECT ROW_NUMBER() OVER ( ORDER BY COUNT(T.ORDER_ID) DESC) ROW,
(H.LAST_NAME + ', ' + H.FIRST_NAME) 'Employee name',
COUNT(T.ORDER_ID) 'Sold orders', H.CITY 'City'
FROM ORDERS T JOIN EMPLOYEE H ON T.EMPLOYEE_ID = H.EMPLOYEE_ID
GROUP BY H.EMPLOYEE_ID, H.LAST_NAME, H.FIRST_NAME, H.CITY
With that I can get orders which employees has sold. Unfortunately it does not show employees with 0 orders.
How I can show also employees with 0 orders? And how can I show employees who have different count of orders than employees in one city? Example employees which have different order coutn than employees in New York.
I hope you understand what I mean. Complicated problem and little language barrier.
Example data:
First_name Last_name Sold_orders City
John Doe 2 New York
Jane Doe 5 Los Angeles
Peter Pan 5 Miami
I would like to get employees which do not have same count of orders than employees in Miami.
First_name Last_name Sold_orders City
John Doe 2 New York
So Jane doe and Peter Pan has gone, because they have same count of orders than employees in Miami (Miami itself included).
I think that you just want a left join. For this you need to start from the employee table, then bring the orders.
select
row_number() over ( order by count(o.order_id) desc) rn,
e.last_name + ', ' + e.first_name employee_name,
count(o.order_id) sold_orders,
e.city
from employee e
left join orders o on o.employee_id = e.employee_id
group by e.employee_id, e.last_name, e.first_name, e.city
Note that I also changed the column aliases: you should avoid using single quotes. Although some databases allow this, single quotes are usually meant for string litterals rather than identifiers. Databases use different symbols to quote identifiers (Oracle and Postgres have double quotes, MySQL has backticks, SQL Server has square brackets). I changed the query so it uses identifiers that do not require quotes.

BigQuery Nested Challenge Involving Joins and Having (or Where) Clauses

I have been given a challenge that is a bit out of my scope, so I'm just going to jump right in.
I have a sample dataset in BigQuery you can find here for testing purposes: https://bigquery.cloud.google.com/table/robotic-charmer-726:bl_test_data.complex_problem
I need to figure out the SQL code to query my table and do the following:
By aggregating using the following rules (I'll start with just one email address, and add the other in at the end):
As a general note up front, everything is to be made lowercase such that Ben=ben when aggregating.
Email is the broadest aggregation, and is aggregated by the lowercase version.
The amounts for all of those lowercase emails are summed, as is pictured below in blue.
First and last names are considered next, and they are selected based on the sum amount of the lowercase of the first AND last name.
Note, first or last names are NOT considered separately. See below where Ben has a sum amount of 160 and Kathleen only has a sum amount of 150, but Kathleen is still selected because her full name has a sum amount higher than any other full name.
Next the lowercase full address of the SELECTED NAME is chosen based on the highest sum amount.
Similar to the names, the full address considers all columns together.
Now I'll add in another email address, and we'll do the same thing.
Each lowercase email address is considered separately. I'm now realizing that I should have made that more clear with my pictures, but I don't want to do it all again... too much work. So I hope I have made it clear enough.
I hope you find this to be a very fun challenge!
There are probably cleaner ways of doing this, but this will give you the answer you need:
select email, first_name, last_name, address, city, state, zip, total_amount amount
from (
select d.email email, d.first_name first_name, d.last_name last_name, d.amount amount, d.total_amount total_amount, e.address address, e.city city, e.state state, e.zip zip, row_number() over (partition by e.email order by e.amount desc) ord
from (
select a.email email, a.first_name first_name, a.last_name last_name, b.amount amount, c.amount total_amount
from (
SELECT
lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name, last_name)) as name_group, lower(address) address, lower(city) city, lower(state) state, lower(concat(address,city,state)) as location_group, zip, sum(amount) amount
FROM [robotic-charmer-726:bl_test_data.complex_problem]
group by 1,2,3,4,5,6,7,8,9
) a
inner join (
select email, first_name, last_name, name_group, amount
from (
select email, first_name, last_name, name_group, amount, row_number() over (partition by email order by amount desc) as ord
from (
select lower(email) email , lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name,last_name)) as name_group, sum(amount) amount,
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1, 2, 3, 4
)
)
where ord = 1
) b
on a.name_group = b.name_group
inner join (
select lower(email) email, sum(amount) amount
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1
) c
on a.email = c.email
group by 1,2,3,4,5
) d
inner join (
select lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(address) address, lower(city) city, lower(state) state, zip,lower(concat(lower(address),lower(city), lower(state), zip)) as location_group, sum(amount) amount
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1,2,3,4,5,6,7,8
) e
on d.email = e.email and d.first_name = e.first_name and d.last_name = e.last_name
)
where ord = 1

SQL-Oracle: Difficulties in resolving basic problems, part2

Problem:
The HR department needs a query that prompts the user for an employee
last name. The query then displays the last name and hire date of any employee
in the same department as the employee whose name they supply(excluding that employee).
For example, if the user enters Zlotkey, find all employees who work with
Zlotkey (excluding Zlotkey).
I managed to do this so far, but i have no clue how to finish it. For now it shows me all employees after i write the last_name excluding it. Any suggestions please how to continue?
SELECT last_name, TO_CHAR(hire_date,'DD-MON-YYYY') AS "HIRE_DATE"
FROM employees
WHERE last_name <>ALL (SELECT '&last_name'
FROM employees)
AND department_id IN (SELECT department_id
???....
P.S: This problem is from the Oracle tutorials (Oracle Database 11g: SQL Fundamentals 1 of 1 (year 2009), Practice 7 , exercise 1 for people who already done this:).
Something like this?
SELECT last_name, TO_CHAR(hire_date,'DD-MON-YYYY') AS "HIRE_DATE"
FROM employees a
JOIN (Select department_id from employees where last_name = :surname) b on a.department_id = b.department_id
and last_name <> :surname
EDIT
The only problem with this type of solution is that if there are two people with the same surname in different departments, so it might be useful to maybe use something like an employee number as a filter instead of surname.
SELECT LAST_NAME, HIRE_DATE
FROM EMPLOYEES
WHERE DEPARTMENT_ID= (SELECT DEPARTMENT_ID
FROM EMPLOYEES
WHERE LAST_NAME LIKE '&NAME')
AND LAST_NAME <> '&NAME';