what is the difference between GROUP BY and ORDER BY in sql - sql

When do you use which in general? Examples are highly encouraged!
I am referring so MySql, but can't imagine the concept being different on another DBMS

ORDER BY alters the order in which items are returned.
GROUP BY will aggregate records by the specified columns which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc).
TABLE:
ID NAME
1 Peter
2 John
3 Greg
4 Peter
SELECT *
FROM TABLE
ORDER BY NAME
=
3 Greg
2 John
1 Peter
4 Peter
SELECT Count(ID), NAME
FROM TABLE
GROUP BY NAME
=
1 Greg
1 John
2 Peter
SELECT NAME
FROM TABLE
GROUP BY NAME
HAVING Count(ID) > 1
=
Peter

ORDER BY: sort the data in ascending or descending order.
Consider the CUSTOMERS table:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in ascending order by NAME:
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME;
This would produce the following result:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
GROUP BY: arrange identical data into groups.
Now, CUSTOMERS table has the following records with duplicate names:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
if you want to group identical names into single name, then GROUP BY query would be as follows:
SQL> SELECT * FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result:
(for identical names it would pick the last one and finally sort the column in ascending order)
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
+----+----------+-----+-----------+----------+
as you have inferred that it is of no use without SQL functions like sum,avg etc..
so go through this definition to understand the proper use of GROUP BY:
A GROUP BY clause works on the rows returned by a query by summarizing
identical rows into a single/distinct group and returns a single row
with the summary for each group, by using appropriate Aggregate
function in the SELECT list, like COUNT(), SUM(), MIN(), MAX(), AVG(),
etc.
Now, if you want to know the total amount of salary on each customer(name), then GROUP BY query would be as follows:
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result: (sum of the salaries of identical names and sort the NAME column after removing identical names)
+---------+-------------+
| NAME | SUM(SALARY) |
+---------+-------------+
| Hardik | 8500.00 |
| kaushik | 8500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh | 3500.00 |
+---------+-------------+

ORDER BY alters the order in which items are returned.
GROUP BY will aggregate records by the specified columns which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc).

The difference is exactly what the name implies: a group by performs a grouping operation, and an order by sorts.
If you do SELECT * FROM Customers ORDER BY Name then you get the result list sorted by the customers name.
If you do SELECT IsActive, COUNT(*) FROM Customers GROUP BY IsActive you get a count of active and inactive customers. The group by aggregated the results based on the field you specified.

They have totally different meaning and aren't really related at all.
ORDER BY allows you to sort the result set according to different criteria, such as first sort by name from a-z, then sort by the price highest to lowest.
(ORDER BY name, price DESC)
GROUP BY allows you to take your result set, group it into logical groups and then run aggregate queries on those groups. You could for instance select all employees, group them by their workplace location and calculate the average salary of all employees of each workplace location.

Simple, ORDER BY orders the data and GROUP BY groups, or combines the data.
ORDER BY orders the result set as per the mentioned field, by default in ascending order.
Suppose you are firing a query as ORDER BY (student_roll_number), it will show you result in ascending order of student's roll numbers. Here, student_roll_number entry might occur more than once.
In GROUP BY case, we use this with aggregate functions, and it groups the data as per the aggregate function, and we get the result. Here, if our query has SUM (marks) along with GROUP BY (student_first_name) it will show the sum of marks of students belonging to each group (where all members of a group will have the same first name).

GROUP BY is used to group rows in a select, usually when aggregating rows (e.g. calculating totals, averages, etc. for a set of rows with the same values for some fields).
ORDER BY is used to order the rows resulted from a select statement.

ORDER BY shows a field in ascending or descending order. While GROUP BY shows same fieldnames, id's etc in only one output.

GROUP BY will aggregate records by the specified column which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc.). ORDER BY alters the order in which items are returned.
If you do
SELECT IsActive, COUNT(*) FROM Customers GROUP BY IsActive
you get a count of active and inactive customers. The group by aggregated the results based on the field you specified. If you do
SELECT * FROM Customers ORDER BY Name
then you get the result list sorted by the customer’s name.
If you GROUP, the results are not necessarily sorted; although in many cases they may come out in an intuitive order, that's not guaranteed by the GROUP clause. If you want your groups sorted, always use an explicitly ORDER BY after the GROUP BY.
Grouped data cannot be filtered by WHERE clause. Order data can be filtered by WHERE clause.

It should be noted GROUP BY is not always necessary as (at least in PostgreSQL, and likely in other SQL variants) you can use ORDER BY with a list and you can still use ASC or DESC per column...
SELECT name_first, name_last, dob
FROM those_guys
ORDER BY name_last ASC, name_first ASC, dob DESC;

GROUP BY clause provide us grouping data of each record, however order by clause provide us ordering data for ASC or DESC .
1)To understand GROUP BY clause imagine that you have a table and there are a lots of record and you want to grouping each record where each record of value is only same value grouping by some criteria e.g (AVG,SUM,COUNT,MIN,MAX etc. ) what will you do in this situation? before group by you were writing such queries;
select avg(sal), job
from emp
where job='MANAGER';
select avg(sal), job
from emp
where job='DIRECTOR';
select avg(sal), job
from emp
where job='USER';
They return avg(sal) each of people by job criteria in the above queries, however sql has a better way to provide us grouping by some criteria each of record, which has same value. So that without write more complex query like in the above you can write only one query and see result each record group by
select avg(sal), job
from emp
group by job;
This will return each same value of record of avg(sal) some group by criteria.
Note GROUP BY keyword affect when used to together aggregate functions.

Related

SQL Group By Query With Specific First Row

I'm using this query to pull information about companies and their scores from a ms sql database.
SELECT company, avg(score) AS Value FROM Responses where id=12 group by company
This is the result
| COMPANY | VALUE |
|: ------------ | ------:|
| Competitor A | 6.09 |
| Competitor B | 5.70 |
| Other Brand | 5.29 |
| Your Brand | 6.29 |
What I need is a query that will put one company that I will specify in the first position (in this case, the company is Your Brand) and then order the rest by the company like this.
| COMPANY | VALUE |
|: ------------ | -----:|
| Your Brand | 6.29 |
| Competitor A | 6.09 |
| Competitor B | 5.70 |
| Other Brand | 5.29 |
As #jarlh has suggested, use a CASE expression to order:
SELECT company, AVG(score) AS Value
FROM Responses
WHERE id = 12
GROUP BY company
ORDER BY CASE company WHEN 'Your Brand' THEN 0 ELSE 1 END,
AVG(score) DESC;

result of selecting field names from table with group by

Say you're given the following table called Customers:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Hardik | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Ramesh | 25 | Ahmedabad | 6500.00 |
| 5 | Hardik | 27 | Delhi | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Ramesh | 24 | Ahmedabad | 10000.00 |
+----+----------+-----+-----------+----------+
A lot of resources explaining group by statements would use an example like:
SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY NAME; where the thing being 'selected' other than the field being 'grouped by' is a function like count or sum. But what happens if you did something like SELECT NAME, ADDRESS FROM CUSTOMERS GROUP BY NAME; - how exactly would the addresses be grouped together in a single record with the name. I know I can run this and find out the answer, but I want to understand the general logic - if anyone could assist that would be very much appreciated.
EDIT ANOTHER QUESTION:
In the new table above, if I did SELECT NAME, ADDRESS, group_concat(salary) FROM CUSTOMERS GROUP BY NAME; would this be ok, seeing as how the addresses are the same for each name?
If you say SELECT NAME, ADDRESS FROM CUSTOMERS GROUP BY NAME, you will get an error asking you to put an aggregation function around the ADDRESS column. For instance, you could write
SELECT NAME, MAX(ADDRESS) FROM CUSTOMERS GROUP BY NAME

LEFT JOINing the max/top

I have two tables from which I'm trying to run a query to return the maximum (or top) transaction for each person. I should note that I cannot change the table structure. Rather, I can only pull data.
People
+-----------+
| id | name |
+-----------+
| 42 | Bob |
| 65 | Ted |
| 99 | Stu |
+-----------+
Transactions (there is no primary key)
+---------------------------------+
| person | amount | date |
+---------------------------------+
| 42 | 3 | 9/14/2030 |
| 42 | 4 | 7/02/2015 |
| 42 | *NULL* | 2/04/2020 |
| 65 | 7 | 1/03/2010 |
| 65 | 7 | 5/20/2020 |
+---------------------------------+
Ultimately, for each person I want to return the highest amount. If that doesn't work then I'd like to look at the date and return the most recent date.
So, I'd like my query to return:
+----------------------------------------+
| person_id | name | amount | date |
+----------------------------------------+
| 42 | Bob | 4 | 7/02/2015 | (<- highest amount)
| 65 | Ted | 7 | 5/20/2020 | (<- most recent date)
| 99 | Stu | *NULL* | *NULL* | (<- no records in Transactions table)
+----------------------------------------+
SELECT People.id, name, amount, date
FROM People
LEFT JOIN (
SELECT TOP 1 person_id
FROM Transactions
WHERE person_id = People.id
ORDER BY amount DESC, date ASC
)
ON People.id = person_id
I can't figure out what I am doing wrong, but I know it's wrong. Any help would be much appreciated.
You are almost there but since there are duplicate Id in the Transaction table ,so you need to remove those by using Row_number() function
Try this :
With cte as
(Select People,amount,date ,row_number() over (partition by People
order by amount desc, date desc) as row_num
from Transac )
Select * from People as a
left join cte as b
on a.ID=b.People
and b.row_num=1
The result is in Sql Fiddle
Edit: Row_number() from MSDN
Returns the sequential number of a row within a partition of a result set,
starting at 1 for the first row in each partition.
Partition is used to group the result set and Over by clause is used
Determine the partitioning and ordering of the rowset before the
associated window function is applied.

Maintain sql order in mysql after group

I have next sql
select site_id,count from tags where match (tag) against ('statistici' in boolean mode) ORDER BY count DESC;
+---------+-------+
| site_id | count |
+---------+-------+
| 9 | 1300 |
| 13 | 1200 |
| 9 | 1100 |
| 13 | 1000 |
| 9 | 900 |
| 13 | 800 |
| 13 | 800 |
+---------+-------+
What i need is to get distinct site_id.
But when i use a group by statement the order by count is not kept
select site_id,count from tags where match (tag) against ('statistici' in boolean mode) GROUP by site_id ORDER BY count DESC;
+---------+-------+
| site_id | count |
+---------+-------+
| 13 | 1000 |
| 9 | 900 |
+---------+-------+
What should i do ?
You're ordering by ORDER BY count DESC, and the result is ordered with the highest count first.
Change to ORDER BY count ACS if you prefer the lowest count first.
EDIT: Based on your comment, perhaps this is more like what you're trying to achieve:
select site_id
, max(count) as TopCount
from tags
where match (tag) against ('statistici' in boolean mode)
group by
site_id
order by
TopCount DESC
This selects the highest count per site, and places the highest count first.
Group by site_Id order by count(*);

MySQL: How to select and display ALL rows from one table, and calculate the sum of a where clause on another table?

I'm trying to display all rows from one table and also SUM/AVG the results in one column, which is the result of a where clause. That probably doesn't make much sense, so let me explain.
I need to display a report of all employees...
SELECT Employees.Name, Employees.Extension
FROM Employees;
--------------
| Name | Ext |
--------------
| Joe | 123 |
| Jane | 124 |
| John | 125 |
--------------
...and join some information from the PhoneCalls table...
--------------------------------------------------------------
| PhoneCalls Table |
--------------------------------------------------------------
| Ext | StartTime | EndTime | Duration |
--------------------------------------------------------------
| 123 | 2010-09-05 10:54:22 | 2010-09-05 10:58:22 | 240 |
--------------------------------------------------------------
SELECT Employees.Name,
Employees.Extension,
Count(PhoneCalls.*) AS CallCount,
AVG(PhoneCalls.Duration) AS AverageCallTime,
SUM(PhoneCalls.Duration) AS TotalCallTime
FROM Employees
LEFT JOIN PhoneCalls ON Employees.Extension = PhoneCalls.Extension
GROUP BY Employees.Extension;
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 10 | 200 | 2000 |
| Jane | 124 | 20 | 250 | 5000 |
| John | 125 | 3 | 100 | 300 |
------------------------------------------------------------
Now I want to filter out some of the rows that are included in the SUM and AVG calculations...
WHERE PhoneCalls.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
...which will ideally result in a table looking something like this:
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 5 | 200 | 1000 |
| Jane | 124 | 10 | 250 | 2500 |
| John | 125 | 0 | 0 | 0 |
------------------------------------------------------------
Note that John has not made any calls in this date range, so his total CallCount is zero, but he is still in the list of results. I can't seem to figure out how to keep records like John's in the list. When I add the WHERE clause, those records are filtered out.
How can I create a select statement that displays all of the Employees and only SUMs/AVGs the values returned from the WHERE clause?
Use:
SELECT e.Name,
e.Extension,
Count(pc.*) AS CallCount,
AVG(pc.Duration) AS AverageCallTime,
SUM(pc.Duration) AS TotalCallTime
FROM Employees e
LEFT JOIN PhoneCalls pc ON pc.extension = e.extension
AND pc.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
GROUP BY e.Name, e.Extension
The issue is when using an OUTER JOIN, specifying criteria in the JOIN section is applied before the JOIN takes place--like a derived table or inline view. The WHERE clause is applied after the OUTER JOIN, which is why when you specified the WHERE clause on the table being LEFT OUTER JOIN'd to that the rows you still wanted to see are being filtered out.