result of selecting field names from table with group by - sql

Say you're given the following table called Customers:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Hardik | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Ramesh | 25 | Ahmedabad | 6500.00 |
| 5 | Hardik | 27 | Delhi | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Ramesh | 24 | Ahmedabad | 10000.00 |
+----+----------+-----+-----------+----------+
A lot of resources explaining group by statements would use an example like:
SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY NAME; where the thing being 'selected' other than the field being 'grouped by' is a function like count or sum. But what happens if you did something like SELECT NAME, ADDRESS FROM CUSTOMERS GROUP BY NAME; - how exactly would the addresses be grouped together in a single record with the name. I know I can run this and find out the answer, but I want to understand the general logic - if anyone could assist that would be very much appreciated.
EDIT ANOTHER QUESTION:
In the new table above, if I did SELECT NAME, ADDRESS, group_concat(salary) FROM CUSTOMERS GROUP BY NAME; would this be ok, seeing as how the addresses are the same for each name?

If you say SELECT NAME, ADDRESS FROM CUSTOMERS GROUP BY NAME, you will get an error asking you to put an aggregation function around the ADDRESS column. For instance, you could write
SELECT NAME, MAX(ADDRESS) FROM CUSTOMERS GROUP BY NAME

Related

SQL Group By Query With Specific First Row

I'm using this query to pull information about companies and their scores from a ms sql database.
SELECT company, avg(score) AS Value FROM Responses where id=12 group by company
This is the result
| COMPANY | VALUE |
|: ------------ | ------:|
| Competitor A | 6.09 |
| Competitor B | 5.70 |
| Other Brand | 5.29 |
| Your Brand | 6.29 |
What I need is a query that will put one company that I will specify in the first position (in this case, the company is Your Brand) and then order the rest by the company like this.
| COMPANY | VALUE |
|: ------------ | -----:|
| Your Brand | 6.29 |
| Competitor A | 6.09 |
| Competitor B | 5.70 |
| Other Brand | 5.29 |
As #jarlh has suggested, use a CASE expression to order:
SELECT company, AVG(score) AS Value
FROM Responses
WHERE id = 12
GROUP BY company
ORDER BY CASE company WHEN 'Your Brand' THEN 0 ELSE 1 END,
AVG(score) DESC;

FIRST & LAST values in Oracle SQL

I am having trouble querying some data. The table I am trying to pull the data from is a LOG table, where I would like to see changes in the values next to each other (example below)
Table:
+-----------+----+-------------+----------+------------+
| UNIQUE_ID | ID | NAME | CITY | DATE |
+-----------+----+-------------+----------+------------+
| xa220 | 1 | John Smith | Berlin | 2020.05.01 |
| xa195 | 1 | John Smith | Berlin | 2020.03.01 |
| xa111 | 1 | John Smith | München | 2020.01.01 |
| xa106 | 2 | James Brown | Atlanta | 2018.04.04 |
| xa100 | 2 | James Brown | Boston | 2017.12.10 |
| xa76 | 3 | Emily Wolf | Shanghai | 2016.11.03 |
| xa20 | 3 | Emily Wolf | Shanghai | 2016.07.03 |
| xa15 | 3 | Emily Wolf | Tokyo | 2014.02.22 |
| xa12 | 3 | Emily Wolf | null | 2014.02.22 |
+-----------+----+-------------+----------+------------+
Desired outcome:
+----+-------------+----------+---------------+
| ID | NAME | CITY | PREVIOUS_CITY |
+----+-------------+----------+---------------+
| 1 | John Smith | Berlin | München |
| 2 | James Brown | Atlanta | Boston |
| 3 | Emily Wolf | Shanghai | Tokyo |
| 3 | Emily Wolf | Tokyo | null |
+----+-------------+----------+---------------+
I have been trying to use FIRST and LAST values, however, cannot get the desired outcome.
select distinct id,
name,
city,
first_value(city) over (partition by id order by city) as previous_city
from test
Any help is appreciated!
Thank you!
Use the LAG function to get the city for previous date and display only the rows where current city and the result of lag are different:
WITH cte AS (
SELECT t.*, LAG(CITY, 1, CITY) OVER (PARTITION BY ID ORDER BY "DATE") LAG_CITY
FROM yourTable t
)
SELECT ID, NAME, CITY, LAG_CITY AS PREVIOUS_CITY
FROM cte
WHERE
CITY <> LAG_CITY OR
CITY IS NULL AND LAG_CITY IS NOT NULL OR
CITY IS NOT NULL AND LAG_CITY IS NULL
ORDER BY
ID, "DATE" DESC;
Demo
Some comments on how LAG is being used and its values checked are warranted. We use the three parameter version of LAG here. The second parameter means the number of records to look back, which in this case is 1 (the default). The third parameter means the default value to use should a given record per ID partition be the first. In this case, we use the default as the same CITY value. This means that the first record would never appear in the result set.
For the WHERE clause above, a matching record is one for which the city and lag city are different, or for where one of the two be NULL and the other not NULL. This is the logic needed to treat a NULL city and some not NULL city value as being different.

SQL - joining 3 tables and choosing newest logged entry per id

I got rather complicated riddle to solve. So far I'm unlocky.
I got 3 tables which I need to join to get the result.
Most important is that I need highest h_id per p_id. h_id is uniqe entry in log history. And I need newest one for given point (p_id -> num).
Apart from that I need ext and name as well.
history
+----------------+---------+--------+
| h_id | p_id | str_id |
+----------------+---------+--------+
| 1 | 1 | 11 |
| 2 | 5 | 15 |
| 3 | 5 | 23 |
| 4 | 1 | 62 |
+----------------+---------+--------+
point
+----------------+---------+
| p_id | num |
+----------------+---------+
| 1 | 4564 |
| 5 | 3453 |
+----------------+---------+
street
+----------------+---------+-------------+
| str_id | ext | name |
+----------------+---------+-------------+
| 15 | | Mein st. 33 | - bad name
| 11 | | eck st. 42 | - bad name
| 62 | abc | Main st. 33 |
| 23 | efg | Back st. 42 |
+----------------+---------+-------------+
EXPECTED RESULT
+----------------+---------+-------------+-----+
| num | ext | name |h_id |
+----------------+---------+-------------+-----+
| 3453 | efg | Back st. 42 | 3 |
| 4564 | abc | Main st. 33 | 4 |
+----------------+---------+-------------+-----+
I'm using Oracle SQL. Tried using query below but result is not true.
SELECT num, max(name), max(ext), MAX(h_id) maxm FROM history
INNER JOIN street on street.str_id = history._str_id
INNER JOIN point on point.p_id = history.p_id
GROUP BY point.num
In Oracle, you can use keep:
SELECT p.num,
MAX(h.h_id) as maxm,
MAX(s.name) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as name,
MAX(s.ext) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as ext
FROM history h INNER JOIN
street s
ON s.str_id = h._str_id INNER JOIN
point p
ON p.p_id = h.p_id
GROUP BY p.num;
The keep syntax allows you to do "first()" and "last()" for aggregations.

MySQL: How to select and display ALL rows from one table, and calculate the sum of a where clause on another table?

I'm trying to display all rows from one table and also SUM/AVG the results in one column, which is the result of a where clause. That probably doesn't make much sense, so let me explain.
I need to display a report of all employees...
SELECT Employees.Name, Employees.Extension
FROM Employees;
--------------
| Name | Ext |
--------------
| Joe | 123 |
| Jane | 124 |
| John | 125 |
--------------
...and join some information from the PhoneCalls table...
--------------------------------------------------------------
| PhoneCalls Table |
--------------------------------------------------------------
| Ext | StartTime | EndTime | Duration |
--------------------------------------------------------------
| 123 | 2010-09-05 10:54:22 | 2010-09-05 10:58:22 | 240 |
--------------------------------------------------------------
SELECT Employees.Name,
Employees.Extension,
Count(PhoneCalls.*) AS CallCount,
AVG(PhoneCalls.Duration) AS AverageCallTime,
SUM(PhoneCalls.Duration) AS TotalCallTime
FROM Employees
LEFT JOIN PhoneCalls ON Employees.Extension = PhoneCalls.Extension
GROUP BY Employees.Extension;
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 10 | 200 | 2000 |
| Jane | 124 | 20 | 250 | 5000 |
| John | 125 | 3 | 100 | 300 |
------------------------------------------------------------
Now I want to filter out some of the rows that are included in the SUM and AVG calculations...
WHERE PhoneCalls.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
...which will ideally result in a table looking something like this:
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 5 | 200 | 1000 |
| Jane | 124 | 10 | 250 | 2500 |
| John | 125 | 0 | 0 | 0 |
------------------------------------------------------------
Note that John has not made any calls in this date range, so his total CallCount is zero, but he is still in the list of results. I can't seem to figure out how to keep records like John's in the list. When I add the WHERE clause, those records are filtered out.
How can I create a select statement that displays all of the Employees and only SUMs/AVGs the values returned from the WHERE clause?
Use:
SELECT e.Name,
e.Extension,
Count(pc.*) AS CallCount,
AVG(pc.Duration) AS AverageCallTime,
SUM(pc.Duration) AS TotalCallTime
FROM Employees e
LEFT JOIN PhoneCalls pc ON pc.extension = e.extension
AND pc.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
GROUP BY e.Name, e.Extension
The issue is when using an OUTER JOIN, specifying criteria in the JOIN section is applied before the JOIN takes place--like a derived table or inline view. The WHERE clause is applied after the OUTER JOIN, which is why when you specified the WHERE clause on the table being LEFT OUTER JOIN'd to that the rows you still wanted to see are being filtered out.

what is the difference between GROUP BY and ORDER BY in sql

When do you use which in general? Examples are highly encouraged!
I am referring so MySql, but can't imagine the concept being different on another DBMS
ORDER BY alters the order in which items are returned.
GROUP BY will aggregate records by the specified columns which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc).
TABLE:
ID NAME
1 Peter
2 John
3 Greg
4 Peter
SELECT *
FROM TABLE
ORDER BY NAME
=
3 Greg
2 John
1 Peter
4 Peter
SELECT Count(ID), NAME
FROM TABLE
GROUP BY NAME
=
1 Greg
1 John
2 Peter
SELECT NAME
FROM TABLE
GROUP BY NAME
HAVING Count(ID) > 1
=
Peter
ORDER BY: sort the data in ascending or descending order.
Consider the CUSTOMERS table:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in ascending order by NAME:
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME;
This would produce the following result:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
GROUP BY: arrange identical data into groups.
Now, CUSTOMERS table has the following records with duplicate names:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
if you want to group identical names into single name, then GROUP BY query would be as follows:
SQL> SELECT * FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result:
(for identical names it would pick the last one and finally sort the column in ascending order)
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
+----+----------+-----+-----------+----------+
as you have inferred that it is of no use without SQL functions like sum,avg etc..
so go through this definition to understand the proper use of GROUP BY:
A GROUP BY clause works on the rows returned by a query by summarizing
identical rows into a single/distinct group and returns a single row
with the summary for each group, by using appropriate Aggregate
function in the SELECT list, like COUNT(), SUM(), MIN(), MAX(), AVG(),
etc.
Now, if you want to know the total amount of salary on each customer(name), then GROUP BY query would be as follows:
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result: (sum of the salaries of identical names and sort the NAME column after removing identical names)
+---------+-------------+
| NAME | SUM(SALARY) |
+---------+-------------+
| Hardik | 8500.00 |
| kaushik | 8500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh | 3500.00 |
+---------+-------------+
ORDER BY alters the order in which items are returned.
GROUP BY will aggregate records by the specified columns which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc).
The difference is exactly what the name implies: a group by performs a grouping operation, and an order by sorts.
If you do SELECT * FROM Customers ORDER BY Name then you get the result list sorted by the customers name.
If you do SELECT IsActive, COUNT(*) FROM Customers GROUP BY IsActive you get a count of active and inactive customers. The group by aggregated the results based on the field you specified.
They have totally different meaning and aren't really related at all.
ORDER BY allows you to sort the result set according to different criteria, such as first sort by name from a-z, then sort by the price highest to lowest.
(ORDER BY name, price DESC)
GROUP BY allows you to take your result set, group it into logical groups and then run aggregate queries on those groups. You could for instance select all employees, group them by their workplace location and calculate the average salary of all employees of each workplace location.
Simple, ORDER BY orders the data and GROUP BY groups, or combines the data.
ORDER BY orders the result set as per the mentioned field, by default in ascending order.
Suppose you are firing a query as ORDER BY (student_roll_number), it will show you result in ascending order of student's roll numbers. Here, student_roll_number entry might occur more than once.
In GROUP BY case, we use this with aggregate functions, and it groups the data as per the aggregate function, and we get the result. Here, if our query has SUM (marks) along with GROUP BY (student_first_name) it will show the sum of marks of students belonging to each group (where all members of a group will have the same first name).
GROUP BY is used to group rows in a select, usually when aggregating rows (e.g. calculating totals, averages, etc. for a set of rows with the same values for some fields).
ORDER BY is used to order the rows resulted from a select statement.
ORDER BY shows a field in ascending or descending order. While GROUP BY shows same fieldnames, id's etc in only one output.
GROUP BY will aggregate records by the specified column which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc.). ORDER BY alters the order in which items are returned.
If you do
SELECT IsActive, COUNT(*) FROM Customers GROUP BY IsActive
you get a count of active and inactive customers. The group by aggregated the results based on the field you specified. If you do
SELECT * FROM Customers ORDER BY Name
then you get the result list sorted by the customer’s name.
If you GROUP, the results are not necessarily sorted; although in many cases they may come out in an intuitive order, that's not guaranteed by the GROUP clause. If you want your groups sorted, always use an explicitly ORDER BY after the GROUP BY.
Grouped data cannot be filtered by WHERE clause. Order data can be filtered by WHERE clause.
It should be noted GROUP BY is not always necessary as (at least in PostgreSQL, and likely in other SQL variants) you can use ORDER BY with a list and you can still use ASC or DESC per column...
SELECT name_first, name_last, dob
FROM those_guys
ORDER BY name_last ASC, name_first ASC, dob DESC;
GROUP BY clause provide us grouping data of each record, however order by clause provide us ordering data for ASC or DESC .
1)To understand GROUP BY clause imagine that you have a table and there are a lots of record and you want to grouping each record where each record of value is only same value grouping by some criteria e.g (AVG,SUM,COUNT,MIN,MAX etc. ) what will you do in this situation? before group by you were writing such queries;
select avg(sal), job
from emp
where job='MANAGER';
select avg(sal), job
from emp
where job='DIRECTOR';
select avg(sal), job
from emp
where job='USER';
They return avg(sal) each of people by job criteria in the above queries, however sql has a better way to provide us grouping by some criteria each of record, which has same value. So that without write more complex query like in the above you can write only one query and see result each record group by
select avg(sal), job
from emp
group by job;
This will return each same value of record of avg(sal) some group by criteria.
Note GROUP BY keyword affect when used to together aggregate functions.