Get the min of one column but select multiple columns - sql

I have a table as following:
ID NAME AMOUNT
______________________
1 A 3
1 B 4
2 C 18
4 I 2
4 P 9
And I want the min(Amount) for each ID but I still want to display its Name. So I want this:
ID NAME min(AMOUNT)
______________________
1 A 3
2 C 18
4 I 2
ID's can occur multiple times, Names too. I tried this:
SELECT ID, NAME, min(AMOUNT) FROM TABLE
GROUP BY ID
But of course its an error because I have to
GROUP BY ID, NAME
But then I get
ID NAME AMOUNT
______________________
1 A 3
1 B 4
2 C 18
4 I 2
4 P 9
And I understand why, it looks for the min(AMOUNT) for each combination of ID + NAME. So my question is basically, how can I select multiple column (ID, NAME, AMOUNT) and get the minimum for only one column, still displaying the others?
Im new to SQL but I cant seem to find an answer..

If you are using PostgreSQL, SQL Server, MySQL 8.0 and Oracle then try the following with window function row_number().
in case you have one id with similar amount then you can use dense_rank() instead of row_number()
Here is the demo.
select
id,
name,
amount
from
(
select
*,
row_number() over (partition by id order by amount) as rnk
from yourTable
) val
where rnk = 1
Output:
| id | name | amount |
| --- | ---- | ------ |
| 1 | A | 3 |
| 2 | C | 18 |
| 4 | I | 2 |
Second Option without using window function
select
val.id,
t.name,
val.amount
from myTable t
join
(
select
id,
min(amount) as amount
from myTable
group by
id
) val
on t.id = val.id
and t.amount = val.amount

You did not specify your db vendor. If it is luckily Postgres, the problem can be also solved without nested subquery using proprietary distinct on clause:
with t(id,name,amount) as (values
(1, 'A', 3),
(1, 'B', 4),
(1, 'W', 3),
(2, 'C', 18),
(4, 'I', 2),
(4, 'P', 9)
)
select distinct on (id, name_of_min) id
, first_value(name) over (partition by id order by amount) as name_of_min
, amount
from t
order by id, name_of_min
Just for widening knowledge. I don't recommend using proprietary features. first_value is standard function but to solve problem in simple query is still not enough. #zealous' answer is perfect.

In many databases, the most efficient method uses a correlated subquery:
select t.*
from t
where t.amount = (select min(t2.amount) from t t2 where t2.id = t.id);
In particular, this can take advantage of an index on (id, amount).

Related

SQL query for fetching a single column with multiple values

Consider the below table:
Table1
id | status
------------
1 | A
2 | B
3 | C
1 | B
4 | B
5 | C
4 | A
Desired output is 1 and 4 as they are having status as both 'A' and 'B'.
Can we write an query for this? I tried to query it using conditions like 'AND', 'UNION' and 'OR', but it did not return me the desired result.
If you want the ids with more than 1 statuses:
select id
from tablename
group by id
having count(distinct status) > 1
You can use aggregation:
select id
from t
where status in ('A', 'B')
group by id
having count(*) = 2;
If the table allows duplicates, then use count(distinct status) = 2.
Try this one, you can do without using having() as well
select
id
from
(
select
id,
count(distinct status) as cnt
from yourTable
group by
id
) val
where cnt > 1

How to select most frequent value in a column per each id group?

I have a table in SQL that looks like this:
user_id | data1
0 | 6
0 | 6
0 | 6
0 | 1
0 | 1
0 | 2
1 | 5
1 | 5
1 | 3
1 | 3
1 | 3
1 | 7
I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is. In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below:
user_id | most_frequent_value
0 | 6
1 | 3
I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id. What would I need to add to my query to get it to return the most frequent value for each id? I am thinking I need to use a subquery, but am unsure of how to structure it.
SELECT user_id, data1 AS most_frequent_value
FROM my_table
GROUP BY user_id, data1
ORDER BY COUNT(*) DESC LIMIT 1
You can use a window function to rank the userids based on their count of data1.
WITH cte AS (
SELECT
user_id
, data1
, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY COUNT(data1) DESC) rn
FROM dbo.YourTable
GROUP BY
user_id,
data1)
SELECT
user_id,
data1
FROM cte WHERE rn = 1
If you use proper "order by" then distinct on (user_id) make the same work because it takes 1.line from data partitioned by "user_id". DISTINCT ON is specialty of PostgreSQL.
select distinct on (user_id) user_id, most_frequent_value from (
SELECT user_id, data1 AS most_frequent_value, count(*) as _count
FROM my_table
GROUP BY user_id, data1) a
ORDER BY user_id, _count DESC
With postgres 9.4 or greater it is possible. You can use it like:
SELECT
user_id, MODE() WITHIN GROUP (ORDER BY value)
FROM
(VALUES (0,6), (0,6), (0, 6), (0,1),(0,1), (1,5), (1,5), (1,3), (1,3), (1,7))
users (user_id, value)
GROUP BY user_id

Postgres: select all row with count of a field greater than 1

i have table storing product price information, the table looks similar to, (no is the primary key)
no name price date
1 paper 1.99 3-23
2 paper 2.99 5-25
3 paper 1.99 5-29
4 orange 4.56 4-23
5 apple 3.43 3-11
right now I want to select all the rows where the "name" field appeared more than once in the table. Basically, i want my query to return the first three rows.
I tried:
SELECT * FROM product_price_info GROUP BY name HAVING COUNT(*) > 1
but i get an error saying:
column "product_price_info.no" must appear in the GROUP BY clause or be used in an aggregate function
SELECT *
FROM product_price_info
WHERE name IN (SELECT name
FROM product_price_info
GROUP BY name HAVING COUNT(*) > 1)
Try this:
SELECT no, name, price, "date"
FROM (
SELECT no, name, price, "date",
COUNT(*) OVER (PARTITION BY name) AS cnt
FROM product_price_info ) AS t
WHERE t.cnt > 1
You can use the window version of COUNT to get the population of each name partition. Then, in an outer query, filter out name partitions having a population that is less than 2.
Window Functions are really nice for this.
SELECT p.*, count(*) OVER (PARTITION BY name) FROM product p;
For a full example:
CREATE TABLE product (no SERIAL, name text, price NUMERIC(8,2), date DATE);
INSERT INTO product(name, price, date) values
('paper', 1.99, '2017-03-23'),
('paper', 2.99, '2017-05-25'),
('paper', 1.99, '2017-05-29'),
('orange', 4.56, '2017-04-23'),
('apple', 3.43, '2017-03-11')
;
WITH report AS (
SELECT p.*, count(*) OVER (PARTITION BY name) as count FROM product p
)
SELECT * FROM report WHERE count > 1;
Gives:
no | name | price | date | count
----+--------+-------+------------+-------
1 | paper | 1.99 | 2017-03-23 | 3
2 | paper | 2.99 | 2017-05-25 | 3
3 | paper | 1.99 | 2017-05-29 | 3
(3 rows)
Self join version, use a sub-query that returns the name's that appears more than once.
select t1.*
from tablename t1
join (select name from tablename group by name having count(*) > 1) t2
on t1.name = t2.name
Basically the same as IN/EXISTS versions, but probably a bit faster.
SELECT name, count(name)
FROM product_price_info
GROUP BY name
HAVING COUNT(name) > 1
LIMIT 3

How to order an already ordered subquery

Creating this table:
CREATE TABLE #Test (id int, name char(10), list int, priority int)
INSERT INTO #Test VALUES (1, 'One', 1, 1)
INSERT INTO #Test VALUES (2, 'Two', 2, 1)
INSERT INTO #Test VALUES (3, 'Three', 3, 2)
INSERT INTO #Test VALUES (4, 'Four', 4, 1)
INSERT INTO #Test VALUES (5, 'THREE', 3, 1)
and ordering it by, list and priority:
SELECT * FROM #Test ORDER BY list, priority
1 | One | 1 | 1
2 | Two | 2 | 1
5 | THREE | 3 | 1
3 | Three | 3 | 2
4 | Four | 4 | 1
However I want to step through rows one by one selecting the top one for each list ordered by priority, and start over when I get to the end.
For example with this simpler table:
1 | One | 1 | 1
2 | Two | 2 | 1
3 | Three | 3 | 1
4 | Four | 4 | 1
and this query:
SELECT TOP 1 * FROM #Test ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
If #PreviousList is the list for the previous row I got, then this will select the next row and gracefully jump to the top when I have selected the last row.
But there are rows that will have the same list only ordered by priority - like my first example:
1 | One | 1 | 1
2 | Two | 2 | 1
5 | THREE | 3 | 1
3 | Three | 3 | 2
4 | Four | 4 | 1
Here id=3 should be skipped because id=5 have the same list ordering and a better priority. The only way I can think of doing this is simply by first order the entire list by list and priority, and then run the order by that goes through the rows one by one, like this:
SELECT TOP 1 * FROM (
SELECT * FROM #Test ORDER BY list, priority
) ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
But of course I cannot order by an already ordered subquery and get the error:
The ORDER BY clause is invalid in views, inline functions, derived tables,
subqueries, and common table expressions, unless TOP or FOR XML is also
specified.
Are there any ways and can get past this problem or get the query down to a single query and order by?
Another possible solution is to use a subquery to select the min priority grouped by list and join it back to the table for the rest of the details
SELECT T2.*
FROM (SELECT MIN(priority) as priority, list
FROM #Test
GROUP BY list) AS T1
INNER JOIN #Test T2 ON T1.list = T2.list AND T1.priority = T2.priority
ORDER BY T1.list, T1.priority
I want to step through rows one by one selecting the top one for each
list ordered by priority, and start over when I get to the end.
You can use the built in ROW_NUMBER function that is designed for these scenarios with OVER(PARTITION BY name ORDER BY priority) to do this directly:
WITH CTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY name ORDER BY priority) AS RN
FROM #Test
)
SELECT *
FROM CTE
WHERE RN = 1;
Live DEMO
The ranking number rn generated by ROW_NUMBER() OVER(PARTITION BY name ORDER BY priority) will rank each group of rows that has the same name ordered by priority then when you filtered by WHERE rn = 1 it will remove all the duplicate with the same name and left only the first priority.
SELECT TOP 1 * FROM (
SELECT * FROM #Test
) ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
Try this, because Order By is not allowed in CTE.
Perhaps I am missing the requirement that makes this harder than I realize, but what about a nice simple join to select highest priority for the list. To scale, performance would require an index on list.
select t.*
, ttop.id as firstid
from #test t
JOIN #test ttop on ttop.id = (SELECT TOP 1 ID
FROM #TEST tbest
WHERE t.list = tbest.list order by priority)
and ttop.id = t.id -- this does the trick!

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.
You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;
If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group
Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1