Postgres: select all row with count of a field greater than 1 - sql

i have table storing product price information, the table looks similar to, (no is the primary key)
no name price date
1 paper 1.99 3-23
2 paper 2.99 5-25
3 paper 1.99 5-29
4 orange 4.56 4-23
5 apple 3.43 3-11
right now I want to select all the rows where the "name" field appeared more than once in the table. Basically, i want my query to return the first three rows.
I tried:
SELECT * FROM product_price_info GROUP BY name HAVING COUNT(*) > 1
but i get an error saying:
column "product_price_info.no" must appear in the GROUP BY clause or be used in an aggregate function

SELECT *
FROM product_price_info
WHERE name IN (SELECT name
FROM product_price_info
GROUP BY name HAVING COUNT(*) > 1)

Try this:
SELECT no, name, price, "date"
FROM (
SELECT no, name, price, "date",
COUNT(*) OVER (PARTITION BY name) AS cnt
FROM product_price_info ) AS t
WHERE t.cnt > 1
You can use the window version of COUNT to get the population of each name partition. Then, in an outer query, filter out name partitions having a population that is less than 2.

Window Functions are really nice for this.
SELECT p.*, count(*) OVER (PARTITION BY name) FROM product p;
For a full example:
CREATE TABLE product (no SERIAL, name text, price NUMERIC(8,2), date DATE);
INSERT INTO product(name, price, date) values
('paper', 1.99, '2017-03-23'),
('paper', 2.99, '2017-05-25'),
('paper', 1.99, '2017-05-29'),
('orange', 4.56, '2017-04-23'),
('apple', 3.43, '2017-03-11')
;
WITH report AS (
SELECT p.*, count(*) OVER (PARTITION BY name) as count FROM product p
)
SELECT * FROM report WHERE count > 1;
Gives:
no | name | price | date | count
----+--------+-------+------------+-------
1 | paper | 1.99 | 2017-03-23 | 3
2 | paper | 2.99 | 2017-05-25 | 3
3 | paper | 1.99 | 2017-05-29 | 3
(3 rows)

Self join version, use a sub-query that returns the name's that appears more than once.
select t1.*
from tablename t1
join (select name from tablename group by name having count(*) > 1) t2
on t1.name = t2.name
Basically the same as IN/EXISTS versions, but probably a bit faster.

SELECT name, count(name)
FROM product_price_info
GROUP BY name
HAVING COUNT(name) > 1
LIMIT 3

Related

Select first rows where condition [duplicate]

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?
To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.
I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma
This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1
If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4
SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.
This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.
The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM
select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3

SELECT only rows when count=1 - without additional SELECT or/ and having

I wonder if there is a way to build a query without joins or/and having clause that would return the same result as the query below? I already found similar question (select and count rows) but didn't find the answer.
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
JOIN (SELECT ID, COUNT(CATEGORY)
FROM SALES
GROUP by ID
HAVING count(CATEGORY)=1) S2 ON S.ID=S2.ID;
So the table looks like
ID | Country | Product | DESC
1 | USA | Cream | Super cream
1 | Canada | Toothpaste| Great Toothpaste
2 | Germany | Beer | Tasty Beer
and the result I would like to get is
ID | Country | Product | DESC
2 | Germany | Beer | Tasty Beer
because id=1 has 2 different countries assigned
I'm using SQL Server
In general I'm interested in the 'fastest' solution. The table is huge and I just wonder if there is a way to do it smarter.
you may want to consider this query.
select t2.id, t2.category, t2.product, t2.desc from (
select id, category, product,
case when (select count(1) from sales where id=t1.id group by id) as ct
,desc
from sales t1) t2 where t2.ct = 1
You can try this Query:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
WHERE 1 = (
SELECT COUNT(*)
FROM SALES x
WHERE x.ID = s.ID
);
One method uses window functions:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM (SELECT s.*, COUNT(*) OVER (PARTITION BY ID) as cnt
FROM SALES s
) s
WHERE cnt = 1;
However, the fastest solution would require a unique id and an index. That would be:
select s.*
from sales s
where not exists (select 1
from sales s2
where s2.id = s.id and
s2.<unique key> <> s.<unique key>
);
This can take advantage of an index on (id, <unique key>).
Note: This particular formulation assumes that category is never null.

SQL fetch rows with column value equal to max value

I got the following table with the query select count(category), name from product natural join supplier group by name;:
count | nome
-------+-----------
1 | CandyCorp
1 | Nike
1 | DrinksInc
7 | Mutante
1 | Colt
7 | Mazzetti
Now I want to fetch only the rows with count equal to the max value on the count column (in this case 7), getting:
count | nome
-------+-----------
7 | Mutant
7 | Mazzetti
EDIT: I got it working by creating a temporary table:
create table auxtable as (select count(categoria),name from product natural join supplier group by name);
select name from auxtable a1 join (select max(count) as countMax from auxtable) a2 on a1.count=a2.countMax;
drop table auxtable;
Is there a way to this in a single query?
you can use CTE instead of temp table:
with auxtable as (select count(categoria),name from product natural join supplier group by name)
select name from auxtable a1
join (select max(count) as countMax from auxtable) a2 on a1.count=a2.countMax;
You can use rank() over (order by count(*) desc) to rank by count and then just keep the #1 ranked items:
select * from (
select
rank() over (order by count(category) desc) rn,
name, count(category)
from product natural join supplier
group by name
) t where rn = 1
http://sqlfiddle.com/#!15/26b68/1

SQL group by with a count

I have a table (simplified below)
|company|name |age|
| 1 | a | 3 |
| 1 | a | 3 |
| 1 | a | 2 |
| 2 | b | 8 |
| 3 | c | 1 |
| 3 | c | 1 |
For various reason the age column should be the same for each company. I have another process that is updating this table and sometimes it put an incorrect age in. For company 1 the age should always be 3
I want to find out which companies have a mismatch of age.
Ive done this
select company, name age from table group by company, name, age
but dont know how to get the rows where the age is different. this table is a lot wider and has loads of columns so I cannot really eyeball it.
Can anyone help?
Thanks
You should not be including age in the group by clause.
SELECT company
FROM tableName
GROUP BY company, name
HAVING COUNT(DISTINCT age) <> 1
SQLFiddle Demo
If you want to find the row(s) with a different age than the max-count age of each company/name group:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
select * from cte
where age <> maxAge
Demontration
If you want to update the incorrect with the correct ages you just need to replace the SELECT with UPDATE:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
UPDATE cte SET AGE = maxAge
WHERE age <> maxAge
Demonstration
Since you mentioned "how to get the rows where the age is different" and not just the comapnies:
Add a unique row id (a primary key) if there isn't already one. Let's call it id.
Then, do
select id from table
where company in
(select company from table
group by company
having count(distinct age)>1)

Group by minimum value in one field while selecting distinct rows

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?
To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.
I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma
This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1
If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4
SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.
This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.
The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM
select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3