SQL remove certain duplicate values - sql

I have a result table like this (after a query has been run):
id | time | region
12x-4nm-334 | 16:00 | Utah
12x-4nm-334 | 17:00 | California
12x-4nm-334 | 19:00 | Missouri
12x-4nm-334 | 22:00 | California
983-n2n-aq2 | 8:00 | New York
983-n2n-aq2 | 9:00 | New York
There are a few other columns in this table, but the important thing is that I want to remove the ids that are only registered to one region from the result. So ids like "983-n2n-aq2" which only show up in a single region (regardless of time) should not be in the resulting table.
Hope this question is clear enough.

If you use MySql
DELETE FROM table
WHERE id IN ( SELECT x.id
FROM ( select *
FROM table t
GROUP BY id
HAVING COUNT(DISTINCT region) = 1
) as x
)
I don't know for Vertica. Hope it help

Try this:
SELECT id, count(DISTINCT region) as RegionCount
FROM table
GROUP BY
id
HAVING count(DISTINCT region) > 1
If your DBMS doesn't support count(distinct ) then this should do instead:
SELECT id, count(DISTINCT region) as RegionCount
FROM (
SELECT id, region FROM table GROUP BY id, region
) as table
GROUP BY
id
HAVING count(DISTINCT region) > 1

Try this (borrowing from Mr Geerkens):
SELECT a.*
FROM table a INNER JOIN
(
SELECT id
FROM table
GROUP BY id
HAVING count(DISTINCT region) > 1
) b ON a.id = b.id

First get the ids that have more than one distinct region (as shown in the subquery), and then use that in a WHERE clause to filter.
SELECT id, time, region
FROM mytable
WHERE id IN
(
SELECT id
FROM mytable
GROUP BY id
HAVING count(DISTINCT region) > 1
)

Related

SELECT only rows when count=1 - without additional SELECT or/ and having

I wonder if there is a way to build a query without joins or/and having clause that would return the same result as the query below? I already found similar question (select and count rows) but didn't find the answer.
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
JOIN (SELECT ID, COUNT(CATEGORY)
FROM SALES
GROUP by ID
HAVING count(CATEGORY)=1) S2 ON S.ID=S2.ID;
So the table looks like
ID | Country | Product | DESC
1 | USA | Cream | Super cream
1 | Canada | Toothpaste| Great Toothpaste
2 | Germany | Beer | Tasty Beer
and the result I would like to get is
ID | Country | Product | DESC
2 | Germany | Beer | Tasty Beer
because id=1 has 2 different countries assigned
I'm using SQL Server
In general I'm interested in the 'fastest' solution. The table is huge and I just wonder if there is a way to do it smarter.
you may want to consider this query.
select t2.id, t2.category, t2.product, t2.desc from (
select id, category, product,
case when (select count(1) from sales where id=t1.id group by id) as ct
,desc
from sales t1) t2 where t2.ct = 1
You can try this Query:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM SALES s
WHERE 1 = (
SELECT COUNT(*)
FROM SALES x
WHERE x.ID = s.ID
);
One method uses window functions:
SELECT ID, CATEGORY, PRODUCT, DESC
FROM (SELECT s.*, COUNT(*) OVER (PARTITION BY ID) as cnt
FROM SALES s
) s
WHERE cnt = 1;
However, the fastest solution would require a unique id and an index. That would be:
select s.*
from sales s
where not exists (select 1
from sales s2
where s2.id = s.id and
s2.<unique key> <> s.<unique key>
);
This can take advantage of an index on (id, <unique key>).
Note: This particular formulation assumes that category is never null.

SQL select rows, where column value is unique (only appears once)

Given the table
| id | Name |
| 01 | Bob |
| 02 | Chad |
| 03 | Bob |
| 04 | Tim |
| 05 | Bob |
I want to select the name and ID, from rows where the name is unique (only appears once)
This is essentially the same as How to select unique values of a column from table?, but notice that the author doesn't need the id, so that problem can be solved by a GROUP BY name HAVING COUNT(name) = 1
However, I need to extract the entire row (could be tens or hundreds of columns) including the id, where COUNT(name) = 1, but I cannot GROUP BY id, name as every combination of those are unique.
EDIT:
Am using Google BigQuery.
Expected results:
| id | Name |
| 02 | Chad |
| 04 | Tim |
Simply do a GROUP BY. Use HAVING to make sure a name is only there once. Use MIN() to pick the only id for the name.
select min(id), name
from tablename
group by name
having count(*) = 1
Reading the table only once will increase performance! (And don't forget to create an index on (name, id).)
Use correlated subquery
DEMO
select * from tablename a
where not exists (select 1 from tablename b where a.name=b.name having count(*)>1)
OUTPUT:
id name
2 Chad
4 Tim
You can use NOT EXISTS :
SELECT t.*
FROM table t
WHERE NOT EXISTS (SELECT 1 FROM table t1 WHERE t1.name = t.Name AND t1.id <> t.id);
This would need index on table(id, name) to produce faster result set.
How about a simple aggregation?
select any_value(id), name
from t
group by name
having count(*) = 1;
BigQuery works quite well with aggregations so this might be quite efficient as well.
use exists and check uqique name
select id,name
from table t1
where exists ( select 1 from table t2 where t1.name=t2.name
having count(*)=1
)
Please try this.
SELECT
DISTINCT id,NAME
FROM
tableName
You can use multiple subqueries to extract what you need.
SELECT * FROM tableName
WHERE name IN (SELECT name FROM (SELECT name, COUNT(name) FROM tableName
GROUP BY name
HAVING COUNT(name) = 1) AS subQuery)
Below is for BigQuery Standard SQL and works for any number of columns w/o explicitly calling them out and does not require any join'ing or sub-selects
#standardSQL
SELECT t.*
FROM (
SELECT ANY_VALUE(t) t
FROM `project.dataset.table` t
GROUP BY name
HAVING COUNT(1) = 1
)

Selecting compared pairs from table

I don't really know how to describe it. I have a table:
ID | Name | Date
-------------------------
1 | Mike | 01.01.2016
1 | Michael | 02.03.2016
2 | Samuel | 23.12.2015
2 | Sam | 05.03.2015
3 | Tony | 02.04.2012
I want to select pairs of IDs and Names with latest dates in each pair. The result here should be:
ID | Name | Date
-------------------------
1 | Michael | 02.03.2016
2 | Samuel | 23.12.2015
3 | Tony | 02.04.2012
How do I achieve this?
Oracle Database 11g
You can do it using the ROW_NUMBER() analytic function:
SELECT id, name, "date"
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY "date" DESC ) rn
FROM table_name t
)
WHERE rn = 1
This requires only a single table scan (it does not have a self-join or correlated sub-query - i.e. IN (...) or EXISTS(...)).
Have a sub-select that returns each id and it's max date:
select * from table
where (id, date) in (select id, max(date) from table group by id)
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.date > t.date)
Possibly the most efficient method is:
select t.*
from table t
where t.date = (select max(date) from table t2 where t2.id = t.id);
along with an index on table(id, date).
This version should scan the table and look up the correct value in the index.
Or, if there are only three columns, you can use keep:
select id, max(date) as date,
max(name) keep (dense_rank first order by date desc) as name
from table
group by id;
I have found that this version works very well in Oracle.

Postgres: select all row with count of a field greater than 1

i have table storing product price information, the table looks similar to, (no is the primary key)
no name price date
1 paper 1.99 3-23
2 paper 2.99 5-25
3 paper 1.99 5-29
4 orange 4.56 4-23
5 apple 3.43 3-11
right now I want to select all the rows where the "name" field appeared more than once in the table. Basically, i want my query to return the first three rows.
I tried:
SELECT * FROM product_price_info GROUP BY name HAVING COUNT(*) > 1
but i get an error saying:
column "product_price_info.no" must appear in the GROUP BY clause or be used in an aggregate function
SELECT *
FROM product_price_info
WHERE name IN (SELECT name
FROM product_price_info
GROUP BY name HAVING COUNT(*) > 1)
Try this:
SELECT no, name, price, "date"
FROM (
SELECT no, name, price, "date",
COUNT(*) OVER (PARTITION BY name) AS cnt
FROM product_price_info ) AS t
WHERE t.cnt > 1
You can use the window version of COUNT to get the population of each name partition. Then, in an outer query, filter out name partitions having a population that is less than 2.
Window Functions are really nice for this.
SELECT p.*, count(*) OVER (PARTITION BY name) FROM product p;
For a full example:
CREATE TABLE product (no SERIAL, name text, price NUMERIC(8,2), date DATE);
INSERT INTO product(name, price, date) values
('paper', 1.99, '2017-03-23'),
('paper', 2.99, '2017-05-25'),
('paper', 1.99, '2017-05-29'),
('orange', 4.56, '2017-04-23'),
('apple', 3.43, '2017-03-11')
;
WITH report AS (
SELECT p.*, count(*) OVER (PARTITION BY name) as count FROM product p
)
SELECT * FROM report WHERE count > 1;
Gives:
no | name | price | date | count
----+--------+-------+------------+-------
1 | paper | 1.99 | 2017-03-23 | 3
2 | paper | 2.99 | 2017-05-25 | 3
3 | paper | 1.99 | 2017-05-29 | 3
(3 rows)
Self join version, use a sub-query that returns the name's that appears more than once.
select t1.*
from tablename t1
join (select name from tablename group by name having count(*) > 1) t2
on t1.name = t2.name
Basically the same as IN/EXISTS versions, but probably a bit faster.
SELECT name, count(name)
FROM product_price_info
GROUP BY name
HAVING COUNT(name) > 1
LIMIT 3

SQL group by with a count

I have a table (simplified below)
|company|name |age|
| 1 | a | 3 |
| 1 | a | 3 |
| 1 | a | 2 |
| 2 | b | 8 |
| 3 | c | 1 |
| 3 | c | 1 |
For various reason the age column should be the same for each company. I have another process that is updating this table and sometimes it put an incorrect age in. For company 1 the age should always be 3
I want to find out which companies have a mismatch of age.
Ive done this
select company, name age from table group by company, name, age
but dont know how to get the rows where the age is different. this table is a lot wider and has loads of columns so I cannot really eyeball it.
Can anyone help?
Thanks
You should not be including age in the group by clause.
SELECT company
FROM tableName
GROUP BY company, name
HAVING COUNT(DISTINCT age) <> 1
SQLFiddle Demo
If you want to find the row(s) with a different age than the max-count age of each company/name group:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
select * from cte
where age <> maxAge
Demontration
If you want to update the incorrect with the correct ages you just need to replace the SELECT with UPDATE:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
UPDATE cte SET AGE = maxAge
WHERE age <> maxAge
Demonstration
Since you mentioned "how to get the rows where the age is different" and not just the comapnies:
Add a unique row id (a primary key) if there isn't already one. Let's call it id.
Then, do
select id from table
where company in
(select company from table
group by company
having count(distinct age)>1)