SQL - Select first group in group by - sql

I have this table in DB2:
+----+-----+----------+
| id | name| key |
+----+-----+----------+
| 1 | foo |111000 |
| 2 | bar |111000 |
| 3 | foo |000111 |
+----+-----+----------+
When I group by name by I can extract the table grouped by the name, but how can I automatically only extract the first group, to get this result:
+----+-----+----------+
| id | name| key |
+----+-----+----------+
| 1 | foo |111000 |
| 3 | foo |000111 |
+----+-----+----------+
How can I solve this?

The MIN function will identify which row is the first one by id, then you can use that to filter the result to show only that row.
SELECT id,name,key
FROM Table1
WHERE id IN (SELECT MIN(ID) FROM Table1 GROUP BY name,key)

You could use a inner join on subselect aggregated by min id
select * from mytable
inner join (
select min(id) my_id
from mytable
group by name, key
) t on t.my_id = mytable.id

It looks like you want to get all names that have the same as the min(id). If this us correct then this should work:
Otherwise, please explain what you mean by "first group" and how that is defined.
select * from table
inner join (
select name, min(id)
from table
group by name
) t on t.name = table.name

In theory, given the way the question is asked you could also just do a simple select on the name you want.
SELECT id,name,key
From Table1
Where name = 'foo'
It really depends what you mean by 'first group'. If you grouped by name and ordered ascending by name then 'bar' would actually be the 'first group', not 'foo'. Maybe if you clarify that we can give you better answers?

Related

SQL select rows, where column value is unique (only appears once)

Given the table
| id | Name |
| 01 | Bob |
| 02 | Chad |
| 03 | Bob |
| 04 | Tim |
| 05 | Bob |
I want to select the name and ID, from rows where the name is unique (only appears once)
This is essentially the same as How to select unique values of a column from table?, but notice that the author doesn't need the id, so that problem can be solved by a GROUP BY name HAVING COUNT(name) = 1
However, I need to extract the entire row (could be tens or hundreds of columns) including the id, where COUNT(name) = 1, but I cannot GROUP BY id, name as every combination of those are unique.
EDIT:
Am using Google BigQuery.
Expected results:
| id | Name |
| 02 | Chad |
| 04 | Tim |
Simply do a GROUP BY. Use HAVING to make sure a name is only there once. Use MIN() to pick the only id for the name.
select min(id), name
from tablename
group by name
having count(*) = 1
Reading the table only once will increase performance! (And don't forget to create an index on (name, id).)
Use correlated subquery
DEMO
select * from tablename a
where not exists (select 1 from tablename b where a.name=b.name having count(*)>1)
OUTPUT:
id name
2 Chad
4 Tim
You can use NOT EXISTS :
SELECT t.*
FROM table t
WHERE NOT EXISTS (SELECT 1 FROM table t1 WHERE t1.name = t.Name AND t1.id <> t.id);
This would need index on table(id, name) to produce faster result set.
How about a simple aggregation?
select any_value(id), name
from t
group by name
having count(*) = 1;
BigQuery works quite well with aggregations so this might be quite efficient as well.
use exists and check uqique name
select id,name
from table t1
where exists ( select 1 from table t2 where t1.name=t2.name
having count(*)=1
)
Please try this.
SELECT
DISTINCT id,NAME
FROM
tableName
You can use multiple subqueries to extract what you need.
SELECT * FROM tableName
WHERE name IN (SELECT name FROM (SELECT name, COUNT(name) FROM tableName
GROUP BY name
HAVING COUNT(name) = 1) AS subQuery)
Below is for BigQuery Standard SQL and works for any number of columns w/o explicitly calling them out and does not require any join'ing or sub-selects
#standardSQL
SELECT t.*
FROM (
SELECT ANY_VALUE(t) t
FROM `project.dataset.table` t
GROUP BY name
HAVING COUNT(1) = 1
)

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo
The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)
Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

How to select all attributes (*) with distinct values in a particular column(s)?

Here is link to the w3school database for learners:
W3School Database
If we execute the following query:
SELECT DISTINCT city FROM Customers
it returns us a list of different City attributes from the table.
What to do if we want to get all the rows like that we get from SELECT * FROM Customers query, with unique value for City attribute in each row.
DISTINCT when used with multiple columns, is applied for all the columns together. So, the set of values of all columns is considered and not just one column.
If you want to have distinct values, then concatenate all the columns, which will make it distinct.
Or, you could group the rows using GROUP BY.
You need to select all values from customers table, where city is unique. So, logically, I came with such query:
SELECT * FROM `customers` WHERE `city` in (SELECT DISTINCT `city` FROM `customers`)
I think you want something like this:
(change PK field to your Customers Table primary key or index like Id)
In SQL Server (and standard SQL)
SELECT
*
FROM (
SELECT
*, ROW_NUMBER() OVER (PARTITION BY City ORDER BY PK) rn
FROM
Customers ) Dt
WHERE
(rn = 1)
In MySQL
SELECT
*
FORM (
SELECT
a.City, a.PK, count(*) as rn
FROM
Customers a
JOIN
Customers b ON a.City = b.City AND a.PK >= b.PK
GROUP BY a.City, a.PK ) As DT
WHERE (rn = 1)
This query -I hope - will return your Cities distinctly and also shows other columns.
You can use GROUP BY clause for getting distinct values in a particular column. Consider the following table - 'contact':
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 2 | PQR | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+
To select all columns with distinct values in City attribute, use the following query:
SELECT *
FROM contact
GROUP BY city;
This will give you the output as follows:
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+

Select the most common item for each category

Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work

Choose rows based on two connected column values in one statement - ORACLE

First, I'm not sure if the title represent the best of the issue. Any better suggestion is welcomed. My problem is I have the following table:
+----+----------+-------+-----------------+
| ID | SUPPLIER | BUYER | VALIDATION_CODE |
+----+----------+-------+-----------------+
| 1 | A | Z | 937886521 |
| 2 | A | X | 937886521 |
| 3 | B | Z | 145410916 |
| 4 | C | V | 775709785 |
+----+----------+-------+-----------------+
I need to show SUPPLIERS A and B which have BUYER Z, X. However, I want this condition to be one-to-one relationship rather than one-to-many. That is, for the supplier A, I want to show the column with ID: 1, 2. For the supplier B, I want to show the column 3 only. The following script will show the supplier A with all possible buyers (which I do not want):
SELECT *
FROM validation
WHERE supplier IN ( 'A', 'B' )
AND buyer IN ( 'X', 'Z');
This will show the following pairs: (A,Z), (A,X), (B, Z). I need to show only the following: (A,X)(B,Z) in one statement.
The desired result should be like this:
+----+----------+-------+-----------------+
| ID | SUPPLIER | BUYER | VALIDATION_CODE |
+----+----------+-------+-----------------+
| 2 | A | X | 937886521 |
| 3 | B | Z | 145410916 |
+----+----------+-------+-----------------+
You can update the WHERE clause to filter on the desired pairs:
select *
from sample
where (upper(supplier),upper(buyer))
in (('A','X'),('A','Y'),('A','Z'),('B','X'),('B','Y'),('B','Z'));
I used the UPPER function based on your mixed case examples.
See if this what you need:
SELECT MAX(id),
supplier,
MAX(buyer),
MAX(validation_code)
FROM
(SELECT *
FROM Validation
WHERE supplier IN ( 'A', 'B' ) AND buyer IN ( 'X', 'Z')
) filtered
GROUP BY supplier;
SQL Fiddle
I used GROUP BY supplier to flatten the table and included maximum values of ID, Buyer, and Validation_Code.
Alternatively, you could try this:
SELECT id
, supplier
, buyer
, validation_code
FROM (SELECT id
,max(id) OVER(PARTITION BY supplier) AS maxid
,supplier
,buyer
,validation_code
FROM sample) AS x
WHERE x.id=x.maxid
You may have a look to the results of the inner SQL statement to see what it does.
try this query:
select ID,SUPPLIER,BUYER,VALIDATION_CODE from
(select
t2.*,t1.counter
from
validation t2,
(select supplier,count(supplier) as counter from hatest group by supplier)t1
where
t1.supplier = t2.supplier)t3
where t3.supplier in('A','B') and
id = case when t3.counter > 1 then
(select max(id) from validation t4 where t4.supplier = t3.supplier) else t3.id end;