Identify the counts of each distinct value in one column in sqlite3 - sql

I am trying to identify the count of each distinct value in one column (name) in a table called brgy.
---------------------
| ID | name |
---------------------
| 1 | Alfonso |
| 2 | Arakan |
| 3 | Poblacion |
| 4 | Ilaya |
| 5 | Poblacion |
----------------------
I tried using this code but it keeps giving the COUNT as 1 despite Poblacion appearing twice in the name column:
SELECT name,COUNT(name) AS distinct_name
FROM (
SELECT DISTINCT name
FROM brgy
GROUP BY name
)
GROUP BY name;
The intended output should eliminate duplicate names but sum up the number of times the distinct name appears in the name column:
Expected Output is as below,
-----------------------------
| name | distinct_name |
-----------------------------
| Alfonso | 1 |
| Arakan | 1 |
| Ilaya | 1 |
| Poblacion | 2 |
-----------------------------

A simple GROUP BY name:
SELECT name, COUNT(name) AS distinct_name
FROM brgy
GROUP BY name;
you don't need the subquery:
SELECT DISTINCT name FROM brgy GROUP BY name
because GROUP BY name takes care of it.

Just removed the subquery because it will give you unique entry :
select name, count(*) as distinct_name
from brgy
group by name
order by distinct_name, name;

Related

SQL group by a field and only return one joined row for each grouping

Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here

Oracle SQL: Counting how often an attribute occurs for a given entry and choosing the attribute with the maximum number of occurs

I have a table that has a number column and an attribute column like this:
1.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+------------
I want to make the number unique, and the attribute to be whichever attribute occured most often for that number, like this (This is the end-product im interrested in) :
2.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 2 | b |
+------------
I have been working on this for a while and managed to write myself a query that looks up how many times an attribute occurs for a given number like this:
3.
+-----+-----+-----+
| num | att |count|
------------------+
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | a | 1 |
| 2 | b | 2 |
+-----------------+
But I can't think of a way to only select those rows from the above table where the count is the highest (for each number of course).
So basically what I am asking is given table 3, how do I select only the rows with the highest count for each number (Of course an answer describing providing a way to get from table 1 to table 2 directly also works as an answer :) )
You can use aggregation and window functions:
select num, att
from (
select num, att, row_number() over(partition by num order by count(*) desc, att) rn
from mytable
group by num, att
) t
where rn = 1
For each num, this brings the most frequent att; if there are ties, the smaller att is retained.
Oracle has an aggregation function that does this, stats_mode().:
select num, stats_mode(att)
from t
group by num;
In statistics, the most common value is called the mode -- hence the name of the function.
Here is a db<>fiddle.
You can use group by and count as below
select id, col, count(col) as count
from
df_b_sql
group by id, col

Join Table From Minimum Value and Specific Name

I have:
Table id
+--------+
| number |
+--------+
| 1 |
| 2 |
| 3 |
+--------+
Table data
+-------+--------------+
| name | phone_number |
+-------+--------------+
| Bob | 111 |
| John | 333 |
| Alice | 555 |
+-------+--------------+
How to join table with results: (number from minimum value & name='John') ?
+--------+-------+--------------+
| number | name | phone_number |
+--------+-------+--------------+
| 1 | John | 333 |
+--------+-------+--------------+
You can try below -
select
(select min(number) FROM ID) as number, name, phone_number
from date
where name = 'John'
You can use cross join:
select min(number) as number, name, phone_number
from Table_Id
cross join Table_Data
group by name, phone_number
Depending on the RDBMS you're using, this query should get you close.
SELECT
MIN_NUMBER, NAME, PHONE_NUMBER
FROM
DATA LEFT JOIN (SELECT MIN(NUMBER) AS MIN_NUMBER FROM ID) ON 1=1
WHERE NAME = 'JOHN'

how to sum multiple rows with same id in SQL Server

Lets say I have following table:
id | name | no
--------------
1 | A | 10
1 | A | 20
1 | A | 40
2 | B | 20
2 | B | 20
And I want to perform a select query in SQL server which sums the value of "no" field which have same id.
Result should look like this,
id | name | no
--------------
1 | A | 70
2 | B | 40
Simple GROUP BY and SUM should work.
SELECT ID, NAME, SUM([NO])
FROM Your_TableName
GROUP BY ID, NAME;
Use SUM and GROUP BY
SELECT ID,NAME, SUM(NO) AS TOTAL_NO FROM TBL_NAME GROUP BY ID, NAME
SELECT *, SUM(no) AS no From TABLE_NAME GROUP BY name
This will return the same table by summing up the no column of the same name column.

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1