Get a distinct row count from another column - sql

SQL Table is as follows:
Category | Subcategory |
A 1
A 1
A 2
B 1
B 2
I need the number of each subcategory for each category, not including duplicate subcategories within the category.
You'll notice there are 3 total "1" subcategories, but only a count of 2 as the duplicate is redundant and not included.
Example output:
subcategory | count
1 2
2 2
How can I achieve this? I am familiar with COUNT but I can only get the raw number of rows.
Using Snowflake.
Thanks!

You can use GROUP BY, as in:
select Category, count(distinct Subcategory)
from t
group by Category

Related

Filter rows to return the exact relationship

I have two tables, expenses and categories, they have a many-to-many relationship through the table expenses_categories. I'm trying to implement a filter by categories, lets say that I provided the id for the categories A and B, I want to return the expenses who only have A and B. For example:
Expense X have Category A, B, and C
Expense Y have Category A and B
Expense Z have Category B
I want to return only the Expense Y
I'm using PostgreSQL by the way. I really need to learn how to do this kind of stuff.
Categories
ID
NAME
1
TV
2
CC
3
NET
ExpensesCategories
expense_id
category_id
1
1
1
2
2
1
2
2
2
3
3
1
4
2
I want to get all the Expenses that have ONLY the Categories 1 and 2.
In that case, I expect to only get the Expense 1
expense_id
category_id
1
1
1
2
You can group by expense_id and use STRING_AGG() in the HAVING clause to collect all the category_ids of each expense_id and compare it to a string like '1,2' which contains the category_ids that you want in ascending order as a comma separated list:
SELECT expense_id
FROM ExpensesCategories
GROUP BY expense_id
HAVING STRING_AGG(category_id::text, ',' ORDER BY category_id) = '1,2';
If you want all the rows of these expense_ids in ExpensesCategories, use the above query as a CTE:
WITH cte AS (
SELECT expense_id
FROM ExpensesCategories
GROUP BY expense_id
HAVING STRING_AGG(category_id::text, ',' ORDER BY category_id) = '1,2'
)
SELECT *
FROM ExpensesCategories
WHERE expense_id IN (SELECT expense_id FROM cte);
See the demo.

What is the proper way to complete cross-tab on the following segment in SQL?

I create frequencies on one column in SQL in a standard way.
My code is
select id , count(*) as counts
from TABLE
group by id
order by counts desc
Suppose the output is as follows for six id
id counts
-- -----
1 3 two id have 3 counts per
2 3
---------
3 6 three id have 6 counts per
4 6
5 6
---------
6 2 one id has 2 counts
How can I produce the following?
nid counts
--- ------
1 2
2 3
3 6
I am writing in a hive environment, but that should be standard SQL.
Thanks in advance for answering.
You want two levels of aggregation:
select counts, count(*)
from (select id , count(*) as counts
from TABLE
group by id
) c
group by counts
order by counts;
I call this a "histogram-of-histograms" query. I usually include min(id) and max(id) in the outer select, so I have examples of ids with given frequencies.

Get duplicate on single column after distinct across multiple columns in SQL

I have a table that looks like this:
name | id
-----------
A 1
A 1
B 2
C 1
D 3
D 3
F 2
I want to return id's 1 and 2 because they are duplicate on names. I don't want to return 3, because it is distinct for D 3.
Basically, I'm thinking of doing a query to first get a distinct pairing, so the above reduces to
name | id
-----------
A 1
B 2
C 1
D 3
F 2
And then doing a duplicate find on the id column. However, I'm struggling to find the correct syntax to construct that query.
You should be able to get the result you want by using a GROUP BY along with a HAVING clause that counts the distinct names. The HAVING clause will filter for those ids that have more than one distinct name:
select id
from Table1
group by id
having count(distinct name) > 1
Here is a demo

Postgres: get average for all values of a column for each distinct from another column

I have a table that looks like:
sku | qty
----|----
sku1| 1
sku1| 3
sku1| 1
sku1| 3
sku2| 1
And I'm trying to write a query that will return the average of qty for each distinct sku.
So for the data above, the output from the query would look like:
sku | qty
----|----
sku1| 2
sku2| 1
So, the averages for sku1 came from 1 3 1 3 and the average of sku2 is just 1
I know it's going to involve some kind of subquery, but I just can't seem to figure it out.
SELECT sku, AVG(qty)
FROM (SELECT DISTINCT sku FROM table)
How do I query for the average qty for each sku?
That's precisely what group by is for:
SELECT sku, AVG(qty)
FROM the_table
GROUP BY sku;
The manual has some examples: http://www.postgresql.org/docs/current/static/queries-table-expressions.html#QUERIES-GROUP

SQL: 2 Counts using joins?

I have these 2 tables:
Table: Unit
UnitID | Title
1 Unit 1
2 Unit 2
3 Unit 3
Table: Piece
PieceID | UnitID | Category
1 1 A
2 1 A
3 1 B
4 2 A
5 3 B
What I need to do is show a count of the total units containing Piece rows with Category A, as well as the total amount of Piece table rows with Category A (regardless of unitid). So using the data above, the result would be 2 units, 3 Piece rows.
I could do this with two statements, but I would like to do it one.
Any suggestions from craftier folks than I?
Filter out the pieces with the correct category, then count the units distinctly:
select count(distinct UnitId) as Units, count(*) as Pieces
from Piece
where Category = 'A'
Try:
select count(distinct UnitID) total_units, count(*) total_rows
from Piece
where Cateory = 'A';