SQL Count number of occurrences for each distinct combination - sql

I need to get the count for each distinct combination of two columns. Here is the query to get the distinct combinations.
SELECT distinct Sales_Cat, Sub_cat
FROM rtx_Sales
It returns all combinations as seen below. I need to add a count field that also shows the number of occurrences for each of the combinations. Is SQL capable of this or should I write a python script to query for each combo?

Use COUNT()
SELECT sales_cat, sub_cat, COUNT(*)
FROM rtx_sales
GROUP BY sales_cat, sub_cat

Related

Access query finding the duplicates without using DISTINCT

i have this query
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo;
and the PersonalInfo.k-commission is a multi value field. the CommissionAbsent shows duplicate values for each k-commission value. when i use DISTINCT i get an error saying that the keyword cannot be used with a multi value field.
now i want to remove the duplicates and show only one result for each. i tried using a WHERE but i dont know how.
edit: i have a lot more columnes and in the example i only showed the few i need.
You can use GROUP BY and COUNT to solve your problem, here is an example for it
SELECT clmn1, clmn2, COUNT(*) as count
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1;
the query groups the rows in the table by the clmn1 and clmn2 columns, and counts the number of occurrences of each group. The HAVING clause is then used to filter the groups and only return the groups that have a count greater than 1, which indicates duplicates.
If you want to select all, then you can do like this
SELECT *
FROM table
WHERE (clmn1, clmn2) IN (SELECT clmn1, clmn2
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1)
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo
GROUP BY PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value]))
HAVING COUNT(*) > 1

What's the difference between select distinct count, and select count distinct?

I am aware of select count(distinct a), but I recently came across select distinct count(a).
I'm not very sure if that is even valid.
If it is a valid use, could you give me a sample code with a sample data, that would explain me the difference.
Hive doesn't allow the latter.
Any leads would be appreciated!
Query select count(distinct a) will give you number of unique values in a.
While query select distinct count(a) will give you list of unique counts of values in a. Without grouping it will be just one line with total count.
See following example
create table t(a int)
insert into t values (1),(2),(3),(3)
select count (distinct a) from t
select distinct count (a) from t
group by a
It will give you 3 for first query and values 1 and 2 for second query.
I cannot think of any useful situation where you would want to use:
select distinct count(a)
If the query has no group by, then the distinct is anomalous. The query only returns on row anyway. If there is a group by, then the aggregation columns should be in the select, to identify each row.
I mean, technically, with a group by, it would be answering the question: "how many different non-null values of a are in groups". Usually, it is much more useful to know the value per group.
If you want to count the number of distinct values of a, then use count(distinct a).

Need SQL with subquery to get distinct values for VBA code

I have a table BAR_DATA with two fields: LongDate, Time. Both are long integers. No Access Date/Time involved here.
For each distinct LongDate value there are hundreds of records, each with Time value which may be distinct or duplicate within that LongDate.
I need to create an SQL statement that will group by LongDate and give me a count of distinct Times within each LongDate.
The following SQL statement, (built by an Acess query) does NOT work (some LongDates are omitted):
Query A
SELECT DISTINCT BAR_DATA.LongDate, Count(BAR_DATA.Time) AS CountOfTime
FROM BAR_DATA
GROUP BY BAR_DATA.LongDate
HAVING (((Count(BAR_DATA.Time))<>390 And (Count(BAR_DATA.Time))<>210));
However, if I use Query B to reference Query DistinctDateTime, it does work:
Query B
SELECT DistinctDateTime.LongDate, Count(DistinctDateTime.Time) AS CountOfTime
FROM DistinctDateTime
GROUP BY DistinctDateTime.LongDate
HAVING (((Count(DistinctDateTime.Time))<>390 And (Count(DistinctDateTime.Time))<>210));
Query DistinctDateTime
SELECT DISTINCT BAR_DATA.LongDate, BAR_DATA.Time
FROM BAR_DATA;
My problem:
I need to get Query B and Query DistinctDateTime wrapped into a single SQL statement so I can paste it into a VBA function. I presume there
is some subquery techniques, but I have failed at every attempt, and find no pertinent example.
Any help will be greatly appreciated. Thanks!
Subquery your distinct table inside and perform your aggregates outside until you get the desired result:
SELECT DistinctDateTime.LongDate, Count(DistinctDateTime.Time) AS CountOfTime
FROM
(
SELECT DISTINCT BAR_DATA.LongDate, BAR_DATA.Time
FROM BAR_DATA
) AS DistinctDateTime
GROUP BY DistinctDateTime.LongDate
HAVING (((Count(DistinctDateTime.Time))<>390 And (Count(DistinctDateTime.Time))<>210));

Selecting counts of a substring

I know this has to be an easy select but I am having no luck figuring it out. I've got a table has a field of customer grouping codes and I'm trying to get a count of each distinct character 2 through 6 sets. In my past foxpro experience a simple
select distinct substr(custcode,2,5), count (*) from a group by 1
would work, but this doesn't appear to work in sql server queries. The error message indicated it didn't like using the number reference in the group by so I changed it to custcode but the count just returns 1 for each, as I assume the count is after the distinct occurs so there is only one. If I change the count to count(distinct substring(custcode,2,5)) and remove the first distinct substring I just get a count of how many different codes exist. Can someone point out what I'm doing wrong here? Thanks.
The DISTINCT and GROUP BY are redundant, you just want GROUP BY, and you want to GROUP BY the same thing you are selecting:
select substr(custcode,2,5), count (*)
from a
group by substr(custcode,2,5)
In SQL Server you can use column aliases/numbers in the ORDER BY clause, but not in GROUP BY.
Ie. ORDER BY 1 will order by the first selected column, but many consider it bad practice to use column indexes, using aliases/column names is clearer.

Count of 2 columns by GROUP BY and catx giving different outputs

I have to find distinct count of combination of 2 variables. I used the following 2 queries to find the count:
select count(*) from
( select V1, V2
from table1
group by 1,2
) a
select count(distinct catx('-', V1, V2))
from table1
Logically, both the above queries should give the same count but I am getting different counts. Note that
both V1 and V2 are integers
Both variables can have null values, though there are no null values in my table
There are no negative values
Any idea why I might be getting different outputs? And which is the best way to find the count of distinct combinations of 2 or more columns?
Thanks.
The SAS log gives the answer when you run the first sql code. Using 'group by' requires a summary function, otherwise it is ignored. The count will therefore return the overall number of rows instead of a distinct count of the 2 variables combined.
Just add count(*) to the subquery and you will get the same answer with both methods.
select count(*) from
( select V1, V2, count(*)
from table1
group by 1,2
) a
Use distinct in the subquery for the first query..
When you do a group by but don't include any aggregate function, it discards the group by.
so you will still have duplicate combinations of v1 and v2.
It seems that GROUP BY doesn't work that way in SAS. You can't use it to remove duplicates unless you have an aggregate function in your query. I found this in the log of my query output -
NOTE: A GROUP BY clause has been discarded because neither the SELECT
clause nor the optional HAVING clause of the associated
table-expression referenced a summary function.
This answers the question.
you can ignore the group by part also and just add a distinct in the sub-query. Also the second query you wrote is more efficient