SQL How many of a counted row is in another counted row? - sql

I've been stuck on how to write a particular query for the following question:
How many employees are in how many businesses?
My end result should look like this:
EmployeeId Count
BusinessId Count
1
23473423
2
56245764
3
834456
So there are 23473423 businesses that have 1 employee, 23473423 businesses that have 2 employees, etc.
I have a table with a list of items including EmployeeId and BusinessId. A BusinessId can connect to many EmployeeIds.
So far I have the following code to get me employees per business
Select BusinessId,
Count(EmployeeId) as EIdCount
From Table
Group by BusinessId
Which gets me me a list of BusinessIds and how many EmployeeIds are attached to it.
BusinessId
EIdCount
23
2
24
5
25
1
26
3
But now I need to figure out how to further group it to where the BusinessId's can be grouped by the Grouped Counted Employee Ids. I've looked at subqueries, having by, and group but I am still at a loss how to progress this without running into an error.
Thank you for your help in advance!

Not sure if this is what you want:
Select EIdCount, Count(BusinessId)
From (
Select BusinessId,
Count(EmployeeId) As EIdCount
From Table
Group by BusinessId
) A
Group By EIdCount

Just use a subquery:
select
EIdCount,count(BusinessId) as [BusinessId Count]
from
(
--your original query/start
Select BusinessId,
Count(EmployeeId) as EIdCount
From Table
Group by BusinessId
--your original query/end
)t
group by EIdCount

Related

Add column to count records for unique ID

I'm dealing with data regarding people who have visited a certain place. Each person has their own unique PersonID and each of their visits has a unique VisitID. What I'd like to do is add a column to my query that counts the number of distinct records for each person (i.e. counts and displays the number of times that person visited). The logic makes sense in my head, but I'm unsure about syntax, and the similar questions I've looked at while researching just haven't quite applied to my situation.
So here's what I'm looking at:
SELECT
PersonID,
[a few other demographic fields we'll skip for now],
VisitID,
COUNT(DISTINCT VisitID) as PersonVisits
FROM VisitInfo
WHERE VisitID IS NOT NULL
ORDER BY PersonID, VisitID
And I'm hoping to see results like this:
PersonID ... VisitID PersonVisits
------------------------------------------------
1001 ... 0277 2
1001 ... 1429 2
1002 ... 1103 1
1003 ... 0925 3
1003 ... 2276 3
1003 ... 5018 3
I know the PersonVisits count would just repeat for each of a given person's records, but that's something I can deal with for the purposes of this project (unless anyone has any suggestions for how to improve that aspect of the query).
My main problem is that
I'm not sure if what I'm doing is even the correct way to go about this
and
As it stands now, this query is giving me the error that
Column PersonID is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY function.
I wasn't getting that error until I added the COUNT function to the select list.
Am I on the right track?
When you use aggregates in SQL (sum, count, etc), you need a GROUP BY clause:
SELECT
PersonID,
COUNT(DISTINCT VisitID) as PersonVisits
FROM VisitInfo
WHERE VisitID IS NOT NULL
GROUP BY PersonID
ORDER BY PersonID
As a rule of thumb, ALL non-aggregated columns in your SELECT need to be in the GROUP BY
SELECT
PersonID,
[a few other demographic fields we''ll skip for now],
VisitID,
COUNT(DISTINCT VisitID) as PersonVisits
FROM VisitInfo
WHERE VisitID IS NOT NULL
GROUP BY
PersonID,
[a few other demographic fields we'll skip for now],
VisitID,
ORDER BY PersonID, VisitID
This is probably going to give you weird/incorrect results since your query is most likely going to introduce duplicates and be grouped very tightly due to the values in your SELECT
Just another option is the window function sum() over()
SELECT PersonID,
VisitID,
PersonVisits = sum(1) over (partition by PersonID)
FROM VisitInfo
WHERE VisitID IS NOT NULL
ORDER BY PersonID, VisitID
Join the table to a query which groups by PersonID and returns the number of distinct VisitIDs for each one of them:
SELECT
t.PersonID,
t.col1, t.col2, ...,
t.VisitID,
g.PersonVisits
FROM VisitInfo t
INNER JOIN (
SELECT PersonID, COUNT(DISTINCT VisitID) PersonVisits
FROM VisitInfo
GROUP BY PersonID
) g ON g.PesronID = t.PersonID
WHERE t.VisitID IS NOT NULL
ORDER BY t.PersonID, t.VisitID
This is how I would write it...
select person_id, demo1, demo2, demo3, count(distinct visitid) as visits
from visitinfo
where visitid is not null
group by 1,2,3,4
order by 1,2,3,4

Identifying a Distinct Count for a Column Without Using Group By

I'm trying to figure out how to get the distinct count of something that's conditional and doesn't use group by. I've got a table that has columns as seen here:
Employeeid, Training_Course_name, CompletedDate
Some of the courses have the word Rope in them.
I want to take the number of completed courses per person with the word "Rope" in the title and divide it by the number of unique courses there are that have the word rope in the title. If there are 15 unique course names that have the word rope in the title, regardless of who they're assigned to, I want to come up with that number and have it divided into the number of completed rope courses per person.
You can use conditional aggregation:
select count(distinct case when Training_Course_name like '%rope%'
then Training_Course_name
end) as courses_with_rope
This will help you to solve your problem
Declare #UniqueCourses As TABLE(Course As VarChar(32))
Select #UniqueCourses = Training_Course_Name
From
(SELECT DISTINCT Training_Course_Name
FROM Employess
WHERE Training_Course_Name LIKE '%Rope%') A
SELECT
EmpId,
(SELECT COUNT(1) FROM Employees innerEmployees
WHERE innerEmployees.EmpId = outerEmployees.EmpId AND
innerEmployees.CompletedDate is not null
) AS Completed Courses
From Employees outerEmployees
You can get the course with the word rope in them with this query
SELECT Employeeid, Training_Course_name, CompletedDate
FROM Table_Name_You_Did_Not_Say
WHERE Training_Course_name LIKE '%rope%'
And a distinct count like this
SELECT Employeeid, Training_Course_name, CompletedDate,
count(distinct Training_Course_name) as distinct_names
FROM Table_Name_You_Did_Not_Say
WHERE Training_Course_name LIKE '%rope%'
Anything "by employee id" would require a group by -- so what exactly is your requirement?

SQL Select employees with work history in multiple states

I need a query that will show only employees who have been paid in more than one state.
The query will pull three columns:
EmployeeID
WorkLocation
LastPayDate
My current, unsuccessful attempt:
Select EmployeeID
, WorkLocation
, max(LastPayDate)
from Table
group by EmployeeID, WorkLocation
having COUNT(distinct WorkLocation) > 1
This query pulls zero records. I know there are employees who have worked in multiple states, however. I am not sure where my logic breaks down.
Any instruction is much appreciated.
You need to have a count(workLocation) > 1 which indicates that they have worked in more than 1 state. Specify this in the HAVING clause. Since you're only concerned in GROUPS which contain multiple WorkLocations.
If you're trying to check for multiple work locations within a specific year, you will perform that logic in the WHERE clause.
select EmployeeId
from table xyz
//where year(LastPayDate) == 2015
group by EmployeeId
having count(distinct WorkLocation) > 1
Figured it out. I needed to use a subquery. Solution as follows:
Select t.EmployeeID
, t.WorkLocation
, t.LastPayDate
From Table t
Where t.EmployeeID in
(
Select t2.EmployeeID
From Table t2
Group by t2.EmployeeID
Having count(distinct t2.WorkLocation) > 1
)
Group by t.EmployeeID, t.WorkLocation
Order by t.EmployeeID
Thanks to everyone for helping.

Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )

How can a group by query be used to get the number of occurences of a particular column?

I want to get the results in a way that each order is displayed with the number of times they occur in a table. For example:
Chicken Parmessan - 3
Polo Pizza - 5
select food, count(*) from tablename group by food
If you have a table like this:
create table orders (
menu_item varchar(100) not null
)
Then you'd want something like this:
select menu_item, count(*)
from orders
group by menu_item
The aggregate function count will then count the number of entries in each group and each group will be identified by menu_item.