Count returning incorrect result - sql

I've written a count query which isn't working, as expected.
The query is below and it returns three columns - DeparmtmentName, CategoryName and NumberOfCats.
The NumberOfCats column should show the number of Categories in a given Department.
So with the table below, the NumberOfCats column should have the number 4 in each row, instead of 1, because the Bakey Department has 4 Categories, in this case.
Does anyone know how I can amend the code, so it returns the right result, please?
SELECT
DepartmentName,
CategoryName,
COUNT(DISTINCT CategoryName) as NumberOfCats
FROM v_EnterpriseStructure
GROUP BY
DepartmentName,
CategoryName
ORDER BY DepartmentName;

Seems like a window function will take care of these pretty easily:
SELECT
DepartmentName,
CategoryName,
COUNT( CategoryName) OVER(Partition by DepartmentName) as NumberOfCats
FROM v_EnterpriseStructure
group by DepartmentName,
CategoryName

One way is to seperate the count from the rest:
SELECT
T1.DepartmentName,
T1.CategoryName,
AG.NumberOfCats
FROM v_EnterpriseStructure T1
INNER JOIN (
SELECT DepartmentName,
COUNT(DISTINCT CategoryName) as NumberOfCats
FROM v_EnterpriseStructure
GROUP BY DepartmentName
) AS AG
ON AG.DepartmentName = T1.DepartmentName
ORDER BY DepartmentName;
You may need to add a DISTINCT

One option, using COUNT() as an analytic function:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY DepartmentName) cnt
FROM v_EnterpriseStructure
)
SELECT DISTINCT
DepartmentName,
CategoryName,
cnt AS NumberOfCats
FROM cte
ORDER BY DepartmentName;

By adding a subquery to your query you can get the data you need. If your table is < 100.000 this should run without issues. if it's larger you might wish to consider altering your data structures.
http://sqlfiddle.com/#!18/d30e1/1
The create statement.
create table v_EnterpriseStructure (DepartmentName nvarchar(50), CategoryName nvarchar(50))
CREATE INDEX IX_DepartmentName
ON v_EnterpriseStructure(DepartmentName);
insert into v_EnterpriseStructure VALUES
('BAKERY', 'Bread'),('BAKERY', 'Cakes'),('BAKERY', 'Scones'),('BAKERY', 'Croissants'),
('BAR', 'Beer'),('BAR', 'Cola'),
('Storage', 'MrProper'),('Storage', 'MrProper');
insert into v_EnterpriseStructure select * from v_EnterpriseStructure;
insert into v_EnterpriseStructure select * from v_EnterpriseStructure;
insert into v_EnterpriseStructure select * from v_EnterpriseStructure;
insert into v_EnterpriseStructure select * from v_EnterpriseStructure;
insert into v_EnterpriseStructure select * from v_EnterpriseStructure;
The query:
SELECT
DepartmentName,
CategoryName,
(select
COUNT(distinct CategoryName) from v_EnterpriseStructure as v2
where v1.DepartmentName = v2.DepartmentName
) as NumberOfCats
FROM v_EnterpriseStructure as v1
group by DepartmentName,
CategoryName
ORDER BY DepartmentName;

Related

SQL show orders where 2 values are distinct and match the first

I'm looking for a way to let me select all orders that have multiple distinct names within the same order-number, it looks like this:
order - name
111-Paul
112-Paula
113-John
113-John
113-Jessica
114-Eric
114-Eric
114-John
115-Zack
115-Zack
115-Zack
etc.
so that i would get all the orders that have 2 or more distinct names in it:
113-John
113-Jessica
114-Eric
114-John
with which I could do further queries but I'm stuck. Can anyone give me some hints on how to tackle this problem please? I've tried it with count(*) which looked like this:
select order, name, count(name) from dbo.orders
group by order, name
having count(name) > 1
which gave me all the orders which had more than 1 name in it but I don't know how to let it only show orders with the distinct names.
Here's one approach using exists:
select distinct [order], name
from orders o
where exists (
select 1
from orders o2
where o.[order] = o2.[order] and o.name != o2.name)
Fiddle Demo
I would use windows functions for this
For example:
select distinct order
from
(select
order,
row_number() over(partition by order, name order by order asc) as rn
) as t1
where rn > 1
you can do the same with count
count(*) over(partition by order,name order by order asc) as cnt
Here's a straight forward implementation in Sql Server:
select distinct *
from table1
where [order] in (
select [order]
from (select distinct * from table1) iq
group by [order]
having count(*) > 1)
It's essentially breaking down the problem into:
Finding the orders that have more than one distinct value.
Finding the pairs of distinct order - name that belong to the list previously calculated.
When you use HAVING COUNT(name) > 1, it is counting all of the rows in those groups, including duplicate rows (rows 113-John and 113-John are 2 rows for order 113). I would query distinct rows from your table, and then select from that:
SELECT [order], [name] FROM (
SELECT DISTINCT [order], [name] FROM dbo.orders
) A
GROUP BY [order], [name]
HAVING COUNT([name]) > 1
As a note, if a [name] is null, then it will not be counted with COUNT(name). If you want nulls to be counted, use COUNT(*) instead.
You can use count(distinct name) to get the number of unique names for each order:
select [order], count(distinct name)
from orders
group by [order]
To just get the order for those you can use having:
select [order]
from orders
group by [order]
having count(distinct name) > 1
To get the details for those orders you can put that in the where clause to just return the rows with order in that list:
select *
from orders
where [order] in (
select [order]
from orders
group by [order]
having count(distinct name) > 1
)
sqlfiddle
I would use RANK (or DENSE_RANK) for this as shown below.
SELECT [Order]
FROM (SELECT
[Order],
RANK() OVER(PARTITION BY [Order] ORDER BY Name) AS NameRank
FROM [StackOverflow].[dbo].[OrderAndName]) ranked
WHERE ranked.NameRank > 1
GROUP BY [Order]
The sub-query ranks (gives a seeding) to the names in an order according to their value. Names with the same value would have the same rank i.e. when an order has one name multiple times (like 115) the rank of all names would be 1.
The partition is important here as otherwise you would get the rank for all names for all orders which wouldn't give you the result you'd like.
It is then just a case of pulling out the orders that have a RANK greater than 1 and grouping (could use distinct if that's a preference).
You can then join to this table to get get the orders and names as follows;
SELECT oan.[Order], [Name]
FROM [StackOverflow].[dbo].[OrderAndName] oan
INNER JOIN (SELECT [Order]
FROM (SELECT [Order],
RANK() OVER(PARTITION BY [Order] ORDER BY Name) AS NameRank
FROM [StackOverflow].[dbo].[OrderAndName]) ranked
WHERE ranked.NameRank > 1
GROUP BY [Order]) twoOrMore ON oan.[Order] = twoOrMore.[Order]

SQL code it's looks to complicate

Test Table
create table Test (
Id integer,
Store_N varchar(25),
Department varchar(25)
);
INSERT INTO Test (Id, Store_N, Department )
Values (25,'1','A'), (67,'1','A'), (34,'1','A'), (97,'1','C'),
(21,'1','C'), (268,'1','B'), (456,'2','A'), (349,'2','A'),
(935,'2','B'), (36,'3','B'), (637,'3','B'), (388,'3','B'),
(891,'3','B'), (344,'4','A'), (763,'4','A'), (836,'4','A')
SELECT * , ROW_NUMBER() OVER( Partition BY Store_N ORDER BY Store_N ) AS AA
FROM Test;
Result is
I need to exclude all stores which have only one department and have the only DISTINCT department for each store. The result looks like this
And this is code
SELECT DISTINCT TB4.Department, TB4.Store_N
From
(
SELECT TB0.Store_N, TB0.Department FROM Test TB0
INNER JOIN
(
SELECT TB2.Store_N , Count(*) AS AA1
FROM
(
SELECT DISTINCT TB1.Department , TB1.Store_N
FROM
( SELECT * , ROW_NUMBER() OVER( Partition BY Store_N ORDER BY Store_N ) AA
FROM Test ) TB1
) TB2
group by TB2.Store_N
HAVING
COUNT(*) > 1 ) TB3
ON TB0.Store_N = TB3.Store_N
) TB4
Now the question how to simplify this code?
Thank you
You can basically do:
select store_n, department
from test
group by store_n, department;
But, you want to exclude stores that have only one department, so lets do a count:
select store_n, department
from (select store_n, department, count(*) over (partition by store_n) as cnt
from test
group by store_n, department
) t
where cnt > 1;
Here is a SQL Fiddle.
You are going a long way round to get the functionality of the "GROUP BY" clause
SELECT TB2.Store_N , TB2.Department
FROM
(
SELECT Department , Store_N, count(Id) as c
FROM Test
GROUP BY Department, Store_N) as TB2
WHERE TB2.c > 1

Remove duplicate columns from Query result

Please help me!! Newby with Sql queries
Select *
from(
select EmpID,
sum(IncomeTax) as TaxAmount,
sum(bsalary) as SalaryAmount
from PayrollHistory Pay
group by EmpID
) cumSalary
Right JOIN (
Select PayrollHistory.EmpID,
(select firstName +' '+coalesce(middleInitial,' ')+' '+ lastName
from Employee
where Employee.EmpID=PayrollHistory.EmpID)as name,
PayrollHistory.IncomeTax,
(PayrollHistory.bsalary+sum(ISNULL(Allw.amount,0)))totalTaxableSUM
from PayrollHistory
left join (
select *
from AllowanceHistory
where AllowanceHistory.taxStatus=1
) as Allw
on Allw.EmpID=PayrollHistory.EmpID and Allw.payMonth=PayrollHistory.payMonth
where PayrollHistory.payMonth=3
group by PayrollHistory.EmpID, PayrollHistory.IncomeTax, PayrollHistory.bsalary
) as tbl
on tbl.EmpID =cumSalary.EmpID
The above query result gives 2 EmpID rows that are the same. How can remove one of them and still get the same result
Instead of first Select * specify all rows that you need like:
select cumSalary.EmpID,
cumSalary.TaxAmount,
cumSalary.SalaryAmount,
tbl.name,
tbl.IncomeTax,
tbl.totalTaxableSUM
etc.
Use column name selection instead of using * , refer as below
Select cumSalary.*,PayrollHistory.name , **....etc** from(
select EmpID, sum(IncomeTax) as TaxAmount,sum(bsalary) as SalaryAmount from
PayrollHistory Pay group by EmpID
) cumSalary
Right JOIN (
Select PayrollHistory.EmpID,(select firstName +' '+coalesce(middleInitial,'
')+' '+ lastName from Employee where
Employee.EmpID=PayrollHistory.EmpID)as name,
PayrollHistory.IncomeTax,( PayrollHistory.bsalary+sum(ISNULL(Allw.amount,0)
))totalTaxableSUM
from PayrollHistory
left join (select * from AllowanceHistory where AllowanceHistory.taxStatus=1
) as Allw on
Allw.EmpID=PayrollHistory.EmpID and Allw.payMonth=PayrollHistory.payMonth
where PayrollHistory.payMonth=3
group by
PayrollHistory.EmpID,PayrollHistory.IncomeTax,PayrollHistory.bsalary
) as tbl on tbl.EmpID =cumSalary.EmpID

SQL Query to identify "Top Performers" [?]

I'm still learning Oracle SQL and would like your guidance.
Let say, we have MONTHLY_SALES_TOTALS table that has 3 fields: name, region, amount. We need to determine the best sales people per region. Best means that their amount is equal to the maximum for the region.
CREATE TABLE montly_sales_totals
(
name varchar(20),
amount numeric(9),
region varchar(30)
);
INSERT ALL
INTO montly_sales_totals (name, amount, region) VALUES ('Peter', 55555, 'east')
INTO montly_sales_totals (name, amount, region) VALUES ('Susan', 55555, 'east')
INTO montly_sales_totals (name, amount, region) VALUES ('Mark', 1000000, 'south')
INTO montly_sales_totals (name, amount, region) VALUES ('Glenn', 50000, 'east')
INTO montly_sales_totals (name, amount, region) VALUES ('Paul', 500000, 'south')
SELECT * from dual;
Possible solution:
SELECT m1.name, m1.region, m1.amount
FROM montly_sales_totals m1
JOIN
(SELECT MAX(amount) max_amount, region FROM montly_sales_totals GROUP BY region) m2
ON (m1.region = m2.region)
WHERE m1.amount = m2.max_amount
ORDER by 2,1;
SQL Fiddle: http://sqlfiddle.com/#!4/6a2d8/6
Now my questions:
How efficient is such query?
How can/should it be simplified and/or improved?
I could not use Top since the number of "max" rows vary by region. Is it another direct functionality I could've used instead?
I would use RANK():
SELECT *
FROM (
SELECT name, amount, region,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) rnk
FROM montly_sales_totals
) t
WHERE t.rnk = 1
Here's a modified version of the SQL Fiddle
There are a number of ways one can go about this. Here's another:
select S.region, S.name, V.regionmax
from sales as S
inner join
(
select region, max(amount) as regionmax
from sales group by region
) as V
on S.region = V.region and S.amount = regionmax
As to efficiency, the main factor is the use of the proper index(es). Inline views can perform very well.
I like CTE syntax, but using that website the time taken is the same 2ms, so I can't beat yours :)
with Maximums as (
SELECT region,
MAX(amount) max_amount
FROM montly_sales_totals GROUP BY region
)
SELECT m1.name, m1.region, m1.amount
FROM montly_sales_totals m1, Maximums
WHERE (m1.amount = Maximums.max_amount)
and (m1.region = Maximums.region)
ORDER by 2,1;
you can do this by using the function too...
select * from (select m1.*, row_number( ) over (partition by m1.region order by m1.amount desc,m1.name desc ) max_sal from montly_sales_totals m1 ) where max_sal =1 ;
this query can do one extra thing if both employee sal are same!

T-SQL Subquery Question

i have two queries.
For each tuple of query1 i want to run query2. i dont want to use cursors. i tried several approaches using subqueries.
query1:
select
distinct
category,
Count(category) as CategoryCount
from
mytable
group by
category
query2:
select
top 3
Text,
Title,
Category
from
mytable
where
Category = '1'
Category = '1' is a sample. the value should come from query1
Try this
WITH TBL AS
(
SELECT TEXT, TITLE, CATEGORY,
COUNT(*) OVER(PARTITION BY CATEGORY) AS CATEGORYCOUNT,
ROW_NUMBER() OVER(PARTITION BY CATEGORY ORDER BY (SELECT 0)) AS RC
FROM MYTABLE
)
SELECT TEXT, TITLE, CATEGORY, CATEGORYCOUNT
FROM TBL
WHERE RC <= 3
ORDER BY CATEGORY