How to fix count() when linking multiple tables together

How to fix count() when linking multiple tables together - sql

Im removing an attribute I have called B_Price_per_hour since it should've been derived in the first place but it messed up one of my codes so I had to recreate it and I couldn't quite get it right.
SELECT c.Customer_ID, SUM(hours_of_use * C_price_per_hour) AS Total_Sales, COUNT(b.Booking_ID) AS Total_Visits
FROM CafeCustomers c, Booking b, Computer AS cc, BookingToComputer AS bc
WHERE c.Customer_ID = b.Customer_ID AND b.Booking_ID = bc.Booking_ID AND cc.Computer_ID = bc.Computer_ID
GROUP BY c.Customer_ID
ORDER BY SUM(hours_of_use * C_Price_per_hour) DESC;
SELECT c.Customer_ID, SUM(hours_of_use * B_Price_per_hour) AS Total_Sales, COUNT(Booking_ID) AS Total_Visits
FROM CafeCustomers c, Booking b
WHERE c.Customer_ID = b.Customer_ID
GROUP BY c.Customer_ID
ORDER BY SUM(hours_of_use * B_Price_per_hour) DESC;
The query that includes BookingToComputer and Computer shows the incorrect result of the count() function.
Results of the first query:
https://imgur.com/aMYHKUG
Results of the second (desired results) query:
https://imgur.com/KfaGAge
Created this as well and still has the same problem:
SELECT cc.Customer_ID, SUM(hours_of_use * C_price_per_hour) AS Total_Sales, COUNT(b.Booking_ID) AS Total_Visits
FROM CafeCustomers AS cc
INNER JOIN Booking AS b ON b.Customer_ID = cc.Customer_ID
INNER JOIN BookingToComputer AS bc ON bc.Booking_ID = b.Booking_ID
INNER JOIN Computer AS c ON c.Computer_ID = bc.Computer_ID
GROUP BY cc.Customer_ID;
Table Info:
CREATE TABLE [dbo].[Booking](
[Booking_ID] [int] NOT NULL,
[B_price_per_hour] [int] NOT NULL, (Removing this one)
[Customer_ID] [int] NOT NULL, )
CREATE TABLE [dbo].[BookingToComputer](
[Booking_ID] [int] NOT NULL,
[Computer_ID] [int] NOT NULL, )
CREATE TABLE [dbo].[CafeCustomers](
[Customer_ID] [int] NOT NULL,)
CREATE TABLE [dbo].[Computer](
[Computer_ID] [int] NOT NULL,
[C_price_per_hour] [int] NOT NULL,)
INSERT INTO Booking VALUES (1, 14, 1)
INSERT INTO Booking VALUES (2, 5, 1)
INSERT INTO Booking VALUES (3, 12, 2)
INSERT INTO Booking VALUES (4,7,3)
INSERT INTO Booking VALUES (5, 12, 2)
INSERT INTO Booking VALUES (6, 7, 5)
INSERT INTO Computer VALUES (1, 7)
INSERT INTO Computer VALUES (2, 7)
INSERT INTO Computer VALUES (3, 7)
INSERT INTO Computer VALUES (4, 7)
INSERT INTO Computer VALUES (5, 7)
INSERT INTO Computer VALUES (6, 7)
INSERT INTO Computer VALUES (7, 5)
INSERT INTO Computer VALUES (8, 5)
INSERT INTO BookingToComputer VALUES (1,1)
INSERT INTO BookingToComputer VALUES (1,2)
INSERT INTO BookingToComputer VALUES (2,5)
INSERT INTO BookingToComputer VALUES (3,3)
INSERT INTO BookingToComputer VALUES (3,8)
INSERT INTO BookingToComputer VALUES (4,7)
INSERT INTO BookingToComputer VALUES (5,6)
INSERT INTO BookingToComputer VALUES (6,4)
INSERT INTO CafeCustomers VALUES (1)
INSERT INTO CafeCustomers VALUES (2)
INSERT INTO CafeCustomers VALUES (3)
INSERT INTO CafeCustomers VALUES (4)
INSERT INTO CafeCustomers VALUES (5)

Was playing around and figured out I had to add DISTINCT to the count

Related

COUNT of different tables in GROUP

I have a table that holds a list of classes available at a school. Each class can have a number of sessions. And each class can have pupils assigned to it.
What I need to do is get a count of all sessions for each class, as well as the number of students attending the class. I have done the first bit, but if I join to the pupil allocation table, my counts will be wrong.
I have conjured up some fake SQL that you can use.
I'm stuck with efficiently getting a count from the Pupil link table.
DECLARE #Class TABLE
(
ClassID INT NOT NULL,
ClassName VARCHAR(20) NOT NULL
)
INSERT INTO #Class VALUES (1, 'English')
INSERT INTO #Class VALUES (2, 'Maths')
DECLARE #ClassSession TABLE
(
ClassSessionID INT NOT NULL,
ClassID INT NOT NULL,
Description VARCHAR(100) NOT NULL
)
INSERT INTO #ClassSession VALUES (1, 1, 'Basic English')
INSERT INTO #ClassSession VALUES (2, 1, 'Advanced English')
INSERT INTO #ClassSession VALUES (3, 1, 'Amazing English')
INSERT INTO #ClassSession VALUES (4, 2, 'Basic English')
INSERT INTO #ClassSession VALUES (5, 2, 'Basic English')
DECLARE #ClassPupil TABLE
(
ClassPupilID INT NOT NULL,
ClassID INT NOT NULL,
PupilID INT NOT NULL -- FK to the Pupils table.
)
INSERT INTO #ClassPupil VALUES (1, 1, 1000)
INSERT INTO #ClassPupil VALUES (2, 1, 1001)
INSERT INTO #ClassPupil VALUES (3, 1, 1002)
INSERT INTO #ClassPupil VALUES (4, 1, 1003)
INSERT INTO #ClassPupil VALUES (5, 1, 1004)
INSERT INTO #ClassPupil VALUES (6, 2, 1005)
INSERT INTO #ClassPupil VALUES (7, 2, 1006)
INSERT INTO #ClassPupil VALUES (8, 2, 1007)
SELECT ClassName, COUNT(*) AS Sessions, '??' AS NumerOfPupils
FROM #Class c
INNER JOIN #ClassSession cs
ON cs.ClassID = c.ClassID
GROUP BY c.ClassID, c.ClassName
It can maybe be done with a sub query? Is that the best way?

You have two independent dimensions for each class. You need to aggregat them separately:
SELECT c.ClassName, cs.Sessions, cp.Pupils
FROM #Class c INNER JOIN
(SELECT ClassId, COUNT(*) as sessions
FROM #ClassSession cs
GROUP BY ClassId
) cs
ON cs.ClassID = c.ClassID INNER JOIN
(SELECT ClassId, COUNT(*) as pupils
FROM #ClassPupil cp
GROUP BY ClassId
) cp
ON cp.ClassId = c.ClassId;

Another method is to use CROSS APPLY to get the count of pupils:
SELECT
ClassName, COUNT(*) AS Sessions, cp.NumberOfPupils
FROM #Class c
INNER JOIN #ClassSession cs
ON cs.ClassID = c.ClassID
CROSS APPLY (
SELECT COUNT(*) AS NumberOfPupils
FROM #ClassPupil
WHERE
ClassID = c.ClassID
) cp
GROUP BY c.ClassID, c.ClassName, cp.NumberOfPupils

SELECT ClassName, COUNT(distinct cs.ClassSessionID) AS Sessions, /*'??'*/ count( distinct cp.PupilID) AS NumerOfPupils
FROM #Class c
INNER JOIN #ClassSession cs
ON cs.ClassID = c.ClassID
inner join #ClassPupil cp on c.ClassID=cp.ClassID
GROUP BY /*c.ClassID,*/ c.ClassName
count(distinct...) solves (works around) the problem.
Generally, this is (A -> B, C) issue.

Getting a single result on one table from criteria on multiple rows in another table

Imagine a Student table with the name and id of students at a school, and a Grades table that has grades on the form:
grade_id | student_id.
What I want to do is find all the students that match an arbitrary criteria of say "find all students that have grade A, grade B, but not C or D".
In a school situation a student could have several A's and B's, but for my particular problem they will allways have one or none of each grade.
Also, the tables i'm working on are huge (several million rows in each), but i only need to find say 10-20 on each query (the purpose of this is to find test data).
Thanks!

Change the table variables to your physical tables and this should help?
DECLARE #Students TABLE (
StudentId INT,
StudentName VARCHAR(50));
INSERT INTO #Students VALUES (1, 'Tom');
INSERT INTO #Students VALUES (2, 'Dick');
INSERT INTO #Students VALUES (3, 'Harry');
DECLARE #StudentGrades TABLE (
StudentId INT,
GradeId INT);
INSERT INTO #StudentGrades VALUES (1, 1);
INSERT INTO #StudentGrades VALUES (1, 1);
INSERT INTO #StudentGrades VALUES (1, 2);
INSERT INTO #StudentGrades VALUES (1, 3);
INSERT INTO #StudentGrades VALUES (2, 1);
INSERT INTO #StudentGrades VALUES (2, 2);
INSERT INTO #StudentGrades VALUES (3, 1);
INSERT INTO #StudentGrades VALUES (3, 1);
INSERT INTO #StudentGrades VALUES (3, 3);
INSERT INTO #StudentGrades VALUES (3, 4);
INSERT INTO #StudentGrades VALUES (3, 4);
DECLARE #Grades TABLE (
GradeId INT,
GradeName VARCHAR(10));
INSERT INTO #Grades VALUES (1, 'A');
INSERT INTO #Grades VALUES (2, 'B');
INSERT INTO #Grades VALUES (3, 'C');
INSERT INTO #Grades VALUES (4, 'D');
--Student/ Grade Summary
SELECT
s.StudentId,
s.StudentName,
g.GradeName,
COUNT(sg.GradeId) AS GradeCount
FROM
#Students s
CROSS JOIN #Grades g
LEFT JOIN #StudentGrades sg ON sg.StudentId = s.StudentId AND sg.GradeId = g.GradeId
GROUP BY
s.StudentId,
s.StudentName,
g.GradeName;
--Find ten students with A and B but not C or D
SELECT TOP 10
*
FROM
#Students s
WHERE
EXISTS (SELECT * FROM #StudentGrades sg WHERE sg.StudentId = s.StudentId AND sg.GradeId = 1) --Got an A
AND EXISTS (SELECT * FROM #StudentGrades sg WHERE sg.StudentId = s.StudentId AND sg.GradeId = 2) --Got a B
AND NOT EXISTS (SELECT * FROM #StudentGrades sg WHERE sg.StudentId = s.StudentId AND sg.GradeId IN (3, 4)); --Didn't get a C or D

Make sure all your id fields are indexed.
select *
from students s
where exists
(
select *
from grades g
where g.grade_id in (1, 2)
and g.student_id = s.student_id
)

sql server query: how to select customers with more than 1 order

I am a sql server newbie and trying to select all the customers which have more than 1 orderid. The table looks as follows:
CREATE TABLE [dbo].[orders](
[customerid] [int] NULL,
[orderid] [int] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[orders] ([customerid], [orderid]) VALUES (1, 2)
INSERT [dbo].[orders] ([customerid], [orderid]) VALUES (1, 3)
INSERT [dbo].[orders] ([customerid], [orderid]) VALUES (2, 4)
INSERT [dbo].[orders] ([customerid], [orderid]) VALUES (2, 5)
INSERT [dbo].[orders] ([customerid], [orderid]) VALUES (3, 1)

select customerid
, count(*) as order_count
from orders
group by
customerid
having count(*) > 1

as you'll probably need customer data at some point, you could also try:
select *
from customers
where exists (
select count(*)
from orders
where customers.id = customerid
group by customerid
having count(*) > 1
)

Complex SQL query for inventory app

Given the following 2 tables, I need to find the warehouses that have all the parts in the right quantity to build a particular kit, or more appropriately, how many kits each can warehouse can build.
Inventory table: Warehouse, Part, and QuantityOnHand
Kit table: Kit, Part, QuantityForKit
For example: Kit1 requires 1 of Part1, 2 of Part2, and 1 of Part3. Warehouse A has 20 Part1, 5 Part2 and 3 Part3. Warehouse B has 5 Part1, 10 Part2, and no Part3.
Warehouse A can only build 2 of Kit1 because it doesn't have enough Part2 to make more than 2 kits. Warehouse B can't build any Kit1 because it doesn't have all the necessary parts.
I've got the following demo that works, but it seems really cumbersome and uses mostly table/index scans. Our inventory table is large and this just runs too slow. I'm looking for a better way to accomplish the same thing. In the demo there's an unbounded cross join, but in the actual app, it's limited to a single kit.
CREATE TABLE #warehouse
(
Warehouse CHAR(1) NOT NULL PRIMARY KEY
)
INSERT INTO #warehouse VALUES ('A')
INSERT INTO #warehouse VALUES ('B')
INSERT INTO #warehouse VALUES ('C')
INSERT INTO #warehouse VALUES ('D')
CREATE TABLE #inventory
(
Warehouse CHAR(1) NOT NULL ,
Part INT NOT NULL ,
OnHand INT NOT NULL ,
CONSTRAINT pk_inventory PRIMARY KEY CLUSTERED (Part, Warehouse)
)
INSERT INTO #inventory VALUES ('A', 1, 20)
INSERT INTO #inventory VALUES ('A', 2, 5)
INSERT INTO #inventory VALUES ('A', 3, 3)
INSERT INTO #inventory VALUES ('B', 1, 5)
INSERT INTO #inventory VALUES ('B', 2, 10)
INSERT INTO #inventory VALUES ('C', 1, 1)
INSERT INTO #inventory VALUES ('C', 3, 1)
INSERT INTO #inventory VALUES ('D', 1, 1)
INSERT INTO #inventory VALUES ('D', 2, 2)
INSERT INTO #inventory VALUES ('D', 3, 1)
CREATE TABLE #kit
(
Kit INT NOT NULL ,
Part INT NOT NULL ,
Quantity INT NOT NULL ,
CONSTRAINT pk_kit PRIMARY KEY CLUSTERED (Kit, Part)
)
INSERT INTO #kit VALUES (1, 1, 1)
INSERT INTO #kit VALUES (1, 2, 2)
INSERT INTO #kit VALUES (1, 3, 1)
INSERT INTO #kit VALUES (2, 1, 1)
INSERT INTO #kit VALUES (2, 2, 1)
-- Here's the statement I need to optimize
SELECT
Warehouse,
Kit,
MIN(Capacity) AS [Capacity]
FROM
(
SELECT
A.Warehouse,
A.Kit,
A.Part,
ISNULL(B.OnHand, 0) AS [Quantity],
ISNULL(B.OnHand, 0) / A.Quantity AS Capacity
FROM
(
SELECT *
FROM
#warehouse
CROSS JOIN
-- (SELECT * FROM
#kit
-- WHERE #kit.Kit = #Kit) K
) A
LEFT OUTER JOIN
#inventory B
ON A.Warehouse = B.Warehouse
AND A.Part = B.Part
) C
GROUP BY
Warehouse,
Kit
;
Suggestions appreciated.

Try this:
SELECT warehouse, MIN(capacity) FROM (
SELECT i.warehouse, i.onhand / k.quantity as capacity
FROM #kit k
JOIN #inventory i
ON k.part = i.part AND k.quantity <= i.onhand
WHERE k.kit = #kit) c
GROUP BY warehouse
HAVING COUNT(*) = (SELECT COUNT(*) FROM #kit WHERE kit = #kit)
sqlfiddle here

How many times are the results of this common table expression evaluated?

I am trying to work out a bug we've found during our last iteration of testing. It involves a query which uses a common table expression. The main theme of the query is that it simulates a 'first' aggregate operation (get the first row for this grouping).
The problem is that the query seems to choose rows completely arbitrarily in some circumstances - multiple rows from the same group get returned, some groups simply get eliminated altogether. However, it always picks the correct number of rows.
I have created a minimal example to post here. There are clients and addresses, and a table which defines the relationships between them. This is a much simplified version of the actual query I'm looking at, but I believe it should have the same characteristics, and it is a good example to use to explain what I think is going wrong.
CREATE TABLE [Client] (ClientID int, Name varchar(20))
CREATE TABLE [Address] (AddressID int, Street varchar(20))
CREATE TABLE [ClientAddress] (ClientID int, AddressID int)
INSERT [Client] VALUES (1, 'Adam')
INSERT [Client] VALUES (2, 'Brian')
INSERT [Client] VALUES (3, 'Charles')
INSERT [Client] VALUES (4, 'Dean')
INSERT [Client] VALUES (5, 'Edward')
INSERT [Client] VALUES (6, 'Frank')
INSERT [Client] VALUES (7, 'Gene')
INSERT [Client] VALUES (8, 'Harry')
INSERT [Address] VALUES (1, 'Acorn Street')
INSERT [Address] VALUES (2, 'Birch Road')
INSERT [Address] VALUES (3, 'Cork Avenue')
INSERT [Address] VALUES (4, 'Derby Grove')
INSERT [Address] VALUES (5, 'Evergreen Drive')
INSERT [Address] VALUES (6, 'Fern Close')
INSERT [ClientAddress] VALUES (1, 1)
INSERT [ClientAddress] VALUES (1, 3)
INSERT [ClientAddress] VALUES (2, 2)
INSERT [ClientAddress] VALUES (2, 4)
INSERT [ClientAddress] VALUES (2, 6)
INSERT [ClientAddress] VALUES (3, 3)
INSERT [ClientAddress] VALUES (3, 5)
INSERT [ClientAddress] VALUES (3, 1)
INSERT [ClientAddress] VALUES (4, 4)
INSERT [ClientAddress] VALUES (4, 6)
INSERT [ClientAddress] VALUES (5, 1)
INSERT [ClientAddress] VALUES (6, 3)
INSERT [ClientAddress] VALUES (7, 2)
INSERT [ClientAddress] VALUES (8, 4)
INSERT [ClientAddress] VALUES (5, 6)
INSERT [ClientAddress] VALUES (6, 3)
INSERT [ClientAddress] VALUES (7, 5)
INSERT [ClientAddress] VALUES (8, 1)
INSERT [ClientAddress] VALUES (5, 4)
INSERT [ClientAddress] VALUES (6, 6)
;WITH [Stuff] ([ClientID], [Name], [Street], [RowNo]) AS
(
SELECT
[C].[ClientID],
[C].[Name],
[A].[Street],
ROW_NUMBER() OVER (ORDER BY [A].[AddressID]) AS [RowNo]
FROM
[Client] [C] INNER JOIN
[ClientAddress] [CA] ON
[C].[ClientID] = [CA].[ClientID] INNER JOIN
[Address] [A] ON
[CA].[AddressID] = [A].[AddressID]
)
SELECT
[CTE].[ClientID],
[CTE].[Name],
[CTE].[Street],
[CTE].[RowNo]
FROM
[Stuff] [CTE]
WHERE
[CTE].[RowNo] IN (SELECT MIN([CTE2].[RowNo]) FROM [Stuff] [CTE2] GROUP BY [CTE2].[ClientID])
ORDER BY
[CTE].[Name] ASC,
[CTE].[Street] ASC
DROP TABLE [ClientAddress]
DROP TABLE [Address]
DROP TABLE [Client]
The query is designed to get all clients, and their first address (the address with the lowest ID). This appears to me that it should work.
I have a theory about why it sometimes will not work. The statement that follows the CTE refers to the CTE in two places. If the CTE is non-deterministic, and it gets run more than once, the result of the CTE may be different in the two places it's referenced.
In my example, the CTE's RowNo column uses ROW_NUMBER() with an order by clause that will potentially result in different orderings when run multiple times (we're ordering by address, the clients can be in any order depending on how the query is executed).
Because of this is it possible that CTE and CTE2 can contain different results? Or is the CTE only executed once and do I need to look elsewhere for the problem?

It is not guaranteed in any way.
SQL Server is free to evaluate CTE each time it's accessed or cache the results, depending on the plan.
You may want to read this article:
Generating XML in subqueries
If your CTE is not deterministic, you will have to store its result in a temporary table or a table variable and use it instead of the CTE.
PostgreSQL, on the other hand, always evaluates CTEs only once, caching their results.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to fix count() when linking multiple tables together - sql

Was playing around and figured out I had to add DISTINCT to the count

Related

COUNT of different tables in GROUP

Getting a single result on one table from criteria on multiple rows in another table

sql server query: how to select customers with more than 1 order

Complex SQL query for inventory app

How many times are the results of this common table expression evaluated?

Categories

Resources