Distinct Count with two tables - sql

I have two tables on Access, Customer and Transaction. I'm trying to find the total # of transactions sorted by Dogs (0-3). The Transaction table has a line for each item bought, so multiple lines can be for one TransactionID.
Here's what I have so far:
SELECT Customer.Dogs, COUNT(Transaction.TransactionID) AS TotTrans
FROM Transaction, Customer
WHERE Transaction.CustomerID = Customer.CustomerID
GROUP BY Dogs
And I get
Dogs | TotTrans
0 | 130104
1 | 59132
2 | 17811
3 | 1401
Obviously this counts the total rows in the Transaction Table and sorts them by # of dogs. However, it is counting for the duplicates in the Transaction Table (e.g. There are three rows with TransactionID = 2, because in that transaction the customer bought 3 items. The Count is obviously including the extra 2 rows).
When I try to do COUNT(DISTINCT Transaction.TransactionID) it doesn't work, and the message
"Syntax error (missing operator) in query expression 'COUNT(DISTINCT Transaction.TransactionID)'.
I have looked around, but can't seem to find the solution. I think part of the problem stems from the fact that I'm selecting two attributes.
If anyone could help explain what to do and the logic behind it, that would be great!

You should join the customer table with an already distinct-ed table (using inner query)
SELECT Customer.Dogs, COUNT(distinctTransactions.TransactionID) AS TotTrans
FROM (select distinct TransactionID,CustomerID from Transaction) as
distinctTransactions, Customer
WHERE distinctTransactions.CustomerID = Customer.CustomerID
GROUP BY Dogs

You should learn to use proper join syntax. Also, table aliases make the query easier to write and to read:
SELECT c.Dogs, COUNT(DISTINCT t.TransactionID) AS TotTrans
FROM Transaction t JOIN
Customer c
ON t.CustomerID = c.CustomerID
GROUP BY c.Dogs
ORDER BY c.Dogs;

Related

Counting Unique IDs In a LEFT/RIGHT JOIN Query in Access

I am working on a database to track staff productivity. Two of the ways we do that is by monitoring the number of orders they fulfil and by tracking their error rate.
Each order they finish is recorded in a table. In one day they can complete many orders.
It is also possible for a single order to have multiple errors.
I am trying to create a query that provides a summary of their results. This query should have one column with "TotalOrders" and another with "TotalErrors".
I connect the two tables with a LEFT/RIGHT join since not all orders will have errors.
The problem comes when I want to total the number of orders. If someone made multiple mistakes on an order, that order gets counted multiple times; once for each error.
I want to modify my query so that when counting the number of orders it only counts records with distinct OrderID's; yet, in the same query, also count the total errors without losing any.
Is this possible?
Here is my SQL
SELECT Count(tblTickets.TicketID) AS TotalOrders,
Count(tblErrors.ErrorID) AS TotalErrors
FROM tblTickets
LEFT JOIN tblErrors ON tblTickets.TicketID = tblErrors.TicketID;
I have played around with SELECT DISTINCT and UNION but am struggling with the correct syntax in Access. Also, a lot of the examples I have seen are trying to total a single field rather than two fields in different ways.
To be clear when totalling the OrderCount field I want to only count records with DISTINCT TicketID's. When totalling the ErrorCount field I want to count ALL errors.
Ticket = Order.
Query Result: Order Count Too High
Ticket/Order Table: Total of 14 records
Error Table: You can see two errors for the same order on 8th
do a query that counts orders by staff from Orders table and a query that counts errors by staff from Errors table then join those two queries to Staff table (queries can be nested for one long SQL statement)
correlated subqueries
SELECT Staff.*,
(SELECT Count(*) FROM Orders WHERE StaffID = Staff.ID) AS CntOrders,
(SELECT Count(*) FROM Errors WHERE StaffID = Staff.ID) AS CntErrors
FROM Staff;
use DCount() domain aggregate function
Option 1 is probably the most efficient and option 3 the least.

My question is about SQL, using a TOP function inside a sub-query in MS Access

Overall what I'm trying to achieve is a query that shows the most ordered item from a customer in a database. To achieve this I've tried making a query showing how many times a customer has ordered an item, and now I am trying to create a sub-query in it using TOP1 to discern the most bought items.
With the SQL from the first query (looking weird because I made it with the Access automatic creator):
SELECT
Customers.CustomerFirstName,
Customers.CustomerLastName,
Products.ProductName,
COUNT(SalesQuantity.ProductCode) AS CountOfProductCode
FROM (Employees
INNER JOIN (Customers
INNER JOIN Sales
ON Customers.CustomerCode = Sales.CustomerCode)
ON Employees.EmployeeCode = Sales.EmployeeCode)
INNER JOIN (Products
INNER JOIN SalesQuantity
ON Products.ProductCode = SalesQuantity.ProductCode)
ON Sales.SalesCode = SalesQuantity.SalesCode
GROUP BY
Customers.CustomerFirstName,
Customers.CustomerLastName,
Products.ProductName
ORDER BY
COUNT(SalesQuantity.ProductCode) DESC;
I have tried putting in a subquery after FROM line:
(SELECT TOP1 CountOfProduct(s)
FROM (.....)
ORDER by Count(SalesQuantity.ProductCode) DESC)
I'm just not sure what to put in for the "from"-every other tutorial has the data from an already created table, however this is from a query that is being made at the same time. Just messing around I've put "FROM" and then listed every table, as well as
FROM Count(SalesQuantity.ProductCode)
just because I've seen that in the order by from the other code, and assume that the query is discerning from this count. Both tries have ended with an error in the syntax of the "FROM" line.
I'm new to SQL so sorry if it's blatantly obvious, but any help would be greatly appreciated.
Thanks
As I understand, you want the most purchased product for each customer.
So, begin by building aggregate query that counts product purchases by customer (appears to be done in the posted image). Including customer ID in the query would simplify the next step which is to build another query with TOP N nested query.
Part of what complicates this is unique record identifier is lost because of aggregation. Have to use other fields from the aggregate query to provide unique identifier. Consider:
SELECT * FROM Query1 WHERE CustomerID & ProductName IN
(SELECT TOP 1 CustomerID & ProductName FROM Query1 AS Dupe
WHERE Dupe.CustomerID = Query1.CustomerID
ORDER BY Dupe.CustomerID, Dupe.CountOfProductCode DESC);
Overall what I'm trying to achieve is a query that shows the most ordered item from a customer in a database.
This answers your question. It does not modify your query which is only tangentially related.
SELECT s.CustomerCode, sq.ProductCode, SUM(sq.quantity) as qty
FROM Sales as s INNER JOIN
SalesQuantity as sq
ON s.SalesCode = sq.SalesCode
GROUP BY s.CustomerCode, sq.ProductCode;
To get the most ordered items, you can use this twice:
SELECT s.CustomerCode, sq.ProductCode, SUM(sq.quantity) as qty
FROM Sales as s INNER JOIN
SalesQuantity as sq
ON s.SalesCode = sq.SalesCode
GROUP BY s.CustomerCode, sq.ProductCode
HAVING sq.ProductCode IN (SELECT TOP 1 sq2.ProductCode
FROM Sales as s2 INNER JOIN
SalesQuantity as sq2
ON s2.SalesCode = sq2.SalesCode
WHERE s2.CustomerCode = s.CustomerCode
GROUP BY sq2.ProductCode
);
In almost any other database, this would be simpler, because you would be able to use window functions.

Counting multiple different one to many relationships

I have the following SQL:
SELECT j.AssocJobKey
, COUNT(DISTINCT o.ID) AS SubjectsOrdered
, COUNT(DISTINCT s.ID) AS SubjectsShot
FROM Jobs j
LEFT JOIN Orders o ON o.AssocJobKey = j.AssocJobKey
LEFT JOIN Subjects s ON j.AssocJobKey = s.AssocJobKey
GROUP BY
j.AssocJobKey
,j.JobYear
The basic structure is a Job is the parent that is unique by AssocJobKey and has a one to many relationships with Subjects and Orders.
The query gives me what I want, the output looks like this:
| AssocJobKey | SubjectsOrdered | SubjectsShot |
|-----------------------|------------------------|---------------------|
| BAT-H181 | 107 | 830 |
|--------------------- |------------------------|---------------------|
| BAT-H131 | 226 | 1287 |
The problem is the query is way to heavy and my memory is spiking, there's no way I could run this on a large dataset. If I remove one of the LEFT JOINs on the corresponding count the query executes instantly and theres no problem. So somehow things are bouncing around between the two left joins more than they should, but I don't understand why they would.
Really hoping to avoid joining on sub selects if at all possible.
Your query is generating a Cartesian product for each job. And this is big -- your second row has about 500k rows being generated. COUNT(DISTINCT) then has to figure out the unique ids among this Cartesian product.
The solution is simple: pre-aggregate:
SELECT j.AssocJobKey, o.SubjectsOrdered, s.SubjectsShot
FROM Jobs j LEFT JOIN
(SELECT o.AssocJobKey, COUNT(*) as SubjectsOrdered
FROM Orders o
GROUP BY o.AssocJobKey
) o
ON o.AssocJobKey = j.AssocJobKey LEFT JOIN
(SELECT j.AssocJobKey, COUNT(s.ID) AS SubjectsShot
FROM Subjects s
GROUP BY j.AssocJobKey
) s
ON j.AssocJobKey = s.AssocJobKey;
This makes certain assumptions that I think are reasonable:
The ids in the orders and subjects table are unique and non-NULL.
jobs.AssocJobKey is unique.
The query can be easily adapted if either of these are not true, but they seem like reasonable assumptions.
Often for these types of joins over different dimensions, COUNT(DISTINCT) is a reasonable solution (the queries are certainly simpler). This is true when there are at most a handful of values.

Joining table issue with SQL Server 2008

I am using the following query to obtain some sales figures. The problem is that it is returning the wrong data.
I am joining together three tables tbl_orders tbl_orderitems tbl_payment. The tbl_orders table holds summary information, the tbl_orderitems holds the items ordered and the tbl_payment table holds payment information regarding the order. Multiple payments can be placed against each order.
I am trying to get the sum of the items sum(mon_orditems_pprice), and also the amount of items sold count(uid_orderitems).
When I run the following query against a specific order number, which I know has 1 order item. It returns a count of 2 and the sum of two items.
Item ProdTotal ProdCount
Westvale Climbing Frame 1198 2
This order has two payment records held in the tbl_payment table, which is causing the double count. If I remove the payment table join it reports the correct figures, or if I select an order which has a single payment it works as well. Am I missing something, I am tired!!??
SELECT
txt_orditems_pname,
SUM(mon_orditems_pprice) AS prodTotal,
COUNT(uid_orderitems) AS prodCount
FROM dbo.tbl_orders
INNER JOIN dbo.tbl_orderitems ON (dbo.tbl_orders.uid_orders = dbo.tbl_orderitems.uid_orditems_orderid)
INNER JOIN dbo.tbl_payment ON (dbo.tbl_orders.uid_orders = dbo.tbl_payment.uid_pay_orderid)
WHERE
uid_orditems_orderid = 61571
GROUP BY
dbo.tbl_orderitems.txt_orditems_pname
ORDER BY
dbo.tbl_orderitems.txt_orditems_pname
Any suggestions?
Thank you.
Drill down Table columns
dbo.tbl_payment.bit_pay_paid (1/0) Has this payment been paid, yes no
dbo.tbl_orders.bit_order_archive (1/0) Is this order archived, yes no
dbo.tbl_orders.uid_order_webid (integer) Web Shop's ID
dbo.tbl_orders.bit_order_preorder (1/0) Is this a pre-order, yes no
YEAR(dbo.tbl_orders.dte_order_stamp) (2012) Sales year
dbo.tbl_orders.txt_order_status (varchar) Is the order dispatched, awaiting delivery
dbo.tbl_orderitems.uid_orditems_pcatid (integer) Product category ID
It's a normal behavior, if you remove grouping clause you'll see that there really are 2 rows after joining and they both have 599 as a mon_orditems_pprice hence the SUM is correct. When there is a multiple match in any joined table the entire output row becomes multiple and the data that is being summed (or counted or aggregated in any other way) also gets summed multiple times. Try this:
SELECT txt_orditems_pname,
SUM(mon_orditems_pprice) AS prodTotal,
COUNT(uid_orderitems) AS prodCount
FROM dbo.tbl_orders
INNER JOIN dbo.tbl_orderitems ON (dbo.tbl_orders.uid_orders = dbo.tbl_orderitems.uid_orditems_orderid)
INNER JOIN
(
SELECT x.uid_pay_orderid
FROM dbo.tbl_payment x
GROUP BY x.uid_pay_orderid
) AS payments ON (dbo.tbl_orders.uid_orders = payments.uid_pay_orderid)
WHERE
uid_orditems_orderid = 61571
GROUP BY
dbo.tbl_orderitems.txt_orditems_pname
ORDER BY
dbo.tbl_orderitems.txt_orditems_pname
I don't know what data from tbl_payment you are using, are any of the columns from the SELECT list actually from tbl_payment? Why is tbl_payment being joined?

Multiple records joined Access SQL

I'm not sure if what I want to do is possible but if it is possible, it's probably a really easy solution that I just can't figure out. Once things get to a certain complexity though, my head starts spinning. Please forgive my ignorance.
I have a database running in MS Access 2007 for a school which has a plethora of tables joined to each other. I'm trying to create a query in which I get information from several tables. I'm looking up sales and payment information for different customers, pulling info from several different linked tables. Each sale is broken down into one of 4 categories, Course Fee, Registration Fee, Book Fee and Others. Because each customer will have multiple purchases, each one is a separate entry in the Sales table. The payment information is also in its own table.
My SQL currently looks like this:
SELECT StudentContracts.CustomerID, (Customers.CFirstName & " " & Customers.CLastName) AS Name, Customers.Nationality, Courses.CourseTitle, (StudentContracts.ClassesBought + StudentContracts.GiftClasses) AS Weeks, StudentContracts.StartDate, Sales.SaleAmount, SaleType.SaleType, Sales.DueDate, Payments.PaymentAmount
FROM (
(
(Customers INNER JOIN StudentContracts ON Customers.CustomerID = StudentContracts.CustomerID)
INNER JOIN Payments ON Customers.CustomerID = Payments.CustomerID)
INNER JOIN
(SaleType INNER JOIN Sales ON SaleType.SalesForID = Sales.SalesForID)
ON Customers.CustomerID = Sales.CustomerID)
INNER JOIN
(
(Courses INNER JOIN Classes ON Courses.CourseID = Classes.CourseID)
INNER JOIN StudentsClasses ON Classes.ClassID = StudentsClasses.ClassID)
ON Customers.CustomerID = StudentsClasses.CustomerID;
This works and brings up the information I need. However, I am getting one record for each sale as in:
CustomerID Name ... SaleAmount SaleType PaymentAmount
1 Bob $600 Course $1000
1 Bob $300 RgnFee $1000
1 Bob $100 Book $1000
What I need is one line for each customer but each sale type in it's own column in the row with the sale amount listed in its value field. As so:
CustomerID Name ... Course RgnFee Book Others PaymentAmount
1 Bob $600 $300 $100 $1000
Can anyone help and possibly explain what I should/need to be doing?
Thanks in advance!
You can create a cross tab from the query you have already created. Add the query to the Query Design Grid, choose Crosstab from query types, and select a Row or rows, Column and Value.
Say:
TRANSFORM Sum(t.SaleAmount) AS SumOfSaleAmount
SELECT t.ID, t.Name, Sum(t.SaleAmount) AS Total
FROM TableQuery t
GROUP BY t.ID, t.Name
PIVOT t.SaleType
If you want a certain order, you can edit the property sheet to include column headings, or you can add an In statement to the SQL. Note that if you add column headings, a column will be included for each column, whether or not data is available, and more importantly, a column will not be included that has data, if it is not listed.
TRANSFORM Sum(t.SaleAmount) AS SumOfSaleAmount
SELECT t.ID, t.Name, Sum(t.SaleAmount) AS Total
FROM TableQuery t
GROUP BY t.ID, t.Name
PIVOT t.SaleType In ("Course","RgnFee","Book","Others");