Sum matching entries in SQL - sql

In this database I need to find the total amount that each customer paid for books in a category, and then sort them by their customer ID. The code appears to run correctly but I end up with approximately 20 extra rows than I should, although the sum appears to be correct in the right rows.
The customer ID is part of customer, but is not supposed to appear in the select clause, when I try and ORDER BY it, I get strange errors. The DB engine is DB2.
SELECT distinct customer.name, book.cat, sum(offer.price) AS COST
FROM offer
INNER JOIN purchase ON purchase.title=offer.title
INNER JOIN customer ON customer.cid=purchase.cid
INNER JOIN member ON member.cid=customer.cid
INNER JOIN book ON book.title=offer.title
WHERE
member.club=purchase.club
AND member.cid=purchase.cid AND purchase.club=offer.club
GROUP BY customer.name, book.cat;

You should fix your join conditions to include the ones in the where clause (between table relationships usually fit better into an on clause).
SELECT DISTINCT is almost never appropriate with a GROUP BY.
But those are not your question. You can use an aggregation function:
GROUP BY customer.name, book.cat
ORDER BY MIN(customer.id)

Related

SQL COUNT(*) Confusion

This is from a book Murachs SQL Server for Developers:
I have this summary query that calculates the number of invoices and the average invoice amount for the vendors in each state and city group. I understand the majority of the code but one part that confuses me is the COUNT(*) aggregate. According to the book, this aggregate will get the number of invoices a State, City Group has.
I cannot seem to follow the logic to me it looks like the COUNT(*) in the SELECT statement will give a total of how many times a vendor state/city group will appear in the vendor's table not in the invoices table.
SELECT VendorState, VendorCity, COUNT(*) AS 'Invoice QTY',
AVG(InvoiceTotal) AS 'InvoiceAvg'
FROM Invoices JOIN Vendors
ON Invoices.VendorID = Vendors.VendorID
GROUP BY VendorState, VendorCity
HAVING COUNT(*) >= 2
ORDER BY VendorState, VendorCity;`
This is how I would parse your query:
SELECT VendorState, VendorCity, COUNT(*) AS 'Invoice QTY',
AVG(InvoiceTotal) AS 'InvoiceAvg'
Okay, you select some blah blah. I'd come back here after parsing to see whether the chosen columns are really available, but here you have no errors so I'll assume the columns are good.
FROM Invoices
You start with all the invoices (perhaps this is the point that confuses you).
JOIN Vendors
ON Invoices.VendorID = Vendors.VendorID
Each Invoice is joined to a single Vendor (VendorID is a primary key), so the cardinality does not change (assuming all vendors are still in place of course; an invoice with no matching VendorId will "disappear". Usually this is not the case when invoices and vendors are involved; you might have a flag to exclude "terminated" vendors, but you wouldn't remove invoices from the database). The important thing is, if you had 1,000 invoices, you now have 1,000 rows after the JOIN, not 2,000 or any other number. So, you're still working with invoices.
GROUP BY VendorState, VendorCity
Okay, so the COUNT(*) refers to the invoices of each single city in each single state. The HAVING clause restricts the results to those cities where at least two invoices are present.
Simply put, the HAVING clause is evaluated after the JOIN is performed. So the number of rows counted will be the number of invoices (less any invoices that are missing a valid vendorID, and thus fail to Join)
First, I would suggest writing the query as:
SELECT v.VendorState, v.VendorCity,
COUNT(*) AS InvoiceQTY, AVG(i.InvoiceTotal) AS InvoiceAvg
FROM Invoices i JOIN
Vendors v
ON i.VendorID = v.VendorID
GROUP BY v.VendorState, v.VendorCity
HAVING COUNT(*) >= 2
ORDER BY v.VendorState, v.VendorCity;
The changes are:
Use table aliases to make the query easier to write and read.
Qualify all column references.
Only use single quotes for strings, not column names.
Avoid column names that need to be escaped.
The COUNT(*) is counting neither the number of "vendors" nor "invoices" -- well, not directly. It is counting the number of rows that match after the JOIN takes place.
Based on your naming convention, each invoice matches exactly one vendor. So, when you use COUNT(*) you are counting invoices, not vendors.

SQL: Nested subquery is returning entries incorrectly

I'm having some trouble querying a dataset with a nested subquery, which I thought would be pretty straightforward.
I have a table of customers and their addresses dbo.PERSON_ADDRESSES and transactions with customers dbo.TRANSACT_CUSTOMERS. It is very common for customers to have multiple addresses stored in the dbo.PERSON_ADDRESSES table over time. I simply would like to use the most recent transaction in the dbo.TRANSACT_CUSOMTERS table to a table of most recent addresses from the dbo.PERSON_ADDRESSES table.
When I run the inner subquery independently, it works fine: it shows the one most recent transaction per customer like I envisioned. But, for some reason when I run this entire query, I obtain many, many addresses per customer. I don't understand why.
SELECT MaxTransaction.PERSON_ID, Addr.*
FROM dbo.PERSON_ADDRESSES AS Addr
INNER JOIN
(SELECT PERSON_ID, Max(TRANSACTION_ID) AS MaxTID
FROM dbo.TRANSACTION_CUSTOMERS
GROUP BY PERSON_ID) AS MaxTransaction
ON MaxTransaction.MaxTID = Addr.TRANSACTION_ID
I am guessing that one transaction can have multiple customers. To get one row per person, use an additional JOIN condition:
SELECT maxp.PERSON_ID, pa.*
FROM dbo.PERSON_ADDRESSES pa JOIN
(SELECT PERSON_ID, Max(TRANSACTION_ID) AS MaxTID
FROM dbo.TRANSACTION_CUSTOMERS
GROUP BY PERSON_ID
) maxp
ON maxp.person_id = pa.person_id AND
maxp.MaxTID = pa.TRANSACTION_ID;

My question is about SQL, using a TOP function inside a sub-query in MS Access

Overall what I'm trying to achieve is a query that shows the most ordered item from a customer in a database. To achieve this I've tried making a query showing how many times a customer has ordered an item, and now I am trying to create a sub-query in it using TOP1 to discern the most bought items.
With the SQL from the first query (looking weird because I made it with the Access automatic creator):
SELECT
Customers.CustomerFirstName,
Customers.CustomerLastName,
Products.ProductName,
COUNT(SalesQuantity.ProductCode) AS CountOfProductCode
FROM (Employees
INNER JOIN (Customers
INNER JOIN Sales
ON Customers.CustomerCode = Sales.CustomerCode)
ON Employees.EmployeeCode = Sales.EmployeeCode)
INNER JOIN (Products
INNER JOIN SalesQuantity
ON Products.ProductCode = SalesQuantity.ProductCode)
ON Sales.SalesCode = SalesQuantity.SalesCode
GROUP BY
Customers.CustomerFirstName,
Customers.CustomerLastName,
Products.ProductName
ORDER BY
COUNT(SalesQuantity.ProductCode) DESC;
I have tried putting in a subquery after FROM line:
(SELECT TOP1 CountOfProduct(s)
FROM (.....)
ORDER by Count(SalesQuantity.ProductCode) DESC)
I'm just not sure what to put in for the "from"-every other tutorial has the data from an already created table, however this is from a query that is being made at the same time. Just messing around I've put "FROM" and then listed every table, as well as
FROM Count(SalesQuantity.ProductCode)
just because I've seen that in the order by from the other code, and assume that the query is discerning from this count. Both tries have ended with an error in the syntax of the "FROM" line.
I'm new to SQL so sorry if it's blatantly obvious, but any help would be greatly appreciated.
Thanks
As I understand, you want the most purchased product for each customer.
So, begin by building aggregate query that counts product purchases by customer (appears to be done in the posted image). Including customer ID in the query would simplify the next step which is to build another query with TOP N nested query.
Part of what complicates this is unique record identifier is lost because of aggregation. Have to use other fields from the aggregate query to provide unique identifier. Consider:
SELECT * FROM Query1 WHERE CustomerID & ProductName IN
(SELECT TOP 1 CustomerID & ProductName FROM Query1 AS Dupe
WHERE Dupe.CustomerID = Query1.CustomerID
ORDER BY Dupe.CustomerID, Dupe.CountOfProductCode DESC);
Overall what I'm trying to achieve is a query that shows the most ordered item from a customer in a database.
This answers your question. It does not modify your query which is only tangentially related.
SELECT s.CustomerCode, sq.ProductCode, SUM(sq.quantity) as qty
FROM Sales as s INNER JOIN
SalesQuantity as sq
ON s.SalesCode = sq.SalesCode
GROUP BY s.CustomerCode, sq.ProductCode;
To get the most ordered items, you can use this twice:
SELECT s.CustomerCode, sq.ProductCode, SUM(sq.quantity) as qty
FROM Sales as s INNER JOIN
SalesQuantity as sq
ON s.SalesCode = sq.SalesCode
GROUP BY s.CustomerCode, sq.ProductCode
HAVING sq.ProductCode IN (SELECT TOP 1 sq2.ProductCode
FROM Sales as s2 INNER JOIN
SalesQuantity as sq2
ON s2.SalesCode = sq2.SalesCode
WHERE s2.CustomerCode = s.CustomerCode
GROUP BY sq2.ProductCode
);
In almost any other database, this would be simpler, because you would be able to use window functions.

How to glue two dependent tables?

I have Customer and Application tables. I want to create select query which provides info about a customer and also to count a number of applications user has in the system.
select distinct c.id, c.region, c.city, count(a.customer_id_id)
from customers c
join applications a on c.id=a.customer_id_id
group by c.id;
But I get an error that I need to group by region and city but I want to display info about each application not to group by region and city. Because in such a way I will get not a number of applications for each user but for each group of users.
I read that it's possible to do with nested queries and full outer join but I tried and it didn't work. Can you explain to me how to do that?
You are close.
Use a LEFT OUTER JOIN so that Customers with 0 records in Applications will also be included (assuming your intent here)
Don't use DISTINCT and GROUP BY together. Distinct means "If all the fields are the same value across multiple records in the record set produced by this SELECT statement, then only give back distinct records, dropping the duplicates". Instead with GROUP BY, "Group by this list of fields. Any remaining fields not in this list will be aggregated using a formula in your SELECT clause like count(a.customer_id_id)." They are similar, but you can't aggregate a field with merely a DISTINCT.
When using GROUP BY, if you are not going to aggregate a field with an aggregation formula (count, sum, avg, etc..) then you must include it in your group by. This isn't necessary with some RDBMS (older versions of MySQL, for example) but it's poor practice since a field that isn't explicitly aggregated with a formula that is also missing from the GROUP BY is like telling the RDBMS "Just pick which ever value you wish from matching records" which might have some unexpected consequences.
SELECT c.id, c.region, c.city, count(a.customer_id_id)
FROM customers c
LEFT OUTER JOIN applications a on c.id=a.customer_id_id
GROUP BY c.id, c.region, c.city;
Not sure what your problem is. I assume that region and city are functionally dependent of id (that is id is a candidate key). Newer versions of postgresql will therefor accept your query. However, if you're on an older version you can expand your group by clause to:
select c.id, c.region, c.city, count(a.customer_id_id)
from customers c
join applications a
on c.id=a.customer_id_id
group by c.id, c.region, c.city;
You say that you would like to display information about each application, but why are you then counting the number of applications per customer? Do you mean something like:
select c.id, c.region, c.city, a.customer_id_id, a.<other attributes>
from customers c
join applications a
on c.id=a.customer_id_id;

sql join query not returning data, just blank

I am trying to pull data from 2 separate tables but only specific columns, then I do a join so that only one row per client is displaying with the total sum of their payments but the information is not displaying. Code is below. It is for a reports page so think about geting the sum of all payments. I know what I am looking for, I just think that maybe there is a bug in the query that I can't seem to catch. I could use an extra pair of eyes to point out the flaw if possible. Thanks
SELECT pre.id, pre.loanAmount, pre.custId,
SUM(pay.amount) AS amount,
DISTINCT(pay.company) AS company,
DISTINCT(pay.loanId) AS loanId
FROM preQualForm pre INNER JOIN
payments pay
ON pre.custId=pay.custId
The DISTINCT Keyword applies to all the columns you SELECT,
so if you need to also do aggregate functions like SUM,
then it is better achieved using a GROUP BY clause on non-aggregate columns.
The following should work.
SELECT
pre.id
, pre.loanAmount
, pre.custId
, SUM(pay.amount) AS Amount
, pay.company AS Company
, pay.loanId AS LoanId
FROM preQualForm pre
INNER JOIN payments pay
ON pre.custId = pay.custId
GROUP BY
pre.id
, pre.loanAmount
, pre.custId
, pay.company
, pay.loanId
for starters you are Missing the GROUP BY clause. (There might be other issues but for that we need data and expected output)
SELECT pre.id, pre.loanAmount, pre.custId, SUM(pay.amount) AS amount,
DISTINCT(pay.company) AS company,
DISTINCT(pay.loanId) AS loanId
FROM preQualForm pre
INNER JOIN payments pay
ON pre.custId=pay.custId
GROUP BY pre.id, pre.loanAmount, pre.custId
DISTINCT() is not a function in SQL (at least not in any dialect I am familiar with). I would start with this query:
SELECT pre.custId, SUM(pay.amount) AS amount
FROM preQualForm pre INNER JOIN
payments pay
ON pre.custId = pay.custId
GROUP BY pre.custId;
It would seem to return what you want. You can enhance it if this does not return all the information you really want.