SQL COUNT(*) Confusion

SQL COUNT(*) Confusion - sql

This is from a book Murachs SQL Server for Developers:
I have this summary query that calculates the number of invoices and the average invoice amount for the vendors in each state and city group. I understand the majority of the code but one part that confuses me is the COUNT(*) aggregate. According to the book, this aggregate will get the number of invoices a State, City Group has.
I cannot seem to follow the logic to me it looks like the COUNT(*) in the SELECT statement will give a total of how many times a vendor state/city group will appear in the vendor's table not in the invoices table.
SELECT VendorState, VendorCity, COUNT(*) AS 'Invoice QTY',
AVG(InvoiceTotal) AS 'InvoiceAvg'
FROM Invoices JOIN Vendors
ON Invoices.VendorID = Vendors.VendorID
GROUP BY VendorState, VendorCity
HAVING COUNT(*) >= 2
ORDER BY VendorState, VendorCity;`

This is how I would parse your query:
SELECT VendorState, VendorCity, COUNT(*) AS 'Invoice QTY',
AVG(InvoiceTotal) AS 'InvoiceAvg'
Okay, you select some blah blah. I'd come back here after parsing to see whether the chosen columns are really available, but here you have no errors so I'll assume the columns are good.
FROM Invoices
You start with all the invoices (perhaps this is the point that confuses you).
JOIN Vendors
ON Invoices.VendorID = Vendors.VendorID
Each Invoice is joined to a single Vendor (VendorID is a primary key), so the cardinality does not change (assuming all vendors are still in place of course; an invoice with no matching VendorId will "disappear". Usually this is not the case when invoices and vendors are involved; you might have a flag to exclude "terminated" vendors, but you wouldn't remove invoices from the database). The important thing is, if you had 1,000 invoices, you now have 1,000 rows after the JOIN, not 2,000 or any other number. So, you're still working with invoices.
GROUP BY VendorState, VendorCity
Okay, so the COUNT(*) refers to the invoices of each single city in each single state. The HAVING clause restricts the results to those cities where at least two invoices are present.

Simply put, the HAVING clause is evaluated after the JOIN is performed. So the number of rows counted will be the number of invoices (less any invoices that are missing a valid vendorID, and thus fail to Join)

First, I would suggest writing the query as:
SELECT v.VendorState, v.VendorCity,
COUNT(*) AS InvoiceQTY, AVG(i.InvoiceTotal) AS InvoiceAvg
FROM Invoices i JOIN
Vendors v
ON i.VendorID = v.VendorID
GROUP BY v.VendorState, v.VendorCity
HAVING COUNT(*) >= 2
ORDER BY v.VendorState, v.VendorCity;
The changes are:
Use table aliases to make the query easier to write and read.
Qualify all column references.
Only use single quotes for strings, not column names.
Avoid column names that need to be escaped.
The COUNT(*) is counting neither the number of "vendors" nor "invoices" -- well, not directly. It is counting the number of rows that match after the JOIN takes place.
Based on your naming convention, each invoice matches exactly one vendor. So, when you use COUNT(*) you are counting invoices, not vendors.

Related

Sum matching entries in SQL

In this database I need to find the total amount that each customer paid for books in a category, and then sort them by their customer ID. The code appears to run correctly but I end up with approximately 20 extra rows than I should, although the sum appears to be correct in the right rows.
The customer ID is part of customer, but is not supposed to appear in the select clause, when I try and ORDER BY it, I get strange errors. The DB engine is DB2.
SELECT distinct customer.name, book.cat, sum(offer.price) AS COST
FROM offer
INNER JOIN purchase ON purchase.title=offer.title
INNER JOIN customer ON customer.cid=purchase.cid
INNER JOIN member ON member.cid=customer.cid
INNER JOIN book ON book.title=offer.title
WHERE
member.club=purchase.club
AND member.cid=purchase.cid AND purchase.club=offer.club
GROUP BY customer.name, book.cat;

You should fix your join conditions to include the ones in the where clause (between table relationships usually fit better into an on clause).
SELECT DISTINCT is almost never appropriate with a GROUP BY.
But those are not your question. You can use an aggregation function:
GROUP BY customer.name, book.cat
ORDER BY MIN(customer.id)

calculates between These two columns in SQL server [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I want to add a column to the query that calculates between These two columns In the same query ....................................................................
,isnull(sum(ORDERS.Net_Amount) + 0,0) as orders
,isnull(sum (convert(float,(RECEIPTS.Amount))) + 0,0) as recepts
IN SQL SERVER
SELECT CUSTOMERS.[ID_CUSTOMER]
,[FIRST_NAME]
,[TEL]
,[EMAIL]
,isnull(sum(ORDERS.Net_Amount) + 0,0) as orders
,isnull(sum (convert(float,(RECEIPTS.Amount))) + 0,0) as recepts
,[CRIDIT_LIMIT]
,[CUSTOMER_SINCE]
,[ADRESS]
,CUSTOMERS.[state]
FROM [CUSTOMERS]
LEFT JOIN ORDERS on CUSTOMERS.ID_CUSTOMER = ORDERS.CUSTOMER_ID
LEFT JOIN RECEIPTS on CUSTOMERS.ID_CUSTOMER = RECEIPTS.ID_CUSTOMER
GROUP BY CUSTOMERS.[ID_CUSTOMER]
,[FIRST_NAME]
,[TEL]
,[EMAIL]
,[CRIDIT_LIMIT]
,[CUSTOMER_SINCE]
,[ADRESS]
,CUSTOMERS.[state]
CUSTOMERS table
SELECT [ID_CUSTOMER]
,[FIRST_NAME]
,[TEL]
,[EMAIL]
,[IMAGE_CUSTOMER]
,[CRIDIT_LIMIT]
,[CUSTOMER_SINCE]
,[ADRESS]
,[Balance]
,[state]
FROM [CUSTOMERS]
ORDERS table
SELECT [ID_ORDER]
,[DATE_ORDER]
,[CUSTOMER_ID]
,[DESCRIPTION_ORDERS]
,[SALEMAN]
,[ORDER_TOTAL]
,[Discount_Of_Total]
,[Total_After_Discount]
,[Paid_Up]
,[Net_Amount]
,[state]
FROM [ORDERS]
RECEIPTS table
SELECT [image_state]
,[ID_RECEIPT]
,[ID_CUSTOMER]
,[Date]
,[Ref]
,[Amount]
,[Memo]
,[User_Name]
,[state]
,[Payment_Method]
,[Account_ID]
FROM [RECEIPTS]

Orders and receipts are not really related to each other (the receipt doesn't refer to a specific order), so don't join the two. What you want to do instead is find the order amount and the receipt amount per customer and show them. So aggregate the two tables per customer and outer-join the results to the customer table.
select
c.id_customer,
c.first_name,
c.tel,
c.email,
coalesce(o.sum_net_amount, 0) as order_amount,
coalesce(r.sum_amount, 0) as receipt_amount,
c.cridit_limit,
c.customer_since,
c.adress,
c.balance,
c.state
from customers c
left join
(
select customer_id, sum(net_amount) as sum_net_amount
from orders
group by customer_id
) o on c.id_customer = o.customer_id
left join
(
select id_customer, sum(amount) as sum_amount
from receipts
group by id_customer
) r on c.id_customer = r.id_customer;
I see you have updated your request now asking also for the difference of the sums. Well, the operator for subtraction in SQL is - little surprising - the minus sign:
coalesce(o.sum_net_amount, 0) - coalesce(r.sum_amount, 0) as diff

Please try the following...
SELECT Customers.ID_Customer,
first_name,
tel,
email,
SUM( net_amount ) AS OrdersTotal,
COALESCE( sumAmount, 0 ) AS PaymentsTotal,
SUM( net_amount ) - COALESCE( sumAmount, 0 ) AS DifferenceInTotals,
cridit_limit,
customer_since,
adress,
balance,
state
FROM Customers
INNER JOIN Orders ON Customers.ID_Customer = Orders.Customer_ID
LEFT JOIN ( SELECT ID_Customer,
SUM( amount ) AS sumAmount
FROM Receipts
GROUP BY ID_Customer
) AS sumAmountFinder ON Customers.ID_Customer = sumAmountFinder.ID_Customer
GROUP BY Customers.ID_Customer,
first_name,
tel,
email,
cridit_limit,
customer_since,
adress,
balance,
state;
This Answer is based on the assumption that every Customer will have at least one Order, but possibly no Receipts.
This statement is essentially the one that you supplied with the following modifications...
I have changed the JOIN to Orders to an INNER JOIN, since I am assuming that each Customer will have at least one Order. The LEFT JOIN is only necessary where you wish to retain all records from the left table that do not have at least one matching record from the right table as defined by the ON clause. (Note : If you wish to retain all the records from the right table where there is no matching records from the left table, use a RIGHT JOIN).
I have replaced the JOIN to the Receipts table with a subquery that calculates the total of the amount field for each Customer in the Receipts table. A LEFT JOIN is necessary between Customers and the results of this subquery as not all Customers will have a Receipt. In such situations the LEFT JOIN will set each of the fields from the subquery in the joined dataset to NULL.
Where the SUM() function encounters only NULL values it returns NULL, not 0. So that PaymentsTotal will be set to 0 for records where the Customer has no Receipts, I have used the COALESCE() function. This function will return the first non-NULL argument it encounters. Here I have set it to return the total of amount where it encounters one, and 0 where it encounters no total amount.
I have removed all of the square brackets from your field and table names. They are only required where you have used an otherwise disallowed name, such as names with spaces (use [Full Name] instead of Full Name) or names that are also reserved by SQL-Server (if you had decided to call PaymentsTotal Sum, then you would have had to use AS [Sum]). Many programmers consider giving fields such names to be bad practice, even when it is possible with []'s, but fortunately you have not used any otherwise names.
I have removed the table names from your SUM() calculations. Since only one table has a field called net_amount, then it will be a unique field name in the joined dataset, and you will be able to refer to them without specifying the name of the source table as well. Specifying the source table is still necessary in the case of Customers.ID_Customer as the joined dataset will have more than one field called ID_Customer. Also, you will need to specify the source tables names when creating the joined dataset using the JOIN's.
I have also taken the liberty of changing your capitalisation scheme. Having just about everything in constant upper-case is monotonous to the eye. Using different casing for SQL terms, table names and field names makes recognising each type of statement part much easier, and thus makes debugging code much easier.
Finally, and relatively trivially, cridit is actually spelt credit and adress is actually spelt address.
If you have any questions or comments, then please feel free to post a Comment accordingly.

sql join query not returning data, just blank

I am trying to pull data from 2 separate tables but only specific columns, then I do a join so that only one row per client is displaying with the total sum of their payments but the information is not displaying. Code is below. It is for a reports page so think about geting the sum of all payments. I know what I am looking for, I just think that maybe there is a bug in the query that I can't seem to catch. I could use an extra pair of eyes to point out the flaw if possible. Thanks
SELECT pre.id, pre.loanAmount, pre.custId,
SUM(pay.amount) AS amount,
DISTINCT(pay.company) AS company,
DISTINCT(pay.loanId) AS loanId
FROM preQualForm pre INNER JOIN
payments pay
ON pre.custId=pay.custId

The DISTINCT Keyword applies to all the columns you SELECT,
so if you need to also do aggregate functions like SUM,
then it is better achieved using a GROUP BY clause on non-aggregate columns.
The following should work.
SELECT
pre.id
, pre.loanAmount
, pre.custId
, SUM(pay.amount) AS Amount
, pay.company AS Company
, pay.loanId AS LoanId
FROM preQualForm pre
INNER JOIN payments pay
ON pre.custId = pay.custId
GROUP BY
pre.id
, pre.loanAmount
, pre.custId
, pay.company
, pay.loanId

for starters you are Missing the GROUP BY clause. (There might be other issues but for that we need data and expected output)
SELECT pre.id, pre.loanAmount, pre.custId, SUM(pay.amount) AS amount,
DISTINCT(pay.company) AS company,
DISTINCT(pay.loanId) AS loanId
FROM preQualForm pre
INNER JOIN payments pay
ON pre.custId=pay.custId
GROUP BY pre.id, pre.loanAmount, pre.custId

DISTINCT() is not a function in SQL (at least not in any dialect I am familiar with). I would start with this query:
SELECT pre.custId, SUM(pay.amount) AS amount
FROM preQualForm pre INNER JOIN
payments pay
ON pre.custId = pay.custId
GROUP BY pre.custId;
It would seem to return what you want. You can enhance it if this does not return all the information you really want.

GROUP BY clause with logical functions

I'm using Oracle 11g Application Express, and executing these commands within the SQL Plus CLI. This is for a class, and I cannot get past this problem. I don't know how to add the total quantity of the items on the orders - I get confused as I don't know how to take the SUM of the QUANTITY per ORDER (customers have multiple orders).
For each customer having an order, list the customer number, the number of orders
that customer has, the total quantity of items on those orders, and the total price for
the items. Order the output by customer number. (Hint: You must use a GROUP BY
clause in this query).
Tables (we will use):
CUSTOMER: contains customer_num
ORDERS: contains order_num, customer_num
ITEMS: contains order_num, quantity, total_price
My logic: I need to be able to calculate the sum of the quantity per order number per customer. I have sat here for over an hour and cannot figure it out.
So far this is what I can formulate..
SELECT customer_num, count(customer_num)
FROM orders
GROUP BY customer_num;
I don't understand how to GROUP BY very well (yes, I have googled and researched it for a bit, and it just isn't clicking), and I have no clue how to take the SUM of the QUANTITY per ORDER per CUSTOMER.
Not looking for someone to do my work for me, just some guidance - thanks!

select o.customer_num,
count(distinct o.order_num) as num_orders,
sum(i.quantity) as total_qty,
sum(i.total_price) as total_price
from orders o
join items i
on o.order_num = i.order_num
group by o.customer_num
order by o.customer_num
First thing:
you have to join the two tables necessary to solve the problem (orders and items). The related field appears to be order_num
Second thing:
Your group by clause is fine, you want one row per customer. But because of the join to the items table, you will have to count DISTINCT orders (because there may be a one to many relationship between orders and items). Otherwise an order with 2 different associated items would be counted twice.
Next, sum the quantity and total price, you can do this now because you've joined to the needed table.

This is also solved by using WHERE:
SELECT orders.customer_num, /*customer number*/
COUNT(DISTINCT orders.order_num) AS "num_orders", /*number of orders/c*/
SUM(items.quantity) as "total_qty", /*total quantity/c*/
SUM(items.total_price) as "total_price" /*total price/items*/
/* For each customer having an order, list the customer number,
the number of orders that customer has, the total quantity of items
on those orders, and the total price for the items.
Order the output by customer number.
(Hint: You must use a GROUP BY clause in this query). */
FROM orders, items
WHERE orders.order_num = items.order_num
GROUP BY orders.customer_num
ORDER BY orders.customer_num
;

Select records for MySQL only once based on a column value

I have a table that stores transaction information. Each transaction is has a unique (auto incremented) id column, a column with the customer's id number, a column called bill_paid which indicates if the transaction has been paid for by the customer with a yes or no, and a few other columns which hold other information not relevant to my question.
I want to select all customer ids from the transaction table for which the bill has not been paid, but if the customer has had multiple transactions where the bill has not been paid I DO NOT want to select them more than once. This way I can generate that customer one bill with all the transactions they owe for instead of a separate bill for each transaction. How would I build a query that did that for me?

Returns exactly one customer_id for each customer with bill_paid equal to 'no':
SELECT
t.customer_id
FROM
transactions t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
Edit:
GROUP BY summarises your resultset.
Caveat: Every column selected must be either 'grouped by' or aggregated in some fashion. As shown by nikic you could use SUM to get the total amount owed, e.g.:
SELECT
t.customer_id
, SUM(t.amount) AS TOTAL_OWED
FROM
transactions AS t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
t is simply an alias.
So instead of typing transactions everywhere you can now simply type t. The alias is not necessary here since you query only one table, but I find them invaluable for larger queries. You can optionally type AS to make it more clear that you're using an alias.

You might try the Group By operator, eg group by the customer.

SELECT customer, SUM(toPay) FROM .. GROUP BY customer

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL COUNT(*) Confusion - sql

Simply put, the HAVING clause is evaluated after the JOIN is performed. So the number of rows counted will be the number of invoices (less any invoices that are missing a valid vendorID, and thus fail to Join)

Related

Sum matching entries in SQL

calculates between These two columns in SQL server [closed]

sql join query not returning data, just blank

GROUP BY clause with logical functions

Select records for MySQL only once based on a column value

Categories

Resources