PostgreSQL Query to JOIN and SUM - sql

I have 2 tables:
orders
orderItems
SUM TOTAL (products price total) of each order s saved on table orders field total. I need to connect these 2 tables and get the sum total and count from the values saved in orders table an example is below
SELECT
count(orders.id), sum(orders.total)
FROM
orders
INNER JOIN orderItems ON orderItems.order_no = orders.order_no
AND orders.order_no LIKE 'P%' AND orderItems.pCode IN ('1','2','3','4')
How do I get the sum and count from single query?

This is a stab in the dark, but based on your updated comments I think I might know what you are dealing with. It seems like you are doing a sum and count on the order header level from the "orders" table, but by joining to the lines table you are getting multiple records, thus getting a seemingly arbitrary multiplication of both aggregates.
If this is the case, where you only want to sum and count the order header if there is one or more lines that meet your criteria (pCode in 1, 2, 3, 4) then what you want is a semi-join, using the exists clause.
SELECT
count(orders.id), sum(orders.total)
FROM
orders o
where
o.order_no like 'P%' and
exists (
select null
from orderItems i
where
o.order_no = i.order_no and
i.pCode in ('1', '2', '3', '4')
)
What this does is even if you have multiple lines meeting your condition(s), it will still only sum each header once. The syntax takes some getting used to, but the construct itself is very useful and efficient. The alternative would be a subquery "in" list, which on PostgreSQL would not run as efficiently for large datasets.
If that's not what you meant, please edit your question with the sample data and what you expect to see for the final output.

If you want to use aggregates (e.g. SUM, COUNT) across values (e.g. pCode) then you need to use a GROUP BY clause on the non-aggregated columns:
SELECT
orderItems.pCode,
COUNT(orders.id) AS order_count,
SUM(orders.total) AS order_total
FROM orders
INNER JOIN orderItems
ON orderItems.order_no = orders.order_no
WHERE orders.order_no LIKE 'P%'
AND orderItems.pCode IN ('1','2','3','4')
GROUP BY
orderItems.pCode
Note how orderItems.pCode is in both the SELECT clause and the GROUP BY clause. If you wanted to list by orders.order_no as well then you would add that column to both clauses too.

Related

Customers who have puchased two specific products

I want to make two lists.
Customers who have bought product 'a' OR product 'b'
Customers who have bought product 'a' AND product 'b'
Number 1 is easy
SELECT SalesTable.OrderAccount
FROM SalesTable
WHERE SalesTable.ItemID = 'a' OR SalesTable.ItemId = 'b'
How do I solve number 2?
Thank you
There are a few ways to do this.
One way is to select where SalesTable.ItemID IN ('a', 'b'). Then you GROUP BY the customer and select rows HAVING two records. You have to be careful here to make sure you're also limiting to one order of each type. This method can be good because, done right, it limits you to one check through your table, but it can be tricky avoiding situations where a single customer may have ordered the same product more than once. It look something like this:
With T As (
SELECT DISTINCT OrderAccount, ItemID
FROM SalesTable
WHERE ItemID IN ('a', 'b')
)
SELECT OrderAccount
FROM T
GROUP BY OrderAccount
HAVING COUNT(ItemID) == 2
Another way is to JOIN SalesTable to itself using a different alias name for each instance of the table, where the join conditions restrict each instance of the table to a different one of the desired products and the same customer. This is more reliable about multiple orders for the same product, but it has to look through the table twice and multiply items in the result set when there are multiple matches on both sides of the JOIN.
SELECT DISTNCT s1.OrderAccount
FROM SalesTable s1
INNER JOIN SalesTable s2 ON s1.OrderAccount = s2.OrderAccount AND s2.ItemID = 'b'
WHERE s1.ItemID = '1'
Another option is using the row_number() windowing function to partition by your products and look for the second row. This fits somewhere in between the two: it only goes through the full table once, but must then review the (somewhat smaller) initial results to get to the final answer. However, the query optimizer can often make this perform just as well as the first option.
SELECT OrderAccount
FROM (
SELECT OrderAccount, row_number()
OVER (partition by OrderAccount, ItemID ORDER BY OrderAccount, ItemID) rn
FROM SalesTable
WHERE ItemID IN ('a', 'b')
) T
WHERE rn = 2
The first query can be written as
SELECT DISTINCT orderaccount
FROM salestable
WHERE itemid IN ('a', 'b');
The DISTINCT is necessary to get each orderaccount only once, no matter if they ordered only item a or only item b or both and whether they ordered them once or many times.
If you want to get order customers that bought both products, then use the same query, but group by customer and count distinct products:
SELECT orderaccount
FROM salestable
WHERE itemid IN ('a', 'b')
GROUP BY orderaccount
HAVING COUNT(DISTINCT itemid) = 2;
As we GROUP BY orderaccount, we don't need to SELECT DISTINCT anymore, because with the GROUP BY clause we already aggregate per orderaccount and get each just once.
If you have an order account table, you can also use IN or EXISTS for the lookup. Thus the DBMS can stop reading once it found a matching purchase. This may not matter here (a customer probably won't buy an item again and again and again), but in other situations (imagine a store selling an item a million times and you merely want to know whether the store sold it at least once or not at all) it can be very beneficial:
SELECT orderaccount
FROM orderaccounts
WHERE orderaccount IN (SELECT orderaccount FROM salestable WHERE itemid = 'a')
AND orderaccount IN (SELECT orderaccount FROM salestable WHERE itemid = 'b');
The same applies to the first query of course, where you would have OR instead of AND. With a large intermediate data set, DISTINCT can be a costly operation.

using a subquery with a having

So the goal is to get a list of customers that have on average ordered more than the total average of all customers.
Select customerNumber, customerName, orderNumber, SUM(quantityOrdered)as 'total_qty', ROUND(AVG(quantityOrdered),2) as 'avg'
From customers
join orders using(customerNumber)
join orderdetails using (orderNumber)
Group by customerNumber, OrderNumber
Having ROUND(AVG(quantityOrdered),2) > ROUND(AVG(quantityOrdered),2) IN
(SELECT ROUND(AVG(quantityOrdered),2) FROM orderdetails)
ORDER BY customerName;
My code runs but it doesn't filter the results on the avg quantity ordered column to only show results over the total average of 35.22.
Possibly, you mean:
select c.customernumber, c.customername,
sum(od.quantity_ordered) as sum_qty,
round(avg(od.quantity_ordered), 2) as avg_dty
from customers c
join orders o using(customerNumber)
join orderdetails od using (orderNumber)
group by customernumber, customername
having avg(od.quantity_ordered) > (select avg(quantity_ordered) from orderdetails)
Rationale:
you discuss computing the average ordered, but what your query does is compare the average order detail quantity per customer; this assumes that the latter is what you want
then: since you want an average per customer, so do not put the order number in the group by
no need for in or the-like in the having clause: just compare the customer's average against a scalar subquery that computes the overall
Notes:
don't use single quotes for identifiers (such as column aliases) - they are meant for literal strings
table aliases make the query easier to write and read; prefixing all columns with the alias of the table they belong to makes the query understandable

Creating variable in SQL and using in WHERE clause

I want to create a variable that counts the number of times each customer ID appears in the CSV, and then I want the output to be all customer IDs that appear 0,1,or 2 times. Here is my code so far:
SELECT Customers.customer_id , COUNT(*) AS counting
FROM Customers
LEFT JOIN Shopping_cart ON Customers.customer_id = Shopping_cart.customer_id
WHERE counting = '0'
OR counting = '1'
OR counting = '2'
GROUP BY Customers.customer_id;
SELECT Customers.customer_id , COUNT(*) AS counting
FROM Customers LEFT JOIN Shopping_cart on Customers.customer_id=Shopping_cart.customer_id
WHERE COUNT(*) < 3
GROUP BY Customers.customer_id;
The query groups all customer ids, and with count() we get the number of items in a group. So for your solution you call the group count() and say only the items where the group count is smaller then 3. Smaller then 3 includes (0,1,2). You can reuse the count() in the query.
You're probably thinking of HAVING, not WHERE.
For example:
select JOB, COUNT(JOB) from SCOTT.EMP
group by JOB
HAVING count(JOB) > 1 ;
While a tad odd, you may be specific about the HAVING condition(s):
HAVING count(JOB) = 2 or count(JOB) = 4
Note: the WHERE clause is used for filtering rows and it applies on each and every row, while the HAVING clause is used to filter groups.
You can apply a filter after the aggregation with the HAVING clause.
Please note that count(*) counts all rows, including empty ones, so you cannot use it to detect customers without any shopping cart; you have to count the non-NULL values in some column instead:
SELECT customer_id,
count(Shopping_cart.some_id) AS counting
FROM Customers
LEFT JOIN Shopping_cart USING (customer_id)
GROUP BY customer_id
HAVING count(Shopping_cart.some_id) BETWEEN 0 and 2;

Counting multiple columns from multiple tables with a JOIN in SQL Server within a stored procedure

I have the task of creating a stored procedure that will count SalesOrderID and SalesOrderDetailID columns from three different tables by orderdate.
Table 1: Salesorderheader
Columns
Sales OrderID
Orderdate
Table 2: Salesorderdetail
Columns
Sales OrderID
Sales OrderDetailID
I am inner joining both tables on the SalesID column, but I keep getting the same count for both the Salesid and the salesdetailId column.
When count both columns separately, I get the correct count, but when I put it in a join I get the same count for both columns.
My question is why is SQL Server making the columns equal?
When you do a count of multiple columns in a single query, it is simply counting the total number of records. What you need to do is use sub-queries to get counts for different columns.
If you want a count of salesid and salesdetailid:
select count(distinct salesid), count(salesdetailid)
from Salesorderheader soh join
Salesorderdetail sod
on soh.salesid = sod.salesid;
In general count(<column name>) counts the non-null values of the column. It does not count the number of distinct values.
The Count function counts the total number of Non Null rows including the duplicates. In your case salesOrderID would have duplicate values in the result set after the join with SalesOrderDetail table. Either you would have to use
Count(Distinct SalesOrderID)
OR you can use a subquery like
select (select COUNT(SalesorderID) from Salesorderheader) as OrderIDCount,
(select COUNT(SalesorderDetailID) from SalesOrderDetail) as OrderDetailIDCount

PostgreSQL how to compare a column's numeric value against the SUM of another numeric value?

I am new to PostgreSQL and restored this Database in order to practice my Queries. It contains the following Tables:
What is the best query to find how many orders have an order_total that is less than the sum of their line_total(s)?
This is the query I have but I doubt that my number is accurate. I feel like I am doing something wrong:
select COUNT(order_total) from orders
join order_lines
on orders.id = order_lines.order_id
having count(order_total) < sum(line_total)
Am I querying correctly or not?
Thanks
Pill
perhaps something like this
select o.order_total,sum( l.line_total) as sum_line_total
from orders o join order_lines l on o.orders.id=l.order_id
group by o.order_total
having o.order_total < sum(l.line_total)
That answer gave more than one result. I experimented and came across the correct answer by using the following sub-query:
select count(*) from orders,
(Select order_id, sum(line_total) from order_lines group by order_id) a
where order_total < a.sum and order_id = orders.id;
Thanks though for the response
Pill