Aggregate after join without duplicates - sql

Consider this query:
select
count(p.id),
count(s.id),
sum(s.price)
from
(select * from orders where <condition>) as s,
(select * from products where <condition>) as p
where
s.id = p.order;
There are, for example, 200 records in products and 100 in orders (one order can contain one or more products).
I need to join then and then:
count products (should return 200)
count orders (should return 100)
sum by one of orders field (should return sum by 100 prices)
The problem is after join p and s has same length and for 2) I can write count(distinct s.id), but for 3) I'm getting duplicates (for example, if sale has 2 products it sums price twice) so sum works on entire 200 records set, but should query only 100.
Any thoughts how to sum only distinct records from joined table but also not ruin another selects?
Example, joined table has
id sale price
0 0 4
0 0 4
1 1 3
2 2 4
2 2 4
2 2 4
So the sum(s.price) will return:
4+4+3+4+4+4=23
but I need:
4+3+4=11

If the products table is really more of an "order lines" table, then the query would make sense. You can do what you want by in several ways. Here I'm going to suggest conditional aggregation:
select count(distinct p.id), count(distinct s.id),
sum(case when seqnum = 1 then s.price end)
from (select o.* from orders o where <condition>) s join
(select p.*, row_number() over (partition by p.order order by p.order) as seqnum
from products p
where <condition>
) p
on s.id = p.order;
Normally, a table called "products" would have one row per product, with things like a description and name. A table called something like "OrderLines" or "OrderProducts" or "OrderDetails" would have the products within a given order.

You are not interested in single product records, but only in their number. So join the aggregate (one record per order) instead of the single rows:
select
count(*) as count_orders,
sum(p.cnt) as count_products,
sum(s.price)
from orders as s
join
(
select order, count(*) as cnt
from products
where <condition>
group by order
) as p on p.order = s.id
where <condition>;

Your main problem is with table design. You currently have no way of knowing the price of a product if there were no sales on it. Price should be in the product table. A product cost a certain price. Then you can count all the products of a sale and also get the total price of the sale.
Also why are you using subqueries. When you do this no indexes will be used when joining the two subqueries. If your joins are that complicated use views. In most databases they can indexed

Related

Finding the most frequently occurring combination

I have two table with name Orders and Products,The order table contains the number of specific orders made by a customer and the products included in that order is in the Products table.
My requirement is to get the number of total orders against the most frequently coming products.
means for these products product 1,Product 2, product 3 what is the total orders,If an order contains 10 Products which contains Product 1 ,Product 2 and Product 3 that order should be counted.
For an order_id there can be multiple products will be there and i'm confused on how to get this result.Can anyone share or suggest a solution on how to get this?
I'm using PostgreSQL.
Below is the sample query ,
SELECT
"Orders"."order_id",pr.product_name
FROM
"data"."orders" AS "Orders"
LEFT JOIN data.items i On i."order_id"="Orders"."order_id"
LEFT join data.products pr on pr."product_id"=i."product_id"
WHERE TO_CHAR("Orders"."created_at_order",'YYYY-MM-DD') BETWEEN '2019-02-01' AND '2019-04-30'
ORDER BY "Orders"."order_id"
Desired Result will be like this(3 columns),The most purchased product combination with number of occurring orders.
Product 1, Product 2,Product 3,etc..... , Number Of Orders
This is the sample data output,Need the product list which is purchased in combination the most.(As of now i have given only 3 columns for sample but it may vary according to the number of PRODUCTS in an order).
and example
SELECT
"Orders"."order_id",
string_agg(DISTINCT pr.product_name,::character varying, ',') AS product_name
count(1) AS product_no
FROM
"data"."orders" AS "Orders"
LEFT JOIN data.items i On i."order_id"="Orders"."order_id"
LEFT join data.products pr on pr."product_id"=i."product_id"
WHERE TO_CHAR("Orders"."created_at_order",'YYYY-MM-DD') BETWEEN '2019-02-01' AND '2019-04-30'
GROUP BY "Orders"."order_id"
ORDER BY count(1);
You can try to use group by clause.
If you want to generally get the number of orders against some products then you can just count the number of orders grouped on the products from product table. Query should look something like this:
SELECT product_id, COUNT(*)
FROM data.products
GROUP BY product_id
ORDER BY COUNT(*)
LIMIT 1;
Hope this helps!
Try to use GROUP BY and take MOST counted value as below-
SELECT
pr.product_name,
COUNT(DISTINCT Orders.order_id)
FROM
"data"."orders" AS "Orders"
LEFT JOIN data.items i On i."order_id"="Orders"."order_id"
LEFT join data.products pr on pr."product_id"=i."product_id"
WHERE TO_CHAR("Orders"."created_at_order",'YYYY-MM-DD') BETWEEN '2019-02-01' AND '2019-04-30'
GROUP BY pr.product_name
ORDER BY COUNT(DISTINCT Orders.order_id) DESC
LIMIT 1 -- You can use the LIMIT or NOT as per requirement

Select most Occurred Value SQL with Inner Join

I am using this query to get the following data from different linked tables. But let's say the VENDORS for an item were three. Now here in result i want to show the Vendor which occurred most. I mean if Item ABC was supplied by 3 different vendors many times. Then here i want to get the Vendor who supplied most of the times item ABC.
My query is this.
use iBusinessFlex;
SELECT Items.Name,
Max(Items.ItemID) as ItemID ,
MAX(Items.Description)as Description,
MAX(ItemsStock.CurrentPrice) as UnitPrice,
MAX(ItemsStock.Quantity) as StockQuantiity,
MAX(Vendors.VendorName) as VendorName,
SUM(ItemReceived.Quantity) as TotalQuantity
From ItemReceived
INNER JOIN Items ON ItemReceived.ItemId=Items.ItemID
INNER JOIN ItemsStock ON ItemReceived.ItemId=ItemsStock.ItemID
INNER JOIN PurchaseInvoices ON PurchaseInvoices.PurchaseInvoiceId = ItemReceived.PurchaseInvoiceId
INNER JOIN Vendors ON Vendors.VendorId = PurchaseInvoices.VendorId
Group By Items.Name
EDIT : I have included this sub query but i am not sure if it is showing correct result. i mean Showing Vendor for each Item who provided that item most of the times
use iBusinessFlex;
SELECT Items.Name,
Max(Items.ItemID) as ItemID ,
MAX(Items.Description)as Description,MAX(ItemsStock.CurrentPrice) as UnitPrice,
MAX(ItemsStock.Quantity) as StockQuantiity,MAX(Vendors.VendorName) as VendorName,
SUM(ItemReceived.Quantity) as TotalQuantity
From ItemReceived
INNER JOIN Items ON ItemReceived.ItemId=Items.ItemID INNER JOIN ItemsStock
ON ItemReceived.ItemId=ItemsStock.ItemID INNER JOIN PurchaseInvoices
ON PurchaseInvoices.PurchaseInvoiceId = ItemReceived.PurchaseInvoiceId INNER JOIN Vendors
ON Vendors.VendorId IN (
SELECT Top 1 MAX(PurchaseInvoices.VendorId) as VendorOccur
FROM PurchaseInvoices INNER JOIN Vendors ON Vendors.VendorId=PurchaseInvoices.VendorId
GROUP BY PurchaseInvoices.VendorId
ORDER BY COUNT(*) DESC
And the Result Looks like this.
First, I would start with who ordered what thing the most. But the MOST is based on what... the most quantity? Price?, Number of Times? If you use one vendor and order 6 times qty of 10 you have 60 things. But order 1 time from another vendor for 100 qty, which one wins. You have to decide the basis of MOST, but I will go based on most times
per your original question.
So all things come from PurchasedInvoices which has a vendor ID. I dont care who the vendor is, just their ID, so no need to join. Also, don't need the item name if I am just looking for my counts. The query below will show per item, each vendor and their respective most times ordered and quantities ordered. I added the items and vendor table joins just to show the names.
select
IR.ItemID,
PI.VendorID,
max( I.Name ) Name,
max( V.VendorName ) VendorName,
count(*) as TimesOrderedFrom,
SUM( IR.Quantity ) as QuantityFromVendor
from
ItemsReceived IR
JOIN PurchaseInvoices PI
on IR.PurchaseInvoiceID = PI.PurchaseInvoiceID
JOIN Items I
on IR.ItemID = I.ItemID
JOIN Vendors V
on IR.VendorID = V.VendorID
group by
IR.ItemID,
PI.VendorID
order by
-- Per item
IR.ItemID,
-- Most count ordered
count(*),
-- If multiple vendors, same count, get total quantity
sum( IR.Quantity )
Now, to get only 1 per item, this would create a correlated subquery and you
can add 'TOP 1' to return only the first by this. Since the aggregate of count
is already done, you can then get the vendor contact info.
select
I.Name,
V.VendorName,
TopVendor.TimesOrderedFromVendor,
TopVendor.QuantityFromVendor
from
Items I
JOIN ( select TOP 1
IR.ItemID,
PI.VendorID,
count(*) as TimesOrderedFrom,
SUM( IR.Quantity ) as QuantityFromVendor
from
ItemsReceived IR
JOIN PurchaseInvoices PI
on IR.PurchaseInvoiceID = PI.PurchaseInvoiceID
where
-- correlated subquery based on the outer-most item
IR.ItemID = I.ItemID
group by
IR.ItemID,
PI.VendorID
order by
-- Per item
IR.ItemID,
-- Most count ordered
count(*),
-- If multiple vendors, same count, get total quantity
sum( IR.Quantity ) ) TopVendor
on I.ItemID = TopVendor.ItemID
JOIN Vendors V
on TopVendor.VendorID = V.VendorID
No sense in having the INNER Subquery joining on the vendor and items just for the names. Get those once and only at the end when the top vendor is selected.

Subquery amount not coming in full

I have this query:
Select I.Invoice_Number, PA.Invoice_Number, I.Line_Amount, PA.Invoiced_Amount
from XXX as PA
Left join (select Invoice_Number, Line_Amount from Invoices) as I
on PA.Invoice_Number = I.Invoice_Number
Group by PA.Invoice_Number;
Both should give me the same amount of cost (I.Line_Amount = PA.Invoice_Amount) per Invoice_Number, yet I.Line_Amount is only bringing the first row on the list, while PA.Invoiced.Number brings the sum of the cost on that Invoice.
I tried using sum(Line_Amount) within the subquery but all records come out as Null.
Is there a way for me to join both tables and make sure that the amounts per invoice match to the total amount of that invoice?
Thanks!!
If I understand you correctly (and you want to make sure that sum of Line_Amount in Invoices table is the same as Invoiced_Amount in XXX table) the second table should have invoice number and sum of amounts:
select I.Invoice_Number, PA.Invoice_Number, I.total, PA.Invoiced_Amount
from XXX as PA
left join (
select Invoice_Number, sum(Line_Amount) as total
from Invoices
group by Invoice_Number
) as I
on PA.Invoice_Number = I.Invoice_Number
You can try it here: http://sqlfiddle.com/#!9/d1d010/1/0

Rails/SQL: finding invoices by checking two sums

I have an Invoice model that has_many lines and has_many payments.
Invoice:
id
ref
Line:
invoice_id:
total (decimal)
Payment:
invoice_id:
total(decimal)
I need to find all paid invoices. So I'm doing the following:
Invoice.joins(:lines, :payments).having(' sum(lines.total) = sum(payments.total').group('invoices.id')
Which queries:
SELECT *
FROM "invoices"
INNER JOIN "lines" ON "lines"."invoice_id" = "invoices"."id"
INNER JOIN "payments" ON "payments"."invoice_id" = "invoices"."id"
GROUP BY invoices.id
HAVING sum(lines.total) = sum(payments.total)
But it always return empty array even if there are invoices fully paid.
Is something wrong with my code?
If you join to more than one table with a 1:n relationship, the joined rows can multiply each other.
This related answer has more detailed explanation for the problem:
Two SQL LEFT JOINS produce incorrect result
To avoid that, sum the totals before you join. This way you join to exactly 1 (or 0) rows, and nothing is multiplied. Not only correct, also considerably faster.
SELECT i.*, l.sum_total
FROM invoices i
JOIN (
SELECT invoice_id, sum(total) AS sum_total
FROM lines
GROUP BY 1
) l ON l.invoice_id = i.id
JOIN (
SELECT invoice_id, sum(total) AS sum_total
FROM payments
GROUP BY 1
) p ON p.invoice_id = i.id
WHERE l.sum_total = p.sum_total;
Using [INNER] JOIN, not LEFT [OUTER] JOIN on purpose. Invoices that do not have any lines or payments are not of interest to begin with. Since we want "paid" invoices. For lack of definition and by the looks of the provided query, I am assuming that means invoices with actual lines and payments, both totaling the same.
If one invoice have a line and two payments fully paid like this:
lines:
id total invoice_id
1 30 1
payments:
id total invoice_id
1 10 1
2 20 1
Then join lines and payments to invoice with invoce_id will get 2 rows like this:
payment_id payment_total line_id line_total invoice_id
1 10 1 30 1
2 20 1 30 1
So the sum of line_total will not equal to sum of payment_total.
To get all paid invoice could use exists instead of joins:
Invoice.where(
"exists
(select 1 from
(select invoice_id
from (select invoice_id,sum(total) as line_total
from lines
group by invoice_id) as l
inner join (select invoice_id,sum(total) as payment_total
from payments
group by invoice_id) as p
on l.invoice_id = p.invoice_id
where payment_total = line_total) as paid
where invoices.id = paid.id) ")
The sub_query paid will get all paid invoice_ids.

SQL SUM, COUNT for only unique id

I want to calculate sum and count for only unique ids.
SELECT COUNT(orders.id), SUM(orders.total), SUM(orders.shipping) FROM "orders"
INNER JOIN "designer_orders" ON "designer_orders"."order_id" = "orders"."id"
WHERE (designer_orders.state = 'pending' OR
designer_orders.state = 'dispatched' OR
designer_orders.state = 'completed')
Do this only for unique orders ids.
Add orders.total only if orders.id is unique. Same goes for shipping.
Avoid adding duplicates.
For example, orders table inner joined designer_orders table:
OrderId Total Some designer order column
1 1000 2
1 1000 3
1 1000 5
2 100 7
3 133 8
4 1000 10
4 1000 20
In this case:
count of orders should be 4.
total of orders should be 2233.
Schema:
One order has many designer orders.
One designer order has only one order.
Try it this way
SELECT COUNT(o.id) no_of_orders,
SUM(o.total) total,
SUM(o.shipping) shipping
FROM orders o JOIN
(
SELECT DISTINCT order_id
FROM designer_orders
WHERE state IN('pending', 'dispatched', 'completed')
) d
ON o.id = d.order_id
Here is SQLFiddle demo
Since you are only interested whether any row with qualifying status exists in the table designer_orders, the most obvious query style would be an EXISTS semi-join. Typically fastest with potentially many duplicate rows in n-table:
SELECT COUNT(o.id) AS no_of_orders
,SUM(o.total) AS total
,SUM(o.shipping) AS shipping
FROM orders o
WHERE EXISTS (
SELECT 1
FROM designer_orders d
WHERE d.state = ANY('{pending, dispatched, completed}')
AND d.order_id = o.id
);
-> SQLfiddle demo
For fast SELECT queries with bigger tables (and at some cost for write performance), you would have a partial index like:
CREATE INDEX designer_orders_order_id_idx ON designer_orders (order_id)
WHERE state = ANY('{pending, dispatched, completed}');
The index condition must match the WHERE condition of the query to talk the query planner into actually using the index.
A partial index is particularly attractive if there are many rows with a status that does not qualify. Else, an index without condition might be the better choice overall.