SQL SUM, COUNT for only unique id - sql

I want to calculate sum and count for only unique ids.
SELECT COUNT(orders.id), SUM(orders.total), SUM(orders.shipping) FROM "orders"
INNER JOIN "designer_orders" ON "designer_orders"."order_id" = "orders"."id"
WHERE (designer_orders.state = 'pending' OR
designer_orders.state = 'dispatched' OR
designer_orders.state = 'completed')
Do this only for unique orders ids.
Add orders.total only if orders.id is unique. Same goes for shipping.
Avoid adding duplicates.
For example, orders table inner joined designer_orders table:
OrderId Total Some designer order column
1 1000 2
1 1000 3
1 1000 5
2 100 7
3 133 8
4 1000 10
4 1000 20
In this case:
count of orders should be 4.
total of orders should be 2233.
Schema:
One order has many designer orders.
One designer order has only one order.

Try it this way
SELECT COUNT(o.id) no_of_orders,
SUM(o.total) total,
SUM(o.shipping) shipping
FROM orders o JOIN
(
SELECT DISTINCT order_id
FROM designer_orders
WHERE state IN('pending', 'dispatched', 'completed')
) d
ON o.id = d.order_id
Here is SQLFiddle demo

Since you are only interested whether any row with qualifying status exists in the table designer_orders, the most obvious query style would be an EXISTS semi-join. Typically fastest with potentially many duplicate rows in n-table:
SELECT COUNT(o.id) AS no_of_orders
,SUM(o.total) AS total
,SUM(o.shipping) AS shipping
FROM orders o
WHERE EXISTS (
SELECT 1
FROM designer_orders d
WHERE d.state = ANY('{pending, dispatched, completed}')
AND d.order_id = o.id
);
-> SQLfiddle demo
For fast SELECT queries with bigger tables (and at some cost for write performance), you would have a partial index like:
CREATE INDEX designer_orders_order_id_idx ON designer_orders (order_id)
WHERE state = ANY('{pending, dispatched, completed}');
The index condition must match the WHERE condition of the query to talk the query planner into actually using the index.
A partial index is particularly attractive if there are many rows with a status that does not qualify. Else, an index without condition might be the better choice overall.

Related

Returning complex query on update sql

I want to return query with multiple joins and with clause after updating something.
For example my query is:
WITH orders AS (
SELECT product_id, SUM(amount) AS orders
FROM orders_summary
GROUP BY product_id
)
SELECT p.id, p.name,
p.date_of_creation,
o.orders, s.id AS store_id,
s.name AS store_name
FROM products AS p
LEFT JOIN orders AS o
ON p.id = o.product_id
LEFT JOIN stores AS s
ON s.id = p.store_id
WHERE p.id = '1'
id
name
date
orders
store_id
store_name
1
pen
11/16/2022
10
1
jj
2
pencil
11/10/2022
30
2
ff
I want to return the exact query but with updated result in my update:
UPDATE products
SET name = 'ABC'
WHERE id = '1'
RETURNING up_qeury
Desired result on update:
id
name
date
orders
store_id
store_name
1
ABC
11/16/2022
10
1
jj
You can try UPDATE products ... RETURNING *. That may get you the content of the row you just updated.
As for UPDATE .... RETURNING someQuery, You Can't Do Thatâ„¢. You want to do both the update and a SELECT operation in one go. But that's not possible.
If you must be sure your SELECT works on the precisely the same data as you just UPDATEd, you can wrap your two queries in a BEGIN; / COMMIT; transaction. That prevents concurrent users from making changes between your UPDATE and SELECT.

SQL - Guarantee at least n unique users with 2 appearances each in query

I'm working with AWS Personalize and one of the service Quotas is to have "At least 1000 records containing a min of 25 unique users with at least 2 records each", I know my raw data has those numbers but I'm trying to find a way to guarantee that those numbers will always be met, even if the query is run by someone else in the future.
The easy way out would be to just use the full dataset, but right now we are working towards a POC, so that is not really my first option. I have covered the "two records each" section by just counting the appearances, but I don't know how to guarantee the min of 25 users.
It is important to say that my data is not shuffled in any way at the time of saving.
My query
SELECT C.productid AS ITEM_ID,
A.userid AS USER_ID,
A.createdon AS "TIMESTAMP",
B.fromaddress_countryname AS "LOCATION"
FROM A AS orders
JOIN B AS sub_orders ON orders.order_id = sub_orders.order_id
JOIN C AS order_items ON orders.order_id = order_items.order_id
WHERE orders.userid IN (
SELECT orders.userid
FROM A AS ORDERS
GROUP BY orders.userid
HAVING count(*) > 2
)
LIMIT 10
I use the LIMIT to just query a subset since I'm in AWS Athena.
The IN query is not very efficient since it needs to compare each row with all (worst case) the elements of the subquery to find a match.
It would be easier to start by storing all users with at least 2 records in a common table expression (CTE) and do a join to select them.
To ensure at least 25 distinct users you will need a window function to count the unique users since the first row and add a condition on that count. Since you can't use a window function in the where clause, you will need a second CTE and a final query that queries it.
For example:
with users as (
select userid as good_users
from orders
group by 1
having count(*) > 1 -- this condition ensures at least 2 records
),
cte as (
SELECT C.productid AS ITEM_ID,
A.userid AS USER_ID,
A.createdon AS "TIMESTAMP",
B.fromaddress_countryname AS "LOCATION",
count(distinct A.userid) over (rows between unbounded preceding and current row) as n_distinct_users
FROM A AS orders
JOIN B AS sub_orders ON orders.order_id = sub_orders.order_id
JOIN C AS order_items ON orders.order_id = order_items.order_id
JOIN users on A.userid = users.userid --> ensure only users with 2 records
order by A.userid -- needed for the window function
)
select * from cte where n_distinct_users < 26
sorting over userid in cte will ensure that at least 2 records per userid will appear in the results.

Why join query give different result than subquery?

I am learning PostgreSQL and working with Nortwind database
Now I am testing JOIN and subquery with ANY
I want select all product_name of which exactly 10 were ordered (column quantity from order_details)
So I have 2 different queries:
SELECT product_name FROM products
WHERE product_id = ANY(
SELECT product_id FROM order_details
WHERE quantity = 10
)
and
SELECT products.product_name FROM products
JOIN order_details ON order_details.product_id = products.product_id
WHERE order_details.quantity = 10
But they are giving different results!
Firts one gives:
Only 60 rows
And the second one gives: 181 rows
Why is that and which result is right?
The first query will output each products row at most once.
The second query can have several result rows for a single products row: one for each matching order_details row.
Which of the queries is better depends on your requirements.

How can I sum data from mulitple rows which have the same foreign key into one row?

I have two tables which have a 1 to n relation. One table contains general Information of a bill (named bill)
(< -1 to n ->)
and the other contains Items which are on the bill (named items). I want a query that Lists all Bills and sums up the prices from the items in a new row. But of course i want every Bill listed just once not for every item.
Usually i don't post anything. But i can't find an answer because i don't know how to search for this problem. Sorry when this is obvious.
What my tables look like:
bill:
bill_id - customer - date
items:
item_id - bill_id - amount - price
A simple join with aggregation should work here:
SELECT
b.bill_id,
COALESCE(SUM(i.price), 0) AS total_price
FROM bill b
LEFT JOIN items i
ON b.bill_id = i.bill_id
GROUP BY
b.bill_id;
If you want to include the other two columns from the bill table, then just add them to the SELECT and GROUP BY clauses.
You may try this.
; with cte as (
select b.bill_id, i.item_id ,isnull(i.price,0) as Price from
Bill as b inner join items as i on b.bill_id =i.bill_id
union all
select b.bill_id , null, sum(isnull(i.price,0)) from
Bill as b inner join items as i on b.bill_id =i.bill_id
group by b.bill_id
)
select * from cte order by bill_id, item_id desc

Aggregate after join without duplicates

Consider this query:
select
count(p.id),
count(s.id),
sum(s.price)
from
(select * from orders where <condition>) as s,
(select * from products where <condition>) as p
where
s.id = p.order;
There are, for example, 200 records in products and 100 in orders (one order can contain one or more products).
I need to join then and then:
count products (should return 200)
count orders (should return 100)
sum by one of orders field (should return sum by 100 prices)
The problem is after join p and s has same length and for 2) I can write count(distinct s.id), but for 3) I'm getting duplicates (for example, if sale has 2 products it sums price twice) so sum works on entire 200 records set, but should query only 100.
Any thoughts how to sum only distinct records from joined table but also not ruin another selects?
Example, joined table has
id sale price
0 0 4
0 0 4
1 1 3
2 2 4
2 2 4
2 2 4
So the sum(s.price) will return:
4+4+3+4+4+4=23
but I need:
4+3+4=11
If the products table is really more of an "order lines" table, then the query would make sense. You can do what you want by in several ways. Here I'm going to suggest conditional aggregation:
select count(distinct p.id), count(distinct s.id),
sum(case when seqnum = 1 then s.price end)
from (select o.* from orders o where <condition>) s join
(select p.*, row_number() over (partition by p.order order by p.order) as seqnum
from products p
where <condition>
) p
on s.id = p.order;
Normally, a table called "products" would have one row per product, with things like a description and name. A table called something like "OrderLines" or "OrderProducts" or "OrderDetails" would have the products within a given order.
You are not interested in single product records, but only in their number. So join the aggregate (one record per order) instead of the single rows:
select
count(*) as count_orders,
sum(p.cnt) as count_products,
sum(s.price)
from orders as s
join
(
select order, count(*) as cnt
from products
where <condition>
group by order
) as p on p.order = s.id
where <condition>;
Your main problem is with table design. You currently have no way of knowing the price of a product if there were no sales on it. Price should be in the product table. A product cost a certain price. Then you can count all the products of a sale and also get the total price of the sale.
Also why are you using subqueries. When you do this no indexes will be used when joining the two subqueries. If your joins are that complicated use views. In most databases they can indexed