founding the total revenue by aggregating

founding the total revenue by aggregating - sql

I want to produce a table with two columns in the form of (country, total_revenue)
This is how the relational model looks like,
Each entry in the table orderdetails can produce revenue where its in the form of = quantityordered(a column)* priceEach(also a column).
The revenue an order produces is the sum of the revenue from the orderdetails in the order, but only if the order's status is shipped. The two tables orderdetails and order are related by the column ordernumber.
An order has a customer number that references customer table and the customer table has country field. The total_country_revenue is the sum over all shipped orders for customers in a country.
so far I have tried first producing a table by using group by(using ordernumber or customer number?) to produce a table with columns orderdetails revenue and the customer number to join with customer and use group by again but I keep getting weird results.....
-orderdetails table-
ordernumber
quantityordered
price_each
1
10
2.39
1
12
1.79
2
12
1.79
3
12
1.79
-orders table-
ordernumber
status.
customer_num
1
shipped
11
1
shipped
12
2
cancelled
13
3
shipped
11
-customers table-
custom_num
country
11
USA
12
France
13
Japan
11
USA
-Result table-
country
total_revenue
11
1300
12
1239
13
800
11
739

Your description is a bit weird. You are writing that you want to build the sum per country, but in your table which should show the desired outcome, you didn't build a sum and you also don't show the country.
Furthermore, you wrote you want to exclude orders that don't have the status "shipped", but your sample outcome includes them.
This query will produce the outcome you have described in words, not that one you have added as a table:
SELECT c.country,
SUM(d.quantityordered * d.price_each) AS total_revenue
FROM
orders o
JOIN orderdetails d ON o.ordernumber = d.ordernumber
JOIN customers c ON o.customer_num = c.custom_num
WHERE o.status = 'shipped'
GROUP BY c.country;
As you can see, you will need to JOIN your tables and apply a GROUP BY country clause.
A note: You could remove the WHERE clause and add its condition to a JOIN. It's possible this will reduce the execution time of your query, but it might be less readable.
A further note: You could also consider to use a window function for that using PARTITION BY c.country. Since you didn't tag your DB type, the exact syntax for that option is unclear.
A last note: Your sample data looks really strange. Is it really intended an order should be counted as for France and for the USA the same time?
If the query above isn't what you were looking for, please review your description and fix it.

Related

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?

The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

Grouping and Summing Totals in a Joined Table

I have two tables Medication and Inventory. I'm trying to SELECT all the below details from both tables but there are multiple listings of medication ids with different BRANCH_NO also in the INVENTORY table (the primary key in INVENTORY is actually BRANCH_NO, MEDICATION_ID composite key)
I need to total up the various medication_IDs and also join the tables in one SELECT command and display all the infomation for each med (there are 5) with a total sum of each med at the end of each row. But im getting all muddled trying Group by and Sum and at one point partition. Help please I'm new to this.
Below is the latest non working version - but it doesn't display
Medication Name
Medication Desc
Manufacturer
Pack Size
like i chanced it might.
SELECT I.MEDICATION_ID,
SUM(I.STOCK_LEVEL)
FROM INVENTORY I
INNER JOIN (SELECT MEDICATION_NAME, SUBSTR(MEDICATION_DESC,1,20) "Medication Description",
MANUFACTURER, PACK_SIZE FROM MEDICATION) M ON MEDICATION_ID=I.MEDICATION_ID
GROUP BY I.MEDICATION_ID;
For the data imagine I want this sort of output:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 10
2 Bravo 20
3 Charlie 20
1 Alpha 30
4 Delta 10
5 Echo 20
5 Echo 40
2 Bravo 10
grouping and totalling into this:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 40
2 Bravo 30
3 Charlie 20
4 Delta 10
5 Echo 60
I can get this when its just one table but when Im trying to join tables and also SELECT things its just not working.
Thanks in advance guys. I appreciate it may be a simple solution, but it will be a big help.

You need to write explicitly all non-aggregated columns into both SELECT and GROUP BY lists ( Btw, no need to use a nested query, and if it's the case MEDICATION_ID column is missing in it ) :
SELECT I.MEDICATION_ID, M.MEDICATION_NAME, SUM(I.STOCK_LEVEL) AS STOCK_LEVEL,
SUBSTR(M.MEDICATION_DESC,1,20) "Medication Description", M.MANUFACTURER, M.PACK_SIZE
FROM INVENTORY I
JOIN MEDICATION M ON M.MEDICATION_ID = I.MEDICATION_ID
GROUP BY I.MEDICATION_ID, M.MEDICATION_NAME, SUBSTR(M.MEDICATION_DESC,1,20),
M.MANUFACTURER, M.PACK_SIZE;
This way, you'll be able to return all the listed columns.

Possible to keep fraction in a query?

I am looking for a way to add up averages in SQL. Here is an example of the data I have:
product avg_price
phone 104.28
car 1000.00
And I'm looking to build something like this:
product avg_price
[all] 544.27
phone 104.28
car 1000.00
The way I'm currently doing it is to store the count and sum in two different columns, such as:
product cnt total
phone 203 20,304.32
car 404 304,323.30
And from that get the average. However, I was wondering if it is possible in SQL to just 'keep the fraction' and be able to add them as needed. For example:
product avg_price
[all] [add the fractions]
phone 20,304.32 / 203
car 304,323.30 / 404
Or do I need to use two columns in order to get an average of multiple aggregated rows?

You don't need 2 columns to get the average, but if you want to display as a fraction then you will need both numbers. They don't need to be in 2 columns though.
select product, sum(total) ||'/'||sum(count)
from table a
join table b on a.product=b.product
union
select product, total ||'/'||count
from table a
join table b on a.product=b.product;

Having problems fully understanding GROUP BY

I'm going over some practise questions for an exam that I have coming up and I'm having a problem fully understanding group by. I see GROUP BY as the following: group the result set by one or more columns.
I have the following database schema
My query
SELECT orders.customer_numb, sum(order_lines.cost_line), customers.customer_first_name, customers.customer_last_name
FROM orders
INNER JOIN customers ON customers.customer_numb = orders.customer_numb
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb, order_lines.cost_line, customers.customer_first_name, customers.customer_last_name
ORDER BY order_lines.cost_line DESC
What I'm struggling to understand
Why can't I simply use just GROUP BY orders.cost_line and group the data by cost_line?
What I'm trying to achieve
I'd like to achieve the name of the customer who has spent the most money. I just don't fully understand how to achieve this. I understand how joins work, I just can't seem to get my head around why I can't simply GROUP BY customer_numb and cost_line (with sum() used to calculate the amount spent). I seem to always get "not a GROUP BY expression", if someone could explain what I'm doing wrong (not just give me the answer), that would be great - I'd really appreciate that, and of course any resources that you have for using GROUP by properly.
Sorry for the long essay and If I've missed anything I apologise. Any help would be greatly appreciated.

I just can't seem to get my head around why I can't simply GROUP BY
customer_numb and cost_line (with sum() used to calculate the amount
spent).
When you say group by customer_numb you know that customer_numb uniquely identifies a row in the customer table (assuming customer_numb is either a primary or alternate key), so that any given customers.customer_numb will have one and only one value for customers.customer_first_name and customers.customer_last_name. But at parse time Oracle does not know, or at least acts like it does not know that. And it says, in a bit of panic, "What do I do if a single customer_numb has more than one value for customer_first_name?"
Roughly the rule is, expressions in the select clause can use expressions in the group by clause and/or use aggregate functions. (As well as constants and system variables that don't depend on the base tables, etc.) And by "use" I mean be the expression or part of the expression. So once you group on first name and last name, customer_first_name || customer_last_name would be a valid expression also.
When you have a table, like customers and are grouping by a primary key, or a column with a unique key and not null constraint, you can safely include them in group by clause. In this particular instance, group by customer.customer_numb, customer.customer_first_name, customer.customer_last_name.
Also note, that the order by in the first query will fail, since order_lines.cost_line doesn't have a single value for the group. You can order on sum(order_lines.cost_line) or use an column alias in the select clause and order on that alias
SELECT orders.customer_numb,
sum(order_lines.cost_line),
customers.customer_first_name,
customers.customer_last_name
FROM orders
INNER JOIN customers ON customers.customer_numb = orders.customer_numb
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb,
customers.customer_first_name,
customers.customer_last_name
ORDER BY sum(order_lines.cost_line)
or
SELECT orders.customer_numb,
sum(order_lines.cost_line) as sum_cost_line,
. . .
ORDER BY sum_cost_line
Note: I've heard that some RDBMSes will imply additional expressions for the grouping without them being explicitly stated. Oracle is not one of those RDBMSes.
As for grouping by both customer_numb and cost_line Consider a DB with two customers, 1 and 2 with two orders of one line each:
Customer Number | Cost Line
1 | 20.00
1 | 20.00
2 | 35.00
2 | 30.00
select customer_number, cost_line, sum(cost_line)
FROM ...
group by customer_number, cost_line
order by sum(cost_line) desc
Customer Number | Cost Line | sum(cost_line)
1 | 20.00 | 40.00
2 | 35.00 | 35.00
2 | 30.00 | 30.00
The first row with highest sum(cost_line) is not the customer who spent the most.

I understand how joins work, I just can't seem to get my head around
why I can't simply GROUP BY customer_numb and cost_line (with sum()
used to calculate the amount spent).
This should give you the sum for every customer.
SELECT orders.customer_numb, sum(order_lines.cost_line)
FROM orders
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb
Note that every column in the SELECT clause that's not an argument to an aggregate function is also a column in the GROUP BY clause.
Now you can join that with other tables to get more detail. Here's one way using a common table expression. (There are other ways to express what you want.)
with customer_sums as (
-- We give the columns useful aliases here.
SELECT orders.customer_numb as customer_numb,
sum(order_lines.cost_line) as total_orders
FROM orders
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb
)
select c.customer_numb, c.customer_first_name, c.customer_last_name, cs.total_orders
from customers c
inner join customer_sums cs
on cs.customer_numb = c.customer_numb
order by cs.total_orders desc
Why can't I simply use just GROUP BY orders.cost_line and group the
data by cost_line?
Applying GROUP BY to order_lines.cost_line will give you one row for each distinct value in order_lines.cost_line. (The column orders.cost_line doesn't exist.) Here's what that data might look like.
OL.ORDER_NUMB OL.COST_LINE O.CUSTOMER_NUMB C.CUSTOMER_FIRST_NAME C.CUSTOMER_LAST_NAME
--
1 1.45 2014 Julio Savell
1 2.33 2014 Julio Savell
1 1.45 2014 Julio Savell
2 1.45 2014 Julio Savell
2 1.45 2014 Julio Savell
3 13.00 2014 Julio Savell
You can group by order_lines.cost_line, but it won't give you any useful information. This query
select order_lines.cost_line, orders.customer_numb
from order_lines
inner join orders on orders.customer_numb = order_lines.customer_numb
group by order_lines.cost_line;
should return something like this.
OL.COST_LINE O.CUSTOMER_NUMB
--
1.45 2014
2.33 2014
13.00 2014
Not terribly useful.
If you're interested in the sum of the order line items, you need to decide what column or columns to group (summarize) by. If you group (summarize) by order number, you'll get three rows. If you group (summarize) by customer number, you'll get one row.

SQL query for child table summary and generalazation

I have 4 tables with diagram below
I want to summary query for the Institution table. where I want to get result of only,
InstitutionType ProductName Quantity
For example. sample data of institution table
Id Name Address InstitionTypeId
1 aaa ny132 1001
2 bbb dx23 1001
3 ccc bn33 1002
And the InstitionProduct is like that
Id ProductId Quantity InstitionId
1 1000 120 1
2 1000 100 2
3 1000 50 3
Then I want a query result to output total quantity of a given product by Instition Type wise. The sample output will look like this.
InstitutionTypeId productId quantity
1001 1000 220
1002 1000 50
So I want to group the institution by type and aggregate the product quantity of all institution type group.
I tried to use the group by clause, but with the product quantity not as a grouping element it results in error.

SELECT
Institution.InstitutionTypeID,
InstitutionProduct.ProductID,
SUM(InstitutionProduct.Quantity)
FROM
Institution
LEFT JOIN
InstitutionProduct
ON InstitutionProduct.InstitutionID = Institution.ID
GROUP BY
Institution.InstitutionTypeID,
InstitutionProduct.ProductID

If you are querying with group by you need to use either aggregate functions or group by all included fields. The reason is, that the 'group by' returns exactly one row per 'group by' value, so if you introduce an ungrouped field, this would conflict if the field has more than one value per grouping constraint. Even though this might not be the case for your dataset, the query engine cannot know this, and raises an error.
The solution is to introduce aggregates for all non-grouping field with aggregates being (among others): average (avg), summarize (sum), minimum (min) and maximum (max). This would lead to something like
SELECT i.InstitutionTypeID, i.Institution.ID, SUM(ip.Quantity)
FROM Institution I LEFT JOIN InstitutionProduct IP
ON IP.InstituationID = I.ID
GROUP BY i.InstitutionTypeID, i.Institution.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas