SQL query for child table summary and generalazation - sql

I have 4 tables with diagram below
I want to summary query for the Institution table. where I want to get result of only,
InstitutionType ProductName Quantity
For example. sample data of institution table
Id Name Address InstitionTypeId
1 aaa ny132 1001
2 bbb dx23 1001
3 ccc bn33 1002
And the InstitionProduct is like that
Id ProductId Quantity InstitionId
1 1000 120 1
2 1000 100 2
3 1000 50 3
Then I want a query result to output total quantity of a given product by Instition Type wise. The sample output will look like this.
InstitutionTypeId productId quantity
1001 1000 220
1002 1000 50
So I want to group the institution by type and aggregate the product quantity of all institution type group.
I tried to use the group by clause, but with the product quantity not as a grouping element it results in error.

SELECT
Institution.InstitutionTypeID,
InstitutionProduct.ProductID,
SUM(InstitutionProduct.Quantity)
FROM
Institution
LEFT JOIN
InstitutionProduct
ON InstitutionProduct.InstitutionID = Institution.ID
GROUP BY
Institution.InstitutionTypeID,
InstitutionProduct.ProductID

If you are querying with group by you need to use either aggregate functions or group by all included fields. The reason is, that the 'group by' returns exactly one row per 'group by' value, so if you introduce an ungrouped field, this would conflict if the field has more than one value per grouping constraint. Even though this might not be the case for your dataset, the query engine cannot know this, and raises an error.
The solution is to introduce aggregates for all non-grouping field with aggregates being (among others): average (avg), summarize (sum), minimum (min) and maximum (max). This would lead to something like
SELECT i.InstitutionTypeID, i.Institution.ID, SUM(ip.Quantity)
FROM Institution I LEFT JOIN InstitutionProduct IP
ON IP.InstituationID = I.ID
GROUP BY i.InstitutionTypeID, i.Institution.ID

Related

founding the total revenue by aggregating

I want to produce a table with two columns in the form of (country, total_revenue)
This is how the relational model looks like,
Each entry in the table orderdetails can produce revenue where its in the form of = quantityordered(a column)* priceEach(also a column).
The revenue an order produces is the sum of the revenue from the orderdetails in the order, but only if the order's status is shipped. The two tables orderdetails and order are related by the column ordernumber.
An order has a customer number that references customer table and the customer table has country field. The total_country_revenue is the sum over all shipped orders for customers in a country.
so far I have tried first producing a table by using group by(using ordernumber or customer number?) to produce a table with columns orderdetails revenue and the customer number to join with customer and use group by again but I keep getting weird results.....
-orderdetails table-
ordernumber
quantityordered
price_each
1
10
2.39
1
12
1.79
2
12
1.79
3
12
1.79
-orders table-
ordernumber
status.
customer_num
1
shipped
11
1
shipped
12
2
cancelled
13
3
shipped
11
-customers table-
custom_num
country
11
USA
12
France
13
Japan
11
USA
-Result table-
country
total_revenue
11
1300
12
1239
13
800
11
739
Your description is a bit weird. You are writing that you want to build the sum per country, but in your table which should show the desired outcome, you didn't build a sum and you also don't show the country.
Furthermore, you wrote you want to exclude orders that don't have the status "shipped", but your sample outcome includes them.
This query will produce the outcome you have described in words, not that one you have added as a table:
SELECT c.country,
SUM(d.quantityordered * d.price_each) AS total_revenue
FROM
orders o
JOIN orderdetails d ON o.ordernumber = d.ordernumber
JOIN customers c ON o.customer_num = c.custom_num
WHERE o.status = 'shipped'
GROUP BY c.country;
As you can see, you will need to JOIN your tables and apply a GROUP BY country clause.
A note: You could remove the WHERE clause and add its condition to a JOIN. It's possible this will reduce the execution time of your query, but it might be less readable.
A further note: You could also consider to use a window function for that using PARTITION BY c.country. Since you didn't tag your DB type, the exact syntax for that option is unclear.
A last note: Your sample data looks really strange. Is it really intended an order should be counted as for France and for the USA the same time?
If the query above isn't what you were looking for, please review your description and fix it.

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

SQL rollup - prevent summing records multiple times

Firstly, I could not think of a better question title. Apologies for that.
So, I am writing a query and here is something(I think) it would return without aggregating functions and group by. I am using this as an example and actual query contains a lot more fields:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
B 2 50
C 3 60
A 2 50
A 1 25 <--Not actually duplicate
Now you would say there are duplicate records. But in fact they are not duplicate in a way that there are some extra fields(not shown here) which would have different values for those seemingly duplicate records.
What I want:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
2 50
TOTAL 75
B 2 50
TOTAL 50
C 3 60
TOTAL 60
//EDIT - Apparently following line is causing too much confusion. Ignore it. How can I get rest of the table correctly?
TOTAL 135 //It seems its quite difficult to get 135 here. Its ok if this total is messed up
What I am trying:
SELECT
SOME_FIELDS,
SUBJ,
CLASSROOM,
SUM(CLASSROOM_CAPACITY)
FROM
MYTABLE
WHERE .....
GROUP BY SOME_FIELDS, ROLLUP(SUBJ,CLASSROOM)
The problem:
Thanks to those "seemingly duplicate" records, classroom capacities are being summed up multiple times. How do I prevent that? Am I doing this the wrong way?
The actual query is lot more complicated but I think if I can get this right, I can apply it to bigger query.
PS: I know how to get text "Total" instead of blank entry with ROLLUP using GROUPING so you can skip that part.
The cardinality you're introducing is a little off and when you sort the that ROLLUP starts to work. Your saying that:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
is equal to:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
But the SOME_FIELDS could vary per row. When you aggregate up to just the columns above, what do you expect to happen to SOME_FIELDS?
If these can be ignore for the purposes of this query your best bet is to first find the DISTINCT records (i.e. records that contain a unique tuple of subj, classroom and classroom_capacity) and then do the ROLLUP on this data set. The following query achieves this:
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT
subj
, classroom
, SUM(classroom_capacity)
FROM distinct_subj_classrm_capacity
GROUP BY ROLLUP(subj, classroom)
If you're not interested in the break report results that ROLLUP gives you and you simply want the raw totals then you can use the analytic version of SUM (see here for more on Oracle analytic functions: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm)
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT DISTINCT
subj
, SUM(classroom_capacity) OVER (PARTITION BY subj) classroom_capacity_per_subj
FROM distinct_subj_classrm_capacity
This gives results in the format:
SUBJ CLASSROOM_CAPACITY_PER_SUBJ
A 75
B 50
C 60

Extract id along with count of other columns

I have four columns in my table
CUSTOMER, TRANSACTION(UNIQUE) PRODUCTA PRODUCTB
Product A or Product B is either 0 or 1 depending on the item bought. Both are not equal to 1 as each row corresponds to a transaction and it is either A or B.
Now I want to extract data such that each customer is listed along with the count of no of product A purchases and product B purchases he made.
select customer,count(PRODUCTA),count(PRODUCTB) from rm_saicharan_final6 group by customer
Its returning all the count including the 0s.
CUSTOMER PRODUCTA PRODUCTB
-------- -------- ---------
32444 209 209
But I want only the count having value=1 not all
Just use SUM as follows:
select customer,SUM(PRODUCTA),SUM(PRODUCTB)
from rm_saicharan_final6 group by customer
SQLFiddle: http://sqlfiddle.com/#!4/ee7da/596
Not entirely sure if this will work but, if it doesnt, it may help
select customer,count(PRODUCTA),count(PRODUCTB) from rm_saicharan_final6 group by customer where PRODUCTA>0 and PRODUCTB>0

Query to find and display unique records in ms-access?

I am using MS-ACCESS. I have a table with field as Receipt_No. In this field there are many times repeated values. I just want to display this repeated values only once rather than displaying it to several times.
Here is my table:
Registration_No Payment_Date Charges Receipt_No
T-11 8/7/2011 200 105
T-12 8/7/2011 200 106
T-13 7/12/11 200 107
T-14 12/7/2011 200 108
T-15 12/7/2011 400 108
Here in Receipt_No field 108 appears 2 times i want to display it only once as:(charges either 200 or 400. But Receipt_No should display once): Please help me..
Registration_No Payment_Date Charges Receipt_No
T-11 8/7/2011 200 105
T-12 8/7/2011 200 106
T-13 7/12/11 200 107
T-14 12/7/2011 200 108
If you want to display only the records in your table with a receipt number that appears exactly once, use this query:
select * from Demand
where reg_no in (
select reg_no
from Demand
group by reg_no
having count(*) = 1
)
With the clarifications you've provided, it looks like what you want is more like in this question, where you want to return all fields, but only one record per receipt number. Here is a variation on the accepted answer:
select * from demand
inner join
(
select
receipt_no,
min(charges) AS min_charges
from
demand
group by
receipt_no
) sq
on demand.receipt_no = sq.receipt_no
and demand.charges = sq.min_charges
Note that this is still not exactly what you want: if there are two or more records with the same values for receipt_no and charges, this query will return them all.
Part of the problem is that your table is not well-defined: it does not appear to have a field that is unique for every record. With such a field, you can modify the query above to return a single row for each receipt_no. (Another part of the problem is that there seems to be something missing from the business requirement: usually, we would want to report the total charges from a receipt, or each charge from a receipt.)
Not sure exactly what you need in your query since you didn't provide many details but using SELECT DISTINCT Omits records that contain duplicate data in the selected fields. To be included in the results of the query, the values for each field listed in the SELECT statement must be unique.
see MS Access Docs for more detail
But as an example the following query would select all LastNames but it would remove duplicate values.
SELECT DISTINCT LastName
FROM Employees;