Eliminating duplicate rows in subquery when calculating sum

Eliminating duplicate rows in subquery when calculating sum - sql

I am having troubles to get correct values for sumSquare which is calculated column in following query:
SELECT
building_type,
COUNT(distinct building.id) as buildingCount,
SUM(squareTable.sumSquare) as sumSquare,
FROM building
LEFT JOIN (
SELECT building_id, SUM(square) as sumSquare
FROM building_square
WHERE (square >= '500')
GROUP BY building_id
) squareTable on (squareTable.building_id = building.id)
JOIN building_square ON (building_square.building_id = building.id)
WHERE building_square.square >= '500'
GROUP BY building_type
building_type is column from building table (f.e. some of the types are: house, apartment...), i need to group by building type.
Also i have relation 1:N between building and building_square tables. In main query in where clause i need to filter by square which is column from building_square table.
I quess that problem is that i am joining building_square table in main query as i need to filter on some fields from that table, while i also use subquery that calculates sum of squares for particular building. As a result in sum i am counting some rows more than one time.
How to calculate sum of squares in subquery only for distinct building_id?

Related

Select mininum value after grouping by two columns

The following table contains two grouping variables, id_point and sci_name and a continuous variable distance. There are multiple distance values per unique combinations of id_point and sci_name. I would like to create a table that contains each unique combination between id_point and sci_name and the minimum distance value.
This was my best attempt:
CREATE TABLE sar.test AS
SELECT DISTINCT id_point, sci_name, MIN(distance)
GROUP BY id_point, sci_name,
FROM sar.dist_mam_na;

The GROUP BY goes after the FROM, and by definition of aggregation, the groups will be unique so the DISTINCT is irrelevant:
SELECT id_point, sci_name, MIN(distance)
FROM sar.dist_mam_na
GROUP BY id_point, sci_name

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!

You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.

GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need

Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

Why does MAX statement require a Group By?

I understand why the first query needs a GROUP BY, as it doesn't know which date to apply the sum to, but I don't understand why this is the case with the second query. The value that ultimately is the max amount is already contained in the table - it is not calculated like SUM is. thank you
-- First Query
select
sum(OrderSales),OrderDates
From Orders
-- Second Query
select
max(FilmOscarWins),FilmName
From tblFilm

It is not the SUM and MAX that require the GROUP BY, it is the unaggregated column.
If you just write this, you will get a single row, for the maximum value of the FilmOscarWins column across the whole table:
select
max(FilmOscarWins)
From
tblFilm
If the most Oscars any film won was 12, that one row will say 12. But there could be multiple films, all of which won 12 Oscars, so if we ask for the FilmName alongside that 12, there is no single answer.
By adding the Group By, we fundamentally change the query: instead of returning one number for the whole table, it will return one row for each group - which in this case, means one row for each film.
If you do want to get a list of all those films which had the maximum 12 Oscars, you have to do something more complicated, such as using a sub-query to first find that single number (12) and then find all the rows matching it:
select
FilmOscarWins,
FilmName
From
tblFilm
Where FilmOscarWins = (
select
max(FilmOscarWins)
From
tblFilm
)

If you want the film with the most Oscar wins, then use select top:
select top (1) f.*
From tblFilm f
order by FilmOscarWins desc;
In an aggregation query, the select columns need to be consistent with the group by columns -- the unaggregated columns in the select must match the group by.

Is GROUP BY needed in the following correlated subquery?

Given scenario:
table fd
(cust_id, fd_id) primary-key and amount
table loan
(cust_id, l_id) primary-key and amount
I want to list all customers who have a fixed deposit with an amount less than the sum of all their loans.
Query:
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id);
OR should we use
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id group by cust_id);
A customer can have multiple loans but one FD is considered at a time.

GROUP BY can be omitted in this case, because there is only (one) aggregate function(s) in the SELECT list and all rows are guaranteed to belong to the same group of cust_id ( by the WHERE clause).
The aggregation will be over all rows with matching cust_id in both cases. So both queries are correct.
This would be a cleaner another way to implement the same thing:
SELECT fd.cust_id
FROM fd
JOIN loan USING (cust_id)
GROUP BY fd.cust_id, fd.amount
HAVING fd.amount < sum(loan.amount)
There is one difference: rows with identical (cust_id, amount) in fd only appear once in the result of my query, while they would appear multiple times in the original.
Either way, if there is no matching row with a non-null amount in table loan, you get no rows at all. I assume you are aware of that.

There are no need for GROUP BY since you filtered data by cust_id. In any case inner query will return the same result.

No, it isn't, because you calculate sum(amount) for customer with id = fd.cust_id, so for a single customer.
However, if somehow your subquery calculate sum for more than one customer, the group by would cause the subquery to generate more than one row and this will cause the condition(<) to fail, and thus, the query to fail.

A query with an aggregate like sum but without a group by will output one group. The aggregates will be computed over all matching rows.
A subquery in a condition clause is only allowed to return one row. If the subquery returned multiple rows, what would the following expression mean?
where 1 > (... subquery ...)
So the group by must be omitted; you would even get an error for your second query.
N.B. When you specify all, any, or some a subquery can return multiple rows:
where 1 > ALL (... subquery ...)
But it's easy to see why that doesn't make sense in your case; you'd compare one customer's data to that of another.

MySQL Single Row Returned From Temporary Table

I am running the following queries against a database:
CREATE TEMPORARY TABLE med_error_third_party_tmp
SELECT `med_error_category`.description AS category, `med_error_third_party_category`.error_count AS error_count
FROM
`med_error_category` INNER JOIN `med_error_third_party_category` ON med_error_category.`id` = `med_error_third_party_category`.`category`
WHERE
year = 2003
GROUP BY `med_error_category`.id;
The only problem is that when I create the temporary table and do a select * on it then it returns multiple rows, but the query above only returns one row. It seems to always return a single row unless I specify a GROUP BY, but then it returns a percentage of 1.0 like it should with a GROUP BY.
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
Here are the server specs:
Server version: 5.0.77
Protocol version: 10
Server: Localhost via UNIX socket
Does anybody see a problem with this that is causing the problem?

Standard SQL requires you to specify a GROUP BY clause if any column is not wrapped in an aggregate function (IE: MIN, MAX, COUNT, SUM, AVG, etc), but MySQL supports "hidden columns in the GROUP BY" -- which is why:
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
...runs without error. The problem with the functionality is that because there's no GROUP BY, the SUM is the SUM of the error_count column for the entire table. But the other column values are completely arbitrary - they can't be relied upon.
This:
SELECT category,
error_count/(SELECT SUM(error_count)
FROM med_error_third_party_tmp) AS percentage
FROM med_error_third_party_tmp;
...will give you a percentage on a per row basis -- category values will be duplicated because there's no grouping.
This:
SELECT category,
SUM(error_count)/x.total AS percentage
FROM med_error_third_party_tmp
JOIN (SELECT SUM(error_count) AS total
FROM med_error_third_party_tmp) x
GROUP BY category
...will gives you a percentage per category of the sum of the categories error_count values vs the sum of the error_count values for the entire table.

another way to do it - without the temp table as seperate item...
select category, error_count/sum(error_count) "Percentage"
from (SELECT mec.description category
, metpc.error_count
FROM med_error_category mec
, med_error_third_party_category metpc
WHERE mec.id = metpc.category
AND year = 2003
GROUP BY mec.id
);
i think you will notice that the percentage is unchanging over the categories. This is probably not what you want - you probably want to group the errors by category as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Eliminating duplicate rows in subquery when calculating sum - sql

Related

Select mininum value after grouping by two columns

Confused with the Group By function in SQL

Why does MAX statement require a Group By?

Is GROUP BY needed in the following correlated subquery?

MySQL Single Row Returned From Temporary Table

Categories

Resources