Select mininum value after grouping by two columns - sql

The following table contains two grouping variables, id_point and sci_name and a continuous variable distance. There are multiple distance values per unique combinations of id_point and sci_name. I would like to create a table that contains each unique combination between id_point and sci_name and the minimum distance value.
This was my best attempt:
CREATE TABLE sar.test AS
SELECT DISTINCT id_point, sci_name, MIN(distance)
GROUP BY id_point, sci_name,
FROM sar.dist_mam_na;

The GROUP BY goes after the FROM, and by definition of aggregation, the groups will be unique so the DISTINCT is irrelevant:
SELECT id_point, sci_name, MIN(distance)
FROM sar.dist_mam_na
GROUP BY id_point, sci_name

Related

(Hive) SQL retrieving data from a column that has 1 to N relationship in another column

How can I retrieve rows where BID comes up multiple times in AID
You can see the sample below, AID and BID columns are under the PrimaryID, and BIDs are under AID. I want to come up with an output that only takes records where BIDs had 1 to many relationship with records on AIDs column. Example output below.
I provided a small sample of data, I am trying to retrieve 20+ columns and joining 4 tables. I have unqiue PrimaryIDs and under those I have multiple unique AIDs, however under these AIDs I can have multiple non-unqiue BIDs that can repeatedly come up under different AIDs.
Hive supports window functions. A window function can associate every row in a group with an attribute of the group. Count() being one of the supported functions. In your case you can use that a and select rows for which that count > 1
The partition by clause you specify which columns define the group, tge same way that you would in the more familiar group by clause.
Something like this:
select * from
(
Select *,
count(*) over (partition by primaryID,AID) counts
from mytable
) x
Where counts>1

Why does MAX statement require a Group By?

I understand why the first query needs a GROUP BY, as it doesn't know which date to apply the sum to, but I don't understand why this is the case with the second query. The value that ultimately is the max amount is already contained in the table - it is not calculated like SUM is. thank you
-- First Query
select
sum(OrderSales),OrderDates
From Orders
-- Second Query
select
max(FilmOscarWins),FilmName
From tblFilm
It is not the SUM and MAX that require the GROUP BY, it is the unaggregated column.
If you just write this, you will get a single row, for the maximum value of the FilmOscarWins column across the whole table:
select
max(FilmOscarWins)
From
tblFilm
If the most Oscars any film won was 12, that one row will say 12. But there could be multiple films, all of which won 12 Oscars, so if we ask for the FilmName alongside that 12, there is no single answer.
By adding the Group By, we fundamentally change the query: instead of returning one number for the whole table, it will return one row for each group - which in this case, means one row for each film.
If you do want to get a list of all those films which had the maximum 12 Oscars, you have to do something more complicated, such as using a sub-query to first find that single number (12) and then find all the rows matching it:
select
FilmOscarWins,
FilmName
From
tblFilm
Where FilmOscarWins = (
select
max(FilmOscarWins)
From
tblFilm
)
If you want the film with the most Oscar wins, then use select top:
select top (1) f.*
From tblFilm f
order by FilmOscarWins desc;
In an aggregation query, the select columns need to be consistent with the group by columns -- the unaggregated columns in the select must match the group by.

Eliminating duplicate rows in subquery when calculating sum

I am having troubles to get correct values for sumSquare which is calculated column in following query:
SELECT
building_type,
COUNT(distinct building.id) as buildingCount,
SUM(squareTable.sumSquare) as sumSquare,
FROM building
LEFT JOIN (
SELECT building_id, SUM(square) as sumSquare
FROM building_square
WHERE (square >= '500')
GROUP BY building_id
) squareTable on (squareTable.building_id = building.id)
JOIN building_square ON (building_square.building_id = building.id)
WHERE building_square.square >= '500'
GROUP BY building_type
building_type is column from building table (f.e. some of the types are: house, apartment...), i need to group by building type.
Also i have relation 1:N between building and building_square tables. In main query in where clause i need to filter by square which is column from building_square table.
I quess that problem is that i am joining building_square table in main query as i need to filter on some fields from that table, while i also use subquery that calculates sum of squares for particular building. As a result in sum i am counting some rows more than one time.
How to calculate sum of squares in subquery only for distinct building_id?

SQL: Find duplicates and for each duplicate group assign value of first duplicate of that group

I have the results in the top table. I would like the results in the bottom table.
Using an SQL query on the table above, I would like to find groups of duplicates (where the values in all columns except Id and Category are identical) and from that create a result that has for each entry the lowest Id from its group of duplicates and the (unmodified) Category from the original table.
Window function min can be used here:
select min(id) over (partition by first_name, last_name, company) id,
category
from t;

Counting how many times a value appeared

I am using SELECT DISTINCT name FROM table to get a table with distinct names that appear in the column. How can i get an extra column returning the number of times that this specific name have appeared on the column?
select name, count(1) from table
group by name;