exclude one column from grouping in sql query - sql

I run an sql query to calculate mean and count after grouping combinations of themes, and country. This works fine.
create table georisk as (select themes, country , AVG(value) as mean, count(themes) as count from mytable group by themes, country order by themes, suppliers_country)
However, now I want to add an additional col to my table with the value max(date_t)without grouping by anything. A single value will be added for all the rows. If I do this:
create table georisk as (select themes, country , AVG(value) as mean, count(themes) as count, max(date_t) as last_included_date from mytable group by themes, country order by themes, suppliers_country)
the max(date_t) will also be according to the grouping. How can I just extract one max value within a single query?

I think you want this...
select
themes,
country,
AVG(value) as mean,
count(themes) as count,
max(date_t) as last_included_date,
max(max(date_t)) over () as very_last_include_date
from
mytable
group by
themes,
country
order by
themes,
country -- Note, you had a typo here ; suppliers_country
The GROUP BY is evaluated before the SELECT, then the aggregates are evaluated, then the window function is evaluated.
MAX(
MAX(date_t) -- normal aggregate
)
OVER () -- window function across whole result set's values of `MAX(date_t)`
Normally a window function has a PARTITION BY, leaving it empty means 'no partition' and therefor 'whole result set'.

Related

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!
You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.
GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need
Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

count distinct issue in Hive

I'm trying to compute a number of (unique) apparition of each element in a Hive table column regarding other columns.
I tried this query, but I've this error Expression not in GROUP BY key custom
SELECT custom, dist_pt, dt, art, COUNT(DISTINCT art) OVER (PARTITION BY custom, dist_pt) as nb_art FROM Tab ;
Remove DISTINCT from your COUNT() and add "GROUP BY art" at the end of your query. You need to segment, or group by, art in order to count how many records have each unique value of art.

Usage of aggregate function Group by

I have observed that Count function can be used without the usage of aggregate function Group by. Like for example:
Select Count(*) from Employee
It would surely return the count of all the rows without the usage of aggregate function. Then where do we really need the usage of group by?
Omitting the GROUP BY implies that the entire table is one group. Sometimes you want there to be multiple groups. Consider the following example:
SELECT month, SUM(sales) AS total_sales
FROM all_sales
GROUP BY month;
This query gives you a month-by-month breakdown of sales. If you omitted month and the GROUP BY clause, you would only receive the total sales of all time which may not have the granularity you require.
You can also group by multiple columns, giving finer detail still:
SELECT state, city, COUNT(*) AS population
FROM all_people
GROUP BY state, city;
Additionally, using a GROUP BY allows us to use HAVING clauses. Which lets us filter groups. Using the above example, we can filter the result to cities with over 1,000,000 people:
SELECT state, city, COUNT(*) AS population
FROM all_people
GROUP BY state, city
HAVING COUNT(*) > 1000000;
The group by clause is used to break up aggregate results to groups of unique values. E.g., let's say you don't want to know how many employees you have, but how many by each first name (e.g., two Gregs, one Adam and three Scotts):
SELECT first_name, COUNT(*)
FROM employee
GROUP BY first_name

Beginning SQL group by and AVG

I am trying to pull information from two columns titled clientstate and clientrevenue in my table. I want clientstate to show up as the state, and have only distinct names in it, and under client revenue I want the average revenue per state, and that will only show up if there are at least two clients from that state. I am very new at this, so what I have is pretty iffy:
SELECT clientstate, clientrevenue
FROM client
GROUP BY clientrevenue
HAVING COUNT (*) >=2;
Where am I going wrong here?
SELECT clientstate AS [State]
, AVG(clientrevenue) AS [Average Revenue]
FROM client
GROUP BY clientstate
Grouping by ClientRevenue will try to group similar values and that doesn't have a logical sense.
First, in order to get distinct states, clientstate column needs to be used in the GROUP BY statement.
Thus, the code would be :
SELECT clientstate, AVG(clientrevenue)
FROM Source_Table
GROUP BY clientstate --this would get you distinct states
Now, considering the 2 clients per state, it's rather a condition than a HAVING statement. HAVING statement limits your query results according to the aggregate function you are using. For instance, in the code aforementioned, the aggregate function is AVG(clientrevenue). So, we can only use it in HAVING. we can not add count(*) unless it was used in SELECT.
So, you need to add it as a condition like
SELECT clientstate, AVG(clientrevenue)
FROM Source_Table A
WHERE (SELECT count(DISTINCT client_ID) FROM Source_Table B
WHERE A.clientstate = B.clientstate) >= 2 --Condition
GROUP BY clientstate --this would get you distinct states

MySQL Single Row Returned From Temporary Table

I am running the following queries against a database:
CREATE TEMPORARY TABLE med_error_third_party_tmp
SELECT `med_error_category`.description AS category, `med_error_third_party_category`.error_count AS error_count
FROM
`med_error_category` INNER JOIN `med_error_third_party_category` ON med_error_category.`id` = `med_error_third_party_category`.`category`
WHERE
year = 2003
GROUP BY `med_error_category`.id;
The only problem is that when I create the temporary table and do a select * on it then it returns multiple rows, but the query above only returns one row. It seems to always return a single row unless I specify a GROUP BY, but then it returns a percentage of 1.0 like it should with a GROUP BY.
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
Here are the server specs:
Server version: 5.0.77
Protocol version: 10
Server: Localhost via UNIX socket
Does anybody see a problem with this that is causing the problem?
Standard SQL requires you to specify a GROUP BY clause if any column is not wrapped in an aggregate function (IE: MIN, MAX, COUNT, SUM, AVG, etc), but MySQL supports "hidden columns in the GROUP BY" -- which is why:
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
...runs without error. The problem with the functionality is that because there's no GROUP BY, the SUM is the SUM of the error_count column for the entire table. But the other column values are completely arbitrary - they can't be relied upon.
This:
SELECT category,
error_count/(SELECT SUM(error_count)
FROM med_error_third_party_tmp) AS percentage
FROM med_error_third_party_tmp;
...will give you a percentage on a per row basis -- category values will be duplicated because there's no grouping.
This:
SELECT category,
SUM(error_count)/x.total AS percentage
FROM med_error_third_party_tmp
JOIN (SELECT SUM(error_count) AS total
FROM med_error_third_party_tmp) x
GROUP BY category
...will gives you a percentage per category of the sum of the categories error_count values vs the sum of the error_count values for the entire table.
another way to do it - without the temp table as seperate item...
select category, error_count/sum(error_count) "Percentage"
from (SELECT mec.description category
, metpc.error_count
FROM med_error_category mec
, med_error_third_party_category metpc
WHERE mec.id = metpc.category
AND year = 2003
GROUP BY mec.id
);
i think you will notice that the percentage is unchanging over the categories. This is probably not what you want - you probably want to group the errors by category as well.