SQL Group by one column - sql

so I have this table:
ID INITIAL_DATE TAX
A 18-02-2012 105
A 19-02-2012 95
A 20-02-2012 105
A 21-02-2012 100
B 18-02-2012 135
B 19-02-2012 150
B 20-02-2012 130
B 21-02-2012 140
and what I need is, for each distinct ID, the highest TAX ever. And if that TAX occurs twice I want the record with the highest INITIAL_DATE.
This is the query I have:
SELECT ID, MAX (initial_date) initial_date, tax
FROM t t0
WHERE t0.tax = (SELECT MAX (t1.tax)
FROM t t1
WHERE t1.ID = t0.ID
GROUP BY ID)
GROUP BY ID, tax
ORDER BY id, initial_date, tax
but I want to believe there is a better way of grouping these records.
Is there any way of NOT grouping by all the columns in the SELECT?

Have you tried with analytical functions?:
SELECT t0.ID, t0.INITIAL_DATE, t0.TAX
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TAX DESC , INITIAL_DATE DESC) Corr
FROM t) t0
WHERE t0.Corr = 1

As far as I know all the columns in a SELECT that are not aggregated in one or another way, must be part of the GROUP BY statement.

i have tested and this is a solution that works
SELECT t1.ID ,
MAX(t1.data) ,
t1.tax FROM test t1
INNER JOIN ( SELECT ID ,
MAX(tax) as maxtax
FROM test
GROUP BY ID
) t2 ON t1.ID = t2.ID
AND t1.tax = t2.maxtax GROUP BY t1.ID ,t1.tax

Related

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?
One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;
Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle
you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t
This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

Select multiple max values after GROUP BY query

Suppose I have a table look like this:
date ID income
0 9/1 C 10.40
1 9/3 A 33.90
2 9/3 B 29.10
3 9/4 C 19.30
4 9/4 B 17.80
5 9/5 B 9.55
6 9/5 C 11.10
7 9/5 A 13.10
8 9/7 A 29.10
9 9/7 B 29.10
I want to find out the ID who made the most income for each date. The most intuitive approach would be writing
SELECT ID, MAX(income) FROM table GROUP BY date
But there are two IDs who made the same MAX income on 9/7, I want to retain all ties on the same date, by using that query I will ignore one ID on 9/7, and 29.1 appears on 9/3 and 9/7, any other approach?
A join based approach doesn't have this problem, and would retain all records tied for the max income on a given date.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT date, MAX(income) AS max_income
FROM yourTable
GROUP BY date
) t2
ON t1.date = t2.date AND t1.income = t2.max_income
ORDER BY
t1.date;
The way the above query works is to join the complete original table to a subquery which finds, for each date, the maximum income value. This has the effect of filtering off any record which did not have the max income on a given date. Pay close attention to the join condition, which has two components, the date, and the income.
If your database supports analytic function, we can also use RANK here:
SELECT date, ID, income
FROM
(
SELECT t.*, RANK() OVER (PARTITION BY date ORDER BY income DESC) rnk
FROM yourTable t
) t
WHERE rnk = 1
ORDER BY date;
one approach can be like below
with cte1
(
Select t1.*
FROM yourTable t1
INNER JOIN
(
SELECT date, MAX(income) AS max_income
FROM yourTable
GROUP BY date
) t2
ON t1.date = t2.date AND t1.income = t2.max_income
) select min(ID) as ID, date,income from cte1
group by date,income
As you not mentioned which id you need in case of two ID's(when income is same on a particular date) so i took minimum id among them when two id's income is same on a particular date But at the same time you may use max() function also
Try below using subquery and as you've tie for one date so take minimum ID which'll give you one id from date 9/7
select date,min(ID),income
from
(SELECT t1.date, t1.ID,t1.income
FROM tablename t1
INNER JOIN
(
SELECT date, MAX(income) AS mincome
FROM yourTable
GROUP BY date
) t2 ON t1.date = t2.date AND t1.income = t2.mincome
)X group by date,income

How to query specific values for some columns and sum of values in others SQL

I'm trying to query some data from SQL such that it sums some columns, gets the max of another column and the corresponding row for a third column. For example,
|dataset|
|shares| |date| |price|
100 05/13/16 20.4
200 05/15/16 21.2
300 06/12/16 19.3
400 02/22/16 20.0
I want my output to be:
|shares| |date| |price|
1000 06/12/16 19.3
The shares have been summed up, the date is max(date), and the price is the price at max(date).
So far, I have:
select sum(shares), max(date), max(price)
but that gives me an incorrect price.
EDIT:
I realize I was unclear in my OP, all the other relevant data is in one table, and the price is in other. My full code is:
select id, stock, side, exchange, max(startdate), max(enddate),
sum(shares), sum(execution_price*shares)/sum(shares), max(limitprice), max(price)
from table1 t1
INNER JOIN table2 t2 on t2.id = t1.id
where location = 'CHICAGO' and startdate > '1/1/2016' and order_type = 'limit'
group by id, stock, side, exchange
You can do this with window functions and aggregation. Here is an example:
select sum(shared), max(date), max(case when seqnum = 1 then price end) as price
from (select t.*, row_number() over (order by date desc) as seqnum
from t
) t;
EDIT:
If the results that you are looking at are in fact the result of a query, you can do:
with t as (<your query here>)
select sum(shared), max(date), max(case when seqnum = 1 then price end) as price
from (select t.*, row_number() over (order by date desc) as seqnum
from t
) t;
Heres one way to do it .... the join would obviously include the ticker symbol for the share also
select
a.sum_share,
a.max_date
b.price
FROM
(
select ticker , sum(shares) sum_share, max(date) max_date from table where ticker = 'MSFT' group by ticker
) a
inner join table on a.max_date = b.date and a.ticker = b.ticker

Custom GROUP BY clause

SAMPLE DATA
Suppose I have table like this:
No Company Vendor Code Date
1 C1 V1 C1 2016-03-08
1 C1 V1 C1 2016-03-07
1 C1 V1 C2 2016-03-06
DESIRED OUPUT
Desired output should be:
No Company Vendor Code Date
1 C1 V1 C1 2016-03-08
It should take max Date for No, Company, Vendor (group by these columns). But shouldn't group by Code, It have to be taken for that Date.
QUERY
SQL query like:
.....
LEFT JOIN (
SELECT No_, Company, Vendor, Code, MAX(Date)
FROM tbl
GROUP BY No_, Company, Vendor, Code
) t2 ON t1.Company = t2.Company and t1.No_ = t2.No_
.....
OUTPUT FOR NOW
But I got output for now:
No Company Vendor Code Date
1 C1 V1 C1 2016-03-08
1 C1 V1 C2 2016-03-06
That because Code records are different, but It should take C1 code in this case (because No, Company, Vendor match)
WHAT I'VE TRIED
I've tried to remove Code from GROUP BY clause and use SELECT MAX(Code)..., but this is wrong that because It take higher Code by alphabetic.
Have you ideas how can I achieve It? If something not clear I can explain more.
If you don't have any identity column for your table then each row is identified by all column values combination it has. That brings us weird on statement. It includes all columns we are grouping by and a date column which is max for given tuple (No_, Company, Vendor).
select t1.No_, t1.Company, t1.Vendor, t1.Code, t1.Date
from tbl t1
join (select No_, Company, Vendor, MAX(Date) as Date
from tbl
group by No_, Company, Vendor) t2
on t1.No_ = t2.No_ and
t1.Company = t2.Company and
t1.Vendor = t2.Vendor and
t1.Date = t2.Date
Take a look at this similar question.
Edit
Thank you for an answer, but this returning duplicates. Suppose that there can be rows with equal No, Company, Vendor and Date, some other columns are different, but no care. So with INNER SELECT everything fine, It returning distinct values, but problem accured when joining t1, that because It have multiple values.
Then you might be interested in such tsql constructions as rank or row_number. Take a look at Ullas' answer. Try rank as well as it can give slightly different output which might fit your needs.
You could give a row_number partitioned by No, Vendor and Date and order by descending order of date.
Query
;with cte as (
select rn = row_number() over(
partition by [No], Company, Vendor
order by [Date] desc
), *
from tbl
)
select [No], Company, Vendor, Code, [Date] from cte
where rn = 1;
If 1 date only can have 1 record, then you can Query it by search the max date first, then check it.
select No_, Company, Vendor, Code, Date
FROM tbl
where Date in
(select MAX(Date) from tbl GROUP BY No_, Company, Vendor)
if there is more than 1 row that could have the same date, then you could use partition
with cte as
(
select *, ROW_NUMBER() over(partition by No_, Company, Vendor order by Date DESC) as rn
from tbl
)
select No_, Company, Vendor, Code, Date
from cte
where rn=1
A Common Table Expression will do it for you.
WITH cte(N,C,V,D)
AS
(
SELECT t1.[No]
,t1.[Company]
,t1.[Vendor]
,MAX(t1.[Date])
FROM [MyTest] t1
GROUP BY t1.[No]
,t1.[Company]
,t1.[Vendor]
)
SELECT N,C,V,t2.Code,D
FROM cte c
INNER JOIN MyTest t2 ON c.N = t2.No AND c.C = t2.Company AND c.V = t2.Vendor AND c.D = t2.Date

Group-by statement doesn't work

So currently, I have the following table:
ID, Name, Code, Date
1 AB x1 01/03/2014
1 AB x2 01/04/2014
1 AB x3 01/05/2014
2 BC x3 01/05/2014
2 BC x5 01/06/2014
3 CD x1 01/06/2014
I want the following output:
ID, Name, Code, Date
1 AB x3 01/05/2014
2 BC x5 01/06/2014
3 CD x1 01/06/2014
So basically, I just want the latest date, without caring for the code.
In my code, I have
select id, name, code, max(date)
group by id, name, code
But the group by does not work as it's also going to take the code into consideration, thus I don't get just the latest date. Also, I can't leave code in the group by statement as it'll give me an error.
How do I use a group by without including code?
I'm using PL/SQL developer as IDE.
select id, name, code, date
from (
select id, name, code,
date,
max(date) over (partition by id) as max_date
from the_table
)
where date = max_date;
If you want to pick exactly one of the dates if there are multiple "max dates" you can use row_number() instead:
select id, name, code, date
from (
select id, name, code,
date,
row_number() over (partition by id order by date desc) as rn
from the_table
)
where rn = 1;
Btw: date is a horrible name for a column. For one because it's also the name of a data type but more importantly because it does not document at all what the column contains. An "end date"? A "start date"? A "due date"? ...
What you want is latest updated record right?
select t1.*
from table t1
inner join (select id, name, max(date) as latest_date
from table
group by id, name) t2 on t1.date = t2.latest_date
and t1.id = t2.id and t1.name = t2.name
It will be good to have index on date column
I assume you want to get whatever the code is that is on the row that has the max date. If you truly don't care what code gets returned, just use an aggregate function on it like max(code).
Otherwise, you can do this:
SELECT t1.id, t1.name, t2.code, t2.date
FROM MyTable t1
CROSS JOIN (
SELECT TOP 1 code, date
FROM MyTable t3
WHERE t3.id=t1.id
AND t3.name=t1.name
ORDER BY t3.date DESC
) t2
I'm not sure if CROSS JOIN is PL/SQL compatible, but you can find the equivalent, I'm sure.