Group by like row filtering - sql

Lets say I have a table describing cars with make, model, year and some other columns.
From that, I would like to get one full row of each make and model for latest year.
Or to put it in another way. Each row would have unique combination of make and model with other data from corresponding row with largest value of year.
Standard SQL solution would be great.

select t1.make, t1.model, t1.year, t1.other_cols
from table t1
where year = (select max(year) from table t2
where t2.make = t1.make
and t2.model = t1.model
);

MySQL solution:
SELECT * FROM cars GROUP NY CONCAT(make, "-", model) ORDER BY year DESC;

This should work most anywhere:
SELECT c1.*
FROM cars c1
INNER JOIN
(
SELECT Make, Model, Max(Year) AS Year
FROM cars
GROUP BY Make, Model
) c2 ON c1.Make=c2.Make AND c1.Model=c2.Model, c1.Year=c2.Year
The main caveat is that Year is often a reserved word (function name) and the means for escaping reserved words vary by platorm. To fix that, rename the year column to something like ModelYear.

Related

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Table inner join itself

I have a table with 3 columns (code, state, date), it records the history of a code state, each code may have changed state multiple times.
I want to show the last state of each code what I did was like this
SELECT code,MAX(date), ....
FROM table
GROUP BY code.
I don't know what to put exactly to get the state. I tried to just put state so it gets the state corresponding to the combination of code,max(date) but it gives me the error of not in aggregate function.
thank you in advance for your help.
If I understand you have data such as
CODE State Date
1 IL 1/1/2016
1 IA 1/1/2017
1 AL 1/1/2015
and you want to see in your results
1 IA 1/1/2017
using a window function and a common table expression (with): we assign a row number to each code based on the date in descending order and return only the first row for each.
With CTE AS (SELECT code
, date
, state
, Row_number() over (partition by code order by date desc) RN
FROM table )
SELECT Code, Date, State
FROM CTE
WHERE RN =1
Using a subquery: (we get the max date for each code and then join back to the base set to limit the rows returned.
SELECT A.code, A.date, A.state
FROM table A
INNER JOIN (SELECT max(date) mdate, code
FROM table
GROUP BY code) B
on A.Code = B.Code
and A.Date = B.MDate
The later query was used when/if window functions are not available. The modern method of solving your question is using the first approach.
In essence what the 1st query does is assign the # 1 to x for each code based on the date descending. So the max date gets a RN of 1 for each code. Thus when we say where RN = 1 we only return codes/states/records having max dates for the code in question. We use a with statement because we need the RN to materialize (actually get generated in memory) so that we can then limit by it in the second part of the with (common table expression) query.
If you're doing an aggregate, like MAX(), then all other non-aggregate columns that are in your select, need to also be in your GROUP BY. That's why you're getting the error when you add state to only the select. If you add it to the select and group by it, you'll get your results:
SELECT State, Code, MAX(Date)
FROM table
GROUP BY State, Code
If you want to user inner join like you mention in your post Inner join back to itself with matching code and date
SELECT *
FROM table t1
INNER JOIN (SELECT code,MAX(date)
FROM table
GROUP BY code) codeWithLatestDate ON t1.code = codeWithLatestDate.code AND t1.date = codeWithLatestDate.dat3
However I would suggest add state to your GROUP BY clause and SELECT cluase
SELECT code,MAX(date),state
FROM table
GROUP BY code, state
Youn can do it with a join to itself
SELECT State,Code,Date
FROM table t
JOIN (
SELECT Code, MAX(Date) as Date
FROM table
GROUP BY Code) t1 on t1.Code= t.Code and t.Date=t1.Date

Selecting the max value of two different columns

I have the following table named 'MoviesInStock'
I would like to select to latest movies from the last month.
In this case, the result should be only the movie 'The Mummy' since he is latest one.
I was trying the next query:
SELECT MovieName
FROM MovieInStock
WHERE Month = (SELECT MAX(Month) FROM MovieInStock) AND
(SELECT MovieName FROM MovieInStock WHERE Year = (SELECT MAX(Year) FROM MovieInStock))
But choosing the AND operator was not that smart. I was also trying to create a temporary table using SELECT INTO # for selecting the Max Year and then on the temp table to select the Max Month, but then it become complicated to me.
You are overcomplicating the problem. You can use TOP with ORDER BY.
Because you say "movies":
select top (1) with ties mis.*
from movieinstock mis
order by year desc, month desc
other solution, but better is Gordon Solution
with maxdt as (
select MAX(Month) MaxMonth, MAX(Year) MaxYear FROM MovieInStock
)
SELECT top 1 MovieName
FROM MovieInStock f1
inner join maxdt f2 on f1.Month=f2.MaxMonth and f1.Year=MaxYear

Several MAX values based on another column value

I'm trying to write a SQL query in MS ACCESS and I've narrowed it down to the table below, but can't seem to get the last thing right without making several extremely large querys.
Here's the strucuture of thetable I'm trying to query:
The results I want: MemberId and year where memberId had most visits in that year.(That is which memberId had most visits 2014, which had most visits 2015 etc..and I also want the relevant year to be shown in the result)
Thanks!
Sounds like you need to determine MAX(Visits) by year in a subquery, then JOIN to that:
SELECT a.*,b.Max_Visits
FROM YourTable a
JOIN (SELECT Year,MAX(Visits) AS Max_Visits
FROM YourTable
GROUP BY Year
) b
ON a.Year = b.Year
AND a.Visits = b.Max_Visits
If you want to see all members and not just those that had the most visits per year, you can change from JOIN to LEFT JOIN
If there's a tie, this returns both members.

SELECT a variable as condition and display another one

My problem is the following.
I want to SELECT the minimum of a years list and display another row of my table.
An example:
SELECT MIN(Year)
FROM table -> Searching for the lowest year.
and then I want it to display the Winners of the first year.
Is there a way to do this in just one line?
select winner from table where year in (select min(year) from table)
You need to do a self-JOIN between table and itself (I suppose you want to do it in a single statement, not a single line):
SELECT A.*
FROM table AS A
JOIN ( SELECT MIN(Year) AS Year FROM table ) AS B
ON (A.Year = B.Year);
This assumes that there is only one record per minimum-year in every group of interest.
Try it
select t.columntoshow, min(t.year) from table t group by t.columntoshow