SQL MAX funtion where not all atributes are in the group by - sql

So my current problem is that I have two tables that look like this:
table1(name, num_patient, quant, inst)
table2(inst_name, num_region)
Where I want to find the patient with max quantity per region.
I first had the idea of doing something like this:
SELECT num_region, num_patient, MAX(quant)
FROM
(SELECT num_patient, quant, num_region
FROM table1
INNER JOIN table2
ON table1.inst = table2.inst_name) AS joined_tables
GROUP BY num_region;
But this doesn't work since either num_patient has to be on the GROUP BY (and this way it doesn't return the max value by region anymore) or I have to remove it from the SELECT (also doesn't work because I need the name of each patient). I have tried to fix my issue with a WHERE quant = MAX() statement but couldn't get it to work. Is there any workaround to this?

Use DISTINCT ON:
SELECT DISTINCT ON (num_region), num_patient, quant, num_region
FROM table1 t1 JOIN
table2 t2
ON t1.inst = t2.inst_name
ORDER BY num_region, quant DESC;
DISTINCT ON is a convenient Postgres extension. It returns one row per keys specified in the SELECT, based on the ordering in the ORDER BY.
Being an extension, not all databases support this functionality -- even databases derived from Postgres. The traditional method would use ROW_NUMBER():
SELECT t.*
FROM (SELECT num_patient, quant, num_region,
ROW_NUMBER() OVER (PARTITION BY num_region ORDER BY quant DESC) as seqnum
FROM table1 t1 JOIN
table2 t2
ON t1.inst = t2.inst_name
) t
WHERE seqnum = 1;

This is a duplicate of the DISTINCT ON question I linked.
SELECT distinct on (num_region) num_patient, quant, num_region
FROM table1
INNER JOIN table2
ON table1.inst = table2.inst_name
ORDER BY num_region, quant desc

Related

SQL Server query showing most recent distinct data

I am trying to build a SQL query to recover only the most young record of a table (it has a Timestamp column already) where the item by which I want to filter appears several times, as shown in my table example:
.
Basically, I have a table1 with Id, Millis, fkName and Price, and a table2 with Id and Name.
In table1, items can appear several times with the same fkName.
What I need to achieve is building up a single query where I can list the last record for every fkName, so that I can get the most actual price for every item.
What I have tried so far is a query with
SELECT DISTINCT [table1].[Millis], [table2].[Name], [table1].[Price]
FROM [table1]
JOIN [table2] ON [table2].[Id] = [table1].[fkName]
ORDER BY [table2].[Name]
But I don't get the correct listing.
Any advice on this? Thanks in advance,
A simple and portable approach to this greatest-n-per-group problem is to filter with a subquery:
select t1.millis, t2.name, t1.price
from table1 t1
inner join table2 t2 on t2.id = t1.fkName
where t1.millis = (select max(t11.millis) from table1 t11 where t11.fkName = t1.fkName)
order by t1.millis desc
using Common Table Expression:
;with [LastPrice] as (
select [Millis], [Price], ROW_NUMBER() over (Partition by [fkName] order by [Millis] desc) rn
from [table1]
)
SELECT DISTINCT [LastPrice].[Millis],[table2].[Name],[LastPrice].[Price]
FROM [LastPrice]
JOIN [table2] ON [table2].[Id] = [LastPrice].[fkName]
WHERE [LastPrice].rn = 1
ORDER BY [table2].[Name]

How to run the subquery with the non equality clause on Spark?

I have this query and I want to execute it on spark
SELECT A.PFR,
A.MFR,
A.MST,
(SELECT COUNT(*)
FROM Table1 T2
WHERE T1.PFR = T2.PFR
AND T1.MFR = T2.MFR
AND T1.MST >= T2.MST) AS RANK
FROM Table1 A
But spark didn't support subquery with non equality clause
I get this error
The correlated scalar subquery can only contain equality predicates
So I tried to use group by but I didn't get the correct results (I have the input and the out result)
SELECT A.PFR,
A.MFR,
A.MST,
B.countRank
FROM Table1 A
LEFT OUTER JOIN
(SELECT PFR,
MFR,
MST,
COUNT(MFR) countRank
FROM Table1 B
GROUP BY PFR,
MFR,
MST) B ON B.PFR = A.PFR
There are a method to convert this query to a join query.
Thanks in advance.
Just use rank():
SELECT A.PFR, A.MFR, A.MST,
RANK() OVER (PARTITION BY PFR, MFR
ORDER BY MST DESC
) as rank
FROM Table1 A;
If rank() doesn't do exactly what you want, then perhaps row_number() or dense_rank() work.

How to use order by and rownum without subselect?

I need to build a query with a order by and rownum but without use a sublect.
It is needed to get the first row of the query ordered.
In other words, I want the result of
select * from (
SELECT CAMP1,ORDERCAMP
FROM TABLENAME
ORDER BY ORDERCAMP) where rownum=1;
but whithout use a subselect. Is it possible?
I have a Oracle 11. You could say this is my whole query:
SELECT T1.CAMP_ID,
T2.CAMP
(SELECT OT.CAMP
FROM OTHERTABLE OT
WHERE OT.FK_TO_TABLE1=T1.CAMP_ID
ORDER BY OT.ORDERCAMP
)
FROM TABLE1 T1,
TABLE2 T2
WHERE T1.FK_TO_T2=T2.PK;
The subquery returns more than one row, and I cant use another subquery like
SELECT T1.CAMP_ID,
T2.CAMP
(SELECT *
FROM
(SELECT OT.CAMP
FROM OTHERTABLE OT
WHERE OT.FK_TO_TABLE1=T1.CAMP_ID
ORDER BY OT.ORDERCAMP
)
WHERE ROWNUM=1
)
FROM TABLE1 T1,
TABLE2 T2
WHERE T1.FK_TO_T2=T2.PK;
SELECT CAMP1,ORDERCAMP FROM TABLE2 ORDER BY ORDERCAMP
Because the T1.CAMP_ID is an invalid identifier in the third level subquery.
I hope I have explained myself enough.
Your current query (without the invalid ORDER BY) gets ORA-01427: single-row subquery returns more than one row. You can nest subqueries, but you can only refer back one level when joining; so if you did:
SELECT T1.CAMP_ID, T2.CAMP,
(SELECT CAMP FROM
FROM
(SELECT OT.CAMP
FROM OTHERTABLE OT
WHERE OT.FK_TO_TABLE1=T1.CAMP_ID
ORDER BY OT.ORDERCAMP
)
WHERE ROWNUM = 1)
FROM TABLE1 T1, TABLE2 T2 WHERE T1.FK_TO_T2=T2.PK;
... then you would get ORA-00904: "T1"."CAMP_ID": invalid identifier. Hence your question, presumably.
What you could do instead is join to the third table, and use the analytic ROW_NUMBER() function to assign the row number, and then use an outer select wrapped around the whole thing to only find the records with the lowest ORDERCAMP:
SELECT CAMP_ID, CAMP, OT_CAMP
FROM (
SELECT T1.CAMP_ID, T2.CAMP, OT.CAMP AS OT_CAMP,
ROW_NUMBER() OVER (PARTITION BY T1.CAMP_ID ORDER BY OT.ORDERCAMP) AS RN
FROM TABLE2 T2
JOIN TABLE1 T1 ON T1.FK_TO_T2=T2.PK
JOIN OTHERTABLE OT ON OT.FK_TO_TABLE1=T1.CAMP_ID
)
WHERE RN = 1;
The ROW_NUMBER() can partition on the T1.CAMP_ID primary key value, or anything else that is unique.
SQL Fiddle demo, including the inner query run on its own so you can see the RN numbers assigned before the outer filter is applied.
Another approach is to use the aggregate KEEP DENSE_RANK FIRST function
SELECT T1.CAMP_ID, T2.CAMP,
MAX(OT.CAMP) KEEP (DENSE_RANK FIRST ORDER BY OT.ORDERCAMP) AS OT_CAMP
FROM TABLE2 T2
JOIN TABLE1 T1 ON T1.FK_TO_T2=T2.PK
JOIN OTHERTABLE OT ON OT.FK_TO_TABLE1=T1.CAMP_ID
GROUP BY T1.CAMP_ID, T2.CAMP;
Which is a bit shorter and doesn't need an inner query. I'm not sure if there's any real advantage of one over the other.
SQL Fiddle demo.
In the most recent version of Oracle, you can do:
SELECT CAMP1, ORDERCAMP
FROM TABLENAME
ORDER BY ORDERCAMP
FETCH FIRST 1 ROWS ONLY;
Otherwise, I think you need a subquery of some sort.
You could use LIMIT or SELECT TOP 1
SELECT CAMP1, ORDERCAMP FROM TABLENAME ORDER BY ORDERCAMP LIMIT 1

SQL selecting minimum value in a sub query when exact value is unknown

I have an SQL query that is meant to select a list of things from different tables using a subquery. I am meant to find those things with the lowest value in a particular column.
This is the query that i currently have. I know the minimum rate is 350 but i cant use it in my query. Any effort to change it to MIN(rate) has been unsuccessful.
SELECT DISTINCT name
FROM table1 NATURAL JOIN table2
WHERE table2.code = (SELECT Code FROM rates WHERE rate = '100')
How do i change that subquery to find the minimum rate?
Most general way to do this would be
select distinct name
from table1 natural join table2
where
table2.code in
(
select t.Code
from rates as t
where t.rate in (select min(r.rate) from rates as r)
)
if you have windowed functions, you can use rank() function:
...
where
table2.code in
(
select t.Code
from (
select r.Code, rank() over(order by r.rate) as rn
from rates as r
) as t
where t.rn = 1
)
in SQL Server you can use top ... with ties syntax:
...
where
table2.code in
(
select top 1 with ties r.Code
from rates as r
order by r.rate
)
SELECT DISTINCT name
FROM table1 NATURAL JOIN table2
WHERE table2.code =
(SELECT CODE FROM RATE WHERE RATE=(SELECT MIN(RATE) FROM RATE))
Considering you are expecting only one record of minimum value.
try this,
WHERE table2.code = (SELECT Code FROM rates ORDER BY rate LIMIT 1)

Can the result of a subquery be joined with itself?

Let's say I need to find the oldest animal in each zoo. It's a typical maximum-of-a-group sort of query. Only here's a complication: the zebras and giraffes are stored in separate tables. To get a listing of all animals, be they giraffes or zebras, I can do this:
(SELECT id,zoo,age FROM zebras
UNION ALL
SELECT id,zoo,age FROM giraffes) t1
Then given t1, I could build a typical maximum-of-a-group query:
SELECT t1.*
FROM t1
JOIN
(SELECT zoo,max(age) as max_age
FROM t1
GROUP BY zoo) t2
ON (t1.zoo = t2.zoo)
Clearly I could store t1 as a temporary table, but is there a way I could do this all within one query, and without having to repeat the definition of t1? (Please let's not discuss modifications to the table design; I want to focus on the issue of working with the subquery result.)
Here is a link to the with clause.
Understanding the WITH Clause
with t1 as
(select id, zoo, age from zebras
union all
select id, zoo, age from giraffes)
select t1.*
from t1
join
(SELECT zoo,max(age) as max_age
FROM t1
GROUP BY zoo) t2
on (t1.zoo = t2.zoo);
Note: You could move t2 up to your with clause as well.
Note 2: An alternative solution is to simply create t1 as a view and use it in your query instead.
To find the oldest animal, you should use wjndoq functions:
select z.*
from (select z.*,
row_number() over (partition by zoo order by age desc) as seqnum
from ( <subquery to union all animals>) z
)
where seqnum = 1