Using MAX to compute MAX value in a subquery column - sql

What I am trying to do: I have a table, "band_style" with schema (band_id, style).
One band_id may occur multiple times, listed with different styles.
I want ALL rows of band_id, NUM (where NUM is the number of different styles a band has) for the band ids with the SECOND MOST number of styles.
I have spent hours on this query- almost nothing seems to be working.
This is how far I got. The table (data) successfully computes all bands with styles less than the maximum value of band styles. Now, I need ALL rows that have the Max NUM for the resulting table. This will give me bands with the second most number of styles.
However, this final result seems to be ignoring the MAX function and just returning the table (data) as is. Can someone please provide some insight/working method? I have over 20 attempts of this query with this being the closest.
Using SQL*PLUS on Oracle
WITH data AS (
SELECT band_id, COUNT(*) AS NUM FROM band_style GROUP BY band_id HAVING COUNT(*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id)))
SELECT data.band_id, data.NUM FROM data
INNER JOIN ( SELECT band_id m, MAX(NUM) n
FROM data GROUP BY band_id
) t
ON t.m = data.band_id
AND t.n = data.NUM;

Something like this... based on a Comment under your post, you are looking for DENSE_RANK()
select band_id
from ( select band_id, dense_rank() over (order by count(style) desc) as drk
from band_style
group by band_id
)
where drk = 2;

I would use a windowing function (RANK() in this case) - which is great for find the 'n' ranked thing in a set.
SELECT DISTINCT bs.band_id
FROM band_style bs
WHERE EXISTS (
SELECT NULL
FROM (
SELECT
bs2.band_id,
bs2.num,
RANK() OVER (ORDER BY bs2.num) AS numrank
FROM (
SELECT bs1.band_id, COUNT(*) as num
FROM band_style bs1
GROUP BY bs1.band_id ) bs2 ) bs3
WHERE bs.band_id = bs3.band_id
AND bs3.numrank = 2 )

Related

Can I use a CTE data inside another CTE by joining both of them (Oracle SQL)

Requirement
I want to get the first four hundred GROUP_ID's from a table(greater than input GROUP_ID), and in the same table against each GROUP_ID, there are two LOG_ID's out of which I want the lowest one. Once I get the lowest LOG_ID, I will use that LOG_ID to get the data from another table where it is a foreign key.
APPROACH I USED
First I have formed a subset of top 400 GROUP_ID's which are greater than input GROUP_ID's
Then I used all the GROUP_IDs in my second subset where I will get the lowest LOG_ID against each GROUP_ID.
And finally, when I have the lowest LOG_ID, I used it to get the details from another table.
QUERY USED
WITH INIT AS (
SELECT GROUP_ID
FROM PV_ADAPTER_LOG
WHERE GROUP_ID > 2004141441192825
AND ADAPTER_ID IN ('2568','2602')
ORDER BY GROUP_ID
FETCH FIRST 400 ROWS ONLY
)
,INIT2 AS (
SELECT MIN(L.LOG_ID) AS LOG_ID
FROM PV_ADAPTER_LOG L
JOIN INIT ON INIT.GROUP_ID =L.GROUP_ID
GROUP BY L.GROUP_ID
)
SELECT A.LOG_ID,A.OPER_SEQ AS CALL_SEQUENCE,A.GROUP_ID ,B.INTERFACE_ID,A.INSTRUCTION_NAME, B.ADAPTER_DETAIL AS XML_CONTENT,B.SEQ AS XML_SEQUENCE
FROM INIT2
JOIN PV_ADAPTER_LOG A ON A.LOG_ID=INIT2.LOG_ID
JOIN PV_ADAPTER_LOG_DETAIL B ON B.LOG_ID=A.LOG_ID
Is my approach right or is there any other way to achieve this.
I think this is what you're looking for:
Use row_number ordered by group to find the first 400 rows
Use row_number partitioned by group and ordered by log to find the first log per group
Which is:
WITH INIT AS (
SELECT P.*,
ROW_NUMBER () OVER (
ORDER BY GROUP_ID
) RN,
ROW_NUMBER () OVER (
PARTITION BY GROUP_ID
ORDER BY LOG_ID
) MN
FROM PV_ADAPTER_LOG p
WHERE GROUP_ID > 2004141441192825
AND ADAPTER_ID IN ('2568','2602')
)
SELECT * FROM INIT
WHERE RN <= 400
AND MN = 1
You can use the analytical function to get the first 400 groups and then record with min log_id per group in a single query as follows:
SELECT GROUP_ID, LOG_ID FROM
(SELECT P.GROUP_ID, P.LOG_ID,
ROW_NUMBER() OVER (ORDER BY GROUP_ID) AS RNGRP,
ROW_NUMBER() OVER (PARTITION BY GROUP_ID ORDER BY LOG_ID) AS RNLOG
FROM PV_ADAPTER_LOG
WHERE GROUP_ID > 2004141441192825
AND ADAPTER_ID IN ('2568','2602'))
WHERE RNGRP <= 400 AND RNLOG = 1;
You can then use it wherever you want to use it. (In CTE or In Inner view)

Most frequent attribute for a subject not returning correct number of distinct subjects

I was trying to write an SQL SELECT to return the most frequent TIPO_ASSEPSIA for each distinct EPISODIO in my sql server.
Well from what I've seen the SELECT technically works (ex:in an EPISODIO with 3 rows, if two of the TIPO_ASSEPSIA are the same, it will choose that attribute as the most frequent) but when I went to check the number of rows I get, I get 3822 rows. This can't possibly be right since if I count the number of distinct EPISODIO I get 3897, so there are several dozens of rows missing and I don't know why. Any ideas?
The code I am using is the following one
SELECT DISTINCT
F1.EPISODIO,
F1.TIPO_ASSEPSIA
FROM DWS_DM F1
WHERE
F1.TIPO_ASSEPSIA =
( SELECT t.TIPO_ASSEPSIA from
(
SELECT TOP 1 TIPO_ASSEPSIA , (COUNT(*)) AS freq
FROM DWS_DM F2
WHERE F2. EPISODIO = F1.EPISODIO
GROUP BY F2.TIPO_ASSEPSIA
ORDER BY count(TIPO_ASSEPSIA) DESC)t
)
SELECT DISTINCT
F1.EPISODIO,
x.TIPO_ASSEPSIA
FROM DWS_DM F1
OUTER APPLY
( SELECT TOP 1 r.TIPO_ASSEPSIA , (COUNT(*)) AS freq
FROM DWS_DM r
WHERE r.EPISODIO = F1.EPISODIO
GROUP BY TIPO_ASSEPSIA
ORDER BY count(TIPO_ASSEPSIA) DESC
) x
Why are you referring back to your F1 alias? I think you want to do this:
SELECT DISTINCT
F1.EPISODIO,
F1.TIPO_ASSEPSIA
FROM DWS_DM F1
WHERE
F1.TIPO_ASSEPSIA =
( SELECT t.TIPO_ASSEPSIA from
(
SELECT TOP 1 TIPO_ASSEPSIA , (COUNT(*)) AS freq
FROM DWS_DM
GROUP BY TIPO_ASSEPSIA
ORDER BY count(TIPO_ASSEPSIA) DESC)t
)

SQL query to select distinct row with minimum value

I want an SQL statement to get the row with a minimum value.
Consider this table:
id game point
1 x 5
1 z 4
2 y 6
3 x 2
3 y 5
3 z 8
How do I select the ids that have the minimum value in the point column, grouped by game? Like the following:
id game point
1 z 4
2 y 5
3 x 2
Use:
SELECT tbl.*
FROM TableName tbl
INNER JOIN
(
SELECT Id, MIN(Point) MinPoint
FROM TableName
GROUP BY Id
) tbl1
ON tbl1.id = tbl.id
WHERE tbl1.MinPoint = tbl.Point
This is another way of doing the same thing, which would allow you to do interesting things like select the top 5 winning games, etc.
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Point) as RowNum, *
FROM Table
) X
WHERE RowNum = 1
You can now correctly get the actual row that was identified as the one with the lowest score and you can modify the ordering function to use multiple criteria, such as "Show me the earliest game which had the smallest score", etc.
This will work
select * from table
where (id,point) IN (select id,min(point) from table group by id);
As this is tagged with sql only, the following is using ANSI SQL and a window function:
select id, game, point
from (
select id, game, point,
row_number() over (partition by game order by point) as rn
from games
) t
where rn = 1;
Ken Clark's answer didn't work in my case. It might not work in yours either. If not, try this:
SELECT *
from table T
INNER JOIN
(
select id, MIN(point) MinPoint
from table T
group by AccountId
) NewT on T.id = NewT.id and T.point = NewT.MinPoint
ORDER BY game desc
SELECT DISTINCT
FIRST_VALUE(ID) OVER (Partition by Game ORDER BY Point) AS ID,
Game,
FIRST_VALUE(Point) OVER (Partition by Game ORDER BY Point) AS Point
FROM #T
SELECT * from room
INNER JOIN
(
select DISTINCT hotelNo, MIN(price) MinPrice
from room
Group by hotelNo
) NewT
on room.hotelNo = NewT.hotelNo and room.price = NewT.MinPrice;
This alternative approach uses SQL Server's OUTER APPLY clause. This way, it
creates the distinct list of games, and
fetches and outputs the record with the lowest point number for that game.
The OUTER APPLY clause can be imagined as a LEFT JOIN, but with the advantage that you can use values of the main query as parameters in the subquery (here: game).
SELECT colMinPointID
FROM (
SELECT game
FROM table
GROUP BY game
) As rstOuter
OUTER APPLY (
SELECT TOP 1 id As colMinPointID
FROM table As rstInner
WHERE rstInner.game = rstOuter.game
ORDER BY points
) AS rstMinPoints
This is portable - at least between ORACLE and PostgreSQL:
select t.* from table t
where not exists(select 1 from table ti where ti.attr > t.attr);
Most of the answers use an inner query. I am wondering why the following isn't suggested.
select
*
from
table
order by
point
fetch next 1 row only // ... or the appropriate syntax for the particular DB
This query is very simple to write with JPAQueryFactory (a Java Query DSL class).
return new JPAQueryFactory(manager).
selectFrom(QTable.table).
setLockMode(LockModeType.OPTIMISTIC).
orderBy(QTable.table.point.asc()).
fetchFirst();
Try:
select id, game, min(point) from t
group by id

Compare SQL groups against eachother

How can one filter a grouped resultset for only those groups that meet some criterion compared against the other groups? For example, only those groups that have the maximum number of constituent records?
I had thought that a subquery as follows should do the trick:
SELECT * FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t HAVING Records = MAX(Records);
However the addition of the final HAVING clause results in an empty recordset... what's going on?
In MySQL (Which I assume you are using since you have posted SELECT *, COUNT(*) FROM T GROUP BY X Which would fail in all RDBMS that I know of). You can use:
SELECT T.*
FROM T
INNER JOIN
( SELECT X, COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
) T2
ON T2.X = T.X
This has been tested in MySQL and removes the implicit grouping/aggregation.
If you can use windowed functions and one of TOP/LIMIT with Ties or Common Table expressions it becomes even shorter:
Windowed function + CTE: (MS SQL-Server & PostgreSQL Tested)
WITH CTE AS
( SELECT *, COUNT(*) OVER(PARTITION BY X) AS Records
FROM T
)
SELECT *
FROM CTE
WHERE Records = (SELECT MAX(Records) FROM CTE)
Windowed Function with TOP (MS SQL-Server Tested)
SELECT TOP 1 WITH TIES *
FROM ( SELECT *, COUNT(*) OVER(PARTITION BY X) [Records]
FROM T
)
ORDER BY Records DESC
Lastly, I have never used oracle so apolgies for not adding a solution that works on oracle...
EDIT
My Solution for MySQL did not take into account ties, and my suggestion for a solution to this kind of steps on the toes of what you have said you want to avoid (duplicate subqueries) so I am not sure I can help after all, however just in case it is preferable here is a version that will work as required on your fiddle:
SELECT T.*
FROM T
INNER JOIN
( SELECT X
FROM T
GROUP BY X
HAVING COUNT(*) =
( SELECT COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
)
) T2
ON T2.X = T.X
For the exact question you give, one way to look at it is that you want the group of records where there is no other group that has more records. So if you say
SELECT taxid, COUNT(*) as howMany
GROUP by taxid
You get all counties and their counts
Then you can treat that expressions as a table by making it a subquery, and give it an alias. Below I assign two "copies" of the query the names X and Y and ask for taxids that don't have any more in one table. If there are two with the same number I'd get two or more. Different databases have proprietary syntax, notably TOP and LIMIT, that make this kind of query simpler, easier to understand.
SELECT taxid FROM
(select taxid, count(*) as HowMany from flats
GROUP by taxid) as X
WHERE NOT EXISTS
(
SELECT * from
(
SELECT taxid, count(*) as HowMany FROM
flats
GROUP by taxid
) AS Y
WHERE Y.howmany > X.howmany
)
Try this:
SELECT * FROM (
SELECT *, MAX(Records) as max_records FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t
) WHERE Records = max_records
I'm sorry that I can't test the validity of this query right now.

Group by every N records in T-SQL

I have some performance test results on the database, and what I want to do is to group every 1000 records (previously sorted in ascending order by date) and then aggregate results with AVG.
I'm actually looking for a standard SQL solution, however any T-SQL specific results are also appreciated.
The query looks like this:
SELECT TestId,Throughput FROM dbo.Results ORDER BY id
WITH T AS (
SELECT RANK() OVER (ORDER BY ID) Rank,
P.Field1, P.Field2, P.Value1, ...
FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;
Something like that should get you started. If you can provide your actual schema I can update as appropriate.
Give the answer to Yuck. I only post as an answer so I could include a code block. I did a count test to see if it was grouping by 1000 and the first set was 999. This produced set sizes of 1,000. Great query Yuck.
WITH T AS (
SELECT RANK() OVER (ORDER BY sID) Rank, sID
FROM docSVsys
)
SELECT (Rank-1) / 1000 GroupID, count(sID)
FROM T
GROUP BY ((Rank-1) / 1000)
order by GroupID
I +1'd #Yuck, because I think that is a good answer. But it's worth mentioning NTILE().
Reason being, if you have 10,010 records (for example), then you'll have 11 groupings -- the first 10 with 1000 in them, and the last with just 10.
If you're comparing averages between each group of 1000, then you should either discard the last group as it's not a representative group, or...you could make all the groups the same size.
NTILE() would make all groups the same size; the only caveat is that you'd need to know how many groups you wanted.
So if your table had 25,250 records, you'd use NTILE(25), and your groupings would be approximately 1000 in size -- they'd actually be 1010 in size; the benefit being, they'd all be the same size, which might make them more relevant to each other in terms of whatever comparison analysis you're doing.
You could get your group-size simply by
DECLARE #ntile int
SET #ntile = (SELECT count(1) from myTable) / 1000
And then modifying #Yuck's approach with the NTILE() substitution:
;WITH myCTE AS (
SELECT NTILE(#ntile) OVER (ORDER BY id) myGroup,
col1, col2, ...
FROM dbo.myTable
)
SELECT myGroup, col1, col2...
FROM myCTE
GROUP BY (myGroup), col1, col2...
;
Answer above does not actually assign a unique group id to each 1000 records. Adding Floor() is needed. The following will return all records from your table, with a unique GroupID for each 1000 rows:
WITH T AS (
SELECT RANK() OVER (ORDER BY your_field) Rank,
your_field
FROM your_table
WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
FROM T
And for my needs, I wanted my GroupID to be a random set of characters, so I changed the Floor(...) GroupID to:
TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 10) AS STRING),'seed1'))) GroupID
without the seed value, you and I would get the exact same output because we're just doing a SHA256 on the number 1, 2, 3 etc. But adding the seed makes the output unique, but still repeatable.
This is BigQuery syntax. T-SQL might be slightly different.
Lastly, if you want to leave off the last chunk that is not a full 1000, you can find it by doing:
WITH T AS (
SELECT RANK() OVER (ORDER BY your_field) Rank,
your_field
FROM your_table
WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
, COUNT(*) OVER(PARTITION BY TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 1000) AS STRING),'seed1')))) AS CountInGroup
FROM T
ORDER BY CountInGroup
You can also use Row_Number() instead of rank. No Floor required.
declare #groupsize int = 50
;with ct1 as ( select YourColumn, RowID = Row_Number() over(order by YourColumn)
from YourTable
)
select YourColumn, RowID, GroupID = (RowID-1)/#GroupSize + 1
from ct1
I read more about NTILE after reading #user15481328 answer
(resource: https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-ntile-function/ )
and this solution allowed me to find the max date within each of the 25 groups of my data set:
with cte as (
select date,
NTILE(25) OVER ( order by date ) bucket_num
from mybigdataset
)
select max(date), bucket_num
from cte
group by bucket_num
order by bucket_num