sql keep the first row according to an attribute

sql keep the first row according to an attribute - sql

I wrote an sql to group by the SEASON in ASC and CHAM in DESC. How can I select the largest value of CHAM in each SEASON? To further development, I need to keep the CHAMPION_ID.

Generally, you can do this with window functions, but your database, MariaDB v5.5, is too old to support window functions and a bunch of other things. I'd recommend either upgrading MariaDB or doing your work on dbfiddle or using the stand-alone SQLite.
You can do this without window functions using a subquery.
select *
from champs as c1
where cham = (
select max(cham)
from champs as c2
where c1.season = c2.season
)
Pick only the rows whose cham equals the highest cham for that season.
You might be tempted to use a group by, but this will not show duplicates. For example, spring is all ties.
-- Only shows one row per season.
select *
from champs
group by season
having cham = max(cham);
Try it.
With window functions...
Add the seasonal ranks. This is partitioned by season and ordering by cham. rank() will assign the same ranks to ties.
select
*,
rank() over (partition by season order by cham desc) as season_rank
from champs
Then use that as a common table expression and select only the rows with seasonal_rank = 1.
with ranked as (
select
*,
rank() over (partition by season order by cham desc) as season_rank
from champs
)
select *
from ranked
where season_rank = 1;
Try it.

Related

Select one random row by group (Oracle 10g)

This post is similar to this thread in that I have multiple observations per group. However, I want to randomly select only one of them. I am also working on Oracle 10g.
There are multiple rows per person_id in table df. I want to order each group of person_ids by dbms_random.value() and select the first observation from each group. To do so, I tried:
select
person_id, purchase_date
from
df
where
row_number() over (partition by person_id order by dbms_random.value()) = 1
The query returns:
ORA-30483: window functions are not allowed here
30483. 00000 - "window functions are not allowed here"
*Cause: Window functions are allowed only in the SELECT list of a query. And, window function cannot be an argument to another window or group function.

Use a subquery:
select person_id, purchase_date
from (select df.*,
row_number() over (partition by person_id order by dbms_random.value()) as seqnum
from df
) df
where seqnum = 1;

One option would be using WITH..AS Clause :
WITH t AS
(
SELECT df.*,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY dbms_random.value()) AS rn
FROM df
)
SELECT person_id, purchase_date
FROM t
WHERE rn = 1

Aggregate queries (using GROUP BY and aggregate functions) are much faster than equivalent analytic functions that do the same job. So, if you have a lot of data to process, or if the data is not excessively large but you must run this query often, you may want a more efficient query that uses aggregation instead of analytic functions.
Here is one possible approach:
select person_id,
max(purchase_date) keep (dense_rank first order by dbms_random.value())
as random_purchase_date
from df
group by person_id
;

Find the second largest value with Groupings

In SQL Server, I am attempting to pull the second latest NOTE_ENTRY_DT_TIME (items highlighted in screenshot). With the query written below it still pulls the latest date (I believe it's because of the grouping but the grouping is required to join later). What is the best method to achieve this?
SELECT
hop.ACCOUNT_ID,
MAX(hop.NOTE_ENTRY_DT_TIME) AS latest_noteid
FROM
NOTES hop
WHERE
hop.GEN_YN IS NULL
AND hop.NOTE_ENTRY_DT_TIME < (SELECT MAX(hope.NOTE_ENTRY_DT_TIME)
FROM NOTES hope
WHERE hop.GEN_YN IS NULL)
GROUP BY
hop.ACCOUNT_ID
Data sample in the table:

One of the "easier" ways to get the Nth row in a group is to use a CTE and ROW_NUMBER:
WITH CTE AS(
SELECT Account_ID,
Note_Entry_Dt_Time,
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY Note_Entry_Dt_Time DESC) AS RN
FROM dbo.YourTable)
SELECT Account_ID,
Note_Entry_Dt_Time
FROM CTE
WHERE RN = 2;
Of course, if an ACCOUNT_ID only has 1 row, then it will not be returned in the result set.
The OP's statement "The row will not always be 2." from the comments conflicts with their statement "I am attempting to pull the second latest NOTE_ENTRY_DT_TIME" in the question. At a best guess, this means that the OP has rows with the same date, that could be the "latest" date. If so, then would simply need to replace ROW_NUMBER with DENSE_RANK. Their sampple data, however, doesn't suggest this is the case.

You can use window functions:
select *
from (
select
n.*,
row_number() over(partition by account_id order by note_entry_dt_time desc) rn
from notes n
) t
where rn = 2

Window Function- Dense_Rank and Row_Number difference

If I use a dense_rank window function below that works in giving me my output which is the transaction refunded at dates in ascending order and assigns it 1 as rank:
select p.billing_cycle_in_months, avg(t.days)
from (
select *,
datediff(day,transaction_settled_at, transaction_refunded_at) as days,
dense_rank() over (partition by signup_id order by transaction_settled_at asc) as rank
from transactions
) t
join signups s on s.signup_id = t.signup_id
join plans p on p.id = s.plan_id
where datediff(year,s.started_at, current_date) > 1 and t.rank = 1
group by p.billing_cycle_in_months
Would I essentially get same result as using row_number window function ranked over same date (transaction_settled_at asc) column?
Basically grouped by billing cycle I want to rank the earliest day as 1, just wanted to clairfy that in this case row_number would give me same result?
Thanks

In your query, the difference between using dense_rank() and row_number() is that the former allows top ties, while the latter does not.
So if two (or more) records have the same, earliest, transaction_settled_at for a given signup_id, then condition dense_rank() ... = 1 will keep them both, while row_number() will select an undefined record out of the two.
If there no risk of ties, both functions will in your context produce the same resulting dataset.
To reduce the possibility of ties, you can also add additional sorting criterias to the order by clause of the window function:
dense_rank() over (
partition by signup_id
order by transaction_settled_at, some_other_column desc, some_more_column
)

filtering out duplicate rows using max

I have a table that, for the most part, is individual users. Occasionally there is a joint user. For a joint user, all the fields in the table will be exactly the same as the primary user except for a b-score field. I want to only display one row of data per account, and use the highest b-score to decide which row to use when it is a joint account (so the highest score is displayed only)
I thought it would be a simple
SELECT DISTINCT accountNo, MAX(bscore) FROM table, GROUP BY accountNo
but I'm still getting multiple rows for joints

You seem to want the ANSI-standard row_number() function:
select t.*
from (select t.*, row_number() over (partition by accountNo order by bscore desc) as seqnum
from t
) t
where seqnum = 1;

This worked for me, maybe not the most efficient. Correlated sub-query. The key part is accountNo = a.accountNo.
SELECT DISTINCT a.accountNo, (SELECT MAX(bscore) FROM table WHERE accountNo =
a.accountNo) bscore
FROM table a
GROUP BY a.accountNo

can we get totalcount and last record from postgresql

i am having table having 23 records , I am trying to get total count of record and last record also in single query. something like that
select count(*) ,(m order by createdDate) from music m ;
is there any way to pull this out only last record as well as total count in PostgreSQL.

This can be done using window functions
select *
from (
select m.*,
row_number() over (order by createddate desc) as rn,
count(*) over () as total_count
from music
) t
where rn = 1;
Another option would be to use a scalar sub-query and combine it with a limit clause:
select *,
(select count(*) from order_test.orders) as total_count
from music
order by createddate desc
limit 1;
Depending on the indexes, your memory configuration and the table definition might be faster then the two window functions.

No, it's not not possible to do what is being asked, sql does not function that way, the second you ask for a count () sql changes the level of your data to an aggregation. The only way to do what you are asking is to do a count() and order by in a separate query.

Another solution using windowing functions and no subquery:
SELECT DISTINCT count(*) OVER w, last_value(m) OVER w
FROM music m
WINDOW w AS (ORDER BY date DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
The point here is that last_value applies on partitions defined by windows and not on groups defined by GROUP BY.
I did not perform any test but I suspect my solution to be the less effective amongst the three already posted. But it is also the closest to your example query so far.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql keep the first row according to an attribute - sql

I wrote an sql to group by the SEASON in ASC and CHAM in DESC. How can I select the largest value of CHAM in each SEASON? To further development, I need to keep the CHAMPION_ID.

Related

Select one random row by group (Oracle 10g)

Find the second largest value with Groupings

Window Function- Dense_Rank and Row_Number difference

filtering out duplicate rows using max

can we get totalcount and last record from postgresql

Categories

Resources