Selecting 1 column's value in a group after grouping by another column - sql

How would I include the name of any one of the books that belong to that particular type in the below query?
select distinct
(select sum(ob.Balance)),
ob.BookType
from orders.OrderBooks ob
group by ob.BookType
In its current state it does what I need it to and groups books by BookType and sums their balances, as seen below.
However I need the name of any book that belongs to that BookType as part of the result.
If I select the BookName column and then group by it like below, it results in more unique entries and to an extent undoes the original grouping.
select distinct
(select sum(ob.Balance)),
ob.BookType,
ob.BookName
from orders.OrderBooks ob
group by ob.BookType, ob.BookName

;WITH x AS
(
SELECT
Balance = SUM(Balance) OVER (PARTITION BY BookType),
BookType,
BookName,
rn = ROW_NUMBER() OVER (PARTITION BY BookType ORDER BY BookName DESC)
FROM orders.OrderBooks
)
SELECT Balance, BookType, BookName
FROM x
WHERE rn = 1;
db<>fiddle
ORDER BY BookName DESC was dealer's choice. If you truly don't care which title shows up in the result, you can use any ordering you like. If you want the results to be random every time, you can use ORDER BY NEWID().
In general I like this flexibility better than the TOP (1) subquery approach, in addition to a single scan instead of an additional table access per row. But you can also do it a different way; just take min/max of the bookname, too:
SELECT Balance = SUM(Balance),
BookType,
BookName = MIN(BookName) -- or MAX()
FROM dbo.OrderBooks
GROUP BY BookType;
You can see these give similar results in this db<>fiddle. Plan is simpler, too; most notably: no spools. However when you use an aggregate function against that column, it makes it harder to provide arbitrary/random results, and if you intend to add other columns pulled from the right row, you'll need to go back to the row_number solution.

You can use a correlated subquery to get a single book name of that type. This assumes there's an ID field and you want to pull the most recent one:
select
Balance = (select sum(ob.Balance)),
ob.BookType,
BookName = (SELECT TOP(1) ob.BookName FROM orders.OrderBooks ob2 WHERE ob2.BookType = ob.BookType ORDER BY ob2.ID DESC)
from orders.OrderBooks ob
group by ob.BookType, ob.BookName

Related

filtering out duplicate rows using max

I have a table that, for the most part, is individual users. Occasionally there is a joint user. For a joint user, all the fields in the table will be exactly the same as the primary user except for a b-score field. I want to only display one row of data per account, and use the highest b-score to decide which row to use when it is a joint account (so the highest score is displayed only)
I thought it would be a simple
SELECT DISTINCT accountNo, MAX(bscore) FROM table, GROUP BY accountNo
but I'm still getting multiple rows for joints
You seem to want the ANSI-standard row_number() function:
select t.*
from (select t.*, row_number() over (partition by accountNo order by bscore desc) as seqnum
from t
) t
where seqnum = 1;
This worked for me, maybe not the most efficient. Correlated sub-query. The key part is accountNo = a.accountNo.
SELECT DISTINCT a.accountNo, (SELECT MAX(bscore) FROM table WHERE accountNo =
a.accountNo) bscore
FROM table a
GROUP BY a.accountNo

Rank Over Partition By in Oracle SQL (Oracle 11g)

I have 4 columns in a table
Company Part Number
Manufacturer Part Number
Order Number
Part Receipt Date
Ex.
I just want to return one record based on the maximum Part Receipt Date which would be the first row in the table (The one with Part Receipt date 03/31/2015).
I tried
RANK() OVER (PARTITION BY Company Part Number,Manufacturer Part Number
ORDER BY Part Receipt Date DESC,Order Number DESC) = 1
at the end of the WHERE statement and this did not work.
This would seem to do what you want:
select t.*
from (select t.*
from t
order by partreceiptdate desc
) t
where rownum = 1;
Analytic functions like rank() are available in the SELECT clause, they can't be invoked directly in a WHERE clause. To use rank() the way you want it, you must declare it in a subquery and then use it in the WHERE clause in the outer query. Something like this:
select company_part_number, manufacturer_part_number, order_number, part_receipt_date
from ( select t.*, rank() over (partition by... order by...) as rnk
from your_table t
)
where rnk = 1
Note also that you can't have a column name like company part number (with spaces in it) - at least not unless they are enclosed in double-quotes, which is a very poor practice, best avoided.

Ambiguous column name using row_number() without alias

I'm trying to implement pagination in a query that is built using information from a view, and I need to use the row_number() function over a column when I don't know which table it is from.
SELECT * FROM (
SELECT class.ID as ID, user.ID as USERID, row_number() over (ORDER BY
ID desc) as row_number FROM class, user
) out_q WHERE row_number > #startrow ORDER BY row_number
The problem is that I only have the result column name (ID or USERID) that came from a previous query. If I execute this query, it will raise the error 'Ambiguous column name "ID"'. Is there a way to specify that I'm referencing the column ID that is being selected and not from a different table?
Is it possible to specify an alias to the query result itself?
I have already tried the following,
SELECT TOP 30 * FROM (
SELECT *, row_number() over (ORDER BY ID desc) as row_number FROM(
SELECT class.ID as ID, user.ID as USERID FROM class, user
) in_q
) out_q WHERE row_number > #startrow ORDER BY row_number
It works, but the SGBD gets confused on which query plan it has to use, because of the small row goal present in the outer query and the big set of results returned by the inner query, when #startrow is a small number, the query executes in less than one second, when it is a big number the query takes minutes to execute.
Your problem is the id in the row_number itself. If you want a stable sort, then include both ids:
SELECT *
FROM (SELECT class.ID as ID, user.ID as USERID,
row_number() over (ORDER BY class.ID desc, user.id) as row_number
FROM class CROSS JOIN user
) out_q
WHERE row_number > #startrow
ORDER BY row_number;
I assume the cartesian product is intentional. Sometimes, this indicates an error in the query. In general, I would advise you to avoid using commas in the from clause. If you do want a cartesian product, then be explicit by using CROSS JOIN.
You could try using the option you already tried, then use the OPTIMIZE FOR hint.
OPTION ( OPTIMIZE FOR (#startrow = 100000) );
See a description of the hint in MSDN docs here: https://msdn.microsoft.com/en-us/library/ms181714.aspx.

Is it possible to get a function result with columns which are not in the group by (SQL)?

I am trying to get the last registration date of a course, but I want to know the id of thar record. As MAX is a function, I must use group by id, which I do not want, because the result is very different (From only one record to each record per id).
Which is the way to manage a query like this?:
SELECT id, MAX(registration_date) AS registration_date
FROM courses;
Because it gives an error and I must do this to avoid it:
SELECT id, MAX(registration_date) AS registration_date
FROM courses
GROUP BY id;
And I do not want the result of the last one.
You could use the rank() window function for that:
SELECT id
FROM (SELECT id, RANK() OVER (ORDER BY registration_date DESC) AS rk
FROM courses)
WHERE rk = 1
One method is to use a sub query like this:
select *
from [dbo].[Courses]
where registration_date =
(select max(registration_date)
from [dbo].[Courses])
but with only a date to match this may return more than one record.
If possible, include more fields in the where clause to narrow it down.

Fastest/most efficient way to perform this SQL Server 2008 query?

I have a table which contains:
-an ID for a financial instrument
-the price
-the date the price was recorded
-the actual time the price was recorded
-the source of the price
I want to get the index ID, the latest price, price source and the date of this latest price, for each instrument, where the source is either "L" or "R". I prefer source "L" to "R", but the latest price is more important (so if the latest price date only has a source of "R"- take this, but if for the latest date we have both, take "L").
This is the SQL I have:
SELECT tab1.IndexID, tab1.QuoteDate, tab2.Source, tab2.ActualTime FROM
(SELECT IndexID, Max(QuoteDate) as QuoteDate FROM PricesTable GROUP BY IndexID) tab1
JOIN
(SELECT IndexID, Min(Source) AS Source, Max(UpdatedTime) AS ActualTime, QuoteDate FROM PricesTable WHERE Source IN ('L','R') GROUP BY IndexID, QuoteDate) tab2
ON tab1.IndexID = tab2.IndexID AND tab1.QuoteDate = tab2.QuoteDate
However, I also want to extract the price field but cannot get this due to the GROUP BY clause. I cannot extract the price without including price in either the GROUP BY, or an aggregate function.
Instead, I have had to join the above SQL code to another piece of SQL which just gets the prices and index IDs and joins on the index ID.
Is there a faster way of performing this query?
EDIT: thanks for the replies so far. Would it be possible to have some advice on which are more efficient in terms of performance?
Thanks
Use ROW_NUMBER within a subquery or CTE to order the rows how you're interested in them, then just select the rows that come at the top of that ordering. (Use PARITION so that row numbers are reaassigned starting at 1 for each IndexId):
;WITH OrderedValues as (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY IndexID ORDER BY QuoteDate desc,Source asc) as rn
FROM
PricesTable
)
SELECT * from OrderedValues where rn=1
Try:
select * from
(select p.*,
row_number() over (partition by IndexID
order by QuoteDate desc, Source) rn
from PricesTable p
where Source IN ('L','R')
) sq
where rn = 1
(This syntax should work in relatively recent versions of Oracle, SQLServer or PostgreSQL, but won't work in MySQL.)