Is it possible to do so without using nested SELECTS? - sql

Suppose I have the following table:
--------------------------------------------
ReceiptNo | Date | EmployeeID | Qty
--------------------------------------------
1 | 12-DEC-2015 | 1 | 200
2 | 13-DEC-2015 | 1 | 500
3 | 13-DEC-2015 | 1 | 100
4 | 13-DEC-2015 | 3 | 100
5 | 13-DEC-2015 | 3 | 500
6 | 13-DEC-2015 | 2 | 75
--------------------------------------------
Show the tuples with maximum Qty.
Answer:
--------------------------------------------
2 | 13-DEC-2015 | 1 | 500
5 | 13-DEC-2015 | 3 | 500
--------------------------------------------
I need to use aggregate function MAX().
Is it possible to do so without using nested SELECTS?

Try this in sql server
SELECT TOP 1 WITH TIES *
FROM TABLE
ORDER BY QTY DESC

No.
You can't show the tuples with maximum Qty, using the max aggregate function while avoiding nested selects.
VR46 posted a nice way to do it without using nested selects, but also without the max aggregate. A similar approach can be used in Oracle 12c using the FETCH clause:
select *
from table
order by qty desc
fetch first row with ties
If you want to use the max aggregate, this is the way to do it:
select *
from table
where qty = (select max(qty) from table)
Another way to do it would be using the rank or dense_rank window functions, but they require a nested select, and do not use the max aggregate function:
select *
from (select t.*,
dense_rank() over (order by t.qty desc) as rnk
from table t) t
where t.rnk = 1

Not using max, but plain "cross-platform" ANSI SQL without nested queries:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2 ON t2.Qty > t1.Qty
WHERE t2.Qty IS NULL
Retrieves all records for which there is no record with a greater quantity in the same table.

Related

Django: Is there a way to apply an aggregate function on a window function?

I have already made a raw SQL of this query as a last resort.
I have a gaps-and-islands problem, where I get the respective groups with two ROW_NUMBER -s. Later on I use a COUNT and a MAX like so:
SELECT id, name, MAX(count)
FROM (
SELECT id, name, COUNT(*)
FROM (
SELECT players.id, players.name,
(ROW_NUMBER() OVER(ORDER BY match_details.id, goals.time) -
ROW_NUMBER() OVER(PARTITION BY match_details.id, players.id ORDER BY match_details.id, goals.time)) AS grp
FROM match_details
JOIN players
ON players.id = match_details.player_id
JOIN goals
ON goals.match_detail_id = match_details.id
ORDER BY match_details.id, goals.time
) AS x
GROUP BY grp, id, name
ORDER BY count DESC
) AS y
GROUP BY id, name
ORDER BY MAX(count) DESC, name
players example:
id | name
----+-------
1 | John
2 | Mark
match_details example:
id | player_id
----+------------
1 | 1
2 | 1
3 | 2
4 | 2
goals example:
id | match_detail_id | time
----+------------------+---------
1 | 1 | 2
2 | 1 | 10
3 | 2 | 2
4 | 3 | 1
5 | 3 | 5
6 | 4 | 6
output example:
id | name | max
----+--------+---------
1 | John | 2
2 | Mark | 2
So far, I have finished the innermost query with Django ORM, but when I try to annotate over group , it throws an error:
django.db.utils.ProgrammingError: aggregate function calls cannot contain window function calls
I haven't yet wrapped my head around using Subquery, but I'm also not sure if that would work at all. I do not need to filter over the window function, only use aggregates on it.
Is there a way to solve this with plain Django, or do I have to resort to hybrid raw-ORM queries, perhaps to django-cte ?

SQL select all rows per group after a condition is met

I would like to select all rows for each group after the last time a condition is met for that group. This related question has an answer using correlated subqueries.
In my case I will have millions of categories and hundreds of millions/billions of rows. Is there a way to achieve the same results using a more performant query?
Here is an example. The condition is all rows (per group) after the last 0 in the conditional column.
category | timestamp | condition
--------------------------------------
A | 1 | 0
A | 2 | 1
A | 3 | 0
A | 4 | 1
A | 5 | 1
B | 1 | 0
B | 2 | 1
B | 3 | 1
The result I would like to achieve is
category | timestamp | condition
--------------------------------------
A | 4 | 1
A | 5 | 1
B | 2 | 1
B | 3 | 1
If you want everything after the last 0, you can use window functions:
select t.*
from (select t.*,
max(case when condition = 0 then timestamp end) over (partition by category) as max_timestamp_0
from t
) t
where timestamp > max_timestamp_0 or
max_timestamp_0 is null;
With an index on (category, condition, timestamp), the correlated subquery version might also perform quite well:
select t.*
from t
where t.timestamp > all (select t2.timestamp
from t t2
where t2.category = t.category and
t2.condition = 0
);
You might want to try window functions:
select category, timestamp, condition
from (
select
t.*,
min(condition) over(partition by category order by timestamp desc) min_cond
from mytable t
) t
where min_cond = 1
The window min() with the order by clause computes the minimum value of condition over the current and following rows of the same category: we can use it as a filter to eliminate rows for which there is a more recent row with a 0.
Compared to the correlated subquery approach, the upside of using window functions is that it reduces the number of scans needed on the table. Of course this computing also has a cost, so you'll need to assess both solutions against your sample data.

Order by two columns regarding their relationship

I want to order the following things by their Ordernum (unique) regarding the RefID (which holds the same products together) with SQL on a Oracle Database:
It has to be first ordered by OrderNum followed by every product with the same RefID. The row with the lowest OrderNum should be first, then the products with the same RefID, after that the next higher OrderNum and so on...
OrderNum | RefID | ID
10 | 100 | 8
1 | 200 | 9
2 | 100 | 4
8 | 200 | 12
3 | 200 | 20
0 | 10 | 11
What I tried and what gives me just the result ordered by OrderNum, not regarding the same RefIDs:
SELECT * FROM products
ORDER BY OrderNum, RefID
Expected result
0 | 10 | 11
1 | 200 | 9
3 | 200 | 20
8 | 200 | 12
2 | 100 | 4
10 | 100 | 8
I think this has to be done with a subselect, right? But how does this look like?
I believe that Oracle supports CTE and window functions, so something like the following should work:
WITH Extras as (
SELECT
p.*,
MIN(OrderNum) OVER (PARTITION BY RefID) as LowNum,
ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY OrderNum) as rn
FROM
Products p
)
SELECT * from Extras ORDER BY LowNum,rn;
Common Table Expressions (CTEs) are similar to subqueries but I tend to prefer to use them, all other things being equal - there's no specific advantage in this query, but they can be reused multiple times, and they can easily build on previous ones without introducing lots of nesting.
You are ordering by OrderNum and inside each OrderNum by RefID, but what you want to do is just the opposite, i.e. to order by RefID first and inside each RefID by OrderNum:
SELECT * FROM products
ORDER BY RefID, OrderNum;
Use RefID and OrderNum to order by your result. Use ASC and DESC for proper ordering (default is ASC):
select * from products
order by RefID DESC,OrderNum ASC
Result:
OrderNum RefID ID
-----------------------
1 200 9
3 200 20
8 200 12
2 100 4
10 100 8
Sample result in SQL Fiddle
you want to Ordernum regarding the RefID ,i think it may gives result according you:-
select * from products
order by RefID,OrderNum

Select the most common item for each category

Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work

SQL - select distinct only on one column [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 9 years ago.
I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 2 | 3968 | Spain | Spanish |
| 3 | 3968 | USA | English |
| 4 | 1234 | Greece | Greek |
| 5 | 1234 | Italy | Italian |
I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 4 | 1234 | Greece | Greek |
How is this achievable?
A very typical approach to this type of problem is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by number order by id) as seqnum
from t
) t
where seqnum = 1;
This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.
Since you don't care, I chose the max ID for each number.
select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id
Query Explanation
select
tbl.* -- give me all the data from the base table (tbl)
from
tbl
inner join ( -- only return rows in tbl which match this subquery
select
max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
from
tbl
group by
NUMBER -- how to group rows for the MAX aggregation
) maxID
on maxID.maxID = tbl.id -- join condition ie only return rows in tbl
-- whose ID is also a MAX ID for a given NUMBER
You will use the following query:
SELECT * FROM [table] GROUP BY NUMBER;
Where [table] is the name of the table.
This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.