Join to replace sub-query - sql

I am almost a novie in database queries.
However,I do understand why and how correlated subqueries are expensive and best avoided.
Given the following simple example - could someone help replacing with a join to help understand how it scores better:
SQL> select
2 book_key,
3 store_key,
4 quantity
5 from
6 sales s
7 where
8 quantity < (select max(quantity)
9 from sales
10 where book_key = s.book_key);
Apart from join,what other option do we have to avoid the subquery.

In this case, it ought to be better to use a windowed-function on a single access to the table - like so:
with s as
(select book_key,
store_key,
quantity,
max(quantity) over (partition by book_key) mq
from sales)
select book_key, store_key, quantity
from s
where quantity < s.mq

Using Common Table Expressions (CTE) will allow you to execute a single primary SELECT statement and store the result in a temporary result set. The data can then be self-referenced and accessed multiple times without requiring the initial SELECT statement to be executed again and won't require possibly expensive JOINs. This solution also uses ROW_NUMBER() and the OVER clause to number the matching BOOK_KEYs in descending order based off of the quantity. You will then only include the records that have a quantity that is less than the max quantity for each BOOK_KEY.
with CTE as
(
select
book_key,
store_key,
quantity,
row_number() over(partition by book_key order by quantity desc) rn
from sales
)
select
book_key,
store_key,
quantity
from CTE where rn > 1;
Working Demo: http://sqlfiddle.com/#!3/f0051/1

Apart from join,what other option do we have to avoid the subquery.
You use something like this:
SELECT select max(quantity)
INTO #myvar
from sales
where book_key = s.book_key
select book_key,store_key,quantity
from sales s
where quantity < #myvar

Related

SQL query can't use variable in FROM statement

I'm new to SQL, so sorry for maybe stupid question.
Table will be from this SQL sandbox:
https://www.w3schools.com/sql/trysql.asp?filename=trysql_asc
There is table of format
OrderDetailID OrderID ProductID Quantity
1 10248 11 12
2 10248 42 10
3 10248 72 5
4 10249 14 9
5 10249 51 40
I want to get products with maximum average quantity.
I can get this using the following query:
SELECT avg.ProductID, avg.Quantity
FROM (
SELECT ProductID, AVG(Quantity) Quantity
FROM OrderDetails
GROUP BY ProductID
) avg
WHERE avg.Quantity = (
SELECT MAX(Quantity) FROM (
SELECT ProductID, AVG(Quantity) Quantity
FROM OrderDetails
GROUP BY ProductID
)
)
ProductID Quantity
8 70
48 70
Here I twice use block
SELECT ProductID, AVG(Quantity) Quantity
FROM OrderDetails
GROUP BY ProductID
because if I use query with avg instead of second block
SELECT avg.ProductID, avg.Quantity
FROM (
SELECT ProductID, AVG(Quantity) Quantity
FROM OrderDetails
GROUP BY ProductID
) avg
WHERE avg.Quantity = (SELECT MAX(Quantity) FROM avg)
I get error could not prepare statement (1 no such table: avg)
So my question is:
Is it a kind of syntaxis mistake and could be simply corrected, or for some reason I can't use variables like that?
Is there simplier way to make the query I need?
Consider Common Table Expressions (CTE) using WITH clause which allows you to avoid repeating and re-calculating the aggregate subquery. Most RDBMS's supports CTEs (fully valid in your SQL TryIt linked page).
WITH avg AS (
SELECT ProductID, AVG(Quantity) Quantity
FROM OrderDetails
GROUP BY ProductID
)
SELECT avg.ProductID, avg.Quantity
FROM avg
WHERE avg.Quantity = (
SELECT MAX(Quantity) FROM avg
)
This is not really a syntax thing, this is rather scope: you try to
reference an alias where it is not in a parent-child relationship. Only this way they can reference each other. (The identifier there is an alias not a variable - that's a different thing.)
A simpler way is to create a temporary set before you run the filter condition - as in a previous answer, with a CTE, or you can try with a temp table. These can be used anywhere because their scope is not within a subquery.

Is it possible to create and use window function in the same query?

I'm using PostgreSQL and I have the following situation:
table of Sales (short version):
itemid quantity
5 10
5 12
6 1
table of stock (short version):
itemid stock
5 30
6 1
I have a complex query that also needs to present in one of it's columns the SUM of each itemid.
So it's going to be:
Select other things,itemid,stock, SUM (quantity) OVER (PARTITION BY itemid) AS total_sales
from .....
sales
stock
This query is OK. however this query will present:
itemid stock total_sales
5 30 22
6 1 1
But I don't need to see itemid=6 because the whole stock was sold. meaning that I need a WHERE condition like:
WHERE total_sales<stock
but I can't do that as the total_sales is created after the WHERE is done.
Is there a way to solve this without surrounding the whole query with another one? I'm trying to avoid it if I can.
You can use a subquery or CTE:
select s.*
from (Select other things,itemid,stock,
SUM(quantity) OVER (PARTITION BY itemid) AS total_sales
from .....
) s
where total_sales < stock;
You cannot use table aliases defined in a SELECT in the SELECT, WHERE, or FROM clauses for that SELECT. However, a subquery or CTE gets around this restriction.
You can also use an inner select in your WHERE statement like this:
SELECT *, SUM (quantity) OVER (PARTITION BY itemid) AS total_sales
FROM t
WHERE quantity <> (SELECT SUM(quantity) FROM t ti WHERE t.itemid = ti.itemid);

Is it possible to calculate the sum of each group in a table without using group by clause

I am trying to find out if there is any way to aggregate a sales for each product. I realise I can achieve it either by using group-by clause or by writing a procedure.
example:
Table name: Details
Sales Product
10 a
20 a
4 b
12 b
3 b
5 c
Is there a way possible to perform the following query with out using group by query
select
product,
sum(sales)
from
Details
group by
product
having
sum(sales) > 20
I realize it is possible using Procedure, could it be done in any other way?
You could do
SELECT product,
(SELECT SUM(sales) FROM details x where x.product = a.product) sales
from Details a;
(and wrap it into another select to simulate the HAVING).
It's possible to use analytic functions to do the sum calculation, and then wrap that with another query to do your filtering.
See and play with the example here.
select
running_sum,
OwnerUserId
from (
select
id,
score,
OwnerUserId,
sum(score) over (partition by OwnerUserId order by Id) running_sum,
last_value(id) over (partition by OwnerUserId order by OwnerUserId) last_id
from
Posts
where
OwnerUserId in (2934433, 10583)
) inner_q
where inner_q.id = inner_q.last_id
--and running_sum > 20;
We keep a running sum going on the partition of the owner (product), and we tally up the last id for the same window, which is the ID we'll use to get the total sum. Wrap it all up with another query to make sure you get the "last id", take the sum, and then do any filtering you want on the result.
This is an extremely round-about way to avoid using GROUP BY though.
If you don't want nested select statements (run slower), use CASE:
select
sum(case
when c.qty > 20
then c.qty
else 0
end) as mySum
from Sales.CustOrders c

SQL. Is there any efficient way to find second lowest value?

I have the following table:
ItemID Price
1 10
2 20
3 12
4 10
5 11
I need to find the second lowest price. So far, I have a query that works, but i am not sure it is the most efficient query:
select min(price)
from table
where itemid not in
(select itemid
from table
where price=
(select min(price)
from table));
What if I have to find third OR fourth minimum price? I am not even mentioning other attributes and conditions... Is there any more efficient way to do this?
PS: note that minimum is not a unique value. For example, items 1 and 4 are both minimums. Simple ordering won't do.
SELECT MIN( price )
FROM table
WHERE price > ( SELECT MIN( price )
FROM table )
select price from table where price in (
select
distinct price
from
(select t.price,rownumber() over () as rownum from table t) as x
where x.rownum = 2 --or 3, 4, 5, etc
)
Not sure if this would be the fastest, but it would make it easier to select the second, third, etc... Just change the TOP value.
UPDATED
SELECT MIN(price)
FROM table
WHERE price NOT IN (SELECT DISTINCT TOP 1 price FROM table ORDER BY price)
To find out second minimum salary of an employee, you can use following:
select min(salary)
from table
where salary > (select min(salary) from table);
This is a good answer:
SELECT MIN( price )
FROM table
WHERE price > ( SELECT MIN( price )
FROM table )
Make sure when you do this that there is only 1 row in the subquery! (the part in brackets at the end).
For example if you want to use GROUP BY you will have to define even further using:
SELECT MIN( price )
FROM table te1
WHERE price > ( SELECT MIN( price )
FROM table te2 WHERE te1.brand = te2.brand)
GROUP BY brand
Because GROUP BY will give you multiple rows, otherwise you will get the error:
SQL Error [21000]: ERROR: more than one row returned by a subquery used as an expression
I guess a simplest way to do is using offset-fetch filter from standard sql, distinct is not necessary if you don't have repeat values in your column.
select distinct(price) from table
order by price
offset 1 row fetch first 1 row only;
no need to write complex subqueries....
In amazon redshift use limit-fetch instead for ex...
Select distinct(price) from table
order by price
limit 1
offset 1;
You can either use one of the following:-
select min(your_field) from your_table where your_field NOT IN (select distinct TOP 1 your_field from your_table ORDER BY your_field DESC)
OR
select top 1 ColumnName from TableName where ColumnName not in (select top 1 ColumnName from TableName order by ColumnName asc)
I think you can find the second minimum using LIMIT and ORDER BY
select max(price) as minimum from (select distinct(price) from tableName order by price asc limit 2 ) --or 3, 4, 5, etc
if you want to find third or fourth minimum and so on... you can find out by changing minimum number in limit. you can find using this statement.
You can use RANK functions,
it may seem complex query but similar results like other answers can be achieved with the same,
WITH Temp_table AS (SELECT ITEM_ID,PRICE,RANK() OVER (ORDER BY PRICE) AS
Rnk
FROM YOUR_TABLE_NAME)
SELECT ITEM_ID FROM Temp_table
WHERE Rnk=2;
Maybe u can check the min value first and then place a not or greater than the operator. This will eliminate the usage of a subquery but will require a two-step process
select min(price)
from table
where min(price) <> -- "the min price you previously got"

See whether an item appears more than once in a database column

I want to check if a piece of data appears more than once in a particular column in my table using SQL. Here is my SQL code of what I have so far:
select * from AXDelNotesNoTracking where count(salesid) > 1
salesid is the column I wish to check for, any help would be appreciated, thanks.
It should be:
SELECT SalesID, COUNT(*)
FROM AXDelNotesNoTracking
GROUP BY SalesID
HAVING COUNT(*) > 1
Regarding your initial query:
You cannot do a SELECT * since this operation requires a GROUP BY
and columns need to either be in the GROUP BY or in an aggregate
function (i.e. COUNT, SUM, MIN, MAX, AVG, etc.)
As this is a GROUP BY operation, a HAVING clause will filter it
instead of a WHERE
Edit:
And I just thought of this, if you want to see WHICH items are in there more than once (but this depends on which database you are using):
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY SalesID ORDER BY SalesID) AS [Num]
FROM AXDelNotesNoTracking
)
SELECT *
FROM cte
WHERE cte.Num > 1
Of course, this just shows the rows that have appeared with the same SalesID but does not show the initial SalesID value that has appeared more than once. Meaning, if a SalesID shows up 3 times, this query will show instances 2 and 3 but not the first instance. Still, it might help depending on why you are looking for multiple SalesID values.
Edit2:
The following query was posted by APC below and is better than the CTE I mention above in that it shows all rows in which a SalesID has appeared more than once. I am including it here for completeness. I merely added an ORDER BY to keep the SalesID values grouped together. The ORDER BY might also help in the CTE above.
SELECT *
FROM AXDelNotesNoTracking
WHERE SalesID IN
( SELECT SalesID
FROM AXDelNotesNoTracking
GROUP BY SalesID
HAVING COUNT(*) > 1
)
ORDER BY SalesID
How about:
select salesid from AXDelNotesNoTracking group by salesid having count(*) > 1;
To expand on Solomon Rutzky's answer, if you are looking for a piece of data that shows up in a range (i.e. more than once but less than 5x), you can use
having count(*) > 1 and count(*) < 5
And you can use whatever qualifiers you desire in there - they don't have to match, it's all just included in the 'having' statement.
https://webcheatsheet.com/sql/interactive_sql_tutorial/sql_having.php
try this:
select salesid,count (salesid) from AXDelNotesNoTracking group by salesid having count (salesid) >1