SQL queries with slight modification giving different results - sql

I am learning SQL and doing some exercise with analytics functions. I have following query to find out ship_name and order_value of the highest placed order. Following are my tables:
orders(id, ship_name, city_of_origination)
order_details(id, order_id, unit_price, quantity)
In order to solve this problem, I wrote following query:
select o.ship_name, od.quantity*od.unit_price, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc)
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
Here id the sample output after removing limit in above query:
Changing the problem statement slightly, I only want the ship_name. So I wrote this query:
select tmp.ship_name
from (select o.ship_name as ship_name, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc) fv
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
) tmp;
To my surprise, the result changed. Here is the result of above wuery without limit:
At the same time, if I execute following query:
select tmp.ship_name, tmp.fv
from (select o.ship_name as ship_name, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc) fv
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
) tmp;
I get the same result (and the expected one) as that of the first query. My question is: Why is there a difference of results in above queries?

limit without order by returns an arbitrary row. It might not even return the same row for the same query when executed subsequent times.
So, use order by to control which row is returned.

In Postgres, row order is returned based on the hidden column ctid order. Essentially, it's last-updated/last-inserted order--it just orders based on the order it finds it on-disk. Using LIMIT does not change that order, as it's still going to come out in the order that it's read out of the disk.
Using LIMIT 1 will only show you the first row it encounters off disk. To change the ordering behavior, you should use ORDER BY

Related

Why I get error missing right parenthesis ORA-00907

I am stuck in one problem and I have no idea where did I made mistake.
Since I check everything and every solution but I can not see what I made wrong.
SELECT
o.OrderID,
o.Order_date,
o.status,
o.OrderAcceptanceCommentsSaved,
o.OrderFileAttachment,
o.HasErrors,
o.ErrorsResolved,
(SELECT ou.Status FROM order_unload ou WHERE ou.OrderID = o.OrderID
AND rownum <= 1 ORDER BY ou.Id DESC) AS UnloadStatus
FROM
orders o
WHERE
ProjectID = 141
ORDER BY ou.Id DESC;
The problem here is second SELECT
(SELECT ou.Status FROM order_unload ou WHERE ou.OrderID = o.OrderID
AND rownum <= 1 ORDER BY ou.Id DESC) AS UnloadStatus)
However, when I want to execute only second SELECT I also get error
o.OrderID invalid identifier
Can someone guide me and tell me where I made mistake? What is wrong with this query?
You have several problems:
The ORDER BY clause is not allowed in a correlated sub-query so the SQL engine expects the query to end before the ORDER BY and there to be a closing brace at that point. Remove the ORDER BY clause in the inner select and that error would go away (and you would get a different error).
ROWNUM is applied before the ORDER BY is evaluated so, even if the query was syntactically valid, it would not do what you wanted as you would get a random row (the first the SQL engine happens to read) which would be given a ROWNUM of 1 and then the rest of the rows discarded and then that single (random) row would be ordered. You want to order first and then get the first row.
You are using ou.id to order the outer query but the ou alias is not visible in that outer select.
You can use:
SELECT o.OrderID,
o.Order_date,
o.status,
o.OrderAcceptanceCommentsSaved,
o.OrderFileAttachment,
o.HasErrors,
o.ErrorsResolved,
ou.status AS UnloadStatus
FROM orders o
LEFT OUTER JOIN (
SELECT status,
orderid,
id,
ROW_NUMBER() OVER ( PARTITION BY orderid ORDER BY id DESC ) AS rn
FROM order_unload
) ou
ON ( o.orderid = ou.OrderID AND ou.rn = 1 )
WHERE ProjectID = 141
ORDER BY ou.Id DESC;
db<>fiddle here

add column to select statement when having a certain condition

I have a SQL statement for data from order_details, a table which has many columns including product name, code, etc. How can I add a column to the select statement that whenever the order has a certain product (The product_code I need is called 'Pap') it writes a flag 'pap', so I can visually know which orders have this product?
I tried the code below:
select distinct order_id, customer_id,
(select distinct order_id from order_details
group by 1 having sum (case when product_code='pap'
then 1 else 0 end)=1
) as pap from orders
left join order_details
on order_details.order_id=orders.order_id
group by 1,2,3
The code I am trying is giving me an error "[Firebird]multiple rows in singleton select; HY000".
At a guess, you want to show 'pap' for orders that have one or more order_details with product_code 'pap', in that case you can use:
select order_id, customer_id,
(select max(order_details.product_code)
from order_details
where order_details.order_id = orders.order_id
and order_details.product_code = 'pap') as pap
from orders
Or a more generic solution (that doesn't rely on the product_code for the value to display):
select order_id, customer_id,
case
when exists(
select 1
from order_details
where order_details.order_id = orders.order_id
and order_details.product_code = 'pap')
then 'pap'
end as pap
from orders
Let's try to build your query step by step. From simple to more complex in the obsolete bottom-to-top fashion :-)
I suggest you to run every query to see the results and see how the data is getting refined step by step and to check early whether your assumption holds true.
1st unknown is order_details - can one order had several rows with the same product? Is it possible to have an order with 2+3 Paps or only one summary 5 Paps? Is (order_id,product_code) a unique constraint or primary key over that table, or not?
Select Count(1), order_id, product_code
From order_details
Group by 2,3
Order by 1 DESC
This can show if such a repetition exists, but even if not - you have to check the meta-data (scheme) to see if that is allowed by table constraints or indices.
The thing is, when you JOIN tables - their matching rows get multiplied (in set theory terms). So if you can have several rows about Paps in one order - then we have to make special dealing about it. Which would add extra load on the server, unless we find a way to make it for free.
We can easily check for one specific order to have that product.
select 'pap' from order_details where order_id = :parameter_id and product_code='pap'
We can then suppress repetitions - if they were not prohibited by constraints - in a standard way (but requiring extra sorting) or Firebird-specific (but free) way.
select DISTINCT 'pap' from order_details where order_id = :parameter_id and product_code='pap'
or
select FIRST(1) 'pap' from order_details where order_id = :parameter_id and product_code='pap'
However, these can suit Mark's answer with correlated sub-query:
select o.order_id, o.customer_id,
coalesce(
( select first(1) 'pap' /* the flag */ from order_details d
where o.order_id = d.order_id and d.product_code = 'pap' )
, '' /* just avoiding NULL */
) as pap
from orders o
Lifehack: notice how use of coalesce and first(1) here substitutes use of case and exists in original Mark's answer. This trick can be used in Firebird wherever you use singular (and potentially empty) 1-column query as an expression.
To avoid multiple sub-queries and switch to outer-join we need to make one query to have ALL the order IDs with Paps, but only once.
select distinct order_id from order_details where product_code='pap'
Should do the trick. But probably at the cost of extra sorting to suppress possible duplication (again, is it possible though?)
select order_id, count(order_id)
from order_details
where product_code='pap'
group by 1 order by 2 desc
Would show as the repetitions if they are already there. Just to explain what I mean. And to see if you can enforce SQL constraints upon the already existing data, if you did not have them and would choose to harden your database structure.
This way we just have to outer-join with it and use CASE (or some its shorthand form) do the typical trick of filtering outer-join's NULL rows.
select o.order_id, o.customer_id,
iif( d.order_id is null, '', 'pap') as pap
from orders o
left join (
select distinct order_id
from order_details
where product_code = 'pap'
and product_quantity > 0 ) d
on o.order_id=d.order_id
As someone said this looks ugly, there is one more 'modern' way to write exactly that query, maybe it would look better :-D
with d as (
select distinct order_id
from order_details
where product_code = 'pap'
and product_quantity > 0 )
select o.order_id, o.customer_id,
iif( d.order_id is null, '', 'pap') as pap
from orders o left join d on o.order_id=d.order_id
Where the 'pap' repetitions can not (notice, not DO not, but CAN not) occur within one single order_id then the query would get even simpler and faster:
select o.order_id, o.customer_id,
iif( d.order+id is null, '', 'pap') as pap
from orders o
left join order_details d
on o.order_id=d.order_id
and d.product_code='pap'
and d.product_quantity>0
Notice the crucial detail: d.product_code='pap' is set as an internal condition on (before) the join. Would you put it into outer WHERE clause after the join - it would not work!
Now, to compare those two approaches, JOIN vs correlated subqueries, you have to see query statistics, how many fetches and cached fetches both wout generate. Chances are - on medium-sized tables and with OS disk caches and Firebird caches warmed up you would not see the difference in time. But would you at least shutdown and restart Firebird service and better the whole computer - to clean the said caches - and then get those queries to the last rows (by issuing "fetch all" or "scroll to the last row" in your database IDE, or by wrapping my and Mark's queries into
select count(1) from ( /* measured query here */) you may start to see timing changing too.
SELECT
...
<foreign_table>.<your_desired_extra_column>
FROM
<current_table>
LEFT JOIN
<foreign_table> ON <foreign_table>.id = <current_table>.id
AND
<current_table>.<condition_field> = <condition_value>
Extra column will be NULL if the condition is not met.
select order_id, customer_id,
(select max(order_details.product_code)
from order_details
where order_details.order_id = orders.order_id
and order_details.product_code = 'pap') as pap
from orders

Can we use order by in subquery? If not why sometime could use top(n) order by?

I'm an entry level trying to learn more about SQL,
I have a question "can we use order by in subquery?" I did look for some article says no we could not use.
But on the other hand, I saw examples using top(n) with order by in subquery:
select c.CustomerId,
c.OrderId
from CustomerOrder c
inner join (
select top 2
with TIES CustomerId,
COUNT(distinct OrderId) as Count
from CustomerOrder
group by CustomerId
order by Count desc
) b on c.CustomerId = b.CustomerId
So now I'm bit confused.
Could anyone advise?
Thank you very much.
Yes, you are right we cannot use order by in a inner query. Because it is acting as a table. A table in itself needs to be sorted when queried for different purposes.
In your query itself the inner query is select some records using Top 2. Eventhough these are top 2 records only, they form a table with 2 records which is enough for it to recognized as a table and join it with another table
The right query will be:-
SELECT * FROM
(
SELECT c.CustomerId, c.OrderId, DENSE_RANK() OVER(ORDER BY b.count DESC) AS RANK
FROM CustomerOrder c
INNER JOIN
(SELECT CustomerId, COUNT(distinct OrderId) as Count
FROM CustomerOrder GROUP BY CustomerId) b
ON c.CustomerId = b.CustomerId
) a
WHERE RANK IN (1,2);
Hope I have answered your question.
Yes we can use order by clause in sub query, for example i have a table named as product (check the screen shot of table http://prntscr.com/f15j3z). Chek this query on your side and revert me in case of any doubt.
select p1.* from product as p1 where product_id = (select p2.product_id from product as p2 order by product_id limit 0,1)
yes we can use order by in subquery,but it is pointless to use it.
It is better to use it in the outer query.There is no use of ordering the result of subquery, because result of inner query will become the input for outer query and it does not have to do any thing with the order of the result of subquery.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

database paging design

I'm fetching data for my grid like this
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
I also need the total count for the pagination.
There're two options.
1- Do an another fetch
SELECT count(*) FROM dbo.Orders
2- Put the count statement in the query
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
(SELECT count(*) FROM dbo.Orders) as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
Which way should I go ?
Of the 2 methods you've put forward, the first (separate query) is better. The second method means the count will appear in every row returned which is a bit unnecessary. Also if the query returns 20 rows, the select count(*) will be executed 20 times (if i remember right, guess this could depend on which database engine you're using).
Additionally, depending on how much traffic you're envisaging and how big the table is likely to get, you can improve upon this by caching the result of select count(*) somewhere, and then refreshing it upon insertions / deletions to the table.
If this is for SQL Server 2005 or higher, one of the best ways to get pagination is to use a Common Table Expression.
CREATE PROC MyPaginatedDataProc
#pageNumber INT
AS
WITH OrdersCTE (CustomerID, OrderTime, ProductID, Quantity, RowNumber)
AS
(
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER (ORDER BY OrderItems.OrderID) AS RowNumber
FROM
dbo.Orders INNER JOIN dbo.OrderItems ON Orders.ID = OrderItems.OrderID
)
SELECT
CustomerID,
OrderTime,
ProductId,
Quantity
FROM
OrdersCTE
WHERE
RowNumber BETWEEN (#pageNumber * 10) AND (((#pageNumber + 1) * 10) -1)
Otherwise for getting the total row count, I'd use a separate query like Mailslut said.
If you are using oracle you can use COUNT(*) OVER ( ) CNT. This one was more efficient
as it takes single table scan
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
COUNT(*) OVER ( ) CNT as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
As #Mailslut suggests, you should probably use two queries. However, you should probably add a
WHERE clause to the query that fetches the data, so you only fetch the data that you actually need to show (unless you are caching it).
If more than one thread is accessing the database at a time, you will also need to somehow make sure that the count is kept in sync with the database.
I would consider something different, because what you are trying to do is not very simple, but quite necessary. Have you considered using the SQL Server row_number function? This way you will know how many records there are by looking at the max row_number returned, but also in the order you want.
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER(ORDER BY Orders.CustomerId) rn
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID