add column to select statement when having a certain condition - sql

I have a SQL statement for data from order_details, a table which has many columns including product name, code, etc. How can I add a column to the select statement that whenever the order has a certain product (The product_code I need is called 'Pap') it writes a flag 'pap', so I can visually know which orders have this product?
I tried the code below:
select distinct order_id, customer_id,
(select distinct order_id from order_details
group by 1 having sum (case when product_code='pap'
then 1 else 0 end)=1
) as pap from orders
left join order_details
on order_details.order_id=orders.order_id
group by 1,2,3
The code I am trying is giving me an error "[Firebird]multiple rows in singleton select; HY000".

At a guess, you want to show 'pap' for orders that have one or more order_details with product_code 'pap', in that case you can use:
select order_id, customer_id,
(select max(order_details.product_code)
from order_details
where order_details.order_id = orders.order_id
and order_details.product_code = 'pap') as pap
from orders
Or a more generic solution (that doesn't rely on the product_code for the value to display):
select order_id, customer_id,
case
when exists(
select 1
from order_details
where order_details.order_id = orders.order_id
and order_details.product_code = 'pap')
then 'pap'
end as pap
from orders

Let's try to build your query step by step. From simple to more complex in the obsolete bottom-to-top fashion :-)
I suggest you to run every query to see the results and see how the data is getting refined step by step and to check early whether your assumption holds true.
1st unknown is order_details - can one order had several rows with the same product? Is it possible to have an order with 2+3 Paps or only one summary 5 Paps? Is (order_id,product_code) a unique constraint or primary key over that table, or not?
Select Count(1), order_id, product_code
From order_details
Group by 2,3
Order by 1 DESC
This can show if such a repetition exists, but even if not - you have to check the meta-data (scheme) to see if that is allowed by table constraints or indices.
The thing is, when you JOIN tables - their matching rows get multiplied (in set theory terms). So if you can have several rows about Paps in one order - then we have to make special dealing about it. Which would add extra load on the server, unless we find a way to make it for free.
We can easily check for one specific order to have that product.
select 'pap' from order_details where order_id = :parameter_id and product_code='pap'
We can then suppress repetitions - if they were not prohibited by constraints - in a standard way (but requiring extra sorting) or Firebird-specific (but free) way.
select DISTINCT 'pap' from order_details where order_id = :parameter_id and product_code='pap'
or
select FIRST(1) 'pap' from order_details where order_id = :parameter_id and product_code='pap'
However, these can suit Mark's answer with correlated sub-query:
select o.order_id, o.customer_id,
coalesce(
( select first(1) 'pap' /* the flag */ from order_details d
where o.order_id = d.order_id and d.product_code = 'pap' )
, '' /* just avoiding NULL */
) as pap
from orders o
Lifehack: notice how use of coalesce and first(1) here substitutes use of case and exists in original Mark's answer. This trick can be used in Firebird wherever you use singular (and potentially empty) 1-column query as an expression.
To avoid multiple sub-queries and switch to outer-join we need to make one query to have ALL the order IDs with Paps, but only once.
select distinct order_id from order_details where product_code='pap'
Should do the trick. But probably at the cost of extra sorting to suppress possible duplication (again, is it possible though?)
select order_id, count(order_id)
from order_details
where product_code='pap'
group by 1 order by 2 desc
Would show as the repetitions if they are already there. Just to explain what I mean. And to see if you can enforce SQL constraints upon the already existing data, if you did not have them and would choose to harden your database structure.
This way we just have to outer-join with it and use CASE (or some its shorthand form) do the typical trick of filtering outer-join's NULL rows.
select o.order_id, o.customer_id,
iif( d.order_id is null, '', 'pap') as pap
from orders o
left join (
select distinct order_id
from order_details
where product_code = 'pap'
and product_quantity > 0 ) d
on o.order_id=d.order_id
As someone said this looks ugly, there is one more 'modern' way to write exactly that query, maybe it would look better :-D
with d as (
select distinct order_id
from order_details
where product_code = 'pap'
and product_quantity > 0 )
select o.order_id, o.customer_id,
iif( d.order_id is null, '', 'pap') as pap
from orders o left join d on o.order_id=d.order_id
Where the 'pap' repetitions can not (notice, not DO not, but CAN not) occur within one single order_id then the query would get even simpler and faster:
select o.order_id, o.customer_id,
iif( d.order+id is null, '', 'pap') as pap
from orders o
left join order_details d
on o.order_id=d.order_id
and d.product_code='pap'
and d.product_quantity>0
Notice the crucial detail: d.product_code='pap' is set as an internal condition on (before) the join. Would you put it into outer WHERE clause after the join - it would not work!
Now, to compare those two approaches, JOIN vs correlated subqueries, you have to see query statistics, how many fetches and cached fetches both wout generate. Chances are - on medium-sized tables and with OS disk caches and Firebird caches warmed up you would not see the difference in time. But would you at least shutdown and restart Firebird service and better the whole computer - to clean the said caches - and then get those queries to the last rows (by issuing "fetch all" or "scroll to the last row" in your database IDE, or by wrapping my and Mark's queries into
select count(1) from ( /* measured query here */) you may start to see timing changing too.

SELECT
...
<foreign_table>.<your_desired_extra_column>
FROM
<current_table>
LEFT JOIN
<foreign_table> ON <foreign_table>.id = <current_table>.id
AND
<current_table>.<condition_field> = <condition_value>
Extra column will be NULL if the condition is not met.

select order_id, customer_id,
(select max(order_details.product_code)
from order_details
where order_details.order_id = orders.order_id
and order_details.product_code = 'pap') as pap
from orders

Related

SQL queries with slight modification giving different results

I am learning SQL and doing some exercise with analytics functions. I have following query to find out ship_name and order_value of the highest placed order. Following are my tables:
orders(id, ship_name, city_of_origination)
order_details(id, order_id, unit_price, quantity)
In order to solve this problem, I wrote following query:
select o.ship_name, od.quantity*od.unit_price, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc)
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
Here id the sample output after removing limit in above query:
Changing the problem statement slightly, I only want the ship_name. So I wrote this query:
select tmp.ship_name
from (select o.ship_name as ship_name, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc) fv
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
) tmp;
To my surprise, the result changed. Here is the result of above wuery without limit:
At the same time, if I execute following query:
select tmp.ship_name, tmp.fv
from (select o.ship_name as ship_name, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc) fv
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
) tmp;
I get the same result (and the expected one) as that of the first query. My question is: Why is there a difference of results in above queries?
limit without order by returns an arbitrary row. It might not even return the same row for the same query when executed subsequent times.
So, use order by to control which row is returned.
In Postgres, row order is returned based on the hidden column ctid order. Essentially, it's last-updated/last-inserted order--it just orders based on the order it finds it on-disk. Using LIMIT does not change that order, as it's still going to come out in the order that it's read out of the disk.
Using LIMIT 1 will only show you the first row it encounters off disk. To change the ordering behavior, you should use ORDER BY

SQL how to count the number of relations between two tables and include zeroes?

I have a table of orders, and a table of products contained in these orders. (The products-table has order_id, a foreign key referring to orders.id).
I would like to query the number of products contained in each order. However, I also want orders to be contained in the results if they do not contain any products at all.
This means that a simple
SELECT *, COUNT(*) n_products FROM `orders` INNER JOIN `products` on `products.order_id` = `orders.id` GROUP_BY `order_id`
does not work, since orders without any products disappear.
Using a LEFT OUTER JOIN instead would add rows without product-information, but the distinction between an order with 1 product and an order with 0 products is lost.
What am I missing here?
You need a left join here, and you should be counting some column from the products table:
SELECT
o.*,
COUNT(p.order_id) AS n_products
FROM orders o
LEFT JOIN products p
ON p.order_id = o.id
GROUP BY
o.id;
Note that I assume that Postgres would allow grouping by orders.id and then selecting all columns from that table. If not, then you would only be able to select o.id in addition to the count.

Improve efficiency of PostgreSQL Query - One to Many, Count is 1

I would like to improve the efficiency of the following query, if possible:
SELECT * FROM orders o
INNER JOIN order_items oi
ON o.id = oi.order_id
WHERE o.fulfilled = false
AND o.id NOT IN (SELECT order_id
FROM order_items
WHERE sku = '011111'
GROUP BY order_id
HAVING COUNT(order_id) = 1)
There is a one to many relationship between the orders and order_items tables (o.id = oi.order_id).
The goal is to select all of the information from two tables, with the following conditions:
The order has not been fulfilled (orders.fulfilled = false).
Exclude all of the orders that have exactly one order item with an SKU of '011111' (oi.sku like '011111').
Any help is appreciated!
IN can be slower , modified the query to use inner join
select * from orders o
inner join order_items oi
on o.id = oi.order_id
and o.fulfilled = false
inner join( select order_id
from order_items
where sku != '011111'
group by order_id
having count(order_id) = 1) T
on T.order_id = oi.id
count(whatever) usually will force a full table scan (because it has no idea how many orders there are grouped by order_items and you can not create an index on an aggregate), unless there is another clause that can use an index. Most likely a sku not equaling something will not be selective enough (I'm guessing you have a lot skus.) You can look at the explain output and you probably see a full table scan in the IN part of you query.
If thats the case then you have the option of caching the count data and then indexing it through a trigger function that updates a current_count column every time an order is placed or fulfilled. Or, you could cache a query that kept tracked of the count (say if the information does not need to refreshed very much.)
Can we assume that an order can't have more than one item with the same sku on the same order?
Can we assume that you can't have an order with no items?
If so, writing the opposite might be faster. The query below finds all orders that have any sku other than '011111'. Also, correlated subqueries are usually faster than non-correlated subqueries (although optimizers are smart enough to rewrite this a lot of the time). Exists clauses are usually faster than an in clause since the engine can exit before looking through all of the subquery rows.
SELECT *
FROM orders o
INNER JOIN order_items oi
ON o.id = oi.order_id
WHERE o.fulfilled = false
AND EXISTS (SELECT 'x'
FROM order_items oi2
WHERE o.order_id = oi2.order_id
AND sku != '011111')

A simple nested SQL statement

I have a question in SQL that I am trying to solve. I know that the answer is very simple but I just can not get it right. I have two tables, one with customers and the other one with orders. The two tables are connected using customer_id. The question is to list all the customers that did not make any order! The question is to be run in MapInfo Professional, a GIS desktop software, so not every SQL command is applicable to that program. In other words, I will be thankful if I get more than approach to solve that problem.
Here is how I have been thinking:
SELECT customer_id
from customers
WHERE order_id not in (select order_id from order)
and customer.customer_id = order.customer_id
How about this:
SELECT * from customers
WHERE customer_id not in (select customer_id from order)
The logic is, if we don't have a customer_id in order that means that customer has never placed an order. As you have mentioned that customer_id is the common key, hence above query should fetch the desired result.
SELECT c.customer_id
FROM customers c
LEFT JOIN orders o ON (o.customer_id = c.customer_id)
WHERE o.order_id IS NULL
... The NOT EXISITS way:
SELECT * FROM customers
WHERE NOT EXISTS (
SELECT * FROM orders
WHERE orders.customer_id = customer.customer_id
)
There are some problems with your approach:
There is probably no order_id in the customers table, but in your where-statement you refer to it
The alias (or table-name) order in the where-statement (order.customer_id) is not known because there is no join statement in there
If there would be a join, you would filter out all customers without orders, exactly the opposite of what you want
Your question is difficualt to answer to me because I do not know which SQL subset MapInfo GIS understands, but lets try:
select * from customers c where not exists (select * from order o where o.customer_id=c.customer_id)

What is the most efficient way to write a select statement with a "not in" subquery?

What is the most efficient way to write a select statement similar to the below.
SELECT *
FROM Orders
WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders)
The gist is you want the records from one table when the item is not in another table.
For starters, a link to an old article in my blog on how NOT IN predicate works in SQL Server (and in other systems too):
Counting missing rows: SQL Server
You can rewrite it as follows:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
, however, most databases will treat these queries the same.
Both these queries will use some kind of an ANTI JOIN.
This is useful for SQL Server if you want to check two or more columns, since SQL Server does not support this syntax:
SELECT *
FROM Orders o
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM HeldOrders ho
)
Note, however, that NOT IN may be tricky due to the way it treats NULL values.
If Held.Orders is nullable, no records are found and the subquery returns but a single NULL, the whole query will return nothing (both IN and NOT IN will evaluate to NULL in this case).
Consider these data:
Orders:
OrderID
---
1
HeldOrders:
OrderID
---
2
NULL
This query:
SELECT *
FROM Orders o
WHERE OrderID NOT IN
(
SELECT OrderID
FROM HeldOrders ho
)
will return nothing, which is probably not what you'd expect.
However, this one:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
will return the row with OrderID = 1.
Note that LEFT JOIN solutions proposed by others is far from being a most efficient solution.
This query:
SELECT *
FROM Orders o
LEFT JOIN
HeldOrders ho
ON ho.OrderID = o.OrderID
WHERE ho.OrderID IS NULL
will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius
An ANTI JOIN method used by both IN and EXISTS will just need to make sure that a record does not exists once per each row in Orders, so it will eliminate all possible duplicates first:
NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
A HASH ANTI JOIN will eliminate duplicates when building the hash table.
"Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.
There are three ways I commonly use to accomplish what you want, depending on the situation.
1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.
2. Another method is the "correlated subquery" which is a slight variation of what you have...
SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID
FROM HeldOrders h
where h.order_id = o.order_id)
Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.
3. Another method I use sometimes is left outer join...
SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null
When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.
Each of these variations can work more or less efficiently in various scenarios.
You can use a LEFT OUTER JOIN and check for NULL on the right table.
SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL
I'm not sure what is the most efficient, but other options are:
1. Use EXISTS
SELECT *
FROM ORDERS O
WHERE NOT EXISTS (SELECT 1
FROM HeldOrders HO
WHERE O.Order_ID = HO.OrderID)
2. Use EXCEPT
SELECT O.Order_ID
FROM ORDERS O
EXCEPT
SELECT HO.Order_ID
FROM HeldOrders
Try
SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL