SQL ignore rows based on criteria

SQL ignore rows based on criteria - sql

This is going to sound stupid.
So 2 tables ORDER and ORDERDETAILs
ORDER_ID ITEM_NAME
======== =========
111 Paper
111 Toner
222 Paper
333 Pencils
I want to query only Order_ID's were the ITEM_Name is Paper
so for instance my query only result should be
ORDER_ID ITEM_NAME
======== =========
222 Paper
I don't want ORder_ID's that have other Item's related to it. I only want ORDERID's were the the only ITEM_Name is Paper.

If you want order ids that are only paper, I would recommend using group by and a having clause:
select od.Order_Id
from OrderDetails od
group by od.Order_Id
having sum(case when od.Item_Name = 'Paper' then 1 else 0 end) > 0 and
sum(case when od.Item_Name <> 'Paper' then 1 else 0 end) = 0
The having clause has two conditions. The first counts the number of rows for an order that have Paper as an item. The > 0 says that there needs to be at least one. The second counts the number of rows that do not have paper. The = 0 says that there needs to be none.
This can also be written as:
having sum(case when od.Item_Name = 'Paper' then 1 else 0 end) = count(*)
i.e. All the items for the order are "Paper".
I like this method because it is very general. You can readily extend it to include scissors and rocks. Or to get orders that have paper, but no rocks, and so on.

In this way, you can get every ORDER_ID with ITEM_NAME equals Paper
SELECT ORDER_ID FROM TABLE_NAME WHERE ITEM_NAME = 'Paper';

If you want to retrieve only the order_ids:
select DISTINCT(ORDER_ID)
from orderdetails
where ITEM_NAME='PAPER'
AND ORDER_ID NOT IN
(SELECT DISTINCT(ORDER_ID) FROM ORDERDETAILS WHERE ITEM_NAME!='PAPER')
If you want all the columns from the order:
select *
from orderdetails
where ITEM_NAME='PAPER'
AND ORDER_ID NOT IN
(SELECT DISTINCT(ORDER_ID) FROM ORDERDETAILS WHERE ITEM_NAME!='PAPER')

SELECT OrderDetails.*
FROM OrderDetails
WHERE ITEM_Name = 'Paper'
AND Order_ID NOT IN (SELECT Order_ID FROM OrderDetails WHERE ITEM_Name <> 'Paper')
In this one, you select all items where item_name = 'Paper'. Then exclude all orders where they have a line that <> 'Paper'
Alternates:
If we can assume that the Item_Name is unique for each order, then (at least in MS SQL Server) you could do:
;with NumberOfRows as
(
SELECT COUNT(1) as TotalRows, Order_ID
FROM OrderDetails
GROUP BY Order_ID
)
SELECT OrderDetails.*
FROM OrderDetails
INNER JOIN NumberOfRows
ON NumberOfRows.Order_ID = OrderDetails.OrderID
AND NumberOfRows.TotalRows = 1
WHERE OrderDetails.Item_Name = 'Paper'
As another option:
select OrderDetails1.*
from OrderDetails as OrderDetails1
LEFT JOIN OrderDetails OrderDetails2
ON OrderDetails1.Order_ID = OrderDetails2.Order_ID
AND OrderDetails2.ITEM_Name <> 'Paper'
Where ITEM_Name = 'Paper'
AND OrderDetails2.Order_ID IS NULL
In this last one, you get all the orders and join back onto themselves where they have a 2nd line that is not paper. Then Exclude all of the orders where the join was successful. This would be easier to understand if the order details table had something unique (like a line id).

Related

How to work left outer join in SQl Server?

First: I know to use all types of join but I don't know why it works like this for this Query
I have a Scenario for making a SQL Query, by using 3 tables and a left outer join between selling and order items.
My Tables:
--------------------
Item
--------------------
ID | Code
--------------------
1 | 7502
SQL > select * from Item where id = 1
---------------------
Item_Order
---------------------------
Item | Box | Quantity
---------------------------
1 | 30 | 15000
1 | 12 | 6000
SQL > select * from Item_Order where Item = 1
--------------------------
Invoice_Item
-------------------
Item | Num | Quantity
-------------------------
1 | 1.64 | 10
1 | 2.4 | 8
SQL > select * from Invoice_Item where Item = 1
I want this output:
Item | OrderQ | OrderB | SellN | SellQ
-----------------------------------------
1 | 1500 | 30 | 1.64 | 10
1 | 6000 | 12 | 2.4 | 8
My SQL code:
SELECT Item.ID, Item_Order.Box As OrderB, Item_Order.Quantity As OrderQ, Invoice_Item.Num As SellN, Invoice_Item.Quantity As SellQ
FROM Item LEFT OUTER JOIN
Invoice_Item ON Item.ID = Invoice_Item.Item LEFT OUTER JOIN
Item_Order ON Item_Order.Item = Item.ID
where Item.ID = 1
Why is my output 2x? or why does my output return 4 records?

Your result can be achieve with row_number:
select a.ID
, a.OrderB
, a.OrderQ
, b.Quantity SellQ
, b.Num SellN
from
(SELECT Item.ID
, Item_Order.Box As OrderB
, Item_Order.Quantity As OrderQ
, row_number () over (order by Item.ID) rn
FROM Item
left outer JOIN Item_Order ON Item.ID = Item_Order.Item) a
left outer join (select Item
, Num
, Quantity
, row_number () over (order by Item) rn
from Invoice_Item ) b
on a.ID = b.Item
and a.rn = b.rn
Here is a demo
You can add more tables like this:
left outer join (select Item
, Num
, Quantity
, row_number () over (order by Item) rn
from Invoice_Item ) b

Because when you first join Item with Item_Order it outputs two records because there are two records in Item_Order. Now this resulting query will be left join with Invoice_Item and that two records will be join with all of the records of Invoice_Item
You can better understand this like this
SELECT Item.ID, Item_Order.Box As OrderB, Item_Order.Quantity As OrderQ, Invoice_Item.Num As SellN, Invoice_Item.Quantity As SellQ
FROM Item LEFT OUTER JOIN
Invoice_Item ON Item.ID = Invoice_Item.Item LEFT OUTER JOIN
where Item.ID = 1 into table4 //Only to explain
Now the result of first query table4 will be joined with Items_Order

You are joining on one key -- two rows with the same key in one table times two rows in the second table = 4 rows.
You need a separate key. You can generate one using row_number():
SELECT i.ID, io.Box As OrderB, io.Quantity As OrderQ,
ii.Num As SellN, ii.Quantity As SellQ
FROM Item i LEFT OUTER JOIN
((SELECT ii.*,
ROW_NUMBER() OVER (PARTITION BY ii.item ORDER BY ii.item) as seqnum
FROM Invoice_Item ii
) FULL JOIN
(SELECT io.*,
ROW_NUMBER() OVER (PARTITION BY io.item ORDER BY io.item) as seqnum
FROM Item_Order io
) io
ON io.Item = ii.ID AND io.seqnum = ii.seqnum
)
ON i. = ii.Item
where i.ID = 1;
Note that this is one of the few cases where I use parentheses in the FROM clause. This code can handle additional rows in either of the tables -- if one table is longer than the other, the columns from the other will be NULL.
If you know the two tables have the same number of rows (for a given item) you can just use inner joins and no parentheses.

It is duplicating because you have no secondary association between Invoice_Item and Item_Order. For each record in Invoice_Item it is matching to Item_Order (known as a Cartesian result) base ONLY on the Item ID. So, your order qty APPEARS to be a 1:1 reference such that the first Invoice item Qty of 10 is MEANT to be associated with Item_Order Box = 30. and Qty 8 is MEANT to be associated with Item_Order Box = 12.
Item_Order
Item Box Quantity
1 30 15000
1 12 6000
Invoice_Item
Item Num Quantity
1 1.64 10
1 2.4 8
You probably need to tack on the "Box" reference so Item_Order and Invoice_Item are a 1:1 match.
What is happening is for each item in Invoice Item is joined to the Item_Order based on Item ID. So you are getting two. If you had 3 Invoice Items with 1 and 6 of Items_Order, you would be getting 18 rows.
FEEDBACK
Even though you have an accepted answer based on an OVER/PARTITION/ROW NUMBER, that process is forcing a surrogate secondary ID to each row. Relying on this approach is not best for an overall data structure association. What happens if you delete the second item on an order. are you positive you are deleting the second item in the invoice_items?
As for returning 2 records in the original scenario, you can via the surrogate process, but I think it would be better for you long term to understand what is happening on the join. Going back to your sample data of Item_Order and Invoice_Item. So lets start with the Item_Order table. The SQL engine is going to process each row individually.
First row SQL grabs Item = 1, Box = 30, Qty = 15000.
So now it joins to the Invoice Item table, and since your criteria it only joins based on Item. So, it sees the first row and says... yup this is item 1, so include that with the item order record (first row returned). Now it goes to the second line in the invoice item table... yup, it too is the same item 1, so it returns it again (second row returned).
Now, SQL grabs the second row Item = 1, Box = 12, Qty = 6000.
Goes back to the Invoice Item table and does exact same test... and for each row in the Item Order that has an Item = 1, and 3rd and 4th row hence your doubling... If either table had more records with the same Item id, it would return that many more records... 3 and 3 records would have returned 9 rows. 4 and 4 records would return 16 rows, etc. Doing the surrogate will work, but I don't think as safe as a better/updated design structure.

SQL Query Exclude Records

I want to query a database of guests that bought certain items. I want to see what customers bought item 'A' but not item 'B'.
I tried:
SELECT customerName
FROM Customers
WHERE NOT item = 'A' AND item = 'B';
But I return customers that bought both items. I would like to exclude these customers from that query.
I am using SQLite

There are multiple ways to do this. I like to use group by and having, because it is very flexible for many conditions:
SELECT customerName
FROM Customers
GROUP BY customerName
HAVING SUM(CASE WHEN item = 'A' THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN item = 'B' THEN 1 ELSE 0 END) = 0;

You can also use the MINUS operator which returns all rows in the first SELECT statement that are not returned by the second SELECT statement. Such as:
(SELECT customerName FROM Customers WHERE item='A')
MINUS
(SELECT customerName FROM Customers WHERE item='B');

I would use EXISTS with NOT EXISTS:
select c.*
from Customers c
where exists (select 1
from Customers c1
where c1.customerName = c.customerName and c1.item = 'A'
) and not exists
(select 1
from Customers c2
where c2.customerName = c.customerName and c2.item = 'B'
);

Oracle SQL - Sum up and show text if > 0

I'm totally new with Oracle SQL and my question may seem stupid but I have some difficulties to solve my problem.
Current situation:
I have following tables: Supplier, Debtor, Invoice
Every Supplier has various Debtors and every Debtor has various Invoices.
I want to create an evaluation which shows me the following scenario:
A list of all Debtors from a Supplier. I also want to see in that list if the Debtor once haven't payed his invoice. I have this information within the table Invoice as the attribute called "payed" and possible values are 0 (payed) and 1 (not payed). I just want to see the debtor ONCE in the list so if there is only ONE invoice which is 1, it should show me "1" or "Not Payed" in the list. Right now when a debtor has 100 invoices it shows me 100 times the debtor with the info "0" or "1".
Currently:
SELECT company.company_id,
company.companyname_1,
supplier.supplier_id,
supplier.suppliername_1,
debtor.debtor_id_from_supplier,
debtor_ext.debtorname_1,
debtor_ext.street,
debtor_ext.street_number,
debtor_ext.postcode,
debtor_ext.city,
debtor.approved_limit,
debitor.limit_left,
debitor.limit_status,
debitor.limit_type,
debitor.prosecution,
CASE
WHEN invoice.payed = 0 THEN 'Yes'
ELSE 'No'
END as deb_payment
FROM debtor,
debtor_ext,
company,
supplier,
invoice
WHERE ( company.company_id = supplier.company_id ) and
( supplier.supplier_id = debtor.supplier_id ) and
( debtor.debtor_id = debtor_ext.debtor_id ) and
( debtor.supplier_id = invoice.supplier_id ) and
( debtor.debtor_id_from_supplier = invoice.debtor_id_from_supplier )
CODE CORRECTED!
Hope you guys can help me

First of all, In your query, you're doing cross join which is not required and also affects on the performance. I'll advice you to use Inner Join here.
Also,as the schema is not provided in question,I am giving solution for the below schema:
Supplier:
Supplier_Id (PK) | Supplier_Name
Debtor:
Debtor_Id (PK) | Debtor_Name | Supplier_Id (FK)
Invoice:
Invoice_Id (PK) | Supplier_Id (FK) | Debtor_Id (FK) | Amount | Payed
Query:
Select company.company_id,
company.companyname_1,
s.supplier_id,
s.suppliername_1,
d.debtor_id_from_supplier,
d.debtorname_1,
d.street,
d.street_number,
d.postcode,
d.city,
d.approved_limit,
d.limit_left,
d.limit_status,
d.limit_type,
d.prosecution,
tmp.deb_payment
From
(
SELECT d.supplier_id,
d.debtor_id,
(
CASE
WHEN min(i.payed) <> max(i.payed) Then 'Not Payed'
ELSE 'Payed'
END
)as deb_payment
FROM supplier s
inner join debtor d
on s.supplier_id = d.supplier_id
inner join invoice i
on i.supplier_id = d.supplier_id
and i.debtor_id = d.debtor_id
Group by d.supplier_id,d.debtor_id
) tmp
inner join supplier s
on s.supplier_id = tmp.supplier_id
inner join company
on company.company_id = s.company_id
inner join debtor d
on s.supplier_id = d.supplier_id
inner join debtor_ext de
d.debtor_id = de.debtor_id
;
Hope it helps!

Making simple SQL more efficient

SQL Fiddle.
I'm having a slow start to the morning. I thought there was a more efficient way to make the following query using a join, instead of two independent selects -- am I wrong?
Keep in mind that I've simplified/reduced my query into this example for SO purposes, so let me know if you have any questions as well.
SELECT DISTINCT c.*
FROM customers c
WHERE c.customer_id IN (select customer_id from customers_cars where car_make = 'BMW')
AND c.customer_id IN (select customer_id from customers_cars where car_make = 'Ford')
;
Sample Table Schemas
-- Simple tables to demonstrate point
CREATE TABLE customers (
customer_id serial,
name text
);
CREATE TABLE customers_cars (
customer_id integer,
car_make text
);
-- Populate tables
INSERT INTO customers(name) VALUES
('Joe Dirt'),
('Penny Price'),
('Wooten Nagen'),
('Captain Planet')
;
INSERT INTO customers_cars(customer_id,car_make) VALUES
(1,'BMW'),
(1,'Merc'),
(1,'Ford'),
(2,'BMW'),
(2,'BMW'), -- Notice car_make is not unique
(2,'Ferrari'),
(2,'Porche'),
(3,'BMW'),
(3,'Ford');
-- ids 1 and 3 both have BMW and Ford
Other Expectations
There are ~20 car_make in the database
There are typically 1-3 car_make per customer_id
There is expected to be not more than 50 car_make assignments per customer_id (generally 20-30)
The query is generally only going to look for 2-3 specific car_make per customer (e.g., BMW and Ford), but not 10-20

And here another option, don't know what the fastest one would be on large tables.
SELECT customers.*
FROM customers
JOIN customers_cars USING(customer_id)
WHERE car_make = ANY(ARRAY['BMW','Ford'])
GROUP BY
customer_id, name
HAVING array_agg(car_make) #> ARRAY['BMW','Ford'];
vol7ron:
Fiddle
The following is a modification of the above, taking the same idea using an array for comparison. I'm not sure how any more efficient it would be compared to the dual-query approach, since it would have to create an array as one pass and then do more heavy-handed comparison because of comparing the elements of an array.
SELECT DISTINCT c.*
FROM customers c
WHERE customer_id IN (
select customer_id
from customers_cars
group by customer_id
having array_agg(car_make) #> ARRAY['BMW','Ford']
);

I would write it as
SELECT DISTINCT c.customer_id
FROM customers c
JOIN customers_cars cc_f on c.customer_id = cc_f.customer_id and cc_f.car_make = 'Ford'
JOIN customers_cars cc_b on c.customer_id = cc_b.customer_id and cc_b.car_make = 'BMW'
;
Whether this is better or not I don't know. In some RDBMs plain joins like this work better than subqueries, but I don't know about Postgres. From readability point of view it is also questionable.

It seems to me that you are trying to find customers that has at least 1 BMW and at least 1 Ford car.
This query should get that for you:
SELECT
customers.customer_id
FROM
customers
INNER JOIN customer_cars ON
customers.customer_id = customer_cars.customers_id
AND customer_cars.car_make IN ('BMW', 'Ford')
GROUP BY
customers.customer_id
HAVING
COUNT(CASE WHEN car_make = 'BMW' THEN 1 ELSE NULL END) > 0
AND COUNT(CASE WHEN car_make = 'Ford' THEN 1 ELSE NULL END) > 0
Make sure you have an indexes on customer_cars.customer_id and customer_cars.car_make to achieve maximum performance.

You don't need to join to customers at all (given relational integrity).
Generally, this is a case of relational division. We assembled an arsenal of techniques under this related question:
How to filter SQL results in a has-many-through relation
Unique combinations
If (customer_id, car_make) was defined unique in customers_cars, it would get much simpler:
SELECT customer_id
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1
HAVING count(*) = 2;
Combinations not unique
Since (customer_id, car_make) is not unique, we need an extra step.
For only a few cars, your original query is not that bad. But (especially with duplicates!) EXISTS is typically faster than IN, and we don't need the final DISTINCT:
SELECT customer_id -- no DISTINCT needed.
FROM customers c
WHERE EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'BMW')
AND EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'Ford');
Above query gets verbose and less efficient for a longer list of cars. For an arbitrary number of cars I suggest:
SELECT customer_id
FROM (
SELECT customer_id, car_make
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1, 2
) sub
GROUP BY 1
HAVING count(*) = 2;
SQL Fiddle.

How to ensure outer join with filter still returns all desired rows?

Imagine I have two tables in a DB like so:
products:
product_id name
----------------
1 Hat
2 Gloves
3 Shoes
sales:
product_id store_id sales
----------------------------
1 1 20
2 2 10
Now I want to do a query to list ALL products, and their sales, for store_id = 1. My first crack at it would be to use a left join, and filter to the store_id I want, or a null store_id, in case the product didn't get any sales at store_id = 1, since I want all the products listed:
SELECT name, coalesce(sales, 0)
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
WHERE store_id = 1 or store_id is null;
Of course, this doesn't work as intended, instead I get:
name sales
---------------
Hat 20
Shoes 0
No Gloves! This is because Gloves did get sales, just not at store_id = 1, so the WHERE clause has filtered them out.
How then can I get a list of ALL products and their sales for a specific store?
Here are some queries to create the test tables:
create temp table test_products as
select 1 as product_id, 'Hat' as name;
insert into test_products values (2, 'Gloves');
insert into test_products values (3, 'Shoes');
create temp table test_sales as
select 1 as product_id, 1 as store_id, 20 as sales;
insert into test_sales values (2, 2, 10);
UPDATE: I should note that I am aware of this solution:
SELECT name, case when store_id = 1 then sales else 0 end as sales
FROM test_products p
LEFT JOIN test_sales s ON p.product_id = s.product_id;
however, it is not ideal... in reality I need to create this query for a BI tool in such a way that the tool can simply add a where clause to the query and get the desired results. Inserting the required store_id into the correct place in this query is not supported by this tool. So I'm looking for other options, if there are any.

Add the WHERE condition to the LEFT JOIN clause to prevent that rows go missing.
SELECT p.name, coalesce(s.sales, 0)
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
AND s.store_id = 1;
Edit for additional request:
I assume you can manipulate the SELECT items? Then this should do the job:
SELECT p.name
,CASE WHEN s.store_id = 1 THEN coalesce(s.sales, 0) ELSE NULL END AS sales
FROM products p
LEFT JOIN sales s USING (product_id)
Also simplified the join syntax in this case.

I'm not near SQL, but give this a shot:
SELECT name, coalesce(sales, 0)
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id AND store_id = 1
You don't want a where on the whole query, just on your join

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas