Compare 2 tables and find the missing record - sql

I have 2 database tables:
customers and customers_1
I have 100 customers in the customers table but only 99 customers in the customers_1 table. I would like to write a query that will compare the 2 tables and will result in the missing row.
I have tried this following SQL:
select * from customers c where in (select * from customers_1)
But this will only check for the one table.

Your query shouldn't work this way. You have to compare one column to another and use NOT IN instead of IN:
select *
from customers c
where customerid not in (select customerid from customers_1)
However, Since you are on SQL Server 2008, you can use EXCEPT:
SELECT * FROM customers
EXCEPT
SELECT * FROM customers_1;
This will give you the rows which are in the customers table that are not in customers_1 table:
EXCEPT returns any distinct values from the left query that are not
also found on the right query.

This is easy. Just join them with a left outer join and check for NULL in the table which has the 99 rows. It will look something like this.
SELECT * FROM customers c
LEFT JOIN customers1 c1 ON c.some_key = c1.some_key
WHERE c1.some_key IS NULL

Instead of NOT IN clause consider using NOT EXISTS. NOT EXISTS clause performs better in this particular scenario. Your query would look like:
SELECT * FROM Customer c WHERE NOT EXISTS (SELECT 1 FROM Customer_1 c1 WHERE c.Customer_Id = c1.Customer_Id)
SELECT 1 is just for readability so everyone will know that I don't care about the actual data.

Related

SQLite - How to select records from one table that are not in another table

I have a database with 3 tables; tblCustomers, tblBookings, tblFlights.
I want to find the customer's last name (LName), from the Customers table where the customers do not appear in the bookings table. It should return just three names, but it returns the names 10 times each. There are 10 records in the bookings table, so I think the command is returning the correct names, but not once...
I have tried:
SELECT tblCustomers.LName
FROM tblCustomers, tblBookings
WHERE tblCustomers.CustID
NOT IN (SELECT CustID FROM tblBookings)
How do I return just one instance of the name, not the name repeated 10 times?
You are doing a CROSS JOIN of the 2 tables.
Use only NOT IN:
SELECT LName
FROM tblCustomers
WHERE CustID NOT IN (SELECT CustID FROM tblBookings)
The (implicit) cross join on The bookings table in the outer query makes no sense - and it multiplies the customer rows.
Also, I would recommend not exists for filtering instead of not in: it usually performs better - with the right index in place, and it is null-safe:
SELECT c.LName
FROM tblCustomers c
WHERE NOT EXISTS (SELECT 1 FROM tblBookings b WHERE b.CustID = c.CustID)
For performance, make sure to have an index on tblBookings(CustID) - if you have a proper foreign key declared, it should already be there.

SQL Join and contains

I'm struggling with this query:
SELECT *
FROM Transactions
WHERE CustomerID IN (SELECT ID FROM Customers
WHERE Name LIKE '%Test%')
It takes 10 seconds to run, however if I create the query manually by taking the 4 values returned by the sub query it runs in milliseconds, for example:
SELECT *
FROM Transactions
WHERE (CustomerID = 1 OR CustomerID = 2 OR
CustomerID = 3 OR CustomerID = 4)
To clarify, running
SELECT ID FROM Customers WHERE Name LIKE '%Test%'
returns the values 1,2,3,4 immediately
Any ideas? What am I missing?
As you already said, when you have the customer id's it runs in milliseconds so the filtering of the customer name is the problem.
The first wildcard (WHERE name LIKE '%Test%') is the suspect here because sql server needs to read all the strings in the name column like a regular expression and find if there is any "Test" in there for every row in the table!
If the names you are filtering for would always start with a "Test" and you could do a WHERE name LIKE 'Test%' it would work much better because sql server only needs to read the start of each string.
Edit:
Here is a little bit different version of the original query if you want to try:
SELECT * FROM Transactions t
WHERE EXISTS (
SELECT 1 FROM
Customers c
WHERE c.ID = t.CustomerID
AND c.Name LIKE '%Test%'
)
What happens with a join?
SELECT t.*
FROM Transactions t JOIN
Customers c
ON t.CustomerID = c.ID
WHERE c.Name Like '%Test%';
Sometimes, JOINs optimize better than IN.

Using BETWEEN with a subquery postgres

I need a create a query to get all attendance of the an employee within a time limit. But the time is from different table. I need to create a query like the one below, but I dont know how?
SELECT * FROM attendance WHERE employeeid = 25 AND attendance_date BETWEEN (SELECT bill_fromdate,bill_todate FROM bill WHERE bill_id = 21487)
I am using PostgreSQL 8.4.
You could use a join instead of a subquery:
SELECT *
FROM attendance a
JOIN bill b ON
a.attendance_date BETWEEN b.bill_fromdate AND b.bill_todate
WHERE a.employeeid = 25 AND AND b.bill_id = 21487
Either use a JOIN (as in Mureinik's answer) or use a sub-select with an exists condition:
SELECT a.*
FROM attendance a
WHERE a.employeeid = 25
AND exists (select 1
from bill b
where b.bill_id = 21487
and a.attendance_date BETWEEN b.bill_fromdate AND b.bill_todate)
Given your example query, most probably there isn't a difference between using the join or the sub-select.
But they have different meanings and a join could return a different result (i.e. more rows) than the sub-select (but again I doubt it in this situation).

How to have a set of combination as a condition in SQL?

I'm working on an integration project and have created a batchlog table where I store which combinations have been exported already and then check new data against that batchlog table. It worked pretty well as long as I mostly just stored one ID in batchlog table, let's say Customer ID and then selected new rows from Customer table like this:
SELECT *
FROM Customer
WHERE CusId NOT IN (SELECT CusID FROM IntegrationBatchlog)
However, now the solution is more complex and same row from Customer table will be exported several times in combination with other data so now I have couple of separate stored procedures and more columns in IntegrationBatchlog table (CusID, OrdertypeID and PaymentMethod) and join clauses in my select so now it's more like.
SELECT * FROM Customer c
JOIN....
JOIN...
JOIN...
WHERE there is not a row with that CusID AND OrderTypeID AND PaymentMethod in batchlog table yet.
So here I should check whether or not this exact combination has already been exported but how do you do that when you have like three several ID columns in batchlog table and you want to exclude those rows where all the three ID's are already present in same row in batchlog table?
One way is to do a LEFT JOIN to the IntegrationBatchLog table and only insert rows that aren't present.
select *
from Customer c
LEFT OUTER JOIN IntegrationBatchLog i
on c.CusId = i.CusId
and c.OrderTypeID = i.OrderTypeID
and c.PaymentMethod = i.PaymentMethod
where
i.CusId is null
Use EXISTS, not IN. This allows multiple column matching
This is standard SQL
SELECT * FROM Customer c
JOIN....
JOIN...
JOIN...
WHERE NOT EXISTS (
SELECT * FROM IntegrationBatchlog I
WHERE C.CusID = I.CusID
AND C.OrderTypeID = I.OrderTypeID
AND C.PaymentMethod = I.PaymentMethod)
SELECT ...
FROM Customer c JOIN ...
WHERE NOT EXISTS (SELECT *
FROM IntegrationBatchLog I
WHERE I.CusID = c.CusId AND
I.OrderTypeId = c.OrderTypeID ...)
Maybe NOT EXISTS would work here. Here's an example from the MySQL docs (I don't know your DB) - http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
The second example:
"What kind of store is present in no cities?"
SELECT DISTINCT store_type FROM stores
WHERE NOT EXISTS (SELECT * FROM cities_stores
WHERE cities_stores.store_type = stores.store_type);
Maybe yours could be:
SELECT * FROM Customer c
JOIN....
JOIN...
JOIN...
WHERE NOT EXISTS (
SELECT * FROM batchlog WHERE
c.CusID = batchlog.CusID AND
c.OrderTypeID = batchlog.OrderTypeID AND
c.PaymentMethod = batchlog.PaymentMethod
)

INNER JOIN vs IN

SELECT C.* FROM StockToCategory STC
INNER JOIN Category C ON STC.CategoryID = C.CategoryID
WHERE STC.StockID = #StockID
VS
SELECT * FROM Category
WHERE CategoryID IN
(SELECT CategoryID FROM StockToCategory WHERE StockID = #StockID)
Which is considered the correct (syntactically) and most performant approach and why?
The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.
I have looked at the query plans and havent been able to decipher anything from them.
Query Plan 1
Query Plan 2
The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.
Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.
In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.
Btw, yet a third syntax which is similar to using the IN function:
Select ...
From Category
Where Exists (
Select 1
From StockToCategory
Where StockToCategory.CategoryId = Category.CategoryId
And StockToCategory.Stock = #StockId
)
Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.
T think There are just two ways to specify the same desired result.
for sqlite
table device_group_folders contains 10 records
table device_groups contains ~100000 records
INNER JOIN: 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;
WHERE 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;
IN <1 ms
SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);