How to select records which don't exist in another table or have a different status? - sql

I am trying to select records from a temp table based on another temp table which holds their previous statuses (StatusHistory).
So if the record doesn't exist in the status table, then it should be selected. If the status of the record is different than the one in the StatusHistory table, then the record should be selected. Otherwise, if it exists with the same status in the StatusHistory table, then it should be ignored.
I have this SQL but it doesn't seem to be the best solution. Can you please point me to a better way to achieve that assuming that there are thousands of records in the tables? Would it be possible to achieve the same result with a JOIN statement?
SELECT AI.item
FROM #AllItems AI
WHERE NOT EXISTS (
SELECT * FROM #StatusHistory HS
WHERE HS.itemId = AI.itemId
) OR NOT AI.itemStatus IN ( SELECT HS.itemStatusHistory
FROM #StatusHistory HS
WHERE HS.itemId = AI.itemId
AND HS.itemId = AI.itemId )

Yes, you can do this with a LEFT JOIN.
SELECT AI.item
FROM #AllItems AI
LEFT JOIN #StatusHistory HS ON AI.itemId = HS.itemId
AND AI.itemStatus = HS.itemStatusHistory
WHERE HS.itemId IS NULL
A better solution, however, is to use NOT EXISTS:
SELECT AI.item
FROM #AllItems AI
WHERE NOT EXISTS
(
SELECT 1 FROM #StatusHistory SH
WHERE SH.itemId = AI.itemId
AND SH.itemStatusHistory = AI.itemStatus
);
As pointed out by Aaron, this usually performs better than a LEFT JOIN.

Related

How do I get a records for a table that do not exist in another table?

I Have a staging table which gets updated as part of scheduled batch script.
Lets call this table Staging_Table
Now on a daily basis I update a tabled called Product_Table with entries from Staging_Table.
I need to delete the rows in Products_Table which do not have entries from the Staging table.
Now to uncomplicate things the staging table hold around 97000 records while the producst table has only 7000. However ona daily basis the entries in the staging table go up by 97000. i have a key for these products called TDC_IDP_ID....
So i have this query which seems to be taking forever to execute...
DELETE FROM Product_Table
WHERE PRODUCT_TD_PARTCODE NOT IN ( SELECT TDC_TD_PARTCODE FROM Staging_Table WHERE TDC_IDP_ID = #TDC_IDP_ID )
Now the inner query has 97000 records. How can i optimise this query(to atleast run) or is there another way to go about this? Instead of select i tried the following query and it is still running as i type this question. Its been 11 minutes that it is running....
SELECT COUNT(*)
FROM Product_Table
WHERE PRODUCT_TD_PARTCODE NOT IN ( SELECT TDC_TD_PARTCODE FROM Staging_Table WHERE TDC_IDP_ID = #TDC_IDP_ID )
First, rephrase the index as not exists:
DELETE FROM Product_Table
WHERE NOT EXISTS (SELECT 1
FROM Staging_Table st
WHERE st.TDC_IDP_ID = #TDC_IDP_ID AND
st.TDC_TD_PARTCODE = product_table.PRODUCT_TD_PARTCODE
);
Then you want an index on the staging table:
create index idx_Staging_Table_2 on Staging_Table(TDC_TD_PARTCODE, TDC_IDP_ID);
Use LEFT JOIN instead of NOT IN
Try this:
SELECT COUNT(*)
FROM Product_Table PT
LEFT OUTER JOIN Staging_Table ST ON PT.PRODUCT_TD_PARTCODE = ST.TDC_TD_PARTCODE AND ST.TDC_IDP_ID = #TDC_IDP_ID
WHERE ST.TDC_TD_PARTCODE IS NULL
DELETE PT
FROM Product_Table PT
LEFT OUTER JOIN Staging_Table ST ON PT.PRODUCT_TD_PARTCODE = ST.TDC_TD_PARTCODE AND ST.TDC_IDP_ID = #TDC_IDP_ID
WHERE ST.TDC_TD_PARTCODE IS NULL
For these heavy data you must use LEFT JOIN, and the other thing 'IN/ NOT IN' will make you query so heavy to execute and the run time will be more. Use of join will give you faster execution. In your case use left join

Delete with "Join" in Oracle sql Query

I am not deeply acquainted with Oracle Sql Queries, therefore I face a problem on deleting some rows from a table which must fulfill a constraint which includes fields of another (joining) table. In other words I want to write a query to delete rows including JOIN.
In my case I have a table ProductFilters and another table Products joined on fields ProductFilters.productID = Products.ID. I want to delete the rows from ProductFilters having an ID higher or equal to 200 and the product they refer has the name 'Mark' (name is a field in Product).
I would like to be informed initially if JOIN is acceptable in a Delete Query in Oracle. If not how should I modify this Query in order to make it work, since on that form I receive an error:
DELETE From PRODUCTFILTERS pf
where pf.id>=200
And pf.rowid in
(
Select rowid from PRODUCTFILTERS
inner join PRODUCTS on PRODUCTFILTERS.PRODUCTID = PRODUCTS.ID
And PRODUCTS.NAME= 'Mark'
);
Recently I learned of the following syntax:
DELETE (SELECT *
FROM productfilters pf
INNER JOIN product pr
ON pf.productid = pr.id
WHERE pf.id >= 200
AND pr.NAME = 'MARK')
I think it looks much cleaner then other proposed code.
Based on the answer I linked to in my comment above, this should work:
delete from
(
select pf.* From PRODUCTFILTERS pf
where pf.id>=200
And pf.rowid in
(
Select rowid from PRODUCTFILTERS
inner join PRODUCTS on PRODUCTFILTERS.PRODUCTID = PRODUCTS.ID
And PRODUCTS.NAME= 'Mark'
)
);
or
delete from PRODUCTFILTERS where rowid in
(
select pf.rowid From PRODUCTFILTERS pf
where pf.id>=200
And pf.rowid in
(
Select PRODUCTFILTERS.rowid from PRODUCTFILTERS
inner join PRODUCTS on PRODUCTFILTERS.PRODUCTID = PRODUCTS.ID
And PRODUCTS.NAME= 'Mark'
)
);
Use a subquery in the where clause. For a delete query requirig a join, this example will delete rows that are unmatched in the joined table "docx_document" and that have a create date > 120 days in the "docs_documents" table.
delete from docs_documents d
where d.id in (
select a.id from docs_documents a
left join docx_document b on b.id = a.document_id
where b.id is null
and floor(sysdate - a.create_date) > 120
);
Personally, I would use the EXISTS construct. As described in the examples on this web page:
DELETE ProductFilters pf
WHERE EXISTS (
SELECT *
FROM Products p
WHERE p."productID"=pf."productID"
AND p.NAME= 'Mark'
)
AND pf."id">=200;
Please use a subquery
delete from productfilters
where productid in (Select id from products where name='Mark') and Id>200;

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.

INNER JOIN vs IN

SELECT C.* FROM StockToCategory STC
INNER JOIN Category C ON STC.CategoryID = C.CategoryID
WHERE STC.StockID = #StockID
VS
SELECT * FROM Category
WHERE CategoryID IN
(SELECT CategoryID FROM StockToCategory WHERE StockID = #StockID)
Which is considered the correct (syntactically) and most performant approach and why?
The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.
I have looked at the query plans and havent been able to decipher anything from them.
Query Plan 1
Query Plan 2
The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.
Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.
In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.
Btw, yet a third syntax which is similar to using the IN function:
Select ...
From Category
Where Exists (
Select 1
From StockToCategory
Where StockToCategory.CategoryId = Category.CategoryId
And StockToCategory.Stock = #StockId
)
Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.
T think There are just two ways to specify the same desired result.
for sqlite
table device_group_folders contains 10 records
table device_groups contains ~100000 records
INNER JOIN: 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;
WHERE 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;
IN <1 ms
SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);

MYSQL join - reference external field from nested select?

Is it allowed to reference external field from nested select?
E.g.
SELECT
FROM ext1
LEFT JOIN (SELECT * FROM int2 WHERE int2.id = ext1.some_id ) as x ON 1=1
in this case, this is referencing ext1.some_id in nested select.
I am getting errors in this case that field ext1.some_id is unknow.
Is it possible? Is there some other way?
UPDATE:
Unfortunately, I have to use nested select, since I am going to add more conditions to it, such as LIMIT 0,1
and then I need to use a second join on the same table with LIMIT 1,1 (to join another row)
The ultimate goal is to join 2 rows from the same table as if these were two tables
So I am kind of going to "spread" a few related rows into one long row.
The answer to your initial question is: No, remove your sub-query and put the condition into the ON-clause:
SELECT *
FROM ext1
LEFT JOIN int2 ON ( int2.id = ext1.some_id )
One solution could be to use variables to find the first (or second) row, but this solution would not work efficiently with indexes, so you might end up with performance problems.
SELECT ext1.some_id, int2x.order_col, int2x.something_else
FROM ext1
LEFT JOIN (SELECT `int2`.*, #i:=IF(#id=(#id:=id), #i+1, 0) As rank
FROM `int2`,
( SELECT #i:=0, #id:=-1 ) v
ORDER BY id, order_col ) AS int2x ON ( int2x.id = ext1.some_id
AND int2x.rank = 0 )
;
This assumes that you have a column that you want to order by (order_col) and Left Joins the first row per some_id.
Do you mean this?
SELECT ...
FROM ext1
LEFT JOIN int2 ON int2.id=ext1.some_id
That's what the ON clause is for:
SELECT
FROM ext1
LEFT JOIN int2 AS x ON x.id = ext1.some_id