sql query logic - sql server 2008 - sql

is there a way to improve this query..
INSERT INTO mastertable
VALUES (SELECT *
FROM staging_tbl s
WHERE s.pac NOT IN (SELECT pac
FROM mastertable)
AND s.store NOT IN (SELECT store
FROM mastertable))
Not sure if this will work at first place.. Basically..want to select records from Staging_Tbl only if same PAC-STORE combination do not currently exist.. If PAC exist but for another STORE..yes, we should select and vice versa.
For eg: Should if MasterTable is as below,
PAC1 STORE1
PAC1 STORE2
PAC2 STORE1
PAC2 STORE2
I should insert only if there is a record like PAC1 STORE3 in the staging table..
and NOT PAC1 STORE2

Do you have indexes on those columns..that will make a change
you can also use NOT EXISTS
INSERT INTO MASTERTABLE
SELECT * FROM Staging_Tbl S
WHERE NOT EXISTS ( SELECT 1 FROM MasterTable M
WHERE S.STORE = M.STORE
AND S.PAC = M.PAC)
Or A LEFT JOIN
INSERT INTO MASTERTABLE
SELECT S.* FROM Staging_Tbl S
LEFT OUTER JOIN MasterTable M
ON S.STORE = M.STORE
AND S.PAC = M.PAC
WHERE M.PAC IS NULL
AND M.STORE IS NULL
Except, make sure to test performance with this one
INSERT INTO MASTERTABLE
SELECT * FROM Staging_Tbl
EXCEPT
SELECT * FROM MASTERTABLE
I myself like NOT EXISTS the best
See also Select all rows from one table that don't exist in another table for usage of OUTER APPLY and EXCEPT to do the same

INSERT MASTERTABLE
SELECT * FROM Staging_Tbl S
WHERE NOT EXISTS
(SELECT 1 FROM MASTERTABLE M
WHERE M.PAC = S.PAC AND M.STORE = S.STORE)

Related

From 2 tables, get rows unique to each table

Starting with 2 tables, I want to get all rows with value in a certain column(cName) that is present on 1 table but not the other. I want to do this for both tables. I found a solution to use LEFT JOIN which gives me solution for 1 of the tables and I used UNION to combine. Is this a good way to do this or is there a better way?
select *
from College C1 LEFT JOIN myTestTable T1 on C1.cName = T1.cName
where T1.cName IS NULL
UNION
select *
from myTestTable T1 LEFT JOIN College C1 on T1.cName = C1.cName
where C1.cName IS NULL
You can use full join with a where:
SELECT *
FROM College C1 FULL JOIN
myTestTable T1
ON C1.cName = T1.cName
WHERE T1.cName IS NULL OR C1.cName IS NULL;
I prefer anti-join (NOT EXISTS) operators rather than LEFT JOIN. For one, if CName is not unique the left join produces multiple rows which the UNION must eliminate.
select * from College C1
WHERE NOT EXISTS (SELECT 1 FROM myTestTable T1 WHERE C1.cName = T1.cName)
UNION
select * from myTestTable T1
WHERE NOT EXISTS (SELECT 1 FROM College C1 WHERE T1.cName = C1.cName);
If indexes aren't available on CName you'll have some table scans with either LEFT JOIN or the NOT EXISTS.
You could also do this:
select * from College
union all
select * from myTestTable
MINUS ( select * from College intersect select * from myTestTable );

SQL: JOIN problem using temp tables and one column

I am created two temp tables in which TABLE1 contains all the items and TABLE2 only has the partial list of TABLE1. How can I find out which parts TABLE 1 has that TABLE2 doesn't have or vice versa? Please keep in mind, the temp table only has one column due to the DISTINCT statement.
I do have to use Joins but my thought is if I JOIN on the individual columns of each table and then in the Where clause state that e.g. column 1 is not equal column 2, it's contradicting.
IF EXISTS (
SELECT *
FROM tempdb.dbo.sysobjects
WHERE id = Object_id(N'tempdb..#TABLE1')
)
BEGIN
DROP TABLE #TABLE1
END
IF EXISTS (
SELECT *
FROM tempdb.dbo.sysobjects
WHERE id = Object_id(N'tempdb..#TABLE2')
)
BEGIN
DROP TABLE #TABLE2
END
------------------------------------------------
select distinct 1.parts as #TABLE1 from List1 1 --- MAIN LIST
select distinct 2.parts as #TABLE2 from List2 2 --- ADDITIONAL LIST
select *
from #TABLE2 left join
#TABLE1
on 2.parts = 1.parts
where 2.parts <> 1.parts
Your where clause is undoing the left join. I would recommend not exists:
select t1.*
from #table1 t1
where not exists (select 1 from #table2 t2 where t2.parts = t1.parts);

How to take distinct values in hive join

I need to take the distinct values from Table 2 while joining with Table 1 in Hive. Because the table 2 has duplicate records.
Considering below join condition is it possible to take only distinct key_col from table 2? i dont want to use select distinct * from ...
select * from Table_1 a left join Table_2 b on a.key_col = b.key_col
Note: This is in Hive
Use Left semi join. This will give you all the record in table1 which exist in table2(duplicate record) without duplicates.
select a.* from Table_1 a left semi join Table_2 b on a.key_col = b.key_col

SQL query to find record with ID not in another table

I have two tables with binding primary key in database and I desire to find a disjoint set between them. For example,
Table1 has columns (ID, Name) and sample data: (1 ,John), (2, Peter), (3, Mary)
Table2 has columns (ID, Address) and sample data: (1, address2), (2, address2)
So how do I create a SQL query so I can fetch the row with ID from table1 that is not in table2. In this case, (3, Mary) should be returned?
PS: The ID is the primary key for those two tables.
Try this
SELECT ID, Name
FROM Table1
WHERE ID NOT IN (SELECT ID FROM Table2)
Use LEFT JOIN
SELECT a.*
FROM table1 a
LEFT JOIN table2 b
on a.ID = b.ID
WHERE b.id IS NULL
There are basically 3 approaches to that: not exists, not in and left join / is null.
LEFT JOIN with IS NULL
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON r.value = l.value
WHERE r.value IS NULL
NOT IN
SELECT l.*
FROM t_left l
WHERE l.value NOT IN
(
SELECT value
FROM t_right r
)
NOT EXISTS
SELECT l.*
FROM t_left l
WHERE NOT EXISTS
(
SELECT NULL
FROM t_right r
WHERE r.value = l.value
)
Which one is better? The answer to this question might be better to be broken down to major specific RDBMS vendors. Generally speaking, one should avoid using select ... where ... in (select...) when the magnitude of number of records in the sub-query is unknown. Some vendors might limit the size. Oracle, for example, has a limit of 1,000. Best thing to do is to try all three and show the execution plan.
Specifically form PostgreSQL, execution plan of NOT EXISTS and LEFT JOIN / IS NULL are the same. I personally prefer the NOT EXISTS option because it shows better the intent. After all the semantic is that you want to find records in A that its pk do not exist in B.
Old but still gold, specific to PostgreSQL though: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
Fast Alternative
I ran some tests (on postgres 9.5) using two tables with ~2M rows each. This query below performed at least 5* better than the other queries proposed:
-- Count
SELECT count(*) FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2;
-- Get full row
SELECT table1.* FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2 JOIN table1 ON t1_not_in_t2.id=table1.id;
Keeping in mind the points made in #John Woo's comment/link above, this is how I typically would handle it:
SELECT t1.ID, t1.Name
FROM Table1 t1
WHERE NOT EXISTS (
SELECT TOP 1 NULL
FROM Table2 t2
WHERE t1.ID = t2.ID
)
SELECT COUNT(ID) FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For count
SELECT ID FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For results

An issue possibly related to Cursor/Join

Here is my situation:
Table one contains a set of data that uses an id for an unique identifier. This table has a one to many relationship with about 6 other tables such that.
Given Table 1 with Id of 001:
Table 2 might have 3 rows with foreign key: 001
Table 3 might have 12 rows with foreign key: 001
Table 4 might have 0 rows with foreign key: 001
Table 5 might have 28 rows with foreign key: 001
I need to write a report that lists all of the rows from Table 1 for a specified time frame followed by all of the data contained in the handful of tables that reference it.
My current approach in pseudo code would look like this:
select * from table 1
foreach(result) {
print result;
select * from table 2 where id = result.id;
foreach(result2) {
print result2;
}
select * from table 3 where id = result.id
foreach(result3) {
print result3;
}
//continued for each table
}
This means that the single report can run in the neighbor hood of 1000 queries. I know this is excessive however my sql-fu is a little weak and I could use some help.
LEFT OUTER JOIN Tables2-N on Table1
SELECT Table1.*, Table2.*, Table3.*, Table4.*, Table5.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.ID
LEFT OUTER JOIN Table3 ON Table1.ID = Table3.ID
LEFT OUTER JOIN Table4 ON Table1.ID = Table4.ID
LEFT OUTER JOIN Table5 ON Table1.ID = Table5.ID
WHERE (CRITERIA)
Join doesn't do it for me. I hate having to de-tangle the data on the client side. All those nulls from left-joining.
Here's a set-based solution that doesn't use Joins.
INSERT INTO #LocalCollection (theKey)
SELECT id
FROM Table1
WHERE ...
SELECT * FROM Table1 WHERE id in (SELECT theKey FROM #LocalCollection)
SELECT * FROM Table2 WHERE id in (SELECT theKey FROM #LocalCollection)
SELECT * FROM Table3 WHERE id in (SELECT theKey FROM #LocalCollection)
SELECT * FROM Table4 WHERE id in (SELECT theKey FROM #LocalCollection)
SELECT * FROM Table5 WHERE id in (SELECT theKey FROM #LocalCollection)
Ah! Procedural! My SQL would look like this, if you needed to order the results from the other tables after the results from the first table.
Insert Into #rows Select id from Table1 where date between '12/30' and '12/31'
Select * from Table1 t join #rows r on t.id = r.id
Select * from Table2 t join #rows r on t.id = r.id
--etc
If you wanted to group the results by the initial ID, use a Left Outer Join, as mentioned previously.
You may be best off to use a reporting tool like Crystal or Jasper, or even XSL-FO if you are feeling bold. They have things built in to handle specifically this. This is not something the would work well in raw SQL.
If the format of all of the rows (the headers as well as all of the details) is the same, it would also be pretty easy to do it as a stored procedure.
What I would do: Do it as a join, so you will have the header data on every row, then use a reporting tool to do the grouping.
SELECT * FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.resultid -- this could be a left join if the table is not guaranteed to have entries for t1.id
INNER JOIN table2 t3 ON t1.id = t3.resultid -- etc
OR if the data is all in the same format you could do.
SELECT cola,colb FROM table1 WHERE id = #id
UNION ALL
SELECT cola,colb FROM table2 WHERE resultid = #id
UNION ALL
SELECT cola,colb FROM table3 WHERE resultid = #id
It really depends on the format you require the data in for output to the report.
If you can give a sample of how you would like the output I could probably help more.
Join all of the tables together.
select * from table_1 left join table_2 using(id) left join table_3 using(id);
Then, you'll want to roll up the columns in code to format your report as you see fit.
What I would do is open up cursors on the following queries:
SELECT * from table1 order by id
SELECT * from table1 r, table2 t where t.table1_id = r.id order by r.id
SELECT * from table1 r, table3 t where t.table1_id = r.id order by r.id
And then I would walk those cursors in parallel, printing your results. You can do this because all appear in the same order. (Note that I would suggest that while the primary ID for table1 might be named id, it won't have that name in the other tables.)
Do all the tables have the same format? If not, then if you have to have a report that can display the n different types of rows. If you are only interested in the same columns then it is easier.
Most databases have some form of dynamic SQL. In that case you can do the following:
create temporary table from
select * from table1 where rows within time frame
x integer
sql varchar(something)
x = 1
while x <= numresults {
sql = 'SELECT * from table' + CAST(X as varchar) + ' where id in (select id from temporary table'
execute sql
x = x + 1
}
But I mean basically here you are running one query on your main table to get the rows that you need, then running one query for each sub table to get rows that match your main table.
If the report requires the same 2 or 3 columns for each table you could change the select * from tablex to be an insert into and get a single result set at the end...