assume we have two tables.
TabA(Acol1,Acol2,Acol3)
TabB(Bcol1,Bcol2.Ccol3)
Requirement is like, join two tables on Acol1,Bcol1 and if Acol3='C' then join based on Acol2=Bcol2 in addition to above join. Can we make this in single SQL query ? Is join is record wise or table wise ?
One solution I can get to is using Union, but I dont think this will be a optimized one. Any other solutions ?
Another solution I figured
SELECT A.*,B.* FROM TabA A
INNER JOIN TabB B ON A.Acol1 = B.BCol1
and case when A.Acol3='C' then A.ACol2 else '1' end =
case when A.Acol3='C' then B.BCol2 else '1' end ;
Any other solution without case and Union ?
Thanks in advance
If you want to join on TabA.col2 only when it is 'C' then in those case, TabB.col2 will also be 'C', as you are already joining from col1. So your output will be same which you get just by first join.
select a.*, b.* from tabA a join tabB b
on a.col1=b.col1
This should give you the same output anytime. Try creating a different scenario on values of 'C'. The result will always be a subset of your first join result.
Hmmm, after thinking about the question, I think you might just want a complicated on clause:
select a.*, b.*
from tabA a join
tabB b
on a.col1 = b.col1 and
(a.col3 <> 'C' or a.col2 = b.col2);
Note: the above assumes that a.col2 is not null (that condition is easily included if needed).
You may need to work out some examples by hand to see that the or method is equivalent to the case statement.
Related
I am very new to vertica db and hence looking for different efficient ways for comparing two tables of average size 500ml-800ml rows in vertica. I have a process that gets the data from vertica view and dump in to SQL server for later merge to final table in sql server. for few large tables combine it is dumping about 3bl rows daily. Instead of dumping all data I want to take daily snapshot, and compare it with previous days snapshot on vertica side only and then push changed rows only in to SQL SEREVER.
lets say previous snapshot is stored in tableA, today's snapshot stored in tableB. PK on both table is column named OrderId.
Simplest way I can think of is
Select * from tableB
Where OrderId NOT IN (
SELECT * from tableA
INTERSECT
SELECT * from tbleB
)
So my questions are:
Is there any other/better option in vertica to get only changed rows between two tables? Or should I
even consider doing this compare on vertica side?
How much doing such comparison should take?
What should I consider to improve the performance of such query?
If your columns have no NULL values, then a massive LEFT JOIN would seem to do what you want:
select b.*
from tableB b left join
tableA a
on b.OrderId = a.OrderId and
b.col1 = a.col1 and
. . . -- for all the columns you care about
However, I think you want except:
select b.*
from tableB b
except
select a.*
from tableA a;
I imagine this would have reasonable performance.
Do you have a primary key in the two tables?
Then my technique, for a complete Change Data Capture, is:
SELECT
'I' AS to_do
, newrows.*
FROM tb_today newrows
LEFT
JOIN tb_yesterday oldrows USING(id)
WHERE oldrows.id IS NULL
UNION ALL
SELECT
'U' AS to_do
, newrows.*
FROM tb_today newrows
JOIN tb_yesterday oldrows
WHERE oldrows.fname <> newrows.fname
OR oldrows.lnamd <> newrows.lname
OR oldrows.bdate <> newrwos.bdate
OR oldrows.sal <> newrows.sal
[...]
OR oldrows.lastcol <> newrows.lastcol
UNION ALL
SELECT
'D' AS to_do
, oldrows.*
FROM tb_yesterday oldrows
LEFT
JOIN tb_today oldrows USING(id)
WHERE newrows.id IS NULL
;
Just leave out the last leg of the UNION SELECT if you don't want to cater for DELETEs ('D')
Good luck
you also do it nicely using joins:
SELECT b.*
FROM tableB AS b
LEFT JOIN tableA AS a ON a.id = b.id
WHERE a.id IS NULL
so above query return only diff from TableB to TableA i.e. data which is present in both table will be skipped...
Simple problem. I have a simple SQL as thus...
SELECT a.Col1, a.Col2, XXX
FROM table1 AS a
LEFT JOIN table2 as b
ON b.Key1 = a.Key1
What can I put in the 'XXX' above to say something like "does table B exists?".
ie: EXISTS(b) AS YesTable2
I am hoping there is a simpler solution than just using CASE...END statements.
Thanks
You could use ISNULL(b.Key1, 'XXX') Or COALESCE for checking against multiple values for the first non null value.
Pick any column from b that is not allowed to be NULL. If there is a NULL there, the record does not exist. If there is a value there, the record does exist. If every column in b is allowed to be NULL (rare: you should always have something that's not nullable in the primary key) you'll have to build an expression that mimics the JOIN conditions.
I am hoping there is a simpler solution than just using CASE...END statements.
Your guess is spot-on: you can use CASE...END to compare b.Key1 to NULL, like this:
SELECT
a.Col1
, a.Col2
, CASE WHEN b.Key1 IS NOT NULL THEN 1 ELSE 0 END as YesTable2
FROM table1 AS a
LEFT JOIN table2 as b
ON b.Key1 = a.Key1
If you just want to know if a record exists, I would suggest using exists in the select clause:
SELECT a.Col1, a.Col2,
(CASE WHEN EXISTS (SELECT 1 FROM table2 b ON b.Key1 = a.Key1)
THEN 1 ELSE 0
END) as ExistsInTable2
FROM table1 a;
This version will guarantee that you do not get duplicated rows if there are multiple matches in the two tables.
I have a table, let's call it "a" that is used in a left join in a view that involves a lot of tables. However, I only want to return rows of "a" if they also join with another table "b". So the existing code looks like
SELECT ....
FROM main ...
...
LEFT JOIN a ON (main.col2 = a.col2)
but it's returning too many rows, specifically ones where a doesn't have a match in b. I tried
SELECT ...
FROM main ...
...
LEFT JOIN (
SELECT a.col1, a.col2
FROM a
JOIN b ON (a.col3 = b.col3)) ON (a.col2 = main.col2)
which gives me the correct results but unfortunately "EXPLAIN PLAN" tells that doing it this way ends up forcing a full table scan of both a and b, which is making things quite slow. One of my co-workers suggested another LEFT JOIN on b, but that doesn't work because it gives me the b row when it's present, but doesn't stop returning the rows from a that don't have a match in b.
Is there any way to put the main.col2 condition in the sub-SELECT, which would get rid of the full table scans? Or some other way to do what I want?
SELECT ...
FROM ....
LEFT JOIN ( a INNER JOIN b ON .... ) ON ....
add a where (main.col2 = a.col2)
just do a join instead of a left join.
What if you created a view that gets you the "a" to "b" join, then do your left joins to that view?
Select ...
From Main
Left Join a on main.col2 = a.col2
where a.col3 in (select b.col3 from b) or a.col3 is null
you may also need to do some indexing on a.col3 and b.col3
First define your query between table "a" and "b" to make sure it is returning the rows you want:
Select
a.field1,
a.field2,
b.field3
from
table_a a
JOIN table_b b
on (b.someid = a.someid)
then put that in as a sub-query of your main query:
select
m.field1,
m.field2,
m.field3,
a.field1 as a_field1,
b.field1 as b_field1
from
Table_main m
LEFT OUTER JOIN
(
Select
a.field1,
a.field2,
b.field3
from
table_a a
JOIN table_b b
on (b.someid = a.someid)
) sq
on (sq.field1 = m.field1)
that should do it.
Ahh, missed the performance problem note - what I usually end up doing is putting the query from the view in a stored procedure, so I can generate the sub-queries to temp tables and put indexes on them. Suprisingly faster than you would expect. :-)
I have two tables and I need to remove rows from the first table if an exact copy of a row exists in the second table.
Does anyone have an example of how I would go about doing this in MSSQL server?
Well, at some point you're going to have to check all the columns - might as well get joining...
DELETE a
FROM a -- first table
INNER JOIN b -- second table
ON b.ID = a.ID
AND b.Name = a.Name
AND b.Foo = a.Foo
AND b.Bar = a.Bar
That should do it... there is also CHECKSUM(*), but this only helps - you'd still need to check the actual values to preclude hash-conflicts.
If you're using SQL Server 2005, you can use intersect:
delete * from table1 intersect select * from table2
I think the psuedocode below would do it..
DELETE FirstTable, SecondTable
FROM FirstTable
FULL OUTER JOIN SecondTable
ON FirstTable.Field1 = SecondTable.Field1
... continue for all fields
WHERE FirstTable.Field1 IS NOT NULL
AND SecondTable.Field1 IS NOT NULL
Chris's INTERSECT post is far more elegant though and I'll use that in future instead of writing out all of the outer join criteria :)
I would try a DISTINCT query and do a union of the two tables.
You can use a scripting language like asp/php to format the output into a series of insert statements to rebuild the table the resulting unique data.
try this:
DELETE t1 FROM t1 INNER JOIN t2 ON t1.name = t2.name WHERE t1.id = t2.id
I have researched and haven't found a way to run INTERSECT and MINUS operations in MS Access. Does any way exist
INTERSECT is an inner join. MINUS is an outer join, where you choose only the records that don't exist in the other table.
INTERSECT
select distinct
a.*
from
a
inner join b on a.id = b.id
MINUS
select distinct
a.*
from
a
left outer join b on a.id = b.id
where
b.id is null
If you edit your original question and post some sample data then an example can be given.
EDIT: Forgot to add in the distinct to the queries.
INTERSECT is NOT an INNER JOIN. They're different. An INNER JOIN will give you duplicate rows in cases where INTERSECT WILL not. You can get equivalent results by:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
on a.PK = b.PK
Note that PK must be the primary key column or columns. If there is no PK on the table (BAD!), you must write it like so:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND a.Col3 = b.Col3 ...
With MINUS, you can do the same thing, but with a LEFT JOIN, and a WHERE condition checking for null on one of table b's non-nullable columns (preferably the primary key).
SELECT DISTINCT a.*
FROM a
LEFT JOIN b
on a.PK = b.PK
WHERE b.PK IS NULL
That should do it.
They're done through JOINs. The old fashioned way :)
For INTERSECT, you can use an INNER JOIN. Pretty straightforward. Just need to use a GROUP BY or DISTINCT if you have don't have a pure one-to-one relationship going on. Otherwise, as others had mentioned, you can get more results than you'd expect.
For MINUS, you can use a LEFT JOIN and use the WHERE to limit it so you're only getting back rows from your main table that don't have a match with the LEFT JOINed table.
Easy peasy.
Unfortunately MINUS is not supported in MS Access - one workaround would be to create three queries, one with the full dataset, one that pulls the rows you want to filter out, and a third that left joins the two tables and only pulls records that only exist in your full dataset.
Same thing goes for INTERSECT, except you would be doing it via an inner join and only returning records that exist in both.
No MINUS in Access, but you can use a subquery.
SELECT DISTINCT a.*
FROM a
WHERE a.PK NOT IN (SELECT DISTINCT b.pk FROM b)
I believe this one does the MINUS
SELECT DISTINCT
a.CustomerID,
b.CustomerID
FROM
tblCustomers a
LEFT JOIN
[Copy Of tblCustomers] b
ON
a.CustomerID = b.CustomerID
WHERE
b.CustomerID IS NULL