How to loop through rows in two tables and create a new set based on the merged results in SQL - sql

Here is my obstacle.
I have two tables. Table A contains more rows than Table B. I have to merge the results and if Table A does not contain a row from Table B then I insert it into the new set. If however, a row from Table A contains a row with the same primary key as Table B, the new set will take the row from Table B.
Would this best be done in a cursor or is there an easier way to do this? I ask because there are 20 million rows and while I am new to sql, i've heard cursors are expensive.

Your phrasing is a little vague. It seems that you want everything from TableB and then rows from TableA that have no matching primary key in B. The following query solves this problem:
select *
from tableB union all
select *
from tableA
where tableA.pk not in (select pk from tableB)

Yep, cursors are expensive.
There's a MERGE command in later versions of SQL that will do this in one shot, but it's sooo cumbersome. Better to do it in two pieces - first:
UPDATE A SET
field1 = B.field1
,field2 = B.field2
, etc
FROM A JOIN B on B.id = A.id
Then:
INSERT A SELECT * FROM B --enumerate fields if different
WHERE B.id not in (select id FROM A)

An OUTER JOIN should do what you need and be more efficient than a cursor.
Try this query
--first get the rows that match between TableA and TableB
INSERT INTO [new set]
SELECT TableB.* --or columns of your choice
FROM TableA LEFT JOIN TableB ON [matching key criteria]
WHERE TableB.[joining column/PK] IS NOT NULL
--then get the rows from TableA that don't have a match
INSERT INTO [new set]
SELECT TableA.* --you didn't say what was inserted if there was no matching row
FROM TableA LEFT JOIN TableB ON [matching key criteria]
WHERE TableB.[joining column/PK] IS NULL

Related

duplicate query result when join table

I face issue about duplicate data when join table, here my sample data table I have
-- Table A
I want to join with
-- Table B
this my query notation for join both table,
select a.trans_id, name
from tableA a
inner join tableB b
on a.ID_Trans = b.trans_id
and this the result, why I get the duplicating data which should show only two lines of data, please help me to solve this case.
Firstly, as you have been told multiple times in the comments, this is working exactly as you have written, and (more importantly) as intended. You have 2 rows in tableA and those 2 rows match 2 rows in your table tableB according to the ON clause. This means that each join operation, for the each of the rows in tableA, results in 2 rows as well; thus 4 rows (2 * 2 = 4).
Considering that your table, TableA only has one column then it seems that you should be cleaning up that data and deleting the duplicates. There are plenty of examples on how to do that already (example).
Perhaps the column you show us in TableA is one many, and thus instead you have a denormalisation issue, and instead there should be another table with the details of Id_trans and a PRIMARY KEY or UNIQUE CONSTRAINT/INDEX on it. Then you would join fron that table to TableB.
Finally, what you might be after is an EXISTS, which would look like this:
SELECT B.trans_id, B.[name]
FROM dbo.TableB B
WHERE EXISTS(SELECT 1
FROM dbo.TableA A
WHERE A.ID_Trans = B.trans_id); --Odd that it's called ID_Trans in one table, and Trans_ID in another
As the comments mentioned your query does exactly what you asked it to do but I think you wanted something like:
select a.trans_id, a.name, b.name
from tableA a
inner join tableB b on a.trans_id = b.trans_id
group by a.trans_id, a.name, b.name
Since there are two rows in both table with same ID join will make them four. You can use distinct to remove duplicates:
select distinct a.trans_id, name
from tableA a
inner join tableB b
on a.id_trans = b.trans_id
But I would suggest to use exists:
select trans_id, name
from tableB b
exists (select 1 from tableA a where a.trans_id=b.trans_id)

INTERSECT two table of size 500ml rows in vertica

I am very new to vertica db and hence looking for different efficient ways for comparing two tables of average size 500ml-800ml rows in vertica. I have a process that gets the data from vertica view and dump in to SQL server for later merge to final table in sql server. for few large tables combine it is dumping about 3bl rows daily. Instead of dumping all data I want to take daily snapshot, and compare it with previous days snapshot on vertica side only and then push changed rows only in to SQL SEREVER.
lets say previous snapshot is stored in tableA, today's snapshot stored in tableB. PK on both table is column named OrderId.
Simplest way I can think of is
Select * from tableB
Where OrderId NOT IN (
SELECT * from tableA
INTERSECT
SELECT * from tbleB
)
So my questions are:
Is there any other/better option in vertica to get only changed rows between two tables? Or should I
even consider doing this compare on vertica side?
How much doing such comparison should take?
What should I consider to improve the performance of such query?
If your columns have no NULL values, then a massive LEFT JOIN would seem to do what you want:
select b.*
from tableB b left join
tableA a
on b.OrderId = a.OrderId and
b.col1 = a.col1 and
. . . -- for all the columns you care about
However, I think you want except:
select b.*
from tableB b
except
select a.*
from tableA a;
I imagine this would have reasonable performance.
Do you have a primary key in the two tables?
Then my technique, for a complete Change Data Capture, is:
SELECT
'I' AS to_do
, newrows.*
FROM tb_today newrows
LEFT
JOIN tb_yesterday oldrows USING(id)
WHERE oldrows.id IS NULL
UNION ALL
SELECT
'U' AS to_do
, newrows.*
FROM tb_today newrows
JOIN tb_yesterday oldrows
WHERE oldrows.fname <> newrows.fname
OR oldrows.lnamd <> newrows.lname
OR oldrows.bdate <> newrwos.bdate
OR oldrows.sal <> newrows.sal
[...]
OR oldrows.lastcol <> newrows.lastcol
UNION ALL
SELECT
'D' AS to_do
, oldrows.*
FROM tb_yesterday oldrows
LEFT
JOIN tb_today oldrows USING(id)
WHERE newrows.id IS NULL
;
Just leave out the last leg of the UNION SELECT if you don't want to cater for DELETEs ('D')
Good luck
you also do it nicely using joins:
SELECT b.*
FROM tableB AS b
LEFT JOIN tableA AS a ON a.id = b.id
WHERE a.id IS NULL
so above query return only diff from TableB to TableA i.e. data which is present in both table will be skipped...

Replace null with ID from right table in join

I am doing a select with right join. And I have some rows with IDS from table A but not all rows have corresponding IDs in Table B.
I wish to replace null IDS from table B with IDS of table A based on other row.
That alternative row has no null values. How can it be done?
SQL has a command for that...
ISNULL(TableB.Id, TableA.Id) AS SomeId
In the event of TableB.Id being NULL, you'll get TableA.Id as SomeId.
THIS ADDRESSES THE ORIGINAL VERSION OF THE QUESTION.
First, I strongly recommend left join over right join. It is easier to read a query when the logic is "keep all the rows in the first table (and I know what that is)".
You can use coalesce(). In a select query, you would do:
select coalesce(b.id, a.id)
from a left join
b
on a.anothercol = b.anothercol;
However, I suspect you intend:
update b
set id = (select a.id from a where a.anothercol = b.anothercol)
where id is null;

SQL Select Into Field

I want to accomplish something of the following:
Select DISTINCT(tableA.column) INTO tableB.column FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
SELECT column INTO tableB FROM tableA
SELECT INTO will create a table as it inserts new records into it. If that is not what you want (if tableB already exists), then you will need to do something like this:
INSERT INTO tableB (
column
)
SELECT DISTINCT
column
FROM tableA
Remember that if tableb has more columns that just the one, you will need to list the columns you will be inserted into (like I have done in my example).
You're pretty much there.
SELECT DISTINCT column INTO tableB FROM tableA
It's going to insert into whatever column(s) are specified in the select list, so you would need to alias your select values if you need to insert into columns of tableB that aren't in tableA.
SELECT INTO
Try the following...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
I don't know what the schema of tableB is... if table B already exists and there is no unique constraint on the column you can do as any of the others suggest here....
INSERT INTO tableB (column)Select DISTINCT(tableA.column)FROM tableA
but if you have a unique constraint on table B and it already exists you'll have to exclude those values already in table B...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
WHERE tableA.column NOT IN (SELECT /* NOTE */ tableB.column FROM tableB)
-- NOTE: Remember if there is a unique constraint you don't need the more
-- costly form of a "SELECT DISTICT" in this subquery against tableB
-- This could be done in a number of different ways - this is just
-- one version. Best version will depend on size of data in each table,
-- indexes available, etc. Always prototype different ways and measure perf.

MySQL how to remove records in a table that are in another table

I have table A with close to 15000 entries. I have a second table B with 7900 entries with a common field with table A.
I need to extract into a third temporary tableC all entries from table A except the ones that also appear in table B. Simple as it may sound, i havent found a way to do it. The closest i got was this:
INSERT INTO tableC
SELECT *
FROM tableA
INNER JOIN tableB
ON tableA.field IS NOT tableB.field
This SQL just selects everything in tableA, even entries that are in tableB.
Any ideas where i'm going wrong?
What if you try this?
INSERT INTO tableC
SELECT *
FROM tableA
WHERE tableA.field NOT IN (SELECT tableB.field FROM tableB)
Or you can try the alternate EXISTS syntax
INSERT INTO tableC
SELECT *
FROM tableA
WHERE NOT EXISTS (SELECT * FROM tableB WHERE tableB.field = tableA.field)