Find differences between two large tables in oracle - sql

I have two different tables, say table A and B in oracle with around 15 million records in each. Table A has columns (a,b,c,d) and
Table B has columns (e,f,g,h).
The objective is to write a stored procedure to check if every record present in table A is also present in table B and vice versa. Differences between these two should be inserted into a third table.
My problem is that
column a in Table A should be compared with concatenate of column e and f in table B if column e contains a certain string (0311),
if not I have to compare it with just column f.
Column b should be compared with column g in table B and
I also have to compare column c in the table A with column g in table B, if the two aren't a match column d should be compared with column g.
What's the fastest way to do so?
for example these two are a match:
Table A: 9353456789,03117884657,12082200003035,12082123595535
Table B: 9353456789,0311,7884657,12082200003035
or:
Table A: 9353456789,03117884657,12082200003035,12082123595535
Table B: 9353456789,0311,7884657,12082123595535
example of records that do not need concatenation and are a match:
Table A: 9353456789,03617884657,12082200003035,12082123595535
Table B: 9353456789,0361,03617884657,12082200003035

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TableA ( a VARCHAR2(20), b VARCHAR2(20), c VARCHAR2(20), d VARCHAR2(20) );
CREATE TABLE TableB ( e VARCHAR2(20), f VARCHAR2(20), g VARCHAR2(20), h VARCHAR2(20) );
CREATE TABLE TableC ( i VARCHAR2(20), j VARCHAR2(20), k VARCHAR2(20), l VARCHAR2(20) );
INSERT INTO TableA
SELECT '9353456789','03117884657','12082200003035','12082123595535' FROM DUAL
UNION ALL SELECT '9353456789','03617884657','12082200003035','12082123595535' FROM DUAL
UNION ALL SELECT '9353456789','03617884657','12082200003034','12082123595534' FROM DUAL;
INSERT INTO TableB
SELECT '9353456789','0311','7884657','12082200003035' FROM DUAL
UNION ALL SELECT '9353456789','0311','7884657','12082123595535' FROM DUAL
UNION ALL SELECT '9353456789','0361','03617884657','12082200003035' FROM DUAL
UNION ALL SELECT '9353456789','0361','03617884657','12082200003036' FROM DUAL;
Query 1:
To insert the rows - perform an INSERT INTO... SELECT using a FULL OUTER JOIN between both tables using your requirements as the join condition; then for the rows which do not match either TableA(a, b, c, d) will all be NULL or TableB(e, f, g, h) will all be NULL and this can be used in the WHERE condition to only get the non-matched rows. Finally, so as not to return NULL values, COALESCE() is used for the returned values.
INSERT INTO TableC
SELECT COALESCE( ta.a, tb.e ) AS i,
COALESCE( ta.b, tb.f ) AS j,
COALESCE( ta.c, tb.g ) AS k,
COALESCE( ta.d, tb.h ) AS l
FROM TableA ta
FULL OUTER JOIN
TableB tb
ON ( ta.a = tb.e
AND ta.b = CASE tb.f WHEN '0311' THEN tb.f || tb.g ELSE tb.g END
AND ( ta.c = tb.h OR ta.d = tb.h )
)
WHERE ta.a IS NULL
OR tb.e IS NULL;
Query 2:
SELECT * FROM TableC
Results:
| I | J | K | L |
|------------|-------------|----------------|----------------|
| 9353456789 | 03617884657 | 12082200003034 | 12082123595534 |
| 9353456789 | 0361 | 03617884657 | 12082200003036 |

I'd do this as two statements, though it can be combined
Select a.*
from tablea a left join tableb b on a.a =
case when e = 'string' then b.e || b.f else b.f end
and ...
where b.e is null
The left join will return nulls where a row isn't found in table b, so this should bring up a list of rows i9n table a not in table b. Change the statement to a right join and select b.* and you'll see whats in b but not in a.
Statement can be turned into a 'create table as' which will create a new table with the results from this select statement.
I put and ... your conditions there are a bit confusing, you'll just need to use case statements to pick which columns you want to compare/join on.

Related

SQL - Accessing data in 2 tables

I have two tables (table A and table B) that have a 1 to many mapping. For every record in table A, I want to check if any of its events in table B occur after 2010. For example:
Table A Table B
ID REGISTER ID DATE
A qwer A 1995-01-01
B ghlk A 1997-01-31
C thasdj A 2006-03-15
B 2001-03-15
B 2003-04-03
B 2021-08-01
B 1995-01-01
C 2001-01-01
C 2010-01-01
Therefore, the resulting Table would be
Table C
ID Register
A qwer
C thasdj
Because for ID A and C, none of their events happens after 2010.
THis is the script I tried using but I'm not sure why it's not working. Any help
SELECT *
INTO Table C
FROM Table A
where ID not in(
SELECT distinct ID from Table B
where [DATE] >= 2011-01-01
you can do it with insert into {tablename} (list column) select syntax
INSERT INTO C ( ID, Register )
SELECT A.ID, A.Register
FROM A
WHERE A.ID not in (
SELECT distinct ID from Table B
where [DATE] >= 2011-01-01
)
You can use not exists for this task. Presumably your example query is contrived however note you must properly delimit object names that contain spaces, are reserved words etc and a date value must be quoted.
select *
into TableC
from TableA a
where not exists (
select * from TableB b
where b.Id = a.Id and b.[Date] >='20110101'
);

Postgres SQL SELECT data from the table the entry exists with ID

I have the following scenario.I have 3 tables with the following structure.
TABLE A
-entry_id (PRIMARY KEY, INTEGER)
TABLE B
-entry_id (FOREIGN_KEY -> TABLE A)
-content (TEXT)
TABLE C
-entry_id (FOREIGN_KEY -> TABLE A)
-content (INTEGER)
I want to retrive the content cell value from either table B or table C. The value can be in just one of the table. So it is either table B or C witch have an entry with a given entry_id.
PS. Sorry if duplicate did not manage to find anything to match what i need.
If I correctly understand, you need something like:
select entry_id, content::text from TABLEB where entry_id = ?
union all
select entry_id, content::text from TABLEC where entry_id = ?
union all
select entry_id, content::text from TABLED where entry_id = ?
If it can only exist in one table at a time, use a union
select a1.entry_id, b2.content
from TableA a1
inner join TableB b2
on a1.entry_id = b2.entry_id
union -- This removes any duplicates. Use UNION ALL to show duplicates
select a1.entry_id, c3.content::text
from TableA a1
inner join TableC c3
on a1.entry_id = c3.entry_id

Hive: filter a table using another table

I am very new to hive and sql and I have a question about how I would go about the following:
I have table A:
Name id
Amy 1
Bob 4
Josh 9
Sam 6
And I want to filter it using values from another table (table B):
Value id
.2 4
.7 6
To get a new table that looks like table A but only contains rows with values in the id column that also appeared in the id column of table B:
Name id
Bob 4
Sam 6
So I'm assuming I would write something that started like...
CREATE TABLE Table C AS
SELECT * FROM Table A
WHERE id....
just join it..
hive> CREATE TABLE TableC AS
> SELECT A.* FROM TableA as A,
> TableB as B
> WHERE A.id = B.id;
hive> SELECT * FROM TableC;
OK
Bob 4
Sam 6
or try this,
hive> CREATE TABLE TableD AS
> SELECT A.* FROM TableA as A join
> TableB as B
> on A.id = B.id;
hive> SELECT * FROM TableD;
OK
Bob 4
Sam 6
Two tables were created with the below columns
CREATE TABLE TABLE_1
( NAMES VARCHAR2(10) NOT NULL,
ID_1 NUMBER (2) NOT NULL)
CREATE TABLE TABLE_2
( VALUES_1 VARCHAR2(10) NOT NULL,
ID_1 NUMBER (2) NOT NULL)
and inserted values in these tables
Final table should be created as
CREATE TABLE TABLE_3 AS (
SELECT T1.NAMES,T2.ID_1 FROM TABLE_1 T1,TABLE_2 T2
WHERE T1.ID_1(+)= T2.ID_1)
The correct syntax for the result I wanted was:
CREATE TABLE tableC AS
SELECT tableA.*
FROM tableA LEFT SEMI JOIN tableB on (tableA.id = tableB.id);
Create the result table
CREATE TABLE TABLE3 (Name STRING, id INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Then insert into the new table using Join
INSERT INTO TABLE TABLE3 SELECT t1.Name, t1.Id from Table1 t1
JOIN Table2 t2 WHERE t1.id = t2.id;
will give you the desired result.

Join Query in Hive

I want to create a table C which contains column from Table A (customer_id ) and Table B (customer_id) which contains all customer_id from table A which are not in Table B. I wrote the following query but it didn't get any data populated.
create table C AS
select *
from (
select customer_id
from A al
join B bl
on al.customer_id=bl.customer_id
where bl.customer_id is null
) x;
This query shows 0 results.
SELECT a1.customer_id
FROM
A a1 LEFT OUTER JOIN
B b1 ON a1.customer_id = b1.customer_id
WHERE b1.customer_id IS NULL;
That should do the thing.
Regards,
Dino

Combining sql select and Count

I have two tables
A and B
A B
----------------- -----------------
a_pk (int) b_pk (int)
a_name(varchar) a_pk (int)
b_name (varchar)
I could write a query
SELECT a.a_name, b.b_name
FROM a LEFT OUTER JOIN b ON a.a_pk = b.a_pk
and this would return me a non distinct list of everything in table a and its table b joined data. Duplicates would display for column a where different b records shared a common a_pk column value.
But what I want to do is get a full list of values from table A column a_name and ADD a column that is a COUNT of the joined values of table B.
So if a_pk = 1 and a_name = test and in table b there are 5 records that have a a_pk value of 1 my result set would be
a_name b_count
------ -------
test 5
The query should like this :
SELECT
a.a_name,
(
SELECT Count(b.b_pk)
FROM b
Where b.a_pk = a.a_pk
) as b_count
FROM a
SELECT a_name, COUNT(*) as 'b_count'
FROM
A a
JOIN B b
ON a.a_pk = b.a_pk
GROUP BY a_name
SELECT
a.name,
(
SELECT COUNT(1)
FROM B b
WHERE b.a_pk = a.a_pk
)
FROM A a